<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">26999</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2022.026999</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Multi-Scale Attention-Based Deep Neural Network for Brain Disease Diagnosis</article-title>
<alt-title alt-title-type="left-running-head">Multi-Scale Attention-Based Deep Neural Network for Brain Disease Diagnosis</alt-title>
<alt-title alt-title-type="right-running-head">Multi-Scale Attention-Based Deep Neural Network for Brain Disease Diagnosis</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Liang</surname><given-names>Yin</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>yinliang@bjut.edu.cn</email>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Xu</surname><given-names>Gaoxu</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Rehman</surname><given-names>Sadaqat ur</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Faculty of Information Technology, College of Computer Science and Technology, Beijing Artificial Intelligence Institute, Beijing University of Technology</institution>, <addr-line>Beijing, 100124</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Natural and Computing Science, University of Aberdeen</institution>, <addr-line>Aberdeen</addr-line>, <country>U.K</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Yin Liang. Email: <email>yinliang@bjut.edu.cn</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2022-04-20"><day>20</day>
<month>04</month>
<year>2022</year></pub-date>
<volume>72</volume>
<issue>3</issue>
<fpage>4645</fpage>
<lpage>4661</lpage>
<history>
<date date-type="received"><day>08</day><month>1</month><year>2022</year></date>
<date date-type="accepted"><day>02</day><month>3</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Liang et al.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Liang et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_26999.pdf"></self-uri>
<abstract>
<p>Whole brain functional connectivity (FC) patterns obtained from resting-state functional magnetic resonance imaging (rs-fMRI) have been widely used in the diagnosis of brain disorders such as autism spectrum disorder (ASD). Recently, an increasing number of studies have focused on employing deep learning techniques to analyze FC patterns for brain disease classification. However, the high dimensionality of the FC features and the interpretation of deep learning results are issues that need to be addressed in the FC-based brain disease classification. In this paper, we proposed a multi-scale attention-based deep neural network (MSA-DNN) model to classify FC patterns for the ASD diagnosis. The model was implemented by adding a flexible multi-scale attention (MSA) module to the auto-encoder based backbone DNN, which can extract multi-scale features of the FC patterns and change the level of attention for different FCs by continuous learning. Our model will reinforce the weights of important FC features while suppress the unimportant FCs to ensure the sparsity of the model weights and enhance the model interpretability. We performed systematic experiments on the large multi-sites ASD dataset with both ten-fold and leave-one-site-out cross-validations. Results showed that our model outperformed classical methods in brain disease classification and revealed robust inter-site prediction performance. We also localized important FC features and brain regions associated with ASD classification. Overall, our study further promotes the biomarker detection and computer-aided classification for ASD diagnosis, and the proposed MSA module is flexible and easy to implement in other classification networks.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Autism spectrum disorder diagnosis</kwd>
<kwd>resting-state fMRI</kwd>
<kwd>deep neural network</kwd>
<kwd>functional connectivity</kwd>
<kwd>multi-scale attention module</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Brain disease diagnosis is now becoming a new hotspot issue in the research of artificial intelligence and brain science. Noninvasive brain imaging technologies have effectively enhanced the understanding of the neural substrates underlying brain disorders, and may help to reveal the associated biomarkers that can be used for imaging diagnosis. As a non-invasive brain imaging technology, resting-state functional magnetic resonance imaging (rs-fMRI) has been widely applied in brain diseases diagnosis [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>]. Owing to the expectation of existing interactions between different brain regions, functional connectivity (FC) analysis, which measures the temporal correlations in the fMRI activity between spatially distant brain regions, has become the primary method to analyze rs-fMRI data. Recent studies have shown that many brain diseases, such as autism spectrum disorder (ASD), schizophrenia, and Alzheimer&#x2019;s disease, are associated with abnormalities in the brain FC patterns [<xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;<xref ref-type="bibr" rid="ref-5">5</xref>].</p>
<p>With the rapid development of artificial intelligence and data mining techniques, machine learning methods have been employed in recent studies to classify the FC patterns for brain disease diagnosis. As an important feature extraction technique, deep learning models can automatically learn lower-dimensional abstract feature representations from the initial input. Recently, more and more works have applied deep learning methods to the FC-based brain disease classification [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-8">8</xref>]. Among them, auto-encoder (AE) is currently the most widely used model that construct fully connected deep neural network (DNN) for the FC pattern classification. These methods reshape the FC patterns in vector forms as input and commonly need to learn a large number of parameters. Although substantial achievements have been made in the FC pattern classification, these DNN models can cause problems such as slow model convergence and overfitting due to the dense model parameters. Moreover, for the FC pattern, the data at each location represents the strength of functional correlation between different brain regions, which has obvious biological significance. Therefore, exploring robust classification model as well as improving the model interpretability will be benefit to promote the computer-aided brain disease classification and the research of biomarkers for clinical diagnosis.</p>
<p>In this work, we proposed a multi-scale attention-based DNN (MSA-DNN) model to classify the FC patterns for brain disease diagnosis. The model consisted of a backbone classification network based on fully connected structure and a multi-scale attention (MSA) module. For the backbone network, we built a DNN based on AEs to project high-dimensional FC features into a lower-dimensional feature space. We combined both unsupervised and supervised training processes to improve the effectiveness of feature learning. Inspired by the attention mechanism [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>], we proposed a flexible MSA module that can be embedded between the hidden layers of the backbone network. The MSA module extracted multi-scale features of the FC patterns and added attention weights to the FC features at each position. This ensures that more important FC features are continuously emphasized and less important FC features are continuously suppressed. To verify the effectiveness of the proposed model, we performed systematic experiments on the Autism Brain Imaging Data Exchange (ABIDE) dataset, which aggregated large-scale collections of rs-fMRI data for ASD patients and healthy controls. Ten-fold and leave-one-site-out cross-validations were conducted to examine the classification performance. Moreover, we conducted saliency map analysis to locate the most important FC features correlated to the ASD classification [<xref ref-type="bibr" rid="ref-11">11</xref>].</p>
<p>The main contributions of this paper are summarized as follows:
<list list-type="simple">
<list-item><label>(1)</label><p>We proposed a novel MSA-DNN model to classify FC patterns for ASD diagnosis. The model built a DNN with both unsupervised and supervised training steps to improve the effectiveness of feature learning. A flexible MSA module was added between the hidden layers of the DNN model, which can fuse the multi-scale features of the FC patterns to enhance the sparsity of the model weights and improve the model interpretability.</p></list-item>
<list-item><label>(2)</label><p>Systematic experiments were conducted on the large-scale multi-sites ABIDE dataset. Results of ten-fold and leave-one-site-out cross-validation experiments indicate the robust classification performance of our MSA-DNN model. We also identified important FC features as biomarkers associated with ASD classification.</p></list-item>
<list-item><label>(3)</label><p>This study further extends previous studies on FC-based brain disease classification. The proposed MSA module is flexible and easy to implement, and can be embedded into other classification networks.</p></list-item>
</list></p>
</sec>
<sec id="s2"><label>2</label><title>Related Works</title>
<p>The use of non-invasive rs-fMRI has greatly promoted the neuroscience studies, which helps to investigate the pathological mechanism underlying the brain disease as well as to detect the potential diagnostic biomarkers [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>]. Rs-fMRI can measure blood oxygen level-dependent (BOLD) signal fluctuations to reflect the functional activities of neurons or brain regions, thus can be used to quantify the functional interactions between brain regions. Neuroscience studies have shown that the human brain is a highly interactive system which can perform complex cognition tasks through the interconnections of multiple brain regions. An increasing number of studies have indicated that many brain diseases are associated with interruptions or abnormalities in the FC patterns [<xref ref-type="bibr" rid="ref-13">13</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>].</p>
<p>Machine learning techniques have been widely used in recent rs-fMRI studies to identify the FC pattern differences associated with brain diseases [<xref ref-type="bibr" rid="ref-16">16</xref>&#x2013;<xref ref-type="bibr" rid="ref-19">19</xref>]. Classical machine-learning methods such as the support vector machine (SVM), logistic regression (LR), and random forest (RF) have been found effective in analyzing the fMRI data. Due to their simple and easy to implement properties, these methods, especially the SVM, have been widely employed as classifiers for the FC pattern classification. For instance, Rosa et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] built a sparse framework with graphical LASSO and L1-norm regularization linear SVM for discriminating the major depressive disorder (MDD). Chen et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] applied SVM to classify the FC patterns constructed from different frequency bands for ASD diagnosis. However, these methods may not able to effectively learn high-level abstract feature representations for the complex FC patterns thus limit the further improvement of their performance. As a promising alternative, deep learning methods can automatically learn multi-level low-dimensional abstract feature representations from the initial input, and have achieved outstanding performance in computer vision, communications, and fog computing [<xref ref-type="bibr" rid="ref-20">20</xref>&#x2013;<xref ref-type="bibr" rid="ref-24">24</xref>]. Recently, deep learning methods have attracted an increasing attention in computer-aided medical diagnosis [<xref ref-type="bibr" rid="ref-25">25</xref>&#x2013;<xref ref-type="bibr" rid="ref-27">27</xref>]. Accordingly, adopting DNN to analyze the FC patterns for brain disease classification has become the new trends [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-8">8</xref>]. Among the deep learning methods, AE is commonly employed model that construct fully connected DNN for FC pattern classification. Kim et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] adopted AE with L-1 regularization as pre-training model to initial DNN for the classification of schizophrenia, and obtained lower error rate than SVM. Heinsfeld et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] built a stack AE (SAE) model with two denoising AEs to distinguish the ASD group from the healthy controls, and achieved robust classification performance on the large-scale ASD dataset. In general, these DNNs can extract more informative abstract features to analyze the FC patterns and achieve better classification performance than traditional machine learning methods. However, these DNN models commonly need to train a large amount of model parameters from high-dimensional input FC pattern, which may lead to slow model convergence and overfitting problems. Therefore, study of robust classification model while enhance the sparsity of the model weights may further promote the computer-aided brain disease classification.</p>
<p>In this study, we proposed a novel MSA-DNN model to classify FC patterns for ASD diagnosis. A flexible MSA module was introduced to fuse the multi-scale FC features and enhance the sparsity of model weights. Detailed implementations of our model are described in the following sections.</p>
</sec>
<sec id="s3"><label>3</label><title>Materials and Methods</title>
<sec id="s3_1"><label>3.1</label><title>Data Acquisition and Preprocessing</title>
<p>In this study, rs-fMRI data were obtained from the large-scale ASD dataset ABIDE (<uri xlink:href="http://fcon_1000.projects.nitrc.org/indi/abide/">http://fcon_1000.projects.nitrc.org/indi/abide/</uri>). ABIDE aggregates previously collected rs-fMRI data with corresponding anatomical and phenotypic information from 17 international sites to make available for data sharing with the broader scientific community. The rs-fMRI data in ABIDE have been widely used in recent research to explore the pathological basis of ASD and potential diagnostic biomarkers. Data preprocessing was performed by the Configurable Pipeline for the Analysis of Connectomes (CPAC) [<xref ref-type="bibr" rid="ref-29">29</xref>], which mainly included slice-time correction, motion correction, spatial registration and normalization, nuisance signal regression, and band-pass filtering (0.01&#x2013;0.1&#x2005;Hz). After data check and collation, a total of 989 subjects were included in the subsequent analysis. The phenotypic information of the subjects in this study is summarized in <xref ref-type="table" rid="table-1">Tab. 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Phenotypic information of subjects in ABIDE dataset</title></caption>
<table>
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Type</th>
<th align="left">Number</th>
<th align="left">Avg age (&#x00B1;SD)</th>
<th align="left">Gender(M/F)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">ASD</td>
<td align="left">480</td>
<td align="left">16.6 (&#x00B1;8.2)</td>
<td align="left">422/58</td>
</tr>
<tr>
<td align="left">HC</td>
<td align="left">509</td>
<td align="left">16.6 (&#x00B1;7.3)</td>
<td align="left">418/91</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="tfn1_1"><p>Note: ASD: Autism Spectrum Disorder, HC: Healthy Control, Age Avg: Average Age, SD: Standard Deviation, M: Male, F: Female.</p></fn>
</table-wrap-foot>
</table-wrap>
</sec>
<sec id="s3_2"><label>3.2</label><title>Overview of the Proposed Classification Framework</title>
<p>In this study, we proposed a MSA-DNN model to classify the FC patterns for ASD diagnosis. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> shows the overview flowchart of our classification framework. The FC patterns were constructed from the pre-processed rs-fMRI data by correlation analysis, and the network nodes were defined by CC200 brain atlas (<xref ref-type="fig" rid="fig-1">Fig. 1a</xref>). Considering the high dimensionality of the FC features, we designed a novel DNN model to learn abstract feature representations from the FC patterns for ASD classification. The model consisted of a backbone network based on fully connected structure and a MSA module. For the backbone network, we built a DNN based on AEs to project high-dimensional FC features into a lower-dimensional feature space (<xref ref-type="fig" rid="fig-1">Fig. 1b</xref>). In addition to the unsupervised learning process, a supervised training step was further employed to improve the effectiveness of feature learning. This was implemented by adding a flexible MSA module between the hidden layers of the backbone network (<xref ref-type="fig" rid="fig-1">Fig. 1c</xref>). The MSA module fused multi-scale features of the FC patterns and added attention weights to the FC features to continuously emphasize the more important FCs and suppress the less important FCs. Details for each stage are described in the following subsections.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Overview flowchart of the proposed classification framework</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_26999-fig-1.png"/></fig>
</sec>
<sec id="s3_3"><label>3.3</label><title>Construction of the FC Patterns</title>
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1a</xref>, the average time series were extracted from each ROI, and the FC patterns were constructed by the computation of pairwise correlations between the regional-averaged rs-fMRI signals for each brain region pair. The correlations were calculated by Pearson&#x2019;s correlation coefficients. Assume that <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mi>M</mml:mi></mml:msup></mml:mrow></mml:math></inline-formula> represent the average rs-fMRI signals for the <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:msup><mml:mi>j</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> ROIs at the time point <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mrow></mml:mrow><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>T</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. <italic>M</italic> and <italic>T</italic> denote the total number of ROIs and total number of time points, respectively. The FC strength between these two ROIs <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> can be defined as:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msqrt><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:msqrt><mml:msqrt><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow></mml:msqrt></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> represent the means of <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. By calculating the Pearson correlation between the average rs-fMRI time series for each brain region pair, we generated the classical correlation-based FC patterns. A Fisher-r-to-z transformation was also performed to force the FC matrices to be normally distributed. In addition, since the FC matrices are symmetric, the upper triangle values of each FC matrix were retained and reshaped into an FC feature vector with <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mfrac><mml:mrow><mml:mi>M</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>M</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:math></inline-formula> elements. In this work, <italic>M&#x2009;&#x003D;&#x2009;</italic>200, the initial FC feature dimension is <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>M</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>M</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mfrac><mml:mo>=</mml:mo><mml:mn>19900</mml:mn></mml:math></inline-formula>.</p>
</sec>
<sec id="s3_4"><label>3.4</label><title>AE-Based Backbone DNN Construction</title>
<p>For the backbone network, we built a DNN model based on AEs to learn abstract feature representations from the initial high-dimensional FC patterns. AE is a neural network model that learns a lower-dimensional feature representation (hidden layer) of the input nodes by encoding and decoding procedures with unsupervised learning (<xref ref-type="fig" rid="fig-2">Fig. 2</xref>). The purpose of AE training is to reduce the differences between the input data <italic>x<sub>i</sub></italic> and the reconstructed data <italic>z<sub>i</sub></italic> by continuously optimizing the loss function, so that the abstract feature representations can retain maximum useful information.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Two phases of the AE training</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_26999-fig-2.png"/></fig>
<p>The error between the input and the reconstructed features can be measured by the mean square error (MSE). Due to the characteristics of high-dimensionality and small sample-size of the FC data, we also used the Kullback-Leibler (KL) divergence to constrain the sparsity of the hidden-layer activation neurons of AE and added the L-2 regularization term to further avoid overfitting. The total loss function in the unsupervised training process can be defined as,
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>J</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03C1;</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>&#x03C1;</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mi>&#x03BB;</mml:mi><mml:mn>2</mml:mn></mml:mfrac><mml:mrow><mml:msub><mml:mi>J</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mrow><mml:msub><mml:mi>J</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the MSE for total <italic>C</italic> samples, the second and third terms represent the KL divergence and L-2 regularization terms, respectively; <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> are hyperparameters.</p>
<p>In the network training, we firstly used greedy algorithm for unsupervised training of AEs. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1b</xref>, we trained 4 AEs, each of which was trained independently, with the hidden layer of the current AE became the input in the next AE training. The back-propagation algorithm was used to minimize the loss function in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> to obtain the optimal AE parameters, so that the network continuously learned a more generalized abstract feature representation for the FC patterns.</p>
<p>To further enhance the learning and classification performance and improve the model interpretability, we conducted supervised learning to fine-tune the overall network in addition to the unsupervised training process. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1c</xref>, the pre-trained AEs were stacked to generate the initial DNN and a MSA module was introduced between the hidden layers of the backbone network. More details about the MSA module will be described in the next section. In the supervised training step, an additional layer (labels) was added on the top of the DNN model, and the cross-entropy loss function was used for the supervised fine-tuning of the overall network:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>C</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>C</mml:mi></mml:munderover><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:munderover><mml:mrow><mml:mn>1</mml:mn><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>;</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>;</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represents the probability that sample <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> is classified in class <italic>j</italic> with the model parameter <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula>. This probability can be derived by:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>;</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03B8;</mml:mi><mml:mi>j</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mrow><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:munderover><mml:mrow><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03B8;</mml:mi><mml:mi>l</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In this study, in order to reduce the information loss due to the sharp dimensional reduction between layers, we used denoising AE with sparse penalty in the first AE, and used denoising AE in the other three AEs to increase the robustness of our model. In the supervised training process, we used the Adam optimization algorithm to update the model parameters and employed the learning rate decay strategy in the optimization. The configuration of the backbone DNN is summarized in <xref ref-type="table" rid="table-2">Tab. 2</xref>.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Relevant configurations of the backbone DNN model</title></caption>
<table>
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Configuration</th>
<th align="left">Iteration</th>
<th align="left">Initial learning rate</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1<sup>st</sup> AE</td>
<td align="left">19900-1000-19900</td>
<td align="left">150</td>
<td align="left">0.0001</td>
</tr>
<tr>
<td align="left">2<sup>nd</sup> AE</td>
<td align="left">1000-600-1000</td>
<td align="left">300</td>
<td align="left">0.0001</td>
</tr>
<tr>
<td align="left">3<sup>rd</sup> AE</td>
<td align="left">600-40-600</td>
<td align="left">800</td>
<td align="left">0.0001</td>
</tr>
<tr>
<td align="left">4<sup>th</sup> AE</td>
<td align="left">40-2-40</td>
<td align="left">2000</td>
<td align="left">0.0001</td>
</tr>
<tr>
<td align="left">DNN</td>
<td align="left">19900-1000-600-40-2</td>
<td align="left">200</td>
<td align="left">0.01</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_5"><label>3.5</label><title>Multi-Scale Attention (MSA) Module</title>
<p>The attention mechanism simulates the perceptual process of human visual system, which will concentrate on the features with obvious inter-group differences and suppress the features that do not contribute significantly to the classification. For the FC pattern classification, the sample-size of fMRI data is relatively smaller in compare with the massive natural image data, the traditional deep network structure alone may not focus well on the FCs with more significant changes, and thus limits the further improvement of model performance. Therefore, we introduced a flexible MSA module in our DNN model to achieve the purpose of focusing on more discriminative FC features by automatically adjusting the attention weights. This module would further enhance the interpretability of the model, and ensure the sparsity of the network weights.</p>
<p>The basic configuration of the MSA module is shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. Let the input feature be <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>X</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, where 1 and <italic>L</italic> represent the number of channels and the length of the feature, respectively. In the following, we described the data structure of the MSA module in the format: number of channels, sample length. The attention weights for the FC features were obtained by two steps. In the first step, we conducted multi-scale convolutional operations on the FCs to enrich the data information by describing FC features at multiple scales. In this work, we performed one-dimensional convolutional operations <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> with the convolutional kernel sizes of 5, 7 and 9 to extract multi-scale FC features. Suppose <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mrow><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> denote a convolution kernel of one scale, and the output after the convolution operation is <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. Then, the output <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> for that channel can be given as: <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mi>X</mml:mi></mml:math></inline-formula>, where &#x002A; represents the convolution operation, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mrow><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. Sequentially, for feature maps <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>, and <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mn>3</mml:mn></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> that containing three different scales of features, the MSA module spliced the features along the channel dimension to obtain the fused feature representation <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>U</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mi>C</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. In the second step, a further generalized representation of the fused features was performed to reduce the computational effort. We used average-pooling and max-pooling operations to integrate the channel dimension information. Pooling is a commonly used nonlinear down-sampling method. Assuming that the feature maps obtained after max-pooling and average-pooling are <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mi>L</mml:mi></mml:msubsup></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, respectively. The process of using pooling operations to obtain feature maps can be expressed as follows:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:msubsup><mml:mi>u</mml:mi><mml:mi>c</mml:mi><mml:mi>l</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mi>C</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:math></disp-formula>
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msubsup><mml:mi>u</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>3</mml:mn><mml:mi>C</mml:mi></mml:mrow></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>3</mml:mn><mml:mi>C</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:msubsup><mml:mi>u</mml:mi><mml:mi>c</mml:mi><mml:mi>l</mml:mi></mml:msubsup></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msubsup><mml:mi>u</mml:mi><mml:mi>c</mml:mi><mml:mi>l</mml:mi></mml:msubsup></mml:math></inline-formula> represents the <italic>l</italic>-th FC feature in channel <italic>c</italic>. Then, we spliced these two feature maps to generate a generalized representation of the fused features <italic>U</italic> as <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msup><mml:mi>U</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula>. Finally, we used a one-dimensional convolutional operation with kernel size of 7, and a <italic>Sigmoid</italic> function to obtain the attention weights for the FC features. These weights indicate the degree to which the model emphasizes or suppresses the corresponding FC features in the model training. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1c</xref>, before the features entered the&#x00A0;next layer of the DNN model, the attention weights were multiplied with the learnt features of&#x00A0;the current layer to integrate the attention description for the FC features (by dot product operation). Briefly, the above mentioned two steps for attention weights generation can be summarized by the following <xref ref-type="disp-formula" rid="eqn-7">Eqs. (7)</xref> and <xref ref-type="disp-formula" rid="eqn-8">(8)</xref>, respectively:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>;</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>;</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>U</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>U</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mo stretchy="false">[</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mo>&#x22C5;</mml:mo></mml:mtd><mml:mtd><mml:mo>;</mml:mo></mml:mtd><mml:mtd><mml:mo>&#x22C5;</mml:mo></mml:mtd></mml:mtr></mml:mtable><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> represents the feature fusing, <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represent the max-pooling and average-pooling respectively, <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> represents the <italic>Sigmoid</italic> function, and <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> represents the attention weights. The implementation of the MSA module to add attention weights for the FC features is described in <bold>Algorithm&#x00A0;1</bold>.
</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>The illustration of the MSA module</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_26999-fig-3.png"/></fig>
<fig id="fig-6"><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_26999-fig-6.png"/>
</fig>
</sec>
<sec id="s3_6"><label>3.6</label><title>Important Functional Connections Analysis</title>
<p>In order to identify the important FCs that best discriminate between ASD and HC subjects, we conducted saliency map analysis to find the FC features with the most significant contribution to the classification. The main idea of saliency map is to calculate the partial derivatives of the classification results to the FC features, obtain the gradients of classification results for each FC, and then obtain the importance of the FC during the classification process. Thus, we performed back propagation and obtained the derivative gradients to indicate the contribution of the input FC features to the classification. Assuming the FC between the <italic>i</italic>-th and <italic>j</italic>-th ROIs is denoted as <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>F</mml:mi><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mi>i</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mi>j</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mn>200</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>, <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the importance of the FC feature during classification, which can be expressed by the absolute value of the gradient of the classification result <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math></inline-formula> to <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>F</mml:mi><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>; that is, <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula>. In this experiment, we calculated <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> in each fold of cross-validation and added the results obtained from ten folds to get the average value. Finally, we ranked these weights in descending order and obtained the top 20 FCs that contribute mostly to the ASD classification.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experimental Results</title>
<p>In this study, we conducted systematic experiments on the large aggregate ABIDE dataset to evaluate the classification performance of the proposed model. We employed two cross-validation schemes in our experiments. The first one is the classical 10-fold cross-validation which was performed similarly as those were implemented in previous studies; and the other one is the leave-one-site-out cross-validation which more closely emulated real clinical settings. Briefly, in 10-fold cross-validation, we randomly divided the data into ten subsets with similar size, in which the proportion of ASD patients and HC subjects in each subset was approximately equal. In each fold, we took 9 subsets data as the training set and the remaining one subset as the test set. The similar training process was carried out ten times until each subset was taken as test set once. We compared our model with several classical methods, including SVM, LR, RF, one-dimensional convolution neural network (1D-CNN) and stacked auto-decoders (SAEs). These methods were widely employed in recent studies on FC-based brain disease classification, with the first three are classical machine learning methods and the last two are deep learning methods. In addition to the classical 10-fold cross-validation, we conducted leave-one-site-out cross-validation to verify the model generalization to inter-site variability [<xref ref-type="bibr" rid="ref-30">30</xref>]. In this scheme, we left out the data of one site as the test data each time, and the data of the remaining sites were used as the training set. Data from different acquisition sites may be collected with different acquisition protocols (such as scanner type, collecting parameters, participant recruitment requirements, etc.). Therefore, the leave-one-site-out cross-validation emulated the conditions in real clinical settings more closely, and imposed higher requirements for the model generalization. Results are summarized in the subsections. The classification performance is evaluated by the accuracy, specificity, sensitivity, precision, and F1-score based on the results of cross-validation.</p>
<sec id="s4_1"><label>4.1</label><title>Classification Results of 10-Fold Cross-Validation</title>
<p>To evaluate the classification performance of the proposed model, we firstly performed classical 10-fold cross-validation experiments similarly as those were implemented in previous studies of ASD classification. We compared our model with SVM, RF, LR, 1D-CNN and SAEs, which were classical methods in the FC pattern classification. The results (accuracy, specificity, sensitivity, precision, and F1-score) of different methods are summarized in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. As the results shown, the proposed MSA-DNN obtained the best classification performance on all evaluation measures. In consistent with previous studies, the present work also primarily relied on prediction accuracy to assess the performance. Compared with the competing methods, the MSA-DNN achieved an average accuracy of 70.5&#x0025;, which was 5.2&#x0025;, 7.1&#x0025;, 4.4&#x0025;, 8.7&#x0025;, and 3.6&#x0025; higher than that of SVM, RF, LR, 1D-CNN, and SAEs. For specificity, sensitivity, prevision, and F1-score, our MSA-DNN also revealed obvious advantages than other methods. In addition, the standard errors of MSA-DNN were generally lower than those of the comparison methods, suggesting better robustness of our model in the classification process. These results indicate that the proposed MSA-DNN show better classification performance on the FC patterns, which further superior to the classical classification methods.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Classification performance comparisons between the proposed model and competing methods using 10-fold cross-validation</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_26999-fig-4.png"/></fig>
</sec>
<sec id="s4_2"><label>4.2</label><title>Classification Results of Leave-One-Site-Out Cross-Validation</title>
<p>To evaluate the classifier performance across sites, we further performed a leave-one-site-out cross-validation experiment. In this process, we left out the data of one site as the testing set, and used the data of the remaining sites in the training process. This scenario emulated the clinical settings more closely, and the results reflected the applicability of our model to new, different sites. The classification results of leave-one-site-out cross-validation are summarized in <xref ref-type="table" rid="table-3">Tab. 3</xref><bold>.</bold> As the results shown, our model obtained an average accuracy of 67.2&#x0025; on the entire dataset, suggesting the robust inter-site prediction of our model for new site data. Together with the results from 10-fold cross-validation, our results indicate the effectiveness of the proposed model.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Results of the leave-one-site-out cross-validation (&#x0025;)</title></caption>
<table>
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Site</th>
<th align="left">Accuracy</th>
<th align="left">Specificity</th>
<th align="left">Sensitivity</th>
<th align="left">Precision</th>
<th align="left">F1-score</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">CALTECH</td>
<td align="left">57.1</td>
<td align="left">42.9</td>
<td align="left">64.3</td>
<td align="left">69.2</td>
<td align="left">66.7</td>
</tr>
<tr>
<td align="left">CMU</td>
<td align="left">71.4</td>
<td align="left">66.7</td>
<td align="left">75.0</td>
<td align="left">75.0</td>
<td align="left">75.0</td>
</tr>
<tr>
<td align="left">KKI</td>
<td align="left">75.0</td>
<td align="left">85.0</td>
<td align="left">67.9</td>
<td align="left">86.4</td>
<td align="left">76.0</td>
</tr>
<tr>
<td align="left">LEUVEN</td>
<td align="left">68.9</td>
<td align="left">72.4</td>
<td align="left">65.6</td>
<td align="left">72.4</td>
<td align="left">68.9</td>
</tr>
<tr>
<td align="left">MAXMUN</td>
<td align="left">57.7</td>
<td align="left">62.5</td>
<td align="left">53.6</td>
<td align="left">62.5</td>
<td align="left">57.7</td>
</tr>
<tr>
<td align="left">NYU</td>
<td align="left">72.3</td>
<td align="left">60.0</td>
<td align="left">81.6</td>
<td align="left">72.7</td>
<td align="left">76.9</td>
</tr>
<tr>
<td align="left">OHSU</td>
<td align="left">69.2</td>
<td align="left">50.0</td>
<td align="left">85.7</td>
<td align="left">66.7</td>
<td align="left">75.0</td>
</tr>
<tr>
<td align="left">OLIN</td>
<td align="left">61.8</td>
<td align="left">52.6</td>
<td align="left">73.3</td>
<td align="left">55.0</td>
<td align="left">62.9</td>
</tr>
<tr>
<td align="left">PITT</td>
<td align="left">65.5</td>
<td align="left">44.8</td>
<td align="left">88.5</td>
<td align="left">59.0</td>
<td align="left">70.8</td>
</tr>
<tr>
<td align="left">SBL</td>
<td align="left">57.1</td>
<td align="left">28.6</td>
<td align="left">85.7</td>
<td align="left">54.5</td>
<td align="left">66.7</td>
</tr>
<tr>
<td align="left">SDSU</td>
<td align="left">72.4</td>
<td align="left">45.5</td>
<td align="left">88.9</td>
<td align="left">72.7</td>
<td align="left">80.0</td>
</tr>
<tr>
<td align="left">STANFORD</td>
<td align="left">76.9</td>
<td align="left">84.2</td>
<td align="left">70.0</td>
<td align="left">82.4</td>
<td align="left">75.7</td>
</tr>
<tr>
<td align="left">TRINITY</td>
<td align="left">62.2</td>
<td align="left">86.4</td>
<td align="left">39.1</td>
<td align="left">75.0</td>
<td align="left">51.4</td>
</tr>
<tr>
<td align="left">UCLA</td>
<td align="left">67.3</td>
<td align="left">66.7</td>
<td align="left">68.2</td>
<td align="left">62.5</td>
<td align="left">65.2</td>
</tr>
<tr>
<td align="left">UM</td>
<td align="left">68.6</td>
<td align="left">77.3</td>
<td align="left">60.8</td>
<td align="left">75.0</td>
<td align="left">67.2</td>
</tr>
<tr>
<td align="left">USM</td>
<td align="left">70.4</td>
<td align="left">60.9</td>
<td align="left">88.0</td>
<td align="left">55.0</td>
<td align="left">67.7</td>
</tr>
<tr>
<td align="left">YALE</td>
<td align="left">69.1</td>
<td align="left">66.7</td>
<td align="left">71.4</td>
<td align="left">69.0</td>
<td align="left">70.2</td>
</tr>
<tr>
<td align="left">Average</td>
<td align="left">67.2</td>
<td align="left">61.9</td>
<td align="left">72.2</td>
<td align="left">68.5</td>
<td align="left">69.0</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3"><label>4.3</label><title>Important FCs for ASD Classification</title>
<p>At last, we identified important FCs that best discriminate between ASD patients and healthy controls. These FCs may serve as potential biomarkers for the ASD diagnosis. We analyzed the importance of the FC features and obtained the top 20 FCs that contribute mostly to the ASD classification. To better visualize these important FCs, we separately illustrated them in the connectogram representation (<xref ref-type="fig" rid="fig-5">Fig. 5a</xref>) and mapped them onto the cortical surface (<xref ref-type="fig" rid="fig-5">Fig. 5b</xref>). Different colors are used to indicate different modules (the frontal, temporal, occipital, parietal lobes, cerebellum, vermis, and subcortical nuclei). Lines of the intra-module connections are represented by the same color as the located module, while the inter-module connections are represented by gray lines.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>Visualization of the top 20 discriminative FCs for ASD classification. (a) The Connectogram visualization. (b) Results mapped onto the cortical surface. The coordinates of each node are according to the CC200 atlas, and the brain regions are scaled by the number of their connections</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_26999-fig-5.png"/></fig>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Discussion</title>
<p>This study proposed a novel MSA-DNN model to classify the FC patterns for the ASD diagnosis. The model employed AE as basic unit to build the backbone classification network, and added MSA module in the hidden layers to enhance the interpretability and sparsity of the DNN model. Both unsupervised and supervised learning processes were conducted to improve the model performance. Systematic experiments were carried out on the large ABIDE dataset, which aggregated fMRI data of ASD patients and healthy controls from worldwide multi-sites. Results of both 10-fold cross-validation and leave-one-site-out cross-validation experiments demonstrated the robust generalization of the proposed model. We also identified the important FCs associated with ASD classification that can likely serve as the diagnostic biomarkers.</p>
<p>Due to the high acquisition cost of fMRI data, training DNN models on the FC patterns commonly encounter the problem of high dimensional features in relatively smaller samples. To solve this problem, we proposed a novel MSA-DNN model to classify the FC patterns. The model built a fully connected backbone DNN and combined both unsupervised and supervised training processes. For the backbone network, we built the DNN based on AEs to project high-dimensional FC features into a lower-dimensional feature space. In order to further ensure the sparsity of the model weights to avoid overfitting, a flexible MSA module was proposed and added between the hidden layers of the backbone DNN. The MSA module extracted multi-scale features of the FC patterns and added attention weights to the FC features. This ensured that more important FC features were continuously emphasized and less important FC features were continuously suppressed. The attention mechanism has been demonstrated utility in computer vision studies, which can be considered as a useful means to enhance the representation power towards the most informative features in a computationally efficient manner [<xref ref-type="bibr" rid="ref-31">31</xref>]. Recent studies have shown promising findings for the combination of spatial and channel attention as well as modeling channel-wise relationships, which fuse the features extracted by multiple convolution kernels with different sizes to improve the feature representation power [<xref ref-type="bibr" rid="ref-32">32</xref>,<xref ref-type="bibr" rid="ref-33">33</xref>]. Motivated by these studies, in this work, we conducted multiple convolution operations to extract multi-scale FC features and obtained the attention weights for each FC. The proposed MSA module is simple and flexible, and can be easily embedded into other classification networks.</p>
<p>Moreover, using larger dataset is usually considered as a promising solution to the challenges of reproducibility and statistical power, which would further benefit to promote clinically useful imaging diagnosis and biomarker studies [<xref ref-type="bibr" rid="ref-34">34</xref>]. Large multi-sites datasets are associated with inter-site variability owing to some potential sources of variations across different acquisition sites, such as the scanner type, imaging acquisition parameters, and subject recruitment strategies [<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-35">35</xref>]. Such site-related variation in aggregate dataset closely emulates the conditions in real clinical settings. In this study, the experiments on the whole ABIDE dataset reflect how our model generalizes to a large dataset with site-related variability. Results show that the proposed MSA-DNN achieve robust classification performance for both 10-fold cross-validation and leave-one-site-out cross-validation experiments. For 10-fold cross-validation, our MSA-DNN obtained the best classification results on all evaluation measures than the competing methods, suggesting robust generalization of our model on large-scale dataset. In addition, the experiments of leave-one-site-out cross-validation, which left out the data of one entire site as test data, further reveal reliable prediction performance of our model to new, different sites. This scenario evaluates the performance of our model under simulated clinical conditions and suggest the potential of our model for clinical application. Together, our results indicate the effectiveness of the proposed model on large-scale dataset and suggest robust generalization of our model for site-related variability.</p>
<p>Furthermore, identifying discriminative FC features would be benefit to study which brain regions are related to the specific behaviors of ASD, thus provide potential biomarkers for the ASD diagnosis. In this work, we found that brain areas including the cerebellum, hippocampus, fusiform gyrus, temporal pole, middle temporal gyrus, superior temporal gyrus, cuneus, and occipital cortex, are highly important in the ASD classification. As shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, the discriminative FCs are mostly associated with these regions. The cerebellar area is an important regulatory center for human movement, which is vital to balance the human body. Previous studies on ASD have found that the abnormalities in movement and language tasks for ASD patients may be caused by the abnormal activations in cerebellar area [<xref ref-type="bibr" rid="ref-36">36</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>]. It has also been proved that the FCs in cerebellar are much weaker than those in other regions for ASD patients [<xref ref-type="bibr" rid="ref-38">38</xref>]. In this study, we found that 4 of the top 20 discriminative FCs were related to the cerebellar. Together with the previous findings, we suggest that increasing attention for the functional and structural properties of cerebellar can be paid in future studies. In addition, the temporal-lobe areas including the temporal pole, middle temporal gyrus, and superior temporal gyrus are also involved in the discriminative FCs. Among them, the superior temporal gyrus is considered as an important area for processing auditory and language information [<xref ref-type="bibr" rid="ref-39">39</xref>]. It was found that the abnormal behaviors of ASD patients are related to this brain area [<xref ref-type="bibr" rid="ref-40">40</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>]. Moreover, the injury of middle temporal gyrus may cause disorders in facial expressions and gestures for ASD patients. In clinical trials, patients with ASD often show problems in face recognition, which may be due to the inactivation of related neurons in fusiform gyrus and occipital cortex [<xref ref-type="bibr" rid="ref-42">42</xref>]. Furthermore, as a core processing unit for memory coding and object recognition, the hippocampus plays an important role in high-level cognition. In this study, we found that 3 of the 20 discriminative FCs are associated with hippocampus. These FCs may be an important cause for the differences in the memory tasks between ASD patients and healthy controls. Besides, previous studies have also pointed out that differences in the visual cortex exist between ASD patients and healthy subjects, and the visual processing in human brain is related to the calcarine, cuneus, and occipital cortex. Overall, our results are in line with previous findings, and provide additional support that these important regions and FCs may serve as potential biomarkers for the ASD detection.</p>
<p>This study applied deep learning methods in the brain disease diagnosis. The limitation and future work for this study are summarized as follows. Firstly, considering the complexity of brain diseases and the potential individual differences, the functional interactions may be various across different subjects, which makes the data distributions of the FC patterns much more difficult to model. The use of large aggregate datasets is commonly cited as a promising solution for reproducibility and statistical power. While this study validated the effectiveness of the proposed model on large-scale ABIDE dataset, features identified may still be biased and necessitate further verify on more participants. Moreover, although the MSA module enhances the sparsity of the model weights and alleviates overfitting to some extent, the AE-based backbone DNN still needs to learn a large number of parameters. In view of the promising results obtained from multiple modality data fusion method in recent computer-aided medicine studies [<xref ref-type="bibr" rid="ref-27">27</xref>,<xref ref-type="bibr" rid="ref-43">43</xref>], the fusion of structure MRI features and FC patterns as well as introducing multi-task learning strategy may further promote the model training and enhance the classification performance. This possibility will be further explored in the future work.</p>
</sec>
<sec id="s6"><label>6</label><title>Conclusion</title>
<p>In this study, we proposed a novel MSA-DNN model to classify the FC patterns for ASD detection. The model built a DNN based on AEs for FC feature dimensionality reduction and learning, and combined both unsupervised and supervised training processes to improve the effectiveness of feature learning. A flexible MSA module was added between the hidden layers of the DNN model, which further ensured the sparsity of the model weights and improved the model interpretability. Systematic experiments on the large multi-sites ABIDE dataset demonstrate the effectiveness of the proposed model. We also identified important FCs as biomarkers associated with ASD classification. To sum, our study provides an effective framework to learn and classify FC patterns for ASD diagnosis, and can be further extended to the imaging diagnosis of other brain diseases.</p>
</sec>
</body>
<back>
<ack>
<p>Multi-sites fMRI data were downloaded from the ABIDE dataset. We sincerely thank ABIDE for the publicly access and download of data for further research.</p>
</ack>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> This work was supported by the National Natural Science Foundation of China (No. 61906006).</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> We declare that we have no actual or potential conflict of interest including any financial, personal or other relationships with other people or organizations that can inappropriately influence our work.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Ecker</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Marquand</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Mourao-Miranda</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Johnston</surname></string-name>, <string-name><given-names>E. M.</given-names> <surname>Daly</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Describing the brain in autism in five dimensions&#x2014;Magnetic resonance imaging-assisted diagnosis of autism spectrum disorder using a multiparameter classification approach</article-title>,&#x201D; <source>Journal of Neuroscience</source>, vol. <volume>30</volume>, no. <issue>32</issue>, pp. <fpage>10612</fpage>&#x2013;<lpage>10623</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. S.</given-names> <surname>Monk</surname></string-name>, <string-name><given-names>S. J.</given-names> <surname>Peltier</surname></string-name>, <string-name><given-names>J. L.</given-names> <surname>Wiggins</surname></string-name>, <string-name><given-names>S. J.</given-names> <surname>Weng</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Carrasco</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Abnormalities of intrinsic functional connectivity in autism spectrum disorders</article-title>,&#x201D; <source>NeuroImage</source>, vol. <volume>47</volume>, no. <issue>2</issue>, pp. <fpage>746</fpage>&#x2013;<lpage>772</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. E.</given-names> <surname>Lynall</surname></string-name>, <string-name><given-names>D. S.</given-names> <surname>Bassett</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Kerwin</surname></string-name>, <string-name><given-names>P. J.</given-names> <surname>Mckenna</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Kitzbichler</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Functional connectivity and brain networks in schizophrenia</article-title>,&#x201D; <source>Journal of Neuroscience</source>, vol. <volume>30</volume>, no. <issue>32</issue>, pp. <fpage>10612</fpage>&#x2013;<lpage>10623</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>F. X.</given-names> <surname>Wu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Improving Alzheimer&#x2019;s disease classification by combining multiple measures</article-title>,&#x201D; <source>IEEE/ACM Transactions on Computational Biology and Bioinformatics</source>, vol. <volume>15</volume>, no. <issue>5</issue>, pp. <fpage>1649</fpage>&#x2013;<lpage>1659</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. P.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>C. L.</given-names> <surname>Keown</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Jahedi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Nair</surname></string-name>, <string-name><given-names>M. E.</given-names> <surname>Pflieger</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Diagnostic classification of intrinsic functional connectivity highlights somatosensory, default mode, and visual regions in autism</article-title>,&#x201D; <source>Neuroimage Clinical</source>, vol. <volume>8</volume>, pp. <fpage>238</fpage>&#x2013;<lpage>245</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>K. C.</given-names> <surname>Dominick</surname></string-name>, <string-name><given-names>A. A.</given-names> <surname>Minai</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>C. A.</given-names> <surname>Erickson</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Diagnosing autism spectrum disorder from brain resting-state functional connectivity patterns using a deep neural network with a novel feature selection method</article-title>,&#x201D; <source>Frontiers in Neuroscience</source>, vol. <volume>11</volume>, pp. <fpage>460</fpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Kong</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Pan</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Classification of autism spectrum disorder by combining brain connectivity and deep neural network classifier</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>324</volume>, pp. <fpage>63</fpage>&#x2013;<lpage>68</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. S.</given-names> <surname>Heinsfeld</surname></string-name>, <string-name><given-names>A. R.</given-names> <surname>Franco</surname></string-name>, <string-name><given-names>R. C.</given-names> <surname>Craddock</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Buchweitz</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Meneguzzi</surname></string-name></person-group>, &#x201C;<article-title>Identification of autism spectrum disorder using deep learning and the ABIDE dataset</article-title>,&#x201D; <source>NeuroImage: Clinical</source>, vol. <volume>17</volume>, pp. <fpage>16</fpage>&#x2013;<lpage>23</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Shen</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Albanie</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Sun</surname></string-name> and <string-name><given-names>E.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>Squeeze-and-excitation networks</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>42</volume>, no. <issue>8</issue>, pp. <fpage>2011</fpage>&#x2013;<lpage>2023</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yang</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Tang</surname></string-name></person-group>, &#x201C;<article-title>Residual attention network for image classification</article-title>,&#x201D; in <conf-name>IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Hawaii, USA</conf-loc>, pp. <fpage>3156</fpage>&#x2013;<lpage>3164</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Simonyan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Vedaldi</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Zisserman</surname></string-name></person-group>, &#x201C;<article-title>Deep inside convolutional networks: Visualising image classification models and saliency maps</article-title>,&#x201D; in <conf-name>Proc. of ICLR</conf-name>, <conf-loc>Banff, Canada</conf-loc>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Mingoia</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Wagner</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Langbein</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Maitra</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Smesny</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Default mode network activity in schizophrenia studied at resting state using probabilistic ICA</article-title>,&#x201D; <source>Schizophrenia Research</source>, vol. <volume>138</volume>, pp. <fpage>143</fpage>&#x2013;<lpage>149</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>C. Y.</given-names> <surname>Wee</surname></string-name>, <string-name><given-names>H. F.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>D. G.</given-names> <surname>Shen</surname></string-name></person-group>, &#x201C;<article-title>Inter-modality relationship constrained multi-modality multi-task feature selection for Alzheimer&#x2019;s disease and mild cognitive impairment identification</article-title>,&#x201D; <source>Neuroimage</source>, vol. <volume>84</volume>, pp. <fpage>466</fpage>&#x2013;<lpage>475</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Liu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Resting-state time-varying analysis reveals aberrant variations of functional connectivity in autism</article-title>,&#x201D; <source>Frontiers in Human Neuroscience</source>, vol. <volume>10</volume>, no. <issue>13</issue>, pp. <fpage>463</fpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X. F.</given-names> <surname>Geng</surname></string-name>, <string-name><given-names>J. H.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>B. L.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>Y. G.</given-names> <surname>Shi</surname></string-name></person-group>, &#x201C;<article-title>Multivariate classification of major depressive disorder using the effective connectivity and functional connectivity</article-title>,&#x201D; <source>Frontiers in Neuroscience</source>, vol. <volume>12</volume>, no. <issue>38</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. A.</given-names> <surname>Nielsen</surname></string-name>, <string-name><given-names>B. A.</given-names> <surname>Zielinski</surname></string-name>, <string-name><given-names>P. T.</given-names> <surname>Fletcher</surname></string-name>, <string-name><given-names>A. L.</given-names> <surname>Alexander</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Lange</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Multisite functional connectivity MRI classification of autism: ABIDE results</article-title>,&#x201D; <source>Frontiers in Human Neuroscience</source>, vol. <volume>7</volume>, no. <issue>1</issue>, pp. <fpage>599</fpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L. Q.</given-names> <surname>Uddin</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Supekar</surname></string-name>, <string-name><given-names>C. J.</given-names> <surname>Lynch</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Khouzam</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Phillips</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Salience network&#x2013;based classification and prediction of symptom severity in children with autism</article-title>,&#x201D; <source>JAMA Psychiatry</source>, vol. <volume>70</volume>, pp. <fpage>869</fpage>&#x2013;<lpage>879</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M. J.</given-names> <surname>Rosa</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Portugal</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Shawe-Taylor</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Mourao-Miranda</surname></string-name></person-group>, &#x201C;<article-title>Sparse network-based models for patient classification using fMRI</article-title>,&#x201D; in <conf-name>Proc. of IEEE Int. Workshop on Pattern Recognition in Neuroimaging</conf-name>, <conf-loc>Philadelphia, USA</conf-loc>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Duan</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Multivariate classification of autism spectrum disorder using frequency-specific resting-state functional connectivity&#x2014;A multi-center study</article-title>,&#x201D; <source>Progress in Neuro-Psychopharmacology &#x0026; Biological Psychiatry</source>, vol. <volume>64</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Farkh</surname></string-name>, <string-name><given-names>M. T.</given-names> <surname>Quasim</surname></string-name>, <string-name><given-names>K. A.</given-names> <surname>Jaloud</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Alhuwaumel</surname></string-name> and <string-name><given-names>S. T.</given-names> <surname>Siddiqui</surname></string-name></person-group>, &#x201C;<article-title>Computer vision-control-based CNN-PID for mobile robot</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>68</volume>, no. <issue>1</issue>, pp. <fpage>1065</fpage>&#x2013;<lpage>1079</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Tu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Waqas</surname></string-name>, <string-name><given-names>S. U.</given-names> <surname>Rahman</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Mir</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Haim</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Social phenomena and fog computing networks: A novel perspective for future networks</article-title>,&#x201D; <source>IEEE Transactions on Computational Social Systems</source>, vol. <volume>9</volume>, no. <issue>1</issue>, pp. <fpage>32</fpage>&#x2013;<lpage>44</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Tu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Waqas</surname></string-name>, <string-name><given-names>S. U.</given-names> <surname>Rahman</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Mir</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Abbas</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Reinforcement learning assisted impersonation attack detection in device-to-device communications</article-title>,&#x201D; <source>IEEE Transactions on Vehicular Technology</source>, vol. <volume>70</volume>, no. <issue>2</issue>, pp. <fpage>1474</fpage>&#x2013;<lpage>1479</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wan</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Waqas</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Tu</surname></string-name>, <string-name><given-names>S. M.</given-names> <surname>Hussain</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Shah</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>An efficient impersonation attack detection method in fog computing</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>68</volume>, no. <issue>1</issue>, pp. <fpage>267</fpage>&#x2013;<lpage>281</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P. N.</given-names> <surname>Srinivasu</surname></string-name>, <string-name><given-names>A. K.</given-names> <surname>Bhoi</surname></string-name>, <string-name><given-names>R. H.</given-names> <surname>Jhaveri</surname></string-name>, <string-name><given-names>G. T.</given-names> <surname>Reddy</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Bilal</surname></string-name></person-group>,&#x201C;<article-title>Probabilistic deep Q network for real-time path planning in censorious robotic procedures using force sensors</article-title>,&#x201D; <source>Journal of Real-Time Image Processing</source>, vol. <volume>18</volume>, pp. <fpage>1773</fpage>&#x2013;<lpage>1785</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T. R.</given-names> <surname>Gadekallu</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Khare</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Bhattacharya</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>P. K. R.</given-names> <surname>Maddikunta</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Deep neural networks to predict diabetic retinopathy</article-title>,&#x201D; <source>Journal of Ambient Intelligence and Humanized Computing</source>, vol. <volume>13</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>14</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T. R.</given-names> <surname>Gadekallu</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Khare</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Bhattacharya</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>P. K. R.</given-names> <surname>Maddikunta</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Early detection of diabetic retinopathy using PCA-firefly based deep learning model</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>9</volume>, no. <issue>2</issue>, pp. <fpage>274</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. A.</given-names> <surname>El-Hag</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sedik</surname></string-name>, <string-name><given-names>G. M.</given-names> <surname>El-Banby</surname></string-name>, <string-name><given-names>W.</given-names> <surname>El-Shafai</surname></string-name>, <string-name><given-names>A. A. M.</given-names> <surname>Khalaf</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Utilization of image interpolation and fusion in brain tumor segmentation</article-title>,&#x201D; <source>International Journal Numerical Methods Biomedical Engngineering</source>, vol. <volume>37</volume>, no. <issue>8</issue>, pp. 1--26, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>V. D.</given-names> <surname>Calhoun</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Shim</surname></string-name> and <string-name><given-names>J. H.</given-names> <surname>Lee</surname></string-name></person-group>, &#x201C;<article-title>Deep neural network with weight sparsity control and pre-training extracts hierarchical features and enhances classification performance: Evidence from whole-brain resting-state functional connectivity patterns of schizophrenia</article-title>,&#x201D; <source>Neuroimage</source>, vol. <volume>124</volume>, pp. <fpage>127</fpage>&#x2013;<lpage>146</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. C.</given-names> <surname>Craddock</surname></string-name>, <string-name><given-names>G. A.</given-names> <surname>James</surname></string-name>, <string-name><given-names>P. E.</given-names> <surname>Holtzheimer</surname></string-name>, <string-name><given-names>X. P.</given-names> <surname>Hu</surname></string-name> and <string-name><given-names>H. S.</given-names> <surname>Mayberg</surname></string-name></person-group>, &#x201C;<article-title>A whole brain fMRI atlas generated via spatially constrained spectral clustering</article-title>,&#x201D; <source>Human Brain Mapping</source>, vol. <volume>33</volume>, pp. <fpage>1914</fpage>&#x2013;<lpage>1928</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Abraham</surname></string-name>, <string-name><given-names>M. P.</given-names> <surname>Milham</surname></string-name>, <string-name><given-names>A. D.</given-names> <surname>Martino</surname></string-name>, <string-name><given-names>R. C.</given-names> <surname>Craddock</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Samaras</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Deriving reproducible biomarkers from multi-site resting-state data: An autism-based example</article-title>,&#x201D; <source>NeuroImage</source>, vol. <volume>147</volume>, pp. <fpage>736</fpage>&#x2013;<lpage>745</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Vaswani</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Shazeer</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Parmar</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Uszkoreit</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Jones</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Attention is all you need</article-title>,&#x201D; in <conf-name>Proc. NIPS</conf-name>, <conf-loc>California, USA</conf-loc>, pp. <fpage>5998</fpage>&#x2013;<lpage>6008</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Szegedy</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Jia</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Sermanet</surname></string-name>, <string-name><given-names>S. E.</given-names> <surname>Reed</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Going deeper with convolutions</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <conf-loc>Boston, USA</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>M. M.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>X. Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>M. H.</given-names> <surname>Yang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Res2net: A new multi-scale backbone architecture</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>43</volume>, pp. <fpage>652</fpage>&#x2013;<lpage>662</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K. S.</given-names> <surname>Button</surname></string-name>, <string-name><given-names>J. P. A.</given-names> <surname>Ioannidis</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Mokrysz</surname></string-name>, <string-name><given-names>B. A.</given-names> <surname>Nosek</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Flint</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Power failure: Why small sample size undermines the reliability of neuroscience</article-title>,&#x201D; <source>Nature Reviews Neuroscience</source>, vol. <volume>13</volume>, no. <issue>5</issue>, pp. <fpage>365</fpage>&#x2013;<lpage>376</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Dadi</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Rahim</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Abraham</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Chyzhyk</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Milham</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Benchmarking functional connectome-based predictive models for resting-state fMRI</article-title>,&#x201D; <source>NeuroImage</source>, vol. <volume>192</volume>, pp. <fpage>115</fpage>&#x2013;<lpage>134</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. H.</given-names> <surname>Mostofsky</surname></string-name>, <string-name><given-names>S. K.</given-names> <surname>Powell</surname></string-name>, <string-name><given-names>D. J.</given-names> <surname>Simmonds</surname></string-name>, <string-name><given-names>M. C.</given-names> <surname>Goldberg</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Caffo</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Decreased connectivity and cerebellar activity in autism during motor task performance</article-title>,&#x201D; <source>Brain</source>, vol. <volume>132</volume>, no. <issue>9</issue>, pp. <fpage>2413</fpage>&#x2013;<lpage>2425</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Verly</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Verhoeven</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Zink</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Mantini</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Peeters</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Altered functional connectivity of the language network in ASD: Role of classical language areas and cerebellum</article-title>,&#x201D;<source>NeuroImage: Clinical</source>, vol. <volume>4</volume>, pp. <fpage>374</fpage>&#x2013;<lpage>382</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Long</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Duan</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Mantini</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Alteration of functional connectivity in autism spectrum disorder: Effect of age and anatomical distance</article-title>,&#x201D; <source>Scientific Reports</source>, vol. <volume>6</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. D.</given-names> <surname>Lewis</surname></string-name>, <string-name><given-names>R. J.</given-names> <surname>Theilmann</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Townsend</surname></string-name> and <string-name><given-names>A. C.</given-names> <surname>Evans</surname></string-name></person-group>, &#x201C;<article-title>Network efficiency in autism spectrum disorder and its relation to brain overgrowth</article-title>,&#x201D; <source>Frontiers in Human Neuroscience</source>, vol. <volume>7</volume>, pp. <fpage>845</fpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. A.</given-names> <surname>Green</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Rudie</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Colich</surname></string-name>, <string-name><given-names>J. J.</given-names> <surname>Wood</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Shirinyan</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Overreactive brain responses to sensory stimuli in youth with autism spectrum disorders</article-title>,&#x201D; <source>Journal of the American Academy of Child &#x0026; Adolescent Psychiatry</source>, vol. <volume>52</volume>, pp. <fpage>1158</fpage>&#x2013;<lpage>1172</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>O&#x2019;connor</surname></string-name></person-group>, &#x201C;<article-title>Auditory processing in autism spectrum disorder: A review</article-title>,&#x201D; <source>Neuroscience &#x0026; Biobehavioral Reviews</source>, vol. <volume>36</volume>, no. <issue>2</issue>, pp. <fpage>836</fpage>&#x2013;<lpage>854</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Subbaraju</surname></string-name>, <string-name><given-names>M. B.</given-names> <surname>Suresh</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Sundaram</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Narasimhan</surname></string-name></person-group>, &#x201C;<article-title>Identifying differences in brain activities and an accurate detection of autism spectrum disorder using resting state functional-magnetic resonance imaging: A spatial filtering approach</article-title>,&#x201D; <source>Medical Image Analysis</source>, vol. <volume>35</volume>, pp. <fpage>375</fpage>&#x2013;<lpage>389</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Raki&#x0107;</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Cabezas</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Kushibar</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Oliver</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Llado</surname></string-name></person-group>, &#x201C;<article-title>Improving the detection of autism spectrum disorder by combining structural and functional MRI information</article-title>,&#x201D; <source>NeuroImage: Clinical</source>, vol. <volume>25</volume>, pp. <fpage>102181</fpage>, <year>2020</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>