<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">62643</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.062643</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Awareness with Machine: Hybrid Approach to Detecting ASD with a Clustering</article-title>
<alt-title alt-title-type="left-running-head">Awareness with Machine: Hybrid Approach to Detecting ASD with a Clustering</alt-title>
<alt-title alt-title-type="right-running-head">Awareness with Machine: Hybrid Approach to Detecting ASD with a Clustering</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Baydogmus</surname><given-names>Gozde Karatas</given-names></name><email>gkaratas@marmara.edu.tr</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Demir</surname><given-names>Onder</given-names></name></contrib>
<aff id="aff-1"><institution>Department of Computer Engineering, Marmara University</institution>, <addr-line>Istanbul, 34854</addr-line>, <country>Turkey</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Gozde Karatas Baydogmus. Email: <email>gkaratas@marmara.edu.tr</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>03</day><month>07</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>2</issue>
<fpage>3393</fpage>
<lpage>3406</lpage>
<history>
<date date-type="received">
<day>23</day>
<month>12</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>5</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_62643.pdf"></self-uri>
<abstract>
<p>Detection of Autism Spectrum Disorder (ASD) is a crucial area of research, representing a foundational aspect of psychological studies. The advancement of technology and the widespread adoption of machine learning methodologies have brought significant attention to this field in recent years. Interdisciplinary efforts have further propelled research into detection methods. Consequently, this study aims to contribute to both the fields of psychology and computer science. Specifically, the goal is to apply machine learning techniques to limited data for the detection of Autism Spectrum Disorder. This study is structured into two distinct phases: data preprocessing and classification. In the data preprocessing phase, four datasets&#x2014;Toddler, Children, Adolescent, and Adult&#x2014;were converted into numerical form, adjusted as necessary, and subsequently clustered. Clustering was performed using six different methods: K-means, agglomerative, DBSCAN (Density-Based Spatial Clustering of Applications with Noise), mean shift, spectral, and Birch. In the second phase, the clustered ASD data were classified. The model&#x2019;s accuracy was assessed using 5-fold cross-validation to ensure robust evaluation. In total, ten distinct machine learning algorithms were employed. The findings indicate that all clustering methods demonstrated success with various classifiers. Notably, the K-means algorithm emerged as particularly effective, achieving consistent and significant results across all datasets. This study is expected to serve as a guide for improving ASD detection performance, even with minimal data availability.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>ASD</kwd>
<kwd>ASD detection</kwd>
<kwd>machine learning</kwd>
<kwd>clustering methods</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In today&#x2019;s world, ASD has gained significant attention. Diagnostic methods typically focus on behavioral symptoms in social, sensory, and motor skills. Recent studies indicate that ASD affects approximately 1 in 36 children in the United States, with prevalence rates rising globally [<xref ref-type="bibr" rid="ref-1">1</xref>]. This increase highlights the urgent need for accessible, efficient, and scalable diagnostic methods. Recent advances in technology, machine learning, and data analysis are improving quantitative and ecological validation methods. However, clinical screening tests remain expensive and time-consuming [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>Machine learning has shown remarkable success across fields, providing techniques for learning, detection, data analysis, and pre-processing. Rapid developments in computer science have enabled prediction models that integrate multiple disciplines [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>]. This study aims to develop a machine learning-based estimator for ASD detection across different age groups using limited data.</p>
<p>Early diagnosis is crucial, yet obtaining well-structured datasets remains a challenge. A dataset covering four age groups&#x2014;Toddler, Children, Adolescent, and Adult&#x2014;was selected. Six clustering techniques (k-means, agglomerative, DBSCAN, mean shift, spectral, and birch) and ten machine learning classifiers (logistic regression, support vector machines, k-nearest neighbor, multi-layered perceptron, extra tree classifier, Gaussian process classifier, passive aggressive classifier, ridge, stochastic gradient descent, and linear support vector machines) were employed. The prediction performance of these models was evaluated through clustering on ASD datasets.</p>
<p>Despite numerous studies, no research aligns precisely with the objectives of this study. Existing approaches primarily focus on popular machine learning and deep learning models with an emphasis on feature extraction rather than processing. Proper feature normalization significantly impacts model performance. This research aims to bridge this gap by designing a clustering-assisted classification model, contributing to cost-effective and efficient ASD screening solutions. Considering these aspects, the study aimed to address the following questions:
<list list-type="bullet">
<list-item>
<p>How do alternative machine learning algorithms, apart from the commonly used ones, affect the model&#x2019;s performance?</p></list-item>
<list-item>
<p>How can the model&#x2019;s performance be enhanced without reducing the number of features?</p></list-item>
</list></p>
<p>The designed model consists of two phases: data preprocessing and classification. Numerical transformations of the four ASD datasets selected in the data preprocessing phase were performed, and clustering algorithms were used. Machine learning algorithms selected with a clustered dataset were trained in the classification phase, and the prediction results were observed. In the next part of the study, information about the related work is given in <xref ref-type="sec" rid="s2">Section 2</xref>; in <xref ref-type="sec" rid="s3">Section 3</xref>, the materials used in the development of the study and the proposed model are explained; in <xref ref-type="sec" rid="s4">Section 4</xref>, experimental results are given; in <xref ref-type="sec" rid="s5">Section 5</xref>, the experimental results are discussed, and in <xref ref-type="sec" rid="s6">Section 6</xref>, the results are concluded.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>ASD detection with Machine Learning (ML) has just begun to attract attention; not many journal studies on this subject have been found in the literature. Therefore, conference publications have been added to the relevant studies&#x2019; titles.</p>
<p>Abdelwahab and others explored the use of ML to improve ASD diagnosis [<xref ref-type="bibr" rid="ref-5">5</xref>]. Using publicly available datasets from Kaggle and UCI ML, the researchers tested several ML algorithms. Data preprocessing involved feature selection, encoding, and normalization. Among the algorithms, Random Forest achieved the highest accuracy at 99.75%, while Logistic Regression also performed well at 96.69%. The findings highlight ML&#x2019;s potential to complement traditional ASD diagnosis, enabling earlier intervention and reducing costs.</p>
<p>In 2024, researchers compared two AutoML tools&#x2014;TPOT and KNIME&#x2014;for ASD detection using data from rehabilitation centers in Pakistan [<xref ref-type="bibr" rid="ref-6">6</xref>]. Both tools automated feature selection and model tuning using the Q-CHAT-10 questionnaire. TPOT achieved 85.23% accuracy, while KNIME reached 83.89%, with the Q-CHAT-10 score identified as the most important predictor. The study highlights AutoML&#x2019;s potential to streamline ASD diagnosis, making ML more accessible to healthcare professionals while improving early detection and treatment.</p>
<p>Xu et al. developed a method to detect ASDs in EEG (Electroencephalogram) datasets without using data augmentation methods [<xref ref-type="bibr" rid="ref-7">7</xref>]. They collected data from 97 ASD and 92 typically developing individuals from publicly available datasets. The data was collected during rest and while performing a task. They designed and implemented a combined network based on convolutional neural network (CNN) and long short-term memory (LSTM) for ASD detection. The developed network achieved classification accuracies of 81.08% and 74.55% for resting state and task state data, respectively.</p>
<p>Dia et al. proposed a supervised learning method to classify Autism Spectrum Disorder and to assess emotion levels among autistic children [<xref ref-type="bibr" rid="ref-8">8</xref>]. To evaluate the performance of the proposed approach, they used YouTube video frames of autistic children exhibiting typical autistic behaviors in unconstrained environments and conditions, as well as images of neurotypical people. They also proposed an extended version of a dataset containing additional influence labels corresponding to the influence levels of autistic children. Experiments were conducted using different models to determine the optimal performance of their architecture.</p>
<p>In 2024, researchers explored how the use of AI (Artificial Intelligence), particularly ML and deep learning (DL), can improve ASD detection [<xref ref-type="bibr" rid="ref-9">9</xref>]. They used natural language processing (NLP) to analyze Twitter posts, aiming to identify linguistic patterns associated with ASD. Various models, including decision trees, XGBOOST (eXtreme Gradient Boosting), k-nearest neighbors (KNN), Recurrent neural network (RNN), long short-term memory (LSTM), bidirectional long short-term memory (Bi-LSTM), and BERT (Bidirectional Encoder Representations from Transformers)-based models, were tested on a dataset of 404,627 tweets. BERTweet achieved the highest accuracy of 87.7%, demonstrating AI&#x2019;s potential in ASD diagnosis.</p>
<p>Researchers reviewed AI-based methodologies for ASD detection through computer vision techniques in 2024 [<xref ref-type="bibr" rid="ref-10">10</xref>]. They studied ML models such as SVM, decision trees, and gradient boosting, alongside deep learning models like CNNs, RNNs, LSTMs, and Transformer-based approaches. They proposed a binary image classifier using the Xception CNN model trained on facial images of children aged 2 to 8 years. With a dataset of 23,000 images, the model achieved an accuracy of 88.87%, highlighting the effectiveness of facial analysis in ASD detection.</p>
<p>Loganathan et al. developed a hybrid ensemble model combining ResNet101 and BiGRU networks optimized with the CHGSO algorithm for ASD detection using EEG signals [<xref ref-type="bibr" rid="ref-11">11</xref>]. The hybrid ensemble model shows superior performance in ASD detection compared to existing methods such as DNN (Deep Neural Networks), SVM (Support Vector Machine), KNN, and MGOA-RF. Their Hybrid ensemble model reaches Sensitivity of 98%, 99% higher Specificity, 98% F1-Score, MCC of 99%, Accuracy of 98%, and Precision of 99%.</p>
<p>ML for ASD detection faces challenges such as limited datasets, symptom variability, and model interpretability. Small sample sizes can lead to overfitting, while diverse symptom presentations make pattern recognition difficult. Additionally, selecting relevant features and ensuring model reliability remain key hurdles. Addressing these issues requires robust preprocessing and validation techniques.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Methods</title>
<p>In this section, the datasets, clustering techniques and machine algorithms used in the study are mentioned, and then detailed information about the proposed model is given.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Datasets</title>
<p>For this research purpose, 4 publicly available ASD datasets from Kaggle repository were used. Accordingly, the dataset for Toddler was taken from Kaggle [<xref ref-type="bibr" rid="ref-12">12</xref>], and the datasets for Children, Adult and Adolescent were taken from the UCI repository [<xref ref-type="bibr" rid="ref-13">13</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>]. Detailed information about the datasets is given in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Information about datasets</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Dataset name</th>
<th align="center">Alias for dataset name</th>
<th align="center">Feature type</th>
<th align="center">Number of features</th>
<th align="center">Number of data in classes</th>
<th align="center">Number of data</th>
</tr>
</thead>
<tbody>
<tr>
<td>Autism screening data for toddlers</td>
<td>Toddler</td>
<td>Categorical, continuous and binary</td>
<td>18</td>
<td>no: 326, yes: 729</td>
<td>1054</td>
</tr>
<tr>
<td>Autistic spectrum disorder screening data for children</td>
<td>Children</td>
<td>Categorical, continuous and binary</td>
<td>21</td>
<td>no: 151, yes: 141</td>
<td>292</td>
</tr>
<tr>
<td>Autism screening adult</td>
<td>Adult</td>
<td>Categorical, continuous and binary</td>
<td>21</td>
<td>no: 515, yes: 189</td>
<td>704</td>
</tr>
<tr>
<td>Autistic spectrum disorder screening data for adolescent</td>
<td>Adolescent</td>
<td>Categorical, continuous and binary</td>
<td>21</td>
<td>no: 41, yes: 63</td>
<td>104</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>While authorities create these datasets, ten behavioral traits (AQ-10) and different individual traits were used that have proven effective in detecting cases of ASD from behavioral science controls. In addition, there are two classes in all datasets [<xref ref-type="bibr" rid="ref-12">12</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>], ASD and non-ASD. Therefore, binary classification was performed in all the methods used. When <xref ref-type="table" rid="table-1">Table 1</xref> is examined, it is seen that there is an imbalanced distribution in the Toddler and Adult datasets. For children and adolescents, it is seen that the data numbers of the classes are more balanced. Additionally, special attention should be given to the &#x201C;Number of data in classes column&#x201D;, which indicates the class distribution in the datasets. Notably, a significant observation emerges in <xref ref-type="table" rid="table-1">Table 1</xref>; the Toddler dataset exhibits a much larger sample size for ASD compared to non-ASDs, whereas the Adult dataset presents the opposite scenario. This is definitely a situation that will affect the classification because of the imbalance dataset problem [<xref ref-type="bibr" rid="ref-16">16</xref>].</p>

</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Used Techniques</title>
<p>In this section, the methods used in the two phases of the study are mentioned.</p>
<sec id="s3_2_1">
<label>3.2.1</label>
<title>Clustering Algorithms</title>
<p>This section provides information about the clustering algorithms employed in the study [<xref ref-type="bibr" rid="ref-17">17</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>]. These algorithms, chosen for their popularity and ease of use, are as follows: K-means, Agglomerative Clustering, DBSCAN, MeanShift, Spectral Clustering, and Birch [<xref ref-type="bibr" rid="ref-17">17</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>].</p>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>ML Algorithms</title>
<p>In this section, brief information about the ML algorithms utilized in the study is provided. The following algorithms were examined [<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>]: Extra Trees Classifier (ETC) [<xref ref-type="bibr" rid="ref-23">23</xref>], Gaussian Process Classifier (GPC) [<xref ref-type="bibr" rid="ref-24">24</xref>], KNN [<xref ref-type="bibr" rid="ref-16">16</xref>], Linear Support Vector Machines (LSVC) and Support Vector Machines (SVM) [<xref ref-type="bibr" rid="ref-25">25</xref>], Logistic Regression (LR), Multi-Layered Perceptron (MLP), Passive Aggressive Classifier (PAC), Ridge Classifier (RC) [<xref ref-type="bibr" rid="ref-26">26</xref>], and Stochastic Gradient Descent (SGDC) [<xref ref-type="bibr" rid="ref-27">27</xref>].</p>
</sec>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Performance Metrics</title>
<p>Classification stands as a fundamental challenge in the field of ML, encompassing the task of forecasting the class labels of given input data. To gauge the performance of such models, the accuracy score emerges as a widely employed evaluation measure. It quantifies the proportion of accurate predictions made by the model in relation to the total number of predictions conducted. In addition, accuracy score, F1-score, ROC (Receiver Operating Characteristic)/AUC (Area Under the Curve), and values are calculated [<xref ref-type="bibr" rid="ref-16">16</xref>].</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Proposed Model</title>
<p>In recent years, substantial efforts have been dedicated to enhancing ASD classification, and this research continues to progress. Upon examining the studies, it becomes evident that both image and text-based datasets were utilized.</p>
<p>However, it was observed that these datasets lack sufficient data, particularly in the case of text-based datasets, as they require user trust in the clinical environment, resulting in fewer attributes and data entries. Taking all of these factors into consideration, this study aims to investigate the impact of ML approaches, which is a popular topic today, on improving ASD detection with a limited amount of data. In this manner, the research consists of two phases:
<list list-type="order">
<list-item>
<p>Clustering the dataset, which is called &#x201C;Data Preprocessing&#x201D;.</p></list-item>
<list-item>
<p>Applying selected ML algorithms on both clustered datasets, which is called &#x201C;Classifier Training&#x201D;.</p></list-item>
</list></p>
<p><xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the phases of the proposed model and provides further details, which will be discussed in depth in the following section.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Flowchart of the proposed model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_62643-fig-1.tif"/>
</fig>
<sec id="s3_4_1">
<label>3.4.1</label>
<title>Environment and Development</title>
<p>One of the most crucial aspects in artificial intelligence studies is the development environment and techniques employed. Providing information about the development environments in research studies aims to guide researchers and prevent any inaccuracies. The working environment utilized for the proposed model is outlined in <xref ref-type="table" rid="table-2">Table 2</xref>. Additionally, the Python programming language was employed for data preprocessing and applying ML techniques. Python has become a frequently preferred language for artificial learning approaches in recent years, offering a delightful development experience. Its availability of various artificial learning modules facilitates operations with ease. In addition, these modules can be customized according to user requirements, making them conducive to further development. Throughout this study, all the ML algorithms used retained their default Python values. In other words, no changes were made to any parameters of the algorithms, and they were run as defined in Python.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Development environment</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Hardware</th>
<th>Properties</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>Intel(R) Core(Tm) I7-8750H Cpu @2.20 GHz, 6 Cores</td>
</tr>
<tr>
<td>Op. Syst.</td>
<td>64 bits, Windows 11</td>
</tr>
<tr>
<td>Graphic Card</td>
<td>GTX 1650</td>
</tr>
<tr>
<td>L1/L2/L3Cache</td>
<td>384 KB/1.5 MB/9.0 MB</td>
</tr>
<tr>
<td>RAM</td>
<td>16.00 GB</td>
</tr>
<tr>
<td>Python version</td>
<td>3.9 64-bit</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_4_2">
<label>3.4.2</label>
<title>Proposed Algorithm</title>
<p>In this section, detailed information about the proposed hybrid method is given and how to implement the algorithm is explained step by step.</p>
<p>All operations were executed in a consistent manner for all four datasets.
<list list-type="order">
<list-item>
<p>The dataset has been transformed into a numerical format for mathematical operations. During these conversion processes, categorical data were organized, missing data were addressed, and labels were converted into numerical values.
<list list-type="bullet">
<list-item>
<p>The dataset was converted into a numerical format to facilitate mathematical operations required for ML models. This process included structuring categorical data, handling missing values, and transforming labels into numerical representations to ensure consistency across all features.</p></list-item>
<list-item>
<p>Since all categorical variables in the dataset were nominal (i.e., they do not have an inherent order or ranking), no specialized encoding techniques such as ordinal encoding were necessary. Instead, these categorical values were directly transformed into numerical representations while preserving their original properties.</p></list-item>
<list-item>
<p>Some columns contained an unique categorical entries, particularly ethnicity, country of residence, and relation. To manage this effectively, Label Encoding was used instead of One-Hot Encoding. This decision was made to avoid a significant increase in feature dimensionality, which could lead to excessive sparsity and computational inefficiencies. Label Encoding assigned each category a unique numerical value while preserving the dataset&#x2019;s structure and preventing unnecessary expansion of features.</p></list-item>
<list-item>
<p>Certain columns in the dataset contained missing values, represented by the symbol &#x2018;?&#x2019;. These missing entries were systematically addressed to maintain data integrity. Depending on the nature of the missing data, appropriate techniques such as imputation (e.g., replacing missing values with the mode or median) or row-wise removal were applied to ensure the dataset remained complete and suitable for ML analysis.</p></list-item>
</list></p></list-item>
<list-item>
<p>After the numerical operations were performed on the dataset, clustering was done separately with the selected clustering algorithms. Then, this clustered dataset was trained with ML algorithms.</p></list-item>
<list-item>
<p>In addition, a normality test was conducted using hypothesis tests for the data sets presented in <xref ref-type="table" rid="table-1">Table 1</xref>. Specifically, the Shapiro-Wilk test was applied individually to each dataset. According to this hypothesis test, if the <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>p</mml:mi></mml:math></inline-formula>-value of the examined data is equal to or above 0.05, it follows a normal distribution; otherwise, it does not.</p>
</list-item>
</list></p>
<p>The second phase of the study involves applying ML algorithms to the clustered dataset and examining the determined &#x201C;Performance Metrics&#x201D;. Although 25 ML algorithms were initially applied, only 10 of them were ultimately selected. These selected algorithms are briefly summarized under the title of &#x201C;ML Algorithms&#x201D;. The reason for choosing these specific algorithms is that their performance ratios remained consistent regardless of clustering. Remarkable improvements were observed in the 10 algorithms examined and proposed in the study.</p>
<p>Additionally, the 5-fold cross-validation method was employed during the training and testing phases of the study. This approach ensured more robust testing and estimation processes.</p>
<p>The decision to perform the cross-validation process five times is based on recommendations in the literature for ML algorithms [<xref ref-type="bibr" rid="ref-16">16</xref>]. This number is considered optimal for obtaining reliable results. Algorithm 1 shows the pseudo code of the proposed method.</p>
<fig id="fig-4">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_62643-fig-4.tif"/>
</fig>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Result</title>
<p>In this section, the proposed approach for the study was implemented, and all experiments were conducted in the environment specified under the title &#x201C;Environment and Development&#x201D;.</p>
<p>The results were evaluated separately for the clustered dataset. In the subsequent section, the outcomes obtained for each performance metric will be presented and thoroughly analyzed. Various clustering and ML approaches were considered in the study, but only the most prominent ones were included. Algorithms such as random forest and decision tree, which are commonly used in the literature, were excluded from the study, as there are already sufficient studies available about these algorithms. Additionally, algorithms with low accuracy were not included in the study to focus on the most effective ones.</p>
<p>As in all ML studies, the accuracy rate was first calculated in this study. The accuracy rate results are given in <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Accuracy scores with and without clustering using ML algortihms</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th></th>
<th></th>
<th colspan="10">Algorithms</th>
</tr>
<tr>
<th>Dataset</th>
<th>Clustering</th>
<th>ETC</th>
<th>GPC</th>
<th>KNN</th>
<th>LSVC</th>
<th>LR</th>
<th>MLP</th>
<th>PAC</th>
<th>RC</th>
<th>SGD</th>
<th>SVC</th>
</tr>
</thead>
<tbody>
<tr>
<td>Children</td>
<td>Agglomerative</td>
<td>98.29</td>
<td>100.00</td>
<td>75.00</td>
<td>95.89</td>
<td>100.00</td>
<td>81.51</td>
<td><bold>81.16</bold></td>
<td>96.92</td>
<td><bold>80.48</bold></td>
<td>81.85</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td><bold>98.63</bold></td>
<td>100.00</td>
<td>75.00</td>
<td>94.18</td>
<td>100.00</td>
<td><bold>84.59</bold></td>
<td><bold>82.53</bold></td>
<td>96.92</td>
<td>76.37</td>
<td>81.85</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td>97.60</td>
<td>100.00</td>
<td>75.00</td>
<td><bold>96.58</bold></td>
<td>100.00</td>
<td><bold>85.96</bold></td>
<td><bold>83.90</bold></td>
<td>96.92</td>
<td><bold>79.79</bold></td>
<td>81.51</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>98.29</td>
<td>100.00</td>
<td>75.00</td>
<td>93.84</td>
<td>100.00</td>
<td>80.82</td>
<td><bold>72.60</bold></td>
<td>96.92</td>
<td>78.77</td>
<td>81.51</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>98.29</td>
<td>100.00</td>
<td>75.00</td>
<td>94.86</td>
<td>100.00</td>
<td>80.82</td>
<td><bold>75.00</bold></td>
<td>96.92</td>
<td>69.18</td>
<td>81.51</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>98.29</td>
<td>100.00</td>
<td>75.00</td>
<td>95.89</td>
<td>100.00</td>
<td>82.88</td>
<td>72.26</td>
<td>96.92</td>
<td>79.11</td>
<td>81.85</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td><bold>98.63</bold></td>
<td>100.00</td>
<td>75.00</td>
<td>92.12</td>
<td>100.00</td>
<td>85.27</td>
<td><bold>72.60</bold></td>
<td>96.92</td>
<td><bold>87.33</bold></td>
<td>81.85</td>
</tr>
<tr>
<td>Toddler</td>
<td>Agglomerative</td>
<td>98.86</td>
<td>100.00</td>
<td>97.72</td>
<td>100.00</td>
<td>100.00</td>
<td><bold>98.67</bold></td>
<td><bold>95.64</bold></td>
<td>94.97</td>
<td>99.62</td>
<td>99.43</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td>98.77</td>
<td>100.00</td>
<td>97.53</td>
<td>100.00</td>
<td>100.00</td>
<td>97.82</td>
<td><bold>96.58</bold></td>
<td>95.35</td>
<td>98.01</td>
<td>99.43</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td><bold>99.34</bold></td>
<td>100.00</td>
<td>97.91</td>
<td>100.00</td>
<td>100.00</td>
<td><bold>99.81</bold></td>
<td>93.83</td>
<td>95.54</td>
<td>98.01</td>
<td>99.43</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>98.96</td>
<td>100.00</td>
<td><bold>98.01</bold></td>
<td>100.00</td>
<td>100.00</td>
<td>98.20</td>
<td><bold>96.39</bold></td>
<td>95.16</td>
<td>96.87</td>
<td>99.43</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>99.05</td>
<td>100.00</td>
<td>97.82</td>
<td>100.00</td>
<td>100.00</td>
<td><bold>98.67</bold></td>
<td><bold>94.88</bold></td>
<td>95.35</td>
<td>99.53</td>
<td>99.43</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>99.15</td>
<td>100.00</td>
<td>97.82</td>
<td>100.00</td>
<td>100.00</td>
<td>98.48</td>
<td>94.40</td>
<td>95.54</td>
<td>99.81</td>
<td>99.43</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td>99.24</td>
<td>100.00</td>
<td>97.82</td>
<td>99.81</td>
<td>100.00</td>
<td><bold>98.58</bold></td>
<td><bold>96.39</bold></td>
<td>95.35</td>
<td>98.39</td>
<td>99.43</td>
</tr>
<tr>
<td>Adolscent</td>
<td>Agglomerative</td>
<td>95.19</td>
<td>94.23</td>
<td>82.69</td>
<td><bold>82.69</bold></td>
<td>94.23</td>
<td>75.96</td>
<td><bold>81.73</bold></td>
<td>87.50</td>
<td>82.69</td>
<td>81.73</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td>95.19</td>
<td>94.23</td>
<td>82.69</td>
<td><bold>83.65</bold></td>
<td>94.23</td>
<td><bold>87.50</bold></td>
<td>78.85</td>
<td>87.50</td>
<td>83.65</td>
<td>81.73</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td>94.23</td>
<td>94.23</td>
<td>82.69</td>
<td>81.73</td>
<td>94.23</td>
<td><bold>81.73</bold></td>
<td>77.88</td>
<td>89.42</td>
<td>78.85</td>
<td>81.73</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>94.23</td>
<td>94.23</td>
<td>81.73</td>
<td>80.77</td>
<td>94.23</td>
<td><bold>83.65</bold></td>
<td>72.12</td>
<td>89.42</td>
<td>81.73</td>
<td>81.73</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>94.23</td>
<td>94.23</td>
<td>81.73</td>
<td>81.73</td>
<td>93.27</td>
<td><bold>85.58</bold></td>
<td>80.77</td>
<td>87.50</td>
<td><bold>84.62</bold></td>
<td>81.73</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>97.12</td>
<td>94.23</td>
<td>82.69</td>
<td>81.73</td>
<td>94.23</td>
<td>79.81</td>
<td>80.77</td>
<td>89.42</td>
<td>83.65</td>
<td>81.73</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td>94.23</td>
<td>94.23</td>
<td>82.69</td>
<td>80.77</td>
<td>94.23</td>
<td><bold>85.58</bold></td>
<td><bold>83.65</bold></td>
<td>87.50</td>
<td>81.73</td>
<td>81.73</td>
</tr>
<tr>
<td>Adult</td>
<td>Agglomerative</td>
<td><bold>98.44</bold></td>
<td>73.15</td>
<td>77.70</td>
<td>91.05</td>
<td><bold>99.86</bold></td>
<td><bold>93.47</bold></td>
<td><bold>76.56</bold></td>
<td>94.89</td>
<td><bold>79.83</bold></td>
<td>83.66</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td><bold>98.15</bold></td>
<td>73.15</td>
<td>77.70</td>
<td><bold>92.47</bold></td>
<td><bold>99.86</bold></td>
<td><bold>92.61</bold></td>
<td><bold>78.55</bold></td>
<td>94.89</td>
<td><bold>86.51</bold></td>
<td>83.66</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td><bold>98.30</bold></td>
<td>73.15</td>
<td>77.70</td>
<td><bold>92.90</bold></td>
<td>98.58</td>
<td><bold>87.64</bold></td>
<td><bold>78.69</bold></td>
<td>94.89</td>
<td>75.99</td>
<td>83.66</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td><bold>98.86</bold></td>
<td>73.15</td>
<td>77.70</td>
<td><bold>93.89</bold></td>
<td>98.44</td>
<td><bold>89.49</bold></td>
<td>67.61</td>
<td><bold>95.03</bold></td>
<td><bold>86.65</bold></td>
<td>83.66</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td><bold>98.15</bold></td>
<td>73.15</td>
<td>77.70</td>
<td><bold>93.89</bold></td>
<td><bold>99.72</bold></td>
<td><bold>91.90</bold></td>
<td><bold>81.25</bold></td>
<td>94.89</td>
<td><bold>81.68</bold></td>
<td>83.66</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>97.87</td>
<td>73.15</td>
<td>77.70</td>
<td>91.90</td>
<td>99.43</td>
<td>84.23</td>
<td>75.57</td>
<td>94.89</td>
<td>79.40</td>
<td>83.66</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td><bold>98.58</bold></td>
<td>73.15</td>
<td>77.70</td>
<td>89.63</td>
<td>99.43</td>
<td>84.09</td>
<td><bold>78.84</bold></td>
<td>94.89</td>
<td><bold>82.95</bold></td>
<td>83.66</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="table-3">Table 3</xref> displays the datasets in the leftmost column, followed by the clustering methods and their accuracy rates. Initially, each dataset was classified without clustering. Notably, some values are written in bold to emphasize the increase in accuracy rate, which will also be applied in other tables. A careful examination of <xref ref-type="table" rid="table-3">Table 3</xref> reveals that most of the clustering methods enhance the performance of algorithms. Particularly, almost all algorithms with Spectral demonstrated an increase in accuracy rate across all datasets. Furthermore, there is a noticeable difference in accuracy improvement between the Toddler and Adult datasets, as indicated in the &#x201C;Datasets&#x201D; title. In the Adult dataset, clustering improved accuracy rates for all algorithms, with Extra Trees Classifier (ETC) showing particularly notable success. This is because the number of non-ASD samples is higher in the Adult dataset. Continuing the analysis of <xref ref-type="table" rid="table-3">Table 3</xref>, the most successful algorithms were found to be MLP, PAC, and ETC. These algorithms either maintained or increased accuracy rates across all clustering methods. The ROC curve is a very important performance measure for classification problems. ROC is a probability curve, and the area under it, AUC, represents the degree or measure of separability. For this reason, these values are very important in solving classification problems. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows the performance of classification algorithms in detecting ASD with and without clustering.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>AUC/ROC scores for all datasets with and without clustering</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_62643-fig-2.tif"/>
</fig>
<p>For the model, the ROC curve was also drawn for each execution. However, they are not given here because they seem too complex and are too numerous. <xref ref-type="table" rid="table-4">Table 4</xref> shows the F1-score of classification algorithms in detecting ASD with and without clustering.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>F1-scores with and without clustering using ML algorithms</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th></th>
<th></th>
<th colspan="10">Algorithms</th>
</tr>
<tr>
<th>Dataset</th>
<th>Clustering</th>
<th>ETC</th>
<th>GPC</th>
<th>KNN</th>
<th>LSVC</th>
<th>LR</th>
<th>MLP</th>
<th>PAC</th>
<th>RC</th>
<th>SGD</th>
<th>SVC</th>
</tr>
</thead>
<tbody>
<tr>
<td>Children</td>
<td>Agglomerative</td>
<td>98.23</td>
<td>100.00</td>
<td>74.39</td>
<td>95.89</td>
<td>100.00</td>
<td>82.91</td>
<td>81.36</td>
<td>96.86</td>
<td>81.79</td>
<td>82.27</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td>98.59</td>
<td>100.00</td>
<td>74.39</td>
<td>93.99</td>
<td>100.00</td>
<td>85.34</td>
<td>83.71</td>
<td>96.86</td>
<td>76.61</td>
<td>82.27</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td>97.54</td>
<td>100.00</td>
<td>74.39</td>
<td>96.55</td>
<td>100.00</td>
<td>86.47</td>
<td>82.78</td>
<td>96.86</td>
<td>77.74</td>
<td>81.88</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>98.23</td>
<td>100.00</td>
<td>74.39</td>
<td>93.66</td>
<td>100.00</td>
<td>82.28</td>
<td>74.36</td>
<td>96.86</td>
<td>80.86</td>
<td>81.88</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>98.25</td>
<td>100.00</td>
<td>74.39</td>
<td>94.74</td>
<td>100.00</td>
<td>82.05</td>
<td>74.91</td>
<td>96.86</td>
<td>65.38</td>
<td>81.88</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>98.23</td>
<td>100.00</td>
<td>74.39</td>
<td>95.83</td>
<td>100.00</td>
<td>83.87</td>
<td>68.73</td>
<td>96.86</td>
<td>80.76</td>
<td>82.27</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td>98.58</td>
<td>100.00</td>
<td>74.39</td>
<td>92.41</td>
<td>100.00</td>
<td>85.42</td>
<td>74.19</td>
<td>96.86</td>
<td>88.1</td>
<td>82.27</td>
</tr>
<tr>
<td>Toddler</td>
<td>Agglomerative</td>
<td>99.18</td>
<td>100.00</td>
<td>98.34</td>
<td>100.00</td>
<td>100.00</td>
<td>99.05</td>
<td>96.85</td>
<td>96.31</td>
<td>99.73</td>
<td>99.59</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td>99.11</td>
<td>100.00</td>
<td>98.21</td>
<td>100.00</td>
<td>100.00</td>
<td>98.43</td>
<td>97.54</td>
<td>96.59</td>
<td>98.58</td>
<td>99.59</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td>99.52</td>
<td>100.00</td>
<td>98.48</td>
<td>100.00</td>
<td>100.00</td>
<td>99.86</td>
<td>95.56</td>
<td>96.73</td>
<td>98.54</td>
<td>99.59</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>99.25</td>
<td>100.00</td>
<td>98.56</td>
<td>100.00</td>
<td>100.00</td>
<td>98.71</td>
<td>97.39</td>
<td>96.46</td>
<td>97.68</td>
<td>99.59</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>99.32</td>
<td>100.00</td>
<td>98.42</td>
<td>100.00</td>
<td>100.00</td>
<td>99.04</td>
<td>96.21</td>
<td>96.59</td>
<td>99.66</td>
<td>99.59</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>99.38</td>
<td>100.00</td>
<td>98.41</td>
<td>100.00</td>
<td>100.00</td>
<td>98.91</td>
<td>95.98</td>
<td>96.73</td>
<td>99.86</td>
<td>99.59</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td>99.45</td>
<td>100.00</td>
<td>98.41</td>
<td>99.86</td>
<td>100.00</td>
<td>98.98</td>
<td>97.45</td>
<td>96.59</td>
<td>98.85</td>
<td>99.59</td>
</tr>
<tr>
<td>Adolscent</td>
<td>Agglomerative</td>
<td>96.06</td>
<td>95.31</td>
<td>85.71</td>
<td>85.94</td>
<td>95.31</td>
<td>82.52</td>
<td>86.13</td>
<td>90.23</td>
<td>86.57</td>
<td>86.71</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td>96.12</td>
<td>95.31</td>
<td>85.71</td>
<td>86.82</td>
<td>95.31</td>
<td>90.23</td>
<td>81.36</td>
<td>90.23</td>
<td>87.77</td>
<td>86.71</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td>95.31</td>
<td>95.31</td>
<td>85.71</td>
<td>85.27</td>
<td>95.31</td>
<td>85.71</td>
<td>82.71</td>
<td>91.85</td>
<td>83.08</td>
<td>86.71</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>95.38</td>
<td>95.31</td>
<td>84.8</td>
<td>84.13</td>
<td>95.31</td>
<td>87.41</td>
<td>73.87</td>
<td>91.85</td>
<td>86.71</td>
<td>86.71</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>95.38</td>
<td>95.31</td>
<td>85.04</td>
<td>84.8</td>
<td>94.49</td>
<td>88.89</td>
<td>84.13</td>
<td>90.08</td>
<td>87.30</td>
<td>86.71</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>97.64</td>
<td>95.31</td>
<td>85.71</td>
<td>85.27</td>
<td>95.31</td>
<td>85.31</td>
<td>84.38</td>
<td>91.85</td>
<td>87.22</td>
<td>86.71</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td>95.38</td>
<td>95.31</td>
<td>85.71</td>
<td>84.38</td>
<td>95.31</td>
<td>88.55</td>
<td>86.61</td>
<td>90.51</td>
<td>85.71</td>
<td>86.71</td>
</tr>
<tr>
<td>Adult</td>
<td>Agglomerative</td>
<td>97.04</td>
<td>74.83</td>
<td>52.28</td>
<td>83.89</td>
<td>99.73</td>
<td>87.89</td>
<td>63.41</td>
<td>90.11</td>
<td>71.49</td>
<td>59.36</td>
</tr>
<tr>
<td></td>
<td>Birch</td>
<td>96.50</td>
<td>74.83</td>
<td>52.28</td>
<td>86.38</td>
<td>99.73</td>
<td>86.32</td>
<td>66.37</td>
<td>90.11</td>
<td>74.11</td>
<td>59.36</td>
</tr>
<tr>
<td></td>
<td>DBSCAN</td>
<td>96.77</td>
<td>74.83</td>
<td>52.28</td>
<td>86.63</td>
<td>97.37</td>
<td>78.41</td>
<td>67.11</td>
<td>90.06</td>
<td>63.66</td>
<td>59.07</td>
</tr>
<tr>
<td></td>
<td>KMeans</td>
<td>97.86</td>
<td>74.83</td>
<td>52.28</td>
<td>89.38</td>
<td>97.11</td>
<td>81.12</td>
<td>57.78</td>
<td>90.36</td>
<td>75.52</td>
<td>59.36</td>
</tr>
<tr>
<td></td>
<td>MeanShift</td>
<td>96.5</td>
<td>74.83</td>
<td>52.28</td>
<td>88.95</td>
<td>99.47</td>
<td>85.35</td>
<td>61.85</td>
<td>90.06</td>
<td>73.62</td>
<td>59.36</td>
</tr>
<tr>
<td></td>
<td>NONE</td>
<td>95.98</td>
<td>74.83</td>
<td>52.28</td>
<td>85.93</td>
<td>98.93</td>
<td>73.51</td>
<td>58.65</td>
<td>90.06</td>
<td>71.29</td>
<td>59.36</td>
</tr>
<tr>
<td></td>
<td>Spectral</td>
<td>97.33</td>
<td>74.83</td>
<td>52.28</td>
<td>82.82</td>
<td>98.93</td>
<td>73.58</td>
<td>62.84</td>
<td>90.06</td>
<td>73.57</td>
<td>59.36</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Previously given metrics cannot give a complete result for imbalanced datasets. Basically, it is the MCC criterion that evaluates by looking at the correlation (phi-coefficient) relationship between the actual data and the predicted data. Since the Toddler and Adult dataset has an uneven distribution, the clustering algorithms and classification methods vary. However, the result still does not change. In ASD detection, ETC, MLP, PAC, and RC algorithms, together with the Spectral and DBSCAN algorithms, show great performance and make accurate detection.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Discussion</title>
<p>When the literature and existing papers are examined, it is seen that many researchers tend to solve the issue of ASD detection. The main goal of the study is to design a model that will increase the detection performance without interfering with the number of features in a small sample size. This study aims to design an ASD detection system for people of different age groups. Since diagnosis is very important for ASD, studies in this area are very important. Detection of ASD is very difficult, especially in age groups with small data and limited number of features.</p>
<p>The study examined the effect of six clustering methods on ASD datasets and the rate of improvement in classification. For this, the selected datasets were clustered and then evaluated with the specified performance metrics. The classification results<xref ref-type="fig" rid="fig-3"> </xref> were analyzed, the changes in detection results after clustering were observed, and a hybrid model was proposed. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows total time of clustering and classification for all algorithms.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Time for all datasets with and without clustering using ML algortihms</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_62643-fig-3.tif"/>
</fig>
<p>It is seen that almost every clustering method works successfully in certain algorithms with every dataset. However, the Spectral method stands out in this sense. The values obtained as a result of clustering the ASD datasets with Spectral increased the detection of ASD in all datasets compared to the unclustered state. Spectral Clustering and DBSCAN have shown superior performance over other clustering algorithms, particularly in small datasets, due to their ability to capture complex data structures. Spectral leverages graph-based techniques, transforming data into a similarity matrix before applying clustering.</p>
<p>The results of the study are summarized in the following section:
<list list-type="order">
<list-item>
<p>Using clustering before classification, the dataset for ASD detection generally improves performance. The quality and size of the dataset are crucial factors for building an effective prediction model. Clustering has played a significant role in improving dataset quality, ultimately leading to the creation of more successful prediction models, through the increased availability of larger datasets.</p></list-item>
<list-item>
<p>Particularly in ASD imbalanced datasets, the large number of non-ASD samples enhances the model&#x2019;s success rate.</p></list-item>
<list-item>
<p>Clustering the data allows it to be brought within a certain range, leading to more consistent model performance. In this regard, the Spectral and MLP methods can be preferred as an option for the classification of ASD datasets.</p></list-item>
<list-item>
<p>In the study, it was observed that the prediction rate was increased through correct pre-processing in datasets that did not have a normal distribution.</p></list-item>
</list></p>
<p>Overall, the study highlights the importance of clustering techniques in improving the detection of ASD and identifies specific algorithms that perform exceptionally well in different dataset scenarios.</p>
<p><xref ref-type="table" rid="table-5">Table 5</xref> has the accuracy rate comparison with similar ASD studies found in the literature. The values for [<xref ref-type="bibr" rid="ref-28">28</xref>] and [<xref ref-type="bibr" rid="ref-29">29</xref>]are based on the reference [<xref ref-type="bibr" rid="ref-29">29</xref>], and the values for [<xref ref-type="bibr" rid="ref-30">30</xref>] represent the overall average accuracy achieved by their proposed models.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Comparison of accuracy with other studies</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th></th>
<th colspan="4">Accuracy scores (%)</th>
</tr>
<tr>
<th>Datasets</th>
<th>[<xref ref-type="bibr" rid="ref-28">28</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-29">29</xref>]</th>
<th>[<xref ref-type="bibr" rid="ref-30">30</xref>]</th>
<th>Proposed study</th>
</tr>
</thead>
<tbody>
<tr>
<td>Toddler</td>
<td></td>
<td>98.77</td>
<td></td>
<td>99.34</td>
</tr>
<tr>
<td>Children</td>
<td>97.80</td>
<td>97.20</td>
<td>96.04</td>
<td>98.60</td>
</tr>
<tr>
<td>Adolescent</td>
<td>94.23</td>
<td>93.89</td>
<td>99.95</td>
<td>87.50</td>
</tr>
<tr>
<td>Adult</td>
<td>99.85</td>
<td>98.36</td>
<td>97.32</td>
<td>99.86</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The study has several limitations. First, while four different ASD datasets were used, the results may not fully generalize to other datasets or populations. The sample size in certain age groups remained relatively small, which could limit the model&#x2019;s performance in real-world, larger datasets. Additionally, the datasets were imbalanced, with more non-ASD samples than ASD samples, potentially influencing the model&#x2019;s ability to accurately detect ASD. Although Spectral Clustering and DBSCAN showed strong performance, their effectiveness may vary with different datasets, which limits the broader applicability of the findings. The preprocessing steps played a significant role in the model&#x2019;s success, but these methods may not be equally effective for datasets with different distributions. Finally, while the classification algorithms demonstrated improved performance with certain clustering methods, the results may not be consistent across all algorithms or datasets.</p>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusion and Future Work</title>
<p>Detection of ASD is a critical area of research in psychology, especially with the rise of technology and artificial learning approaches. This study aimed to improve ASD detection in different age groups, particularly focusing on performance enhancement with small sample sizes. A hybrid model was proposed that integrates clustering and classification techniques, evaluating various clustering methods on six different ASD datasets. The results show that clustering significantly improved the performance of the 10 ML algorithms tested, with Spectral Clustering and the ETC, MLP, PAC, and RC algorithms yielding the most prominent improvements. Importantly, the study demonstrated that clustering on limited data could enhance estimation performance without reducing any features.</p>
<p>This work makes several key contributions: the development of a hybrid model for ASD detection, the application of clustering methods to small datasets, and the identification of algorithms that perform particularly well in these conditions. Moreover, the proposed model outperformed previous studies for three of the four age groups (Toddler, Children, and Adult), indicating its potential for improved detection in these groups.</p>
<p>However, the study does have limitations, such as the reliance on small datasets, which may have influenced the results, particularly for the Adolescent group. Future work will focus on integrating larger and more diverse datasets to validate the model&#x2019;s effectiveness. Collaboration with clinical psychologists will also be crucial to evaluate the clinical applicability and robustness of the model. Additionally, exploring other feature selection methods and testing the model on new datasets can help refine the approach and further enhance ASD detection performance.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Gozde Karatas Baydogmus; analysis and interpretation of results: Gozde Karatas Baydogmus, Onder Demir; draft manuscript preparation: Gozde Karatas Baydogmus, Onder Demir. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>All data that support the findings of this study are included within the article.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><collab>Centers for Disease Control and Prevention</collab></person-group>. <article-title>Data &#x0026; statistics on autism spectrum disorder</article-title>. <year>2023</year> [cited 2025 May 19]. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.cdc.gov/autism/data-research/index.html">https://www.cdc.gov/autism/data-research/index.html</ext-link>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mag&#x00E1;n-Maganto</surname> <given-names>M</given-names></string-name>, <string-name><surname>Bejarano-Mart&#x00ED;n</surname> <given-names>&#x00C1;</given-names></string-name>, <string-name><surname>Fern&#x00E1;ndez-Alvarez</surname> <given-names>C</given-names></string-name>, <string-name><surname>Narzisi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Garc&#x00ED;a-Primo</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kawa</surname> <given-names>R</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Early detection and intervention of ASD: a European overview</article-title>. <source>Brain Sci</source>. <year>2017</year>;<volume>7</volume>(<issue>12</issue>):<fpage>159</fpage>. doi:<pub-id pub-id-type="doi">10.3390/brainsci7120159</pub-id>; <pub-id pub-id-type="pmid">29194420</pub-id></mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Farooq</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Tehseen</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sabir</surname> <given-names>M</given-names></string-name>, <string-name><surname>Atal</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Detection of autism spectrum disorder (ASD) in children and adults using machine learning</article-title>. <source>Sci Rep</source>. <year>2023</year>;<volume>13</volume>(<issue>1</issue>):<fpage>9605</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-023-35910-1</pub-id>; <pub-id pub-id-type="pmid">37311766</pub-id></mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Uddin</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Ahamad</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Sarker</surname> <given-names>PK</given-names></string-name>, <string-name><surname>Aktar</surname> <given-names>S</given-names></string-name>, <string-name><surname>Alotaibi</surname> <given-names>N</given-names></string-name>, <string-name><surname>Alyami</surname> <given-names>SA</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>An integrated statistical and clinically applicable machine learning framework for the detection of autism spectrum disorder</article-title>. <source>Computers</source>. <year>2023</year>;<volume>12</volume>(<issue>5</issue>):<fpage>92</fpage>. doi:<pub-id pub-id-type="doi">10.3390/computers12050092</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abdelwahab</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Al-Karawi</surname> <given-names>KA</given-names></string-name>, <string-name><surname>Hasanin</surname> <given-names>E</given-names></string-name>, <string-name><surname>Semary</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Autism spectrum disorder prediction in children using machine learning</article-title>. <source>J Disab Res</source>. <year>2024</year>;<volume>3</volume>(<issue>1</issue>):<fpage>20230064</fpage>. doi:<pub-id pub-id-type="doi">10.57197/jdr-2023-0064</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abbas</surname> <given-names>RT</given-names></string-name>, <string-name><surname>Sultan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Sheraz</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chuah</surname> <given-names>TC</given-names></string-name></person-group>. <article-title>A comparative analysis of automated machine learning tools: a use case for autism spectrum disorder detection</article-title>. <source>Information</source>. <year>2024</year>;<volume>15</volume>(<issue>10</issue>):<fpage>625</fpage>. doi:<pub-id pub-id-type="doi">10.3390/info15100625</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Autism spectrum disorder diagnosis with EEG signals using time series maps of brain functional connectivity and a combined CNN&#x2013;LSTM model</article-title>. <source>Comput Methods Programs Biomed</source>. <year>2024</year>;<volume>250</volume>(<issue>12</issue>):<fpage>108196</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cmpb.2024.108196</pub-id>; <pub-id pub-id-type="pmid">38678958</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dia</surname> <given-names>M</given-names></string-name>, <string-name><surname>Khodabandelou</surname> <given-names>G</given-names></string-name>, <string-name><surname>Sabri</surname> <given-names>AQM</given-names></string-name>, <string-name><surname>Othmani</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Video-based continuous affect recognition of children with Autism Spectrum Disorder using deep learning</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>89</volume>(<issue>2</issue>):<fpage>105712</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2023.105712</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rubio-Mart&#x00ED;n</surname> <given-names>S</given-names></string-name>, <string-name><surname>Garc&#x00ED;a-Ord&#x00E1;s</surname> <given-names>MT</given-names></string-name>, <string-name><surname>Bay&#x00F3;n-Guti&#x00E9;rrez</surname> <given-names>M</given-names></string-name>, <string-name><surname>Prieto-Fern&#x00E1;ndez</surname> <given-names>N</given-names></string-name>, <string-name><surname>Ben&#x00ED;tez-Andrades</surname> <given-names>JA</given-names></string-name></person-group>. <article-title>Enhancing ASD detection accuracy: a combined approach of machine learning and deep learning models with natural language processing</article-title>. <source>Health Inf Sci Syst</source>. <year>2024</year>;<volume>12</volume>(<issue>1</issue>):<fpage>20</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s13755-024-00281-y</pub-id>; <pub-id pub-id-type="pmid">38455725</pub-id></mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Pandey</surname> <given-names>R</given-names></string-name>, <string-name><surname>Maurya</surname> <given-names>N</given-names></string-name>, <string-name><surname>Maurya</surname> <given-names>P</given-names></string-name>, <string-name><surname>Saxena</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Predictive approach for Autism detection using computer vision and deep learning</article-title>. In: <conf-name>2024 MIT Art, Design and Technology School of Computing International Conference (MITADTSoCiCon)</conf-name>. <publisher-loc>Piscataway, NJ, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>; <year>2024</year>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. [cited 2025 May 19]. Available from: <ext-link ext-link-type="uri" xlink:href="https://ieeexplore.ieee.org/document/10575142">https://ieeexplore.ieee.org/document/10575142</ext-link>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Loganathan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Geetha</surname> <given-names>C</given-names></string-name>, <string-name><surname>Nazaren</surname> <given-names>AR</given-names></string-name>, <string-name><surname>Fernandez</surname> <given-names>MHF</given-names></string-name></person-group>. <article-title>Autism spectrum disorder detection and classification using chaotic optimization based Bi-GRU network: an weighted average ensemble model</article-title>. <source>Expert Syst Appl</source>. <year>2023</year>;<volume>230</volume>(<issue>1</issue>):<fpage>120613</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2023.120613</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Thabtah</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Autism screening data for toddlers</article-title>. <year>2018</year> <comment>[cited 2025 May 19]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.kaggle.com/datasets/fabdelja/autism-screening-for-toddlers">https://www.kaggle.com/datasets/fabdelja/autism-screening-for-toddlers</ext-link>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Thabtah</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Autistic spectrum disorder screening data for children</article-title>. <source> UCI Machine Learning Repository</source>; <year>2017</year>. doi:<pub-id pub-id-type="doi">10.24432/C5659W</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Thabtah</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Autism screening adult</article-title>. <source>UCI Machine Learning Repository</source>; <year>2017</year>. doi:<pub-id pub-id-type="doi">10.24432/C5F019</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Thabtah</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Autistic spectrum disorder screening data for adolescent</article-title>. <source>UCI Machine Learning Repository</source>; <year>2017</year>. doi:<pub-id pub-id-type="doi">10.24432/C5V89T</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Karatas</surname> <given-names>G</given-names></string-name>, <string-name><surname>Demir</surname> <given-names>O</given-names></string-name>, <string-name><surname>Sahingoz</surname> <given-names>OK</given-names></string-name></person-group>. <article-title>Increasing the performance of machine learning-based IDSs on an imbalanced and up-to-date dataset</article-title>. <source>IEEE Access</source>. <year>2020</year>;<volume>8</volume>:<fpage>32150</fpage>&#x2013;<lpage>62</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2020.2973219</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>D</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>A comprehensive survey of clustering algorithms</article-title>. <source>Ann Data Sci</source>. <year>2015</year>;<volume>2</volume>:<fpage>165</fpage>&#x2013;<lpage>93</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s40745-015-0040-1</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rodriguez</surname> <given-names>MZ</given-names></string-name>, <string-name><surname>Comin</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Casanova</surname> <given-names>D</given-names></string-name>, <string-name><surname>Bruno</surname> <given-names>OM</given-names></string-name>, <string-name><surname>Amancio</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Costa</surname> <given-names>LdF</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Clustering algorithms: a comparative approach</article-title>. <source>PLoS One</source>. <year>2019</year>;<volume>14</volume>(<issue>1</issue>):<fpage>e0210236</fpage>. doi:<pub-id pub-id-type="doi">10.1371/journal.pone.0210236</pub-id>; <pub-id pub-id-type="pmid">30645617</pub-id></mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hussain</surname> <given-names>I</given-names></string-name>, <string-name><surname>Nataliani</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ali</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hussain</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mujlid</surname> <given-names>HM</given-names></string-name>, <string-name><surname>Almaliki</surname> <given-names>FA</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Weighted multiview K-means clustering with L2 regularization</article-title>. <source>Symmetry</source>. <year>2024</year>;<volume>16</volume>(<issue>12</issue>):<fpage>1646</fpage>. doi:<pub-id pub-id-type="doi">10.3390/sym16121646</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Hussain</surname> <given-names>I</given-names></string-name></person-group>. <article-title>Unsupervised multi-view K-means clustering algorithm</article-title>. <source>IEEE Access</source>. <year>2023</year>;<volume>11</volume>(<issue>6</issue>):<fpage>13574</fpage>&#x2013;<lpage>93</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2023.3243133</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Das</surname> <given-names>K</given-names></string-name>, <string-name><surname>Behera</surname> <given-names>RN</given-names></string-name></person-group>. <article-title>A survey on machine learning: concept, algorithms and applications</article-title>. <source>Int J Innovat Res Comput Commun Eng</source>. <year>2017</year>;<volume>5</volume>(<issue>2</issue>):<fpage>1301</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Alzubi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Nayyar</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Machine learning from theory to algorithms: an overview</article-title>. In: <conf-name>Journal of Physics: Conference Series</conf-name>. <publisher-loc>Bangalore, India</publisher-loc>: <publisher-name>IOP Publishing</publisher-name>; <year>2018</year>. Vol. <volume>1142</volume>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Geurts</surname> <given-names>P</given-names></string-name>, <string-name><surname>Ernst</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wehenkel</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Extremely randomized trees</article-title>. <source>Mach Learn</source>. <year>2006</year>;<volume>63</volume>(<issue>1</issue>):<fpage>3</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10994-006-6226-1</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gibbs</surname> <given-names>MN</given-names></string-name>, <string-name><surname>MacKay</surname> <given-names>DJ</given-names></string-name></person-group>. <article-title>Variational Gaussian process classifiers</article-title>. <source>IEEE Transact Neural Netw</source>. <year>2000</year>;<volume>11</volume>(<issue>6</issue>):<fpage>1458</fpage>&#x2013;<lpage>64</lpage>. doi:<pub-id pub-id-type="doi">10.1109/72.883477</pub-id>; <pub-id pub-id-type="pmid">18249869</pub-id></mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Tang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Deep learning using linear support vector machines</article-title>. <comment>arXiv:1306.0239. 2013</comment>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Singh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Prakash</surname> <given-names>BS</given-names></string-name>, <string-name><surname>Chandrasekaran</surname> <given-names>K</given-names></string-name></person-group>. <article-title>A comparison of linear discriminant analysis and ridge classifier on Twitter data</article-title>. In: <conf-name>2016 International Conference on Computing, Communication and Automation (ICCCA)</conf-name>. <publisher-loc>Piscataway, NJ, USA</publisher-loc>: <publisher-name>IEEE</publisher-name>; <year>2016</year>. p. <fpage>133</fpage>&#x2013;<lpage>8</lpage>. [cited 2025 May 19]. Available from: <ext-link ext-link-type="uri" xlink:href="https://ieeexplore.ieee.org/document/7813704.391">https://ieeexplore.ieee.org/document/7813704.391</ext-link>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Amari</surname> <given-names>Si</given-names></string-name></person-group>. <article-title>Backpropagation and stochastic gradient descent method</article-title>. <source>Neurocomputing</source>. <year>1993</year>;<volume>5</volume>(<issue>4&#x2013;5</issue>):<fpage>185</fpage>&#x2013;<lpage>96</lpage>. doi:<pub-id pub-id-type="doi">10.1016/0925-2312(93)90006-o</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Thabtah</surname> <given-names>F</given-names></string-name>, <string-name><surname>Peebles</surname> <given-names>D</given-names></string-name></person-group>. <article-title>A new machine learning model based on induction of rules for autism detection</article-title>. <source>Health Inform J</source>. <year>2020</year>;<volume>26</volume>(<issue>1</issue>):<fpage>264</fpage>&#x2013;<lpage>86</lpage>. doi:<pub-id pub-id-type="doi">10.1177/1460458218824711</pub-id>; <pub-id pub-id-type="pmid">30693818</pub-id></mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Akter</surname> <given-names>T</given-names></string-name>, <string-name><surname>Satu</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Khan</surname> <given-names>MI</given-names></string-name>, <string-name><surname>Ali</surname> <given-names>MH</given-names></string-name>, <string-name><surname>Uddin</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lio</surname> <given-names>P</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Machine learning-based models for early stage detection of autism spectrum disorders</article-title>. <source>IEEE Access</source>. <year>2019</year>;<volume>7</volume>:<fpage>166509</fpage>&#x2013;<lpage>27</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2019.2952609</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Raj</surname> <given-names>S</given-names></string-name>, <string-name><surname>Masood</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Analysis and detection of autism spectrum disorder using machine learning techniques</article-title>. <source>Procedia Comput Sci</source>. <year>2020</year>;<volume>167</volume>(<issue>12</issue>):<fpage>994</fpage>&#x2013;<lpage>1004</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.procs.2020.03.399</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>