<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">64872</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.064872</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>FSFS: A Novel Statistical Approach for Fair and Trustworthy Impactful Feature Selection in Artificial Intelligence Models</article-title>
<alt-title alt-title-type="left-running-head">FSFS: A Novel Statistical Approach for Fair and Trustworthy Impactful Feature Selection in Artificial Intelligence Models</alt-title>
<alt-title alt-title-type="right-running-head">FSFS: A Novel Statistical Approach for Fair and Trustworthy Impactful Feature Selection in Artificial Intelligence Models</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Farea</surname><given-names>Ali Hamid</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>Ahsfarea@ankara.edu.tr</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Askerzade</surname><given-names>Iman</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Alhazmi</surname><given-names>Omar H.</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Takan</surname><given-names>Sava&#x015F;</given-names></name><xref ref-type="aff" rid="aff-4">4</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Computer Engineering, Ankara University</institution>, <addr-line>Ankara, 06830, Trkiye</addr-line></aff>
<aff id="aff-2"><label>2</label><institution>Center for Theoretical Physics, Khazar University</institution>, <addr-line>Baku, Az1096</addr-line>, <country>Azerbaijan</country></aff>
<aff id="aff-3"><label>3</label><institution>Department of Cyber Security, Taibah University</institution>, <addr-line>Medina, 42353</addr-line>, <country>Saudi Arabia</country></aff>
<aff id="aff-4"><label>4</label><institution>Department of Artificial Intelligence, Ankara University</institution>, <addr-line>Ankara, 06830, Trkiye</addr-line></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Ali Hamid Farea. Email: <email>Ahsfarea@ankara.edu.tr</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>09</day><month>06</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>1</issue>
<fpage>1457</fpage>
<lpage>1484</lpage>
<history>
<date date-type="received">
<day>26</day>
<month>2</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>08</day>
<month>5</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_64872.pdf"></self-uri>
<abstract>
<p>Feature selection (FS) is a pivotal pre-processing step in developing data-driven models, influencing reliability, performance and optimization. Although existing FS techniques can yield high-performance metrics for certain models, they do not invariably guarantee the extraction of the most critical or impactful features. Prior literature underscores the significance of equitable FS practices and has proposed diverse methodologies for the identification of appropriate features. However, the challenge of discerning the most relevant and influential features persists, particularly in the context of the exponential growth and heterogeneity of big data&#x2014;a challenge that is increasingly salient in modern artificial intelligence (AI) applications. In response, this study introduces an innovative, automated statistical method termed Farea Similarity for Feature Selection (FSFS). The FSFS approach computes a similarity metric for each feature by benchmarking it against the record-wise mean, thereby finding feature dependencies and mitigating the influence of outliers that could potentially distort evaluation outcomes. Features are subsequently ranked according to their similarity scores, with the threshold established at the average similarity score. Notably, lower FSFS values indicate higher similarity and stronger data correlations, whereas higher values suggest lower similarity. The FSFS method is designed not only to yield reliable evaluation metrics but also to reduce data complexity without compromising model performance. Comparative analyses were performed against several established techniques, including Chi-squared (CS), Correlation Coefficient (CC), Genetic Algorithm (GA), Exhaustive Approach, Greedy Stepwise Approach, Gain Ratio, and Filtered Subset Eval, using a variety of datasets such as the Experimental Dataset, Breast Cancer Wisconsin (Original), KDD CUP 1999, NSL-KDD, UNSW-NB15, and Edge-IIoT. In the absence of the FSFS method, the highest classifier accuracies observed were 60.00%, 95.13%, 97.02%, 98.17%, 95.86%, and 94.62% for the respective datasets. When the FSFS technique was integrated with data normalization, encoding, balancing, and feature importance selection processes, accuracies improved to 100.00%, 97.81%, 98.63%, 98.94%, 94.27%, and 98.46%, respectively. The FSFS method, with a computational complexity of O(<italic>f</italic><sub><italic>n</italic></sub> log <italic>n</italic>), demonstrates robust scalability and is well-suited for datasets of large size, ensuring efficient processing even when the number of features is substantial. By automatically eliminating outliers and redundant data, FSFS reduces computational overhead, resulting in faster training and improved model performance. Overall, the FSFS framework not only optimizes performance but also enhances the interpretability and explainability of data-driven models, thereby facilitating more trustworthy decision-making in AI applications.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Artificial intelligence</kwd>
<kwd>big data</kwd>
<kwd>feature selection</kwd>
<kwd>FSFS</kwd>
<kwd>models trustworthy</kwd>
<kwd>similarity-based feature ranking</kwd>
<kwd>explainable artificial intelligence (XAI)</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>FS is a crucial aspect of machine learning and deep learning methodologies, significantly impacting model accuracy and reliability [<xref ref-type="bibr" rid="ref-1">1</xref>]. As data size and complexity grow in the era of big data, models face challenges in managing large numbers of features. This often leads to overfitting, poor generalization, and inflated computational costs due to redundant or irrelevant features [<xref ref-type="bibr" rid="ref-2">2</xref>]. FS aims to address these issues by reducing data dimensionality, ensuring that only the most relevant features are used for training, thereby enhancing model performance and evaluation reliability [<xref ref-type="bibr" rid="ref-3">3</xref>]. Researchers and practitioners alike struggle with selecting the most influential features, a task further complicated by the massive and varied nature of modern datasets [<xref ref-type="bibr" rid="ref-4">4</xref>]. The failure to identify key features can result in inaccurate models, negatively affecting the decisions based on them. Moreover, irrelevant features increase training time, model complexity, and computational costs. Consequently, FS techniques have evolved to improve model performance and efficiency by reducing unnecessary data noise. While optimization-focused feature selection can yield models with high predictive accuracy, it often does so at the expense of a deeper understanding of feature relevance and impact. A more balanced approach one that combines performance metrics with direct measures of feature importance and domain insights can lead to models that are not only accurate but also more interpretable and robust.</p>
<p>Feature selection (FS) techniques encompass a broad spectrum of methodologies, ranging from traditional statistical analyses to advanced machine learning approaches. Although machine learning methods can operate with high efficiency, they often function as black boxes in the context of feature selection, thereby limiting interpretability and potentially compromising reliability [<xref ref-type="bibr" rid="ref-5">5</xref>]. Commonly implemented FS techniques include filter methods (e.g., CS, ANOVA (Analysis of Variance)), wrapper methods (e.g., GA), and embedded methods (e.g., Lasso regression) [<xref ref-type="bibr" rid="ref-5">5</xref>]. The primary objectives of these approaches are to accelerate training processes and enhance predictive accuracy. However, methods that achieve high accuracy by arbitrarily selecting features may sacrifice the reliability of results. In big data environments, effective FS is crucial not only for improving model performance but also for reducing training times and computational costs [<xref ref-type="bibr" rid="ref-6">6</xref>]. In applications where precision is of utmost importance, such as finance and security, FS plays a vital role in ensuring accurate and dependable outcomes. The current study continues to innovate new methodologies to address the increasing complexity of contemporary datasets, thereby facilitating the development of more robust and interpretable models [<xref ref-type="bibr" rid="ref-7">7</xref>]. Filter methods, in particular, are widely favoured due to their strong statistical foundation, which allows for rapid data interpretation and efficient filtering. Nonetheless, traditional filter approaches typically do not account for interactions among dependent variables.</p>
<p>To mitigate this shortcoming, there is a pressing need for novel techniques that simultaneously balance accuracy and reliability by integrating similarity metrics. In response, we propose a new filter-based approach that not only interprets the data but also identifies features with significant impact on model outcomes. Unlike conventional techniques that assess each feature in isolation, our method computes a similarity metric that reflects both the intrinsic importance of individual features and their relationships within the overall data structure, thereby providing a more comprehensive and holistic feature selection process.</p>
<p>This manuscript presents a novel statistical method called Farea Similarity for Feature Selection (FSFS), designed to automatically select the most important and impactful features on model outcomes without losing essential data. The FSFS technique measures the similarity between each feature and the approximate average of records, ranking features according to the highest similarity scores. The proposed automatic threshold classifies features as either important or less important by calculating the mean of the total similarities. The most similar features are considered the most important, while the least similar ones are discarded. The FSFS approach eliminates outlier values that negatively affect model outcomes. FSFS incorporates encoding techniques to improve data processing. Data encoding methods are used to transform raw data into structured numerical data, making it machine-readable for AI models. After data structuring, the proposed method performs statistical operations and calculates the highest similarity and correlation between features and records. FSFS stands out from traditional statistical feature selection methods because it doesn&#x2019;t merely rank features based on isolated metrics (like correlation or mutual information) but rather quantifies the similarity between each feature and the overall data pattern. In conventional filter methods, each feature is evaluated independently, often overlooking how features work together.</p>
<p>The scientific contributions of this paper are as follows:
<list list-type="bullet">
<list-item>
<p>Proposing a new, innovative statistical method for selecting the most important features called Farea Similarity for Feature Selection (FSFS).</p></list-item>
<list-item>
<p>Developing a fair FSFS approach that ranks features based on their similarity and feature dependencies with the approximate average of each record, discarding outliers that distort evaluation outcomes.</p></list-item>
<list-item>
<p>Classifying features into most and least important categories based on an FSFS automatic threshold.</p></list-item>
<list-item>
<p>Comparing the FSFS proposed method with existing FS methods and evaluating their performance.</p></list-item>
</list></p>
<p>This paper is organized as follows: <xref ref-type="sec" rid="s2">Section 2</xref> provides a review of related work, contextualizing the contributions of the proposed FSFS framework against existing methods. <xref ref-type="sec" rid="s3">Section 3</xref> outlines the methodology, detailing the FSFS architecture, step-by-step pseudocode, and practical implementation examples. <xref ref-type="sec" rid="s4">Section 4</xref> conducts a comprehensive statistical analysis comparing FSFS to state-of-the-art approaches, supported by rigorous experimental results and performance evaluations. Finally, <xref ref-type="sec" rid="s5">Section 5</xref> summarizes the key findings, discusses current limitations, and proposes actionable insights for future research directions.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Comparative Analysis of Feature Selection Approaches</title>
<p>The body of literature on FS provides a crucial foundation for understanding the available tools and techniques for selecting optimal features and improving model performance and evaluation results. However, with the continuous advancement in this field, challenges remain, such as ensuring fair and adaptive FS in response to evolving and diverse datasets. Challenges include handling imbalanced data and noise, as well as variations in data types and model training approaches. Existing studies highlight the need for developing new methods that address these challenges, opening new horizons for solving future problems as technologies and data representation methods evolve.</p>
<p>This section outlines some of the key FS techniques. Filter Techniques: Filter techniques are among the oldest methods used in FS [<xref ref-type="bibr" rid="ref-8">8</xref>]. Studies such as [<xref ref-type="bibr" rid="ref-9">9</xref>] emphasized the importance of statistical methods in evaluating the relationship between features and target variables. Researchers employed tests like the Chi-square test and ANOVA to identify the most relevant features [<xref ref-type="bibr" rid="ref-10">10</xref>]. These studies demonstrated that using filter techniques can significantly reduce the number of features without losing critical information. Wrapper Methods: Wrapper methods are more complex, relying on evaluating model performance with a specific subset of features. In the study by [<xref ref-type="bibr" rid="ref-11">11</xref>], the concept of eliminating unnecessary features was introduced through Recursive Feature Elimination (RFE) [<xref ref-type="bibr" rid="ref-12">12</xref>]. Their results indicated that this approach significantly improved model accuracy compared to traditional methods. Embedded Methods: Embedded methods, which combine the benefits of both filter and wrapper approaches, are gaining increasing popularity. In the study [<xref ref-type="bibr" rid="ref-13">13</xref>] on Lasso Regression, the regression technique was used to strike a balance between model complexity and accuracy by imposing constraints on the coefficients. The results showed that Lasso could lead to effective FS while reducing overfitting. Recent Innovations: With the development of machine and deep learning techniques, new studies have emerged that utilize deep learning for FS. In a study by [<xref ref-type="bibr" rid="ref-14">14</xref>], the paper proposes a metaheuristic method for selecting optimal features in HER2 (Human epidermal growth factor receptor 2) image classification, enhancing accuracy and reducing complexity. It utilizes a transfer learning model combined with NSGA-II (non-dominated sorting genetic algorithm) and SVM (support vector machine) classifiers for improved performance. Study [<xref ref-type="bibr" rid="ref-15">15</xref>] focuses on enhancing phishing detection using machine learning techniques, particularly through feature selection and deep learning models. A dataset comprising 58,645 URLs was analyzed, identifying 111 features. A feedforward neural network model achieved an accuracy of 94.46% using only 14 selected features. <xref ref-type="table" rid="table-1">Table 1</xref> provides a summary of existing feature selection (FS) methods, highlighting their key characteristics and differences. <xref ref-type="table" rid="table-2">Table 2</xref> illustrates widely used FS approaches and compares them with the proposed FSFS theory. Previous studies, as summarized in <xref ref-type="table" rid="table-1">Tables 1</xref> and <xref ref-type="table" rid="table-2">2</xref>, have played a fundamental role in improving machine learning model performance. Some focus on enhancing system performance, others on optimizing evaluation metrics, while certain methods emphasize selecting noise-free features to improve model interpretability. Existing approaches often assume that the features leading to high performance are the most suitable without considering their significance. As a result, less important features that enhance system performance may be selected over more impactful ones. On the other hand, some methods focus solely on achieving high evaluation results, disregarding the importance of the selected features. These methods aim to balance feature selection and optimization, often sacrificing the more critical features that could increase the reliability of the results. Hence, techniques designed to improve system performance may not necessarily select the most important features, as their goal is to maximize performance and speed without emphasizing feature relevance. Similarly, techniques aimed at maximizing evaluation results may not prioritize the most crucial features, as their objective is to find features that yield high results, regardless of their impact on model outputs. However, there are significant differences in the operation of these methods. Techniques that prioritize system performance and select features to achieve the model&#x2019;s performance.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Existing general types of FS methods and compares them with the FSFS proposed method</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th>FS Approaches</th>
<th>Speed</th>
<th>Scalability</th>
<th>Interpretability</th>
<th>Pros/Cons</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Filter [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td>High</td>
<td>High</td>
<td>High</td>
<td>Simple, fast, and independent of the model. Cons: May ignore feature dependencies</td>
</tr>
<tr align="center">
<td>Wrapper [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td>Low</td>
<td>Low</td>
<td>Moderate</td>
<td>Can find optimal features for specific models. Cons: Computationally expensive, risk of overfitting</td>
</tr>
<tr align="center">
<td>Embedded [<xref ref-type="bibr" rid="ref-20">20</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>]</td>
<td>Moderate</td>
<td>Moderate</td>
<td>Moderate</td>
<td>Integrates FS during model training. Cons: Model-dependent and complex</td>
</tr>
<tr align="center">
<td>Dimensionality [<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-23">23</xref>]</td>
<td>Moderate</td>
<td>Moderate</td>
<td>Low</td>
<td>Reduces feature space effectively. Cons: Loss of interpretability and information</td>
</tr>
<tr align="center">
<td>Regularization [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>]</td>
<td>Moderate</td>
<td>Moderate</td>
<td>Moderate</td>
<td>Prevents overfitting and simplifies models. Cons: May exclude useful features</td>
</tr>
<tr align="center">
<td><bold>This study (FSFS)</bold></td>
<td align="center" colspan="4"><bold>FSFS focuses not only on FS but also on interpretability, serving as a gateway approach for XAI and emphasizing speed. Therefore, the type of FSFS theory proposed in this study is filter-based</bold></td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Popular and practical approaches are used in FS compared to the proposed FSFS approach</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th>FS Methods/Ref.</th>
<th>Type</th>
<th align="center">Mathematical Equations/<break/>Concepts</th>
<th>Selection Criteria/Key Details</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Variance threshold [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-25">25</xref>]</td>
<td>Filter</td>
<td><inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>&#x03C3;</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>Simplify the model, reduce noise/Pros: Simple, fast, easy to implement. Cons: May discard useful low-variance features.</td>
</tr>
<tr align="center">
<td>Correlation-based [<xref ref-type="bibr" rid="ref-26">26</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>Filter</td>
<td><inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>r</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>x</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:msqrt><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>x</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:msqrt></mml:mfrac></mml:mstyle></mml:math></inline-formula></td>
<td>Maximize relevance, and reduce redundancy/Pros: Simple, interpretable, computationally efficient. Cons: May miss complex relationships, and cannot detect non-linear correlations.</td>
</tr>
<tr align="center">
<td>Mutual Information (MI) [<xref ref-type="bibr" rid="ref-28">28</xref>,<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td>Filter</td>
<td><inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>X</mml:mi><mml:mo>;</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2211;</mml:mo><mml:mo>&#x2211;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>Maximize shared information/Pros: Detects non-linear relationships, and works with categorical data. Cons: Computationally expensive for large datasets.</td>
</tr>
<tr align="center">
<td>Chi-Square test [<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td>Filter</td>
<td><inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo>&#x2211;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:mstyle></mml:math></inline-formula></td>
<td>Maximize dependency between features and target/Pros: Useful for categorical data, fast to compute.</td>
</tr>
<tr align="center">
<td>RFE [<xref ref-type="bibr" rid="ref-12">12</xref>,<xref ref-type="bibr" rid="ref-32">32</xref>]</td>
<td>Wrapper</td>
<td>Iterative selection process</td>
<td>Find the optimal subset of features/Pros: Select features based on real model performance. Cons: Risk of removing essential classification features.</td>
</tr>
<tr align="center">
<td>GA [<xref ref-type="bibr" rid="ref-33">33</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>]</td>
<td>Wrapper</td>
<td>Evolutionary algorithms, fitness function</td>
<td>Optimize FS using population evolution/Pros: Finds global optima, works for non-linear problems. Cons: Computationally expensive, requires tuning.</td>
</tr>
<tr align="center">
<td>L1 regularization [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-35">35</xref>]</td>
<td>Embedded</td>
<td><inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>M</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>Less important feature coefficients to zero/Pros: Reduces overfitting, and promotes sparsity.</td>
</tr>
<tr align="center">
<td>Ridge (L2 Regularization) [<xref ref-type="bibr" rid="ref-36">36</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>]</td>
<td>Embedded</td>
<td><inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>M</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mn>2</mml:mn><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mstyle><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:msub><mml:mi>B</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>Reduces overfitting/Pros: Prevents overfitting, and works with many features. Cons: Does not perform strict FS, and retains all features.</td>
</tr>
<tr align="center">
<td>Elastic net [<xref ref-type="bibr" rid="ref-35">35</xref>,<xref ref-type="bibr" rid="ref-38">38</xref>]</td>
<td>Embedded</td>
<td><inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>L</mml:mi><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:math></inline-formula></td>
<td>Balances FS and regularization/Pros: Balances feature selection and regularization. Cons: More complex to tune due to two parameters.</td>
</tr>
<tr align="center">
<td>Decision trees [<xref ref-type="bibr" rid="ref-31">31</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>]</td>
<td>Embedded</td>
<td>Decision nodes and Gini/Entropy impurity</td>
<td>Selects important features based on splits/Pros: Interpretable, handles categorical and continuous data. Cons: Can overfit, biased towards features.</td>
</tr>
<tr align="center">
<td>Forward selection [<xref ref-type="bibr" rid="ref-40">40</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
<td>Wrapper</td>
<td>Stepwise selection process</td>
<td>Finds optimal subset of features/Pros: Intuitive, interpretable. Cons: Computationally expensive.</td>
</tr>
<tr align="center">
<td>Backward elimination [<xref ref-type="bibr" rid="ref-41">41</xref>,<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td>Wrapper</td>
<td>Stepwise elimination process</td>
<td>Finds optimal subset of features/Pros: Intuitive, interpretable. Cons: Risk of overfitting.</td>
</tr>
<tr align="center">
<td>PCA [<xref ref-type="bibr" rid="ref-43">43</xref>,<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td>Dimensionality</td>
<td>Eigenvalues, eigenvectors,<break/> covariance matrix</td>
<td>Reduce dimensionality, retain variance/Pros: Reduces multicollinearity, useful for high-dimensional data.</td>
</tr>
<tr align="center">
<td>Sequential Feature Selection (SFS) [<xref ref-type="bibr" rid="ref-45">45</xref>,<xref ref-type="bibr" rid="ref-46">46</xref>]</td>
<td>Wrapper</td>
<td>Sequential process</td>
<td>Find the best-performing feature subset/Pros: Flexible, works with many model types. Cons: Computationally expensive for large feature sets.</td>
</tr>
<tr align="center">
<td>Fisher score [<xref ref-type="bibr" rid="ref-47">47</xref>,<xref ref-type="bibr" rid="ref-48">48</xref>]</td>
<td>Filter</td>
<td><inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mn>2</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:mfrac></mml:mstyle></mml:math></inline-formula></td>
<td>Maximize separability between classes/Pros: Simple, and effective for classification tasks. Cons: Assumes normality and equal variance in data.</td>
</tr>
<tr align="center">
<td><bold>This study (FSFS)</bold></td>
<td><bold>Filter</bold></td>
<td><inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">j</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi mathvariant="bold-italic">j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td><bold>Identifies features that positively impact model outputs removes outliers, reduces dimensions and improves overall performance/Pros: Easy to implement with an automatic threshold for FS. Cons: Suitable only for numerical data; can be adapted for categorical data using encoding approaches.</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The evaluation in <xref ref-type="table" rid="table-1">Tables 1</xref> and <xref ref-type="table" rid="table-2">2</xref> demonstrates that our proposed FSFS filter-based methods rely on statistical measures to assess feature importance. Their ability to analyze features independently and efficiently establishes them as a powerful foundation for explainable AI, outperforming existing FS approaches in extracting and analyzing meaningful evidence. Traditional filter-based methods evaluate features independently using metrics like correlation or mutual information, enabling fast and computationally efficient feature ranking. However, this isolated evaluation often overlooks feature interactions or dependencies. For example, two features may exhibit weak individual correlations with the target variable, yet their combined interdependence could yield significant predictive power&#x2014;a nuance addressed by our proposed FSFS approach. While filter-based methods excel at rapid dimensionality reduction, their inability to capture such joint relationships may result in missed critical insights. By contrast, the FSFS framework accounts for feature interdependencies, enhancing robustness in scenarios where collaborative feature effects are pivotal. This distinction underscores its superiority in identifying complex patterns that conventional filter methods fail to detect.</p>

<p>FSFS designed to automatically identify and select the most important and impactful features in data-driven models while preserving essential information. The FSFS technique measures the similarity between each feature and the mean of records, ranking features based on their similarity scores. To ensure objective feature selection, an automatic thresholding mechanism is employed, classifying features as either important or less significant by calculating the average similarity score. Features with higher similarity scores are retained, while those with lower similarity scores are discarded. Additionally, the FSFS method effectively eliminates outlier values that could negatively impact model outputs. The FSFS framework effectively identifies and selects features exhibiting strong statistical relationships and high similarity, ensuring fairness in FS while maintaining reliable evaluation outcomes. Although FSFS is inherently optimized for numerical data, it can be adapted for categorical data through preprocessing techniques that map categorical variables to numerical representations.</p>
<p>This framework includes an automated ranking mechanism to identify features with the highest statistical significance and impact on the trustworthiness of models. To enable precise feature selection, FSFS integrates advanced data encoding techniques, transforming raw datasets into structured numerical formats compatible with machine learning workflows. Once structured, statistical analyses&#x2014;the proposed FSFS metrics are applied to guide the right selection process, ensuring robustness, reproducibility, and alignment with model objectives. The proposed FSFS method focuses on fair feature selection, emphasizing the most critical and correlated features that influence overall system outputs while achieving high performance and reliable, reasonable evaluation results. Moreover, new FSFS methods provide greater interpretability of models by offering insights into data features from multiple perspectives, thus increasing users&#x2019; confidence in the results.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Methodology</title>
<p>The significance of this approach lies in its ability to select the most similar and related features by calculating the ratio of similarity and dependencies among features. This is achieved by comparing the total feature similarity with the average of each record to identify the minimum distance, where the shortest distance indicates the highest similarity. Higher similarity corresponds to a stronger relationship between features. After normalization (data standardization and unification), the dataset values are transformed into a uniform range, such as 0&#x2013;1, 0&#x2013;100, or 100&#x2013;1000, depending on the dataset&#x2019;s scale. This normalization process mitigates the effects of data anomalies and extreme values. The FSFS method further eliminates extreme values after normalization, reducing their influence on model outputs. For illustration, consider a dataset containing information on patients with cancer and diabetes. If the focus is on cancer-related data, the FSFS method calculates feature similarity to prioritize parameters most closely related to cancer, while minimizing the influence of dissimilar features that may be more associated with diabetes. This targeted approach enhances performance and yields more reliable results by leveraging the similarity and dependencies in data patterns. As displayed in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the proposed FSFS methodology involves calculating the approximate average for each record and subtracting it from the sum of each attribute, incorporating deep and intelligent scaling values. The scaling process (e.g., Max-value division scaling) eliminates values that negatively impact the output of data-driven models, ensuring that the target values in testing data remain consistent across records while aligning with the corresponding target class (e.g., class X or Y). FSFS methodology is further integrated with the replacement encoding techniques, which preserves the dimensionality of the data while maintaining privacy and ensuring data encryption. The replacement encoding mechanism transforms categorical data into numerical representations, facilitating seamless integration within AI models. In this study, the proposed method for FS and classification into the most important and least important features consists of several stages: Data Structuring and Formatting: This stage involves organizing and structuring the data through replacement encoding, transforming it into a numerical format to ensure uniformity and make it suitable for FSFS statistical analysis. It maintains both the data&#x2019;s dimensions and structure. Datasets containing numerical data do not require this encoding process. However, datasets with categorical data must undergo data encoding to facilitate easy computation. Statistical Stage: This is the most crucial stage in The FSFS proposed method for feature selection, designed to identify the most and least significant features. In this stage, the similarity between each feature and the approximate average of all records is calculated to determine the most correlated and similar features. Automatic Threshold Stage: In this final stage, the automatic threshold is used to classify the features into those with the highest significance and those with lower importance. The threshold is calculated by determining the average of the sum of the similar features. Features that are more correlated and similar are classified as the most important and are less than the output of the automatic threshold. Conversely, features with less correlation and similarity are classified as less important, having less impact on model outcomes, and are greater than the output of the automatic threshold. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the workflow underpinning the proposed FSFS theory, along with the equations that define its theoretical framework. The output of the automated preprocessing consists of numerical data, although outlier values may still affect the performance of AI models. To address this, the output dataset undergoes standard normalization techniques tailored to the dataset&#x2019;s characteristics. The choice of normalization method depends on the specific dataset characteristics and the desired value ranges. Once normalized, the dataset&#x2014;with reduced outlier effects&#x2014;serves as input for the FSFS methodology. The FSFS approach conducts statistical computations by deeply analyzing and removing outliers using the Min-Max (M) function. FSFS calculates feature similarity by summing feature values correlated with the approximate averages of records. Finally, the method applies an automated optimal thresholding process, classifying features into the most important and least important categories, respectively.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Workflow and equations underpinning the proposed FSFS theory</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-1.tif"/>
</fig>
<p>The proposed equation for feature selection consists of three components: the preliminary condition in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>, denoted as (<inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>), calculates the total sum for each feature. <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> computes the correlation and similarity ratio with interactions between each feature (Attributes) and the average of each record (Observations) to identify the most suitable features, which are the most significant and have the greatest impact on the model&#x2019;s output. <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref> establishes the automatic threshold for distinguishing between the most important features and those of lesser importance. In the proposed FS equations, the process begins by calculating the sum of each feature, which serves as an initial condition to facilitate the subsequent calculation of feature similarity. Next, the similarity and correlation between each feature as illustrated in the first part of <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> and the average of all records as illustrated in the second part of <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> are determined. <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> demonstrates how to compute the most important, highly correlated features that have the greatest influence on the model&#x2019;s outcomes.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mi>R</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>]</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the sum of each feature starting from <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The term (FSFS) refers to the Farea Similarity for Feature Selection, which calculates the correlation of each feature using the first part of the equation denoted by (<inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) with the approximate average of each record using the second part of the equation denoted by <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:mo>[</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mi>R</mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>, <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the total of instances where <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is used to eliminate outliers that negatively impact the model&#x2019;s results, with accounting for multiple outliers. Where <italic>M</italic> indicates the number of outliers that may be single or multivalued depending on <italic>M</italic> configuration. Using the Interquartile Range (IQR) method, the min-max (Max) function automatically calculates the upper and lower boundaries to identify outliers. The minus sign represents the calculation of the minimum difference to identify the highest similarity and correlation between features. <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the number of instances, and <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the number of outlier values subtracted to ensure accuracy. FSFS approach ensures that outliers are excluded, and the focus is on identifying highly correlated and significant features. <xref ref-type="table" rid="table-3">Table 3</xref> illustrate the symbols and abbreviation of the proposed FSFS approach and description.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Symbols and description of the FSFS approach</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th>Symbols</th>
<th>Eq.</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-1">(1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td>The sum of the <italic>i</italic>-th feature across all records (e.g., <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:msub><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>).</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td>Farea Similarity for Feature Selection score for the <italic>i</italic>-th feature, quantifying its correlation and similarity to the dataset&#x2019;s average structure.</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td><italic>i</italic>-th record (observation) in the dataset.</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td>Operator to remove <italic>M</italic> outlier values from the records. <italic>M</italic> can be a single or multiple outlier, depending on (IQR).</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td>Total number of records (instances) in the dataset.</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td>Number of outlier values removed during calculations.</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi mathvariant="bold-italic">M</mml:mi></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-2">(2)</xref></td>
<td>Number of outliers to eliminate (configurable based on IQR, e.g., <italic>M</italic> &#x003D; 1 for a single outlier).</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-3">(3)</xref> and <xref ref-type="disp-formula" rid="eqn-4">(4)</xref></td>
<td>Total number of features in the dataset.</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:math></inline-formula></td>
<td><xref ref-type="disp-formula" rid="eqn-3">(3)</xref> and <xref ref-type="disp-formula" rid="eqn-4">(4)</xref></td>
<td>The sum of (<italic>FSFS</italic>)<italic>i</italic> scores across all features.</td>
</tr>
<tr align="center">
<td><bold>Threshold</bold></td>
<td><xref ref-type="disp-formula" rid="eqn-3">(3)</xref> and <xref ref-type="disp-formula" rid="eqn-4">(4)</xref></td>
<td><inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:math></inline-formula>: Average <italic>FSFS</italic> score used to classify features.<break/>Features automatically with (<italic>FSFS</italic>) &#x2264; Threshold are deemed important (<xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>), while those with (<italic>FSFS</italic>) &#x003E; Threshold are less important (<xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>).</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>, feature interactions are quantified by subtracting the sum of each feature from the average of each record, thereby minimizing the distance and maximizing similarity and correlation both vertically (across features) and horizontally (across records). The subtraction operation represents finding the smallest distance, which signifies the highest degree of similarity. In other words, the equation calculates the overall similarity between each feature (representing vertical data) and the average of each record (representing horizontal data). This process establishes a connection between the features and records to determine their correlation. FSFS considers feature dependencies rather than isolated metrics. This approach evaluates both feature consistency and inter-feature similarity to the overall data pattern. The FSFS measure, calculated as the absolute difference, quantifies dissimilarity, with a smaller difference indicating higher similarity. Traditional methods often neglect feature interactions. For instance, individually weak features may collectively possess strong predictive power, which FSFS aims to capture.</p>
<p>Furthermore, FSFS incorporates outlier and irrelevant feature removal post-normalization, potentially impacting AI results. In the proposed method, we also account for outlier removal, as these anomalous values can have a direct impact on the evaluation and calculation results. The equation allows for the removal of either a single outlier or multiple outliers. Considering the elimination of outliers is crucial in this proposed equation, as outliers significantly affect feature selection and result variability. Additionally, the method operates on a general statistical basis for calculating the correlation and similarity between features, ensuring a robust and accurate selection process. In the FSFS approach, outlier removal is integrated into the FS process, ensuring data preservation without arbitrary elimination and maintaining meaningful information. This study deliberately avoids the arbitrary exclusion of extreme values, which could result in significant information loss if such values were indiscriminately treated as part of data cleaning. The careful handling of extreme values was a central focus of this study, aligning with the overarching goal of improving feature selection.</p>
<p><xref ref-type="disp-formula" rid="eqn-3">Eqs. (3)</xref> and <xref ref-type="disp-formula" rid="eqn-4">(4)</xref> illustrate the automatic threshold condition used to classify the most important features from the less important ones. The automatic threshold is calculated by determining the average of the total similarity for all features. Features that are of the highest importance have values below the threshold, indicating they are the most similar and significant, as shown in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>. Conversely, features with lower importance have values greater than the threshold, indicating they are less correlated, as outlined in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mspace width="thinmathspace" /><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">e</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>&#x2265;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mspace width="thinmathspace" /><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mspace width="thinmathspace" /><mml:mrow><mml:mi mathvariant="bold-italic">L</mml:mi><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi mathvariant="bold-italic">s</mml:mi><mml:mi mathvariant="bold-italic">s</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>e</mml:mi><mml:mspace width="thinmathspace" /><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The proposed algorithm processes the features and records of the dataset, which serves as the input to our model. The outputs are the number of features classified as highly similar and of utmost importance, as well as the number of features identified as less correlated and having lower significance and impact on the model&#x2019;s outputs. A data encoding technique is employed to ensure the proper reorganization and structuring of the data. Following this, a normalization process is applied to scale down large values. The FSFS Statistical computations are then performed. Finally, the condition and description of the automatic threshold equation are applied to distinguish and classify the important features from the less significant ones. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> illustrates the pseudocode detailing the sequence of operations in our proposed method.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>FSFS pseudocode detailing the sequence of operations in our proposed method</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-2.tif"/>
</fig>
</sec>
<sec id="s4">
<label>4</label>
<title>Results and Discussions</title>
<p>The proposed theory has been evaluated, tested, and compared with various feature selection techniques and datasets. The selected datasets include experimental data, Breast Cancer Wisconsin (Original) [<xref ref-type="bibr" rid="ref-49">49</xref>], KDD CUP 1999 DATASET [<xref ref-type="bibr" rid="ref-50">50</xref>], NSL-KDD, UNSW-NB15 and Edge-IIoT [<xref ref-type="bibr" rid="ref-51">51</xref>] datasets while the techniques compared with the FSFS proposed method include CS, Gain Ratio, Filtered Subset Eval, Genetic Approach, Exhaustive Approach, and Greedy Stepwise Approach. We assessed and tested the FSFS approach across three scenarios to validate its effectiveness and efficiency, which are detailed as follows: The first scenario involved simple experimental data to demonstrate and simplify the step-by-step calculations of the FSFS proposed theory, ensuring its ease of use and applicability to data-driven models. The second scenario utilized health data related to breast cancer. Lastly, the third scenario applied cybersecurity data to evaluate the approach further.</p>
<p>Scaling dataset values to a specific range through data normalization is crucial, particularly for handling anomalous values and mitigating the influence of outliers that can adversely affect the performance of AI models. The purpose of the first scenario is solely to provide a mathematical understanding of the proposed FSFS approach. In experimental setups, normalization helps in computationally understanding the proposed theory, Hence, we employed division by the maximum value as a normalization step for the datasets. The choice of normalization method depends on the dataset characteristics and desired value ranges. These techniques ensure data standardization and unification, avoiding the adverse effects of outliers. For instance, consider a testing record R with features R (<italic>f</italic><sub>1</sub> &#x003D; 16, <italic>f</italic><sub>2</sub> &#x003D; 18, <italic>f</italic><sub>3</sub> &#x003D; 1000, <italic>f</italic><sub>4</sub> &#x003D; 7, T &#x003D; ?), where <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>3</sub>, and <italic>f</italic><sub>4</sub> represent features, and T is the target label. The training dataset includes two records: R<sub>1</sub> (<italic>f</italic><sub>1</sub> &#x003D; 15, <italic>f</italic><sub>2</sub> &#x003D; 13, <italic>f</italic><sub>3</sub> &#x003D; 500, <italic>f</italic><sub>4</sub> &#x003D; 4, T &#x003D; P) R<sub>2</sub> (<italic>f</italic><sub>1</sub> &#x003D; 17, <italic>f</italic><sub>2</sub> &#x003D; 17, <italic>f</italic><sub>3</sub> &#x003D; 300, <italic>f</italic><sub>4</sub> &#x003D; 8, T &#x003D; N). Before normalization, these records contain anomalies, such as the values <italic>f</italic><sub>3</sub> &#x003D; 1000 in R, <italic>f</italic><sub>3</sub> &#x003D; 500 in R<sub>1</sub> and <italic>f</italic><sub>3</sub> &#x003D; 300 in R<sub>2</sub>. After normalization (scaling the data to the specific range using max-value division), the transformed records are: R<sup>&#x2032;</sup> (<italic>f</italic><sub>1</sub><sup>&#x2032;</sup> &#x003D; 0.016, <italic>f</italic><sub>2</sub><sup>&#x2032;</sup> &#x003D; 0.018, <italic>f</italic><sub>3</sub><sup>&#x2032;</sup> &#x003D; 1.000, <italic>f</italic><sub>4</sub><sup>&#x2032;</sup> &#x003D; 0.007, T &#x003D; ?), R<sub>1</sub><sup>&#x2032;</sup> (<italic>f</italic><sub>1</sub><sup>&#x2032;</sup> &#x003D; 0.015, <italic>f</italic><sub>2</sub><sup>&#x2032;</sup> &#x003D; 0.013, <italic>f</italic><sub>3</sub><sup>&#x2032;</sup> &#x003D; 0.500, <italic>f</italic><sub>4</sub><sup>&#x2032;</sup> &#x003D; 0.004, T &#x003D; P), R<sub>2</sub><sup>&#x2032;</sup> (<italic>f</italic><sub>1</sub><sup>&#x2032;</sup> &#x003D; 0.017, <italic>f</italic><sub>2</sub><sup>&#x2032;</sup> &#x003D; 0.017, <italic>f</italic><sub>3</sub><sup>&#x2032;</sup> &#x003D; 0.300, <italic>f</italic><sub>4</sub><sup>&#x2032;</sup> &#x003D; 0.008, T &#x003D; N). In traditional methods, similarity between records is calculated using the absolute differences between corresponding normalized feature values (e.g., |R<sup>&#x2032;</sup>_<italic>f</italic><sub>1</sub>-R<sub>1</sub><sup>&#x2032;</sup>_<italic>f</italic><sub>1</sub>|, |R<sup>&#x2032;</sup>_<italic>f</italic><sub>2</sub>-R<sub>2</sub><sup>&#x2032;</sup>_<italic>f</italic><sub>2</sub>|, etc.). Based on these calculations, R<sup>&#x2032;</sup> is closer to R<sub>1</sub><sup>&#x2032;</sup>, which would associate T with class P. However, the proposed FSFS approach introduces additional preprocessing steps by eliminating outliers and unrelated features (e.g., <italic>f</italic><sub>3</sub>), thus altering the results. For example, FSFS detects that R<sup>&#x2032;</sup> shares more similarity with R<sub>2</sub><sup>&#x2032;</sup> when focusing on related features <italic>f</italic><sub>1</sub><sup>&#x2032;</sup>, <italic>f</italic><sub>2</sub><sup>&#x2032;</sup>, and <italic>f</italic><sub>4</sub><sup>&#x2032;</sup>, and avoids using the anomaly-prone feature <italic>f</italic><sub>3</sub><sup>&#x2032;</sup>. As a result, T is correctly classified into class N, demonstrating how FSFS enhances model reliability via interaction and dependency between features. This highlights the significance of normalization in ensuring accurate AI outputs and underscores the added value of the FSFS method. By removing unrelated and outlier features, FSFS improves the trustworthiness of results and supports more sensitive and precise calculations that affect AI model outputs. Compared to standard normalization techniques, FSFS offers a more robust and context-aware approach to feature selection, ensuring reliable and consistent decision-making in AI systems.</p>
<sec id="s4_1">
<label>4.1</label>
<title>First Scenario</title>
<p>To enhance understanding of the proposed theory, we applied a simple numerical example to a small dataset, aiming to illustrate the workflow of the FSFS theory and simplify it for the reader. The experimental dataset used in this example, to comprehend and apply the proposed theory, consists of 8 features and 15 patient records. The objective is to classify the 8 features into those of utmost importance and those of lesser significance. <xref ref-type="table" rid="table-4">Table 4</xref> presents the data from the experimental dataset to facilitate understanding. The features include Patient ID, Age, Weight, Blood Pressure, Cholesterol, Glucose, Heart Rate, Body Temp, BMI (Body mass index), and Oxygen Saturation (%). This small dataset was carefully selected to demonstrate the step-by-step mathematical calculations and the functioning of our proposed theory. The goal is to simplify its application and understanding, ensuring it can be effectively used on larger datasets in the next future scenarios. <xref ref-type="table" rid="table-5">Table 5</xref> provides a detailed breakdown of the computational steps for the proposed theory FSFS as applied to the dataset shown in <xref ref-type="table" rid="table-4">Table 4</xref>. The first column represents the sum of each feature, serving as the initial condition for our proposed theory. The second column shows the approximate sum of the average of each record, excluding outliers that negatively impact the results. The upper and lower boundaries for detecting outliers are determined using the IQR (Interquartile Range) method. In this case, the IQR is 145.1. This yields a lower boundary of 430.75 and an upper boundary of 1011.15. Any values outside this range&#x2014;such as 1043.5 and 1116&#x2014;are classified as outliers. The third column presents the complete calculation of our proposed method FSFS, which involves measuring the similarity and correlation of features both vertically and horizontally. <xref ref-type="table" rid="table-6">Table 6</xref> summarizes the computational results of the proposed FSFS theory, FSFS indicating the similarity ratio between the features. It also delineates the automatic threshold employed for feature selection and classification into highly similar, moderately related, and less correlated categories. Features with an aggregate below the mean of FSFS are sequentially the most similar and related. Conversely, features above the average of FSFS are sequentially the least significant features.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Workflow detailing the steps applied to the FSFS experimental dataset for patient analysis</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th><inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">2</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">6</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">7</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">8</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th align="center">T</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>45</td>
<td>80</td>
<td>120</td>
<td>200</td>
<td>95</td>
<td>75</td>
<td>36.7</td>
<td>25</td>
<td>676.7</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>50</td>
<td>85</td>
<td>140</td>
<td>180</td>
<td>105</td>
<td>80</td>
<td>37</td>
<td>28</td>
<td>705.0</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>60</td>
<td>70</td>
<td>130</td>
<td>220</td>
<td>115</td>
<td>85</td>
<td>36.5</td>
<td>24</td>
<td>740.5</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>35</td>
<td>65</td>
<td>110</td>
<td>190</td>
<td>90</td>
<td>70</td>
<td>37.1</td>
<td>23</td>
<td>620.1</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>80</td>
<td>110</td>
<td>200</td>
<td>300</td>
<td>180</td>
<td>100</td>
<td>38.5</td>
<td>35</td>
<td>1043.5</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>25</td>
<td>45</td>
<td>80</td>
<td>120</td>
<td>65</td>
<td>50</td>
<td>35</td>
<td>17</td>
<td>437.0</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>55</td>
<td>90</td>
<td>150</td>
<td>230</td>
<td>130</td>
<td>78</td>
<td>36.8</td>
<td>29</td>
<td>798.8</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>40</td>
<td>68</td>
<td>115</td>
<td>170</td>
<td>85</td>
<td>72</td>
<td>36.6</td>
<td>22</td>
<td>608.6</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>65</td>
<td>75</td>
<td>160</td>
<td>210</td>
<td>125</td>
<td>90</td>
<td>37.2</td>
<td>26</td>
<td>788.2</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>30</td>
<td>60</td>
<td>105</td>
<td>180</td>
<td>100</td>
<td>65</td>
<td>36.9</td>
<td>22</td>
<td>598.9</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>90</td>
<td>120</td>
<td>210</td>
<td>320</td>
<td>190</td>
<td>110</td>
<td>39</td>
<td>37</td>
<td>1116.0</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>70</td>
<td>95</td>
<td>170</td>
<td>250</td>
<td>135</td>
<td>88</td>
<td>37</td>
<td>31</td>
<td>876.0</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>13</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>52</td>
<td>82</td>
<td>130</td>
<td>195</td>
<td>110</td>
<td>82</td>
<td>36.7</td>
<td>27</td>
<td>714.7</td>
<td>1</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>14</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>45</td>
<td>88</td>
<td>135</td>
<td>210</td>
<td>125</td>
<td>76</td>
<td>37.1</td>
<td>26</td>
<td>742.1</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>15</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>60</td>
<td>73</td>
<td>125</td>
<td>200</td>
<td>95</td>
<td>68</td>
<td>36.8</td>
<td>23</td>
<td>680.8</td>
<td>0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>802.0</td>
<td>1206.0</td>
<td>2080.0</td>
<td>3175.0</td>
<td>1745.0</td>
<td>1189.0</td>
<td>554.9</td>
<td>395.0</td>
<td align="center" colspan="2">11,146.9</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>A detailed breakdown of the computational steps for the proposed theory FSFS on the dataset</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th><inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">j</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi mathvariant="bold-italic">j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">j</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi mathvariant="bold-italic">j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 802.0</td>
<td><inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mn>8987.4</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>1043.5</mml:mn><mml:mo>+</mml:mo><mml:mn>1116</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>15</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |802&#x2013;525.22| &#x003D; 276.78</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 1206.0</td>
<td>525.22</td>
<td><inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |1206&#x2013;525.22| &#x003D; 680.78</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 2080.0</td>
<td>525.22</td>
<td><inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |2080&#x2013;525.22| &#x003D; 1554.78</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 3175.0</td>
<td>525.22</td>
<td><inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |3175&#x2013;525.22| &#x003D; 2649.78</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 1745.0</td>
<td>525.22</td>
<td><inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |1745&#x2013;525.22| &#x003D; 1219.78</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 1189.0</td>
<td>525.22</td>
<td><inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |1189&#x2013;525.22| &#x003D; 663.78</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 554.9</td>
<td>525.22</td>
<td><inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |554.9&#x2013;525.22| &#x003D; 29.68</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 395.0</td>
<td>525.22</td>
<td><inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |395&#x2013;525.22| &#x003D; 130.22</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Computational results of the proposed FSFS theory</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th><italic>F</italic> <sub><italic>i</italic></sub></th>
<th><inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">2</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">6</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">7</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">8</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>802.0</td>
<td>1206.0</td>
<td>2080.0</td>
<td>3175.0</td>
<td>1745.0</td>
<td>1189.0</td>
<td>554.9</td>
<td>395.0</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:math></inline-formula></td>
<td>276.78</td>
<td>680.78</td>
<td>1554.78</td>
<td>2649.78</td>
<td>1219.78</td>
<td>663.78</td>
<td>29.68</td>
<td>130.22</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:mfrac><mml:mn>7205.38</mml:mn><mml:mn>8</mml:mn></mml:mfrac></mml:math></inline-formula></td>
<td>900.672</td>
<td>900.672</td>
<td>900.672</td>
<td>900.672</td>
<td>900.672</td>
<td>900.672</td>
<td>900.672</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-97"><mml:math id="mml-ieqn-97"><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>&#x2265;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-98"><mml:math id="mml-ieqn-98"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:math></inline-formula></td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The similarity between features does not merely reflect the closeness of their numerical values but also highlights the relationship and relevance among the features themselves. Features that exhibit strong associations as <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>6</sub>, and <italic>f</italic><sub>7</sub>, and tend to group, indicating their collective influence on reliable classification results. This underscores the importance of evaluating not only numerical proximity but also feature relevance when diagnosing cases or drawing conclusions. For instance, assigning features <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>4</sub>, and <italic>f</italic><sub>5</sub> to one condition (e.g., a specific disease) and grouping other features under another condition is more effective than treating them as part of the same group. This differentiation enhances diagnostic accuracy and result reliability. In the context of this study, the features with the smallest differences in values are considered the most similar, and the most similar features are typically the most correlated. Conversely, larger differences indicate lower similarity and weaker correlation. This relationship can be conceptualized geometrically: the determination of correlation strength can be visualized as finding the least vertical and horizontal deviations between points, which helps pinpoint the strength and direction of the correlation. In this context, the most significant features are <italic>f</italic><sub>7</sub>, <italic>f</italic><sub>8</sub>, <italic>f</italic><sub>1</sub>, and <italic>f</italic><sub>6</sub>. Among these, <italic>f</italic><sub>7</sub> demonstrates the least difference in value compared to other features, making it the most similar and correlated. On the other hand, <italic>f</italic><sub>4</sub> has the greatest difference in value, making it the least similar and least correlated feature. To better illustrate this concept, consider a scenario where similarity is assessed between a given individual and two others based on specific features. By calculating the differences in feature values, the individual is determined to be closer to the person with the smallest overall difference, signifying greater similarity. This principle underlines the importance of minimizing differences in critical features to identify strong correlations and reliable relationships within data. According to the FSFS approach, the computational results presented in <xref ref-type="table" rid="table-6">Table 6</xref> indicate a stronger correlation and similarity between features <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>7</sub>, and <italic>f</italic><sub>8</sub>, compared to features <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>4</sub>, and <italic>f</italic><sub>5</sub>. Since the distance between features <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>7</sub>, and <italic>f</italic><sub>8</sub> is much smaller than that between features <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>4</sub>, and <italic>f</italic><sub>5</sub>, it follows that features <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>7</sub>, and <italic>f</italic><sub>8</sub> are more similar to each other than features <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>4</sub>, and <italic>f</italic><sub>5</sub>. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates the distribution of important and unimportant data within the experimental dataset. The lower the value of variable FSFS, the more significant the feature, as these points fall below the FSFS average line. Conversely, higher values indicate less importance. This is because lower values represent smaller differences in the output, signifying greater similarity among features. Thus, features with smaller FSFS values are more strongly correlated and considered more critical in influencing the model&#x2019;s results. <xref ref-type="table" rid="table-7">Table 7</xref> presents a ranked list of features based on their statistical significance and similarity. Features <italic>f</italic><sub>7</sub>, <italic>f</italic><sub>8</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>6</sub>, and <italic>f</italic><sub>2</sub> exhibit the highest levels of significance, with <italic>f</italic><sub>7</sub> being the most important feature overall. In contrast, features <italic>f</italic><sub>5</sub>, <italic>f</italic><sub>3</sub>, and <italic>f</italic><sub>4</sub> show the lowest levels of significance, with <italic>f</italic><sub>4</sub> having the weakest correlation. The overall ranking from most to least significant is: <italic>f</italic><sub>7</sub>, <italic>f</italic><sub>8</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>5</sub>, <italic>f</italic><sub>3</sub>, and <italic>f</italic><sub>4</sub>. Therefore, <italic>f</italic><sub>7</sub> has the strongest correlation, while <italic>f</italic><sub>4</sub> demonstrates the weakest correlation. <xref ref-type="fig" rid="fig-4">Fig. 4</xref> illustrates the significance levels, where the zigzag line represents the FSFS ratio, shown in light orange. Feature <italic>f</italic><sub>7</sub> exhibits the smallest distance, indicating the highest similarity and correlation, while feature <italic>f</italic><sub>4</sub> shows the largest difference and distance, reflecting the lowest similarity and correlation. <xref ref-type="table" rid="table-8">Table 8</xref> details a comparison of feature selection techniques with the proposed FSFS method integrated with a Random Forest (RF) classifier. It compares the FSFS theory with several other methods, such as CS, CC, and GA, showing the number and names of the selected features in each method along with the evaluation results. The proposed FSFS method achieved competitive evaluation results, reaching up to 100% accuracy compared to other techniques.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Distribution of important and unimportant data as classified by the FSFS theory</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-3.tif"/>
</fig><table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>A ranked list of features based on their statistical significance and similarity</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th><italic>F</italic> <sub><italic>i</italic></sub> &#x2191;</th>
<th><inline-formula id="ieqn-99"><mml:math id="mml-ieqn-99"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">7</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-100"><mml:math id="mml-ieqn-100"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">8</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-101"><mml:math id="mml-ieqn-101"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-102"><mml:math id="mml-ieqn-102"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">6</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">2</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Name (<inline-formula id="ieqn-107"><mml:math id="mml-ieqn-107"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>)</td>
<td>Temp (C)</td>
<td>BMI</td>
<td>Age</td>
<td>Heart (bpm)</td>
<td>Weight (kg)</td>
<td>Glucose</td>
<td>Blood pressure</td>
<td>Cholesterol (mg/dL)</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-108"><mml:math id="mml-ieqn-108"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:math></inline-formula>&#x2191;</td>
<td>29.68</td>
<td>130.22</td>
<td>276.78</td>
<td>663.78</td>
<td>680.78</td>
<td>1219.78</td>
<td>1554.78</td>
<td>2649.78</td>
</tr>
<tr align="center">
<td></td>
<td align="center" colspan="5">Most important features</td>
<td align="center" colspan="3">Less important features</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Significance levels, and FSFS values</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-4.tif"/>
</fig><table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>Comparative evaluation of established FS techniques versus the proposed FSFS method</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th align="center">FS</th>
<th align="center">Selected Features</th>
<th align="center">Training<break/>(70<inline-formula id="ieqn-109"><mml:math id="mml-ieqn-109"><mml:mi mathvariant="bold">&#x0025;</mml:mi></mml:math></inline-formula>)</th>
<th align="center">Testing (30<inline-formula id="ieqn-110"><mml:math id="mml-ieqn-110"><mml:mi mathvariant="bold">&#x0025;</mml:mi></mml:math></inline-formula>)</th>
<th align="center">Algorithm</th>
<th align="center">Accuracy &#x003D; <inline-formula id="ieqn-111"><mml:math id="mml-ieqn-111"><mml:mfrac><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">Number</mml:mtext></mml:mrow><mml:mspace width="thinmathspace" /><mml:mrow><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">f</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mrow><mml:mi mathvariant="bold-italic">C</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mrow><mml:mi mathvariant="bold-italic">P</mml:mi><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mi mathvariant="bold-italic">e</mml:mi><mml:mi mathvariant="bold-italic">d</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mi mathvariant="bold-italic">c</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mi mathvariant="bold-italic">s</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">T</mml:mi><mml:mi mathvariant="bold-italic">o</mml:mi><mml:mi mathvariant="bold-italic">t</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">l</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mrow><mml:mtext mathvariant="bold">Predictions</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:math></inline-formula> &#x00D7; 100</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><bold>CS</bold></td>
<td><italic>f</italic><sub>4</sub>, <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>5</sub></td>
<td><italic>r</italic><sub>1</sub>, <italic>r</italic><sub>2</sub>, <italic>r</italic><sub>5</sub>, <italic>r</italic><sub>7</sub>, <italic>r</italic><sub>9</sub>, <italic>r</italic><sub>11</sub>, <italic>r</italic><sub>12</sub>, <italic>r</italic><sub>13</sub>, <italic>r</italic><sub>14</sub>, <italic>r</italic><sub>15</sub></td>
<td><italic>r</italic><sub>3</sub>, <italic>r</italic><sub>4</sub>, <italic>r</italic><sub>6</sub>, <italic>r</italic><sub>8</sub>, <italic>r</italic><sub>10</sub></td>
<td>RF</td>
<td><bold>Correct predictions:</bold> 3 (<italic>r</italic><sub>3</sub>, <italic>r</italic><sub>6</sub>, <italic>r</italic><sub>8</sub>)<break/><bold>Total predictions:</bold> 5 and Accuracy &#x003D; <inline-formula id="ieqn-112"><mml:math id="mml-ieqn-112"><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula> &#x00D7; 100 &#x003D; 60<inline-formula id="ieqn-113"><mml:math id="mml-ieqn-113"><mml:mrow><mml:mtext>\%&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr align="center">
<td><bold>CC</bold></td>
<td><italic>f</italic><sub>6</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>5</sub>, <italic>f</italic><sub>2</sub></td>
<td><italic>r</italic><sub>1</sub>, <italic>r</italic><sub>2</sub>, <italic>r</italic><sub>5</sub>, <italic>r</italic><sub>7</sub>, <italic>r</italic><sub>9</sub>, <italic>r</italic><sub>11</sub>, <italic>r</italic><sub>12</sub>, <italic>r</italic><sub>13</sub>, <italic>r</italic><sub>14</sub>, <italic>r</italic><sub>15</sub></td>
<td><italic>r</italic><sub>3</sub>, <italic>r</italic><sub>4</sub>, <italic>r</italic><sub>6</sub>, <italic>r</italic><sub>8</sub>, <italic>r</italic><sub>10</sub></td>
<td>RF</td>
<td><bold>Correct predictions:</bold> 3 (<italic>r</italic><sub>3</sub>, <italic>r</italic><sub>6</sub>, <italic>r</italic><sub>8</sub>) <bold>Total predictions:</bold> 5 and Accuracy &#x003D; <inline-formula id="ieqn-114"><mml:math id="mml-ieqn-114"><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:math></inline-formula> &#x00D7; 100 &#x003D; 60<inline-formula id="ieqn-115"><mml:math id="mml-ieqn-115"><mml:mrow><mml:mtext>\%&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr align="center">
<td><bold>GA</bold></td>
<td><italic>f</italic><sub>7</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>3</sub></td>
<td><italic>r</italic><sub>1</sub>, <italic>r</italic><sub>2</sub>, <italic>r</italic><sub>5</sub>, <italic>r</italic><sub>7</sub>, <italic>r</italic><sub>9</sub>, <italic>r</italic><sub>11</sub>, <italic>r</italic><sub>12</sub>, <italic>r</italic><sub>13</sub>, <italic>r</italic><sub>14</sub>, <italic>r</italic><sub>15</sub></td>
<td><italic>r</italic><sub>3</sub>, <italic>r</italic><sub>4</sub>, <italic>r</italic><sub>6</sub>, <italic>r</italic><sub>8</sub>, <italic>r</italic><sub>10</sub></td>
<td>RF</td>
<td><bold>Correct predictions:</bold> 3 (<italic>r</italic><sub>3</sub>, <italic>r</italic><sub>6</sub>, <italic>r</italic><sub>8</sub>)<break/><bold>Total predictions:</bold> 5 and Accuracy &#x003D; <inline-formula id="ieqn-116"><mml:math id="mml-ieqn-116"><mml:mfrac><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula> &#x00D7; 100 &#x003D; 60<inline-formula id="ieqn-117"><mml:math id="mml-ieqn-117"><mml:mrow><mml:mtext>\%&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr align="center">
<td><bold>This study (FSFS)</bold></td>
<td><bold><italic>f</italic></bold><sub><bold>7</bold></sub><bold>, <italic>f</italic></bold><sub><bold>8</bold></sub><bold>, <italic>f</italic></bold><sub><bold>1</bold></sub><bold>, <italic>f</italic></bold><sub><bold>6</bold></sub><bold>, <italic>f</italic></bold><sub><bold>2</bold></sub></td>
<td><bold><italic>r</italic></bold><sub><bold>1</bold></sub><bold>, <italic>r</italic></bold><sub><bold>2</bold></sub><bold>, <italic>r</italic></bold><sub><bold>4</bold></sub><bold>, <italic>r</italic></bold><sub><bold>5</bold></sub><bold>, <italic>r</italic></bold><sub><bold>7</bold></sub><bold>, <italic>r</italic></bold><sub><bold>8</bold></sub><bold>, <italic>r</italic></bold><sub><bold>10</bold></sub><bold>, <italic>r</italic></bold><sub><bold>11,</bold></sub> <bold><italic>r</italic></bold><sub><bold>13</bold></sub><bold>, <italic>r</italic></bold><sub><bold>14</bold></sub></td>
<td><bold><italic>r</italic></bold><sub><bold>3</bold></sub><bold>, <italic>r</italic></bold><sub><bold>6</bold></sub><bold>, <italic>r</italic></bold><sub><bold>9</bold></sub><bold>, <italic>r</italic></bold><sub><bold>12,</bold></sub> <bold><italic>r</italic></bold><sub><bold>15</bold></sub></td>
<td><bold>RF</bold></td>
<td><bold>Correct Predictions: 5 ( <italic>r</italic></bold><sub><bold>3</bold></sub><bold>, <italic>r</italic></bold><sub><bold>6</bold></sub><bold>, <italic>r</italic></bold><sub><bold>9</bold></sub><bold>, <italic>r</italic></bold><sub><bold>12</bold></sub><bold>, <italic>r</italic></bold><sub><bold>15</bold></sub><bold>)</bold><break/><bold>Total Predictions: 5 and Accuracy &#x003D; </bold><inline-formula id="ieqn-118"><mml:math id="mml-ieqn-118"><mml:mfrac><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mfrac></mml:math></inline-formula> <bold>&#x00D7; 100 &#x003D; 100</bold><inline-formula id="ieqn-119"><mml:math id="mml-ieqn-119"><mml:mrow><mml:mtext mathvariant="bold">\!\%&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In this study, we define affected model outputs as cases where selecting less important features leads to contradictory outcomes. For example, selecting 5 features out of 8 may classify a new data record into class A, while selecting 6 features out of 8 from the same dataset may classify the same record into class B. This inconsistency arises due to the removal of features with extreme values that significantly and directly affect the model&#x2019;s results, resulting in inconsistent and volatile outcomes. <xref ref-type="fig" rid="fig-5">Figs. 5</xref> and <xref ref-type="fig" rid="fig-6">6</xref> present a comparison of our proposed feature selection method on the experimental dataset with several widely used and established techniques, namely CS, CC, GA, and RF. In our newly developed statistical FSFS method, the features <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>4</sub>, and <italic>f</italic><sub>5</sub> were identified as less significant, while <italic>f</italic><sub>7</sub>, <italic>f</italic><sub>8</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>6</sub>, and <italic>f</italic><sub>2</sub> were ranked as more important, in that order. When applying the CS method to the same dataset, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>7</sub>, and <italic>f</italic><sub>8</sub> were classified as less important, while <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, and <italic>f</italic><sub>5</sub> were categorized as more important, also in that order. similarly, the CC method selected <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>5</sub>, and <italic>f</italic><sub>2</sub> as the most important features. Additionally, when using the GA method on the same dataset, the features <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>5</sub>, and <italic>f</italic><sub>8</sub> were deemed less important, whereas <italic>f</italic><sub>7</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>2</sub>, and <italic>f</italic><sub>3</sub> were ranked as more significant. Furthermore, implementing the encoding mechanisms on the same dataset with full feature selection resulted in optimal model performance. This outcome suggests that ignoring some features can lead to negative results and highlights the significant impact that the choice of test data has on model performance.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Feature significance determined using the RF method on the complete feature set</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-5.tif"/>
</fig>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>FSFS vs. CS, CC &#x0026; GA on dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-6a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-6b.tif"/>
</fig>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Second Scenario</title>
<p>In the second scenario, a health-related dataset, referred to as the Breast Cancer Wisconsin (Original) dataset was used. This dataset was applied and evaluated using the proposed FSFS method. The dataset contains 10 features, with the ID feature excluded due to its significant deviation from the other data. The dataset comprises 699 records. The proposed FSFS method was calculated for each feature, along with the average for each record, as shown in <xref ref-type="table" rid="table-9">Table 9</xref>. <xref ref-type="table" rid="table-10">Table 10</xref> presents the calculation of similarity based on the proposed FSFS theory applied to the dataset. It is observed that features <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>8</sub>, and <italic>f</italic><sub>9</sub> were classified as highly significant, while features <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>5</sub> and <italic>f</italic><sub>7</sub> showed lower similarity and correlation. The features were automatically classified based on the FSFS proposed threshold. Additionally, the chart illustrates the relationship between the total number of features and the correlation within the proposed FSFS framework. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> illustrates the relation between the sum and FSFS values for similarity. <xref ref-type="table" rid="table-11">Table 11</xref> ranks the importance of the features, with feature f<sub>9</sub> being the most important, while feature <italic>f</italic><sub>1</sub> is the most distinct, having the lowest similarity and correlation. Overall, the features are ranked in terms of importance, similarity, and correlation from highest to lowest as follows: <italic>f</italic><sub>9</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>6</sub>, <italic>f</italic><sub>8</sub>, <italic>f</italic><sub>2</sub>, <italic>f</italic><sub>5</sub>, <italic>f</italic><sub>7</sub>, and <italic>f</italic><sub>1</sub>, respectively. <xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows the distribution of significant and insignificant data within the Breast Cancer Wisconsin (Original) dataset, while <xref ref-type="fig" rid="fig-9">Fig. 9</xref> illustrates the feature importance rankings in the same dataset. Features with lower FSFS values are considered more important, as they lie below the FSFS average line, while higher values correspond to less important features. Lower FSFS values indicate smaller output differences, which reflect a higher similarity among features. Therefore, features with the smallest FSFS values are those most closely aligned and have the greatest impact on the model&#x2019;s outcomes. <xref ref-type="table" rid="table-12">Table 12</xref> presents a comparative analysis of the proposed FSFS method and theory against several other feature selection techniques, such as the Genetic Approach, Exhaustive Approach, and Greedy Stepwise Approach. The number of selected features varies among these methods, influenced by their underlying principles, methodologies, and statistical formulations. Notably, the FSFS method identified only five features, namely <italic>f</italic><sub>9</sub>, <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>6</sub>, and <italic>f</italic><sub>8</sub>, as highly significant and impactful on the model&#x2019;s outcomes. This represents the smallest feature subset selected by any of the methods, yet it consistently outperformed the others in terms of evaluation metrics. <xref ref-type="fig" rid="fig-10">Fig. 10</xref> presents a comparative analysis of the proposed FSFS theory against other feature selection techniques, visualizing accuracy and the number of features selected by each method. The RF classifier integrated with the FSFS approach achieved an accuracy of 97.81% on the Breast Cancer Wisconsin (Original) dataset, along with a precision of 97.2%, recall of 98.1%, and an F1-score of 97.7%.</p>
<table-wrap id="table-9">
<label>Table 9</label>
<caption>
<title>Calculation of the FSFS metric for each feature, incorporating the record-wise average</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th><inline-formula id="ieqn-120"><mml:math id="mml-ieqn-120"><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-121"><mml:math id="mml-ieqn-121"><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">j</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi mathvariant="bold-italic">j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-122"><mml:math id="mml-ieqn-122"><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mi mathvariant="bold-italic">S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">0</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">a</mml:mi><mml:mi mathvariant="bold-italic">x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">M</mml:mi></mml:mrow></mml:msup><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi mathvariant="bold-italic">j</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">J</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mi mathvariant="bold-italic">i</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">n</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="bold-italic">R</mml:mi><mml:mi mathvariant="bold-italic">j</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mi mathvariant="bold-italic">n</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">m</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">r</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><inline-formula id="ieqn-123"><mml:math id="mml-ieqn-123"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 3088</td>
<td><inline-formula id="ieqn-124"><mml:math id="mml-ieqn-124"><mml:mrow><mml:mo>[</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mn>19</mml:mn><mml:mo>,</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mn>670</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mi>a</mml:mi><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>82</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mn>699</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-125"><mml:math id="mml-ieqn-125"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |3088<italic>&#x2013;</italic>28.06| &#x003D; 3059.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-126"><mml:math id="mml-ieqn-126"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 2191</td>
<td>28.06</td>
<td><inline-formula id="ieqn-127"><mml:math id="mml-ieqn-127"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |2191<italic>&#x2013;</italic>28.06| &#x003D; 2162.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-128"><mml:math id="mml-ieqn-128"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 1993</td>
<td>28.06</td>
<td><inline-formula id="ieqn-129"><mml:math id="mml-ieqn-129"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |1993<italic>&#x2013;</italic>28.06| &#x003D; 1964.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-130"><mml:math id="mml-ieqn-130"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 1962</td>
<td>28.06</td>
<td><inline-formula id="ieqn-131"><mml:math id="mml-ieqn-131"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |1962<italic>&#x2013;</italic>28.06| &#x003D; 1933.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-132"><mml:math id="mml-ieqn-132"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 2248</td>
<td>28.06</td>
<td><inline-formula id="ieqn-133"><mml:math id="mml-ieqn-133"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |2248<italic>&#x2013;</italic>28.06| &#x003D; 2219.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-134"><mml:math id="mml-ieqn-134"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 2000</td>
<td>28.06</td>
<td><inline-formula id="ieqn-135"><mml:math id="mml-ieqn-135"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |2000<italic>&#x2013;</italic>28.06| &#x003D; 1971.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-136"><mml:math id="mml-ieqn-136"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 2403</td>
<td>28.06</td>
<td><inline-formula id="ieqn-137"><mml:math id="mml-ieqn-137"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |2403<italic>&#x2013;</italic>28.06| &#x003D; 2374.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-138"><mml:math id="mml-ieqn-138"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 2004</td>
<td>28.06</td>
<td><inline-formula id="ieqn-139"><mml:math id="mml-ieqn-139"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |2004<italic>&#x2013;</italic>28.06| &#x003D; 1975.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-140"><mml:math id="mml-ieqn-140"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; 1111</td>
<td>28.06</td>
<td><inline-formula id="ieqn-141"><mml:math id="mml-ieqn-141"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; |1111<italic>&#x2013;</italic>28.06| &#x003D; 1082.94</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-10">
<label>Table 10</label>
<caption>
<title>Calculation of similarity based on the proposed FSFS theory applied to the dataset</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Fi</th>
<th><inline-formula id="ieqn-142"><mml:math id="mml-ieqn-142"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-143"><mml:math id="mml-ieqn-143"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">2</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-144"><mml:math id="mml-ieqn-144"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-145"><mml:math id="mml-ieqn-145"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-146"><mml:math id="mml-ieqn-146"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-147"><mml:math id="mml-ieqn-147"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">6</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-148"><mml:math id="mml-ieqn-148"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">7</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-149"><mml:math id="mml-ieqn-149"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">8</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-150"><mml:math id="mml-ieqn-150"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">9</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td><inline-formula id="ieqn-151"><mml:math id="mml-ieqn-151"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>3088</td>
<td>2191</td>
<td>1993</td>
<td>1962</td>
<td>2248</td>
<td>2000</td>
<td>2403</td>
<td>2004</td>
<td>1111</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-152"><mml:math id="mml-ieqn-152"><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-153"><mml:math id="mml-ieqn-153"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-154"><mml:math id="mml-ieqn-154"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-155"><mml:math id="mml-ieqn-155"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-156"><mml:math id="mml-ieqn-156"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-157"><mml:math id="mml-ieqn-157"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-158"><mml:math id="mml-ieqn-158"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-159"><mml:math id="mml-ieqn-159"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-160"><mml:math id="mml-ieqn-160"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-161"><mml:math id="mml-ieqn-161"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-162"><mml:math id="mml-ieqn-162"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:math></inline-formula></td>
<td>3059.94</td>
<td>2162.94</td>
<td>1964.94</td>
<td>1933.94</td>
<td>2219.94</td>
<td>1971.94</td>
<td>2374.94</td>
<td>1975.94</td>
<td>1082.94</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-163"><mml:math id="mml-ieqn-163"><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-164"><mml:math id="mml-ieqn-164"><mml:mfrac><mml:mrow><mml:mn>18</mml:mn><mml:mo>,</mml:mo><mml:mn>747.46</mml:mn></mml:mrow><mml:mn>9</mml:mn></mml:mfrac></mml:math></inline-formula></td>
<td>2083.05</td>
<td>2083.05</td>
<td>2083.05</td>
<td>2083.05</td>
<td>2083.05</td>
<td>2083.05</td>
<td>2083.05</td>
<td>2083.05</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-165"><mml:math id="mml-ieqn-165"><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac><mml:mo>&#x2265;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x221A;</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-166"><mml:math id="mml-ieqn-166"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003E;</mml:mo><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mfrac></mml:math></inline-formula></td>
<td>&#x221A;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x221A;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Relationship between the sum and FSFS values for similarity</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-7.tif"/>
</fig><table-wrap id="table-11">
<label>Table 11</label>
<caption>
<title>Features ranked in ascending order of importance</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th><italic>F</italic> <sub><italic>i</italic></sub> &#x2191;</th>
<th><inline-formula id="ieqn-167"><mml:math id="mml-ieqn-167"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">9</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-168"><mml:math id="mml-ieqn-168"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-169"><mml:math id="mml-ieqn-169"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-170"><mml:math id="mml-ieqn-170"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">6</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-171"><mml:math id="mml-ieqn-171"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">8</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-172"><mml:math id="mml-ieqn-172"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">2</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-173"><mml:math id="mml-ieqn-173"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">5</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-174"><mml:math id="mml-ieqn-174"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">7</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-175"><mml:math id="mml-ieqn-175"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Name (<inline-formula id="ieqn-176"><mml:math id="mml-ieqn-176"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>)</td>
<td>Mitoses</td>
<td>Marginal adhesion</td>
<td>Cell shape</td>
<td>Bare Nuclei</td>
<td>Normal Nucleoli</td>
<td>Cell size</td>
<td>Single epithelial</td>
<td>Bland Chromatin</td>
<td>Clump thickness</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-177"><mml:math id="mml-ieqn-177"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:math></inline-formula>&#x2191;</td>
<td>1082.94</td>
<td>1933.94</td>
<td>1964.94</td>
<td>1971.94</td>
<td>1975.94</td>
<td>2162.94</td>
<td>2219.94</td>
<td>2374.94</td>
<td>3059.94</td>
</tr>
<tr align="center">
<td></td>
<td align="center" colspan="5">Most important features</td>
<td align="center" colspan="4">Less important features</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Distribution of data classified as significant and insignificant by the FSFS method</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-8.tif"/>
</fig><fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Rank of features importance in Breast Cancer Wisconsin (Original) dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-9.tif"/>
</fig><table-wrap id="table-12">
<label>Table 12</label>
<caption>
<title>Proposed approach vs. other feature-selection techniques: a comparative analysis</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col/>
<col align="center"/>
<col align="center"/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Ref.</th>
<th align="center">Feature Selection Approach</th>
<th align="center">Algorithm</th>
<th>Dataset</th>
<th align="center">Selected Feature</th>
<th align="center">Name of Selected Features</th>
<th>Accuracy (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td rowspan="3">[<xref ref-type="bibr" rid="ref-52">52</xref>&#x2013;<xref ref-type="bibr" rid="ref-55">55</xref>]</td>
<td>Genetic approach</td>
<td>Decision tree (DT)</td>
<td rowspan="4"><bold>Breast Cancer Wisconsin (Original)</bold></td>
<td>6 out of 9</td>
<td>N/A</td>
<td>94.84</td>
</tr>
<tr align="center">
<td>Exhaustive approach</td>
<td>DT</td>
<td>6 out of 9</td>
<td>N/A</td>
<td>95.13</td>
</tr>
<tr align="center">
<td>Greedy stepwise approach</td>
<td>DT</td>
<td>7 out of 9</td>
<td>N/A</td>
<td>93.99</td>
</tr>
<tr align="center">
<td><bold>This study (FSFS)</bold></td>
<td><bold>FSFS approach</bold></td>
<td><bold>RF</bold></td>
<td><bold>5 out of 9</bold></td>
<td><inline-formula id="ieqn-178"><mml:math id="mml-ieqn-178"><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">9</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">6</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">8</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><bold>97.81</bold></td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Comparative analysis of the proposed FSFS theory against other FS techniques</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-10.tif"/>
</fig>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Third Scenario</title>
<p>In this scenario, a cybersecurity-focused dataset known as the KDD CUP 1999 DATASET was employed. This dataset, consisting of 41 features and 4,000,000 instances, was subjected to the proposed FSFS method. The FSFS metric was calculated for each feature, and an average FSFS score was determined for each record, as presented in <xref ref-type="table" rid="table-13">Table 13</xref>. The proposed FSFS theory selected only eight features from the total 41, prioritizing those with the highest significance and correlation. In this scenario, the eight features were selected using the same mechanism as in the first and second scenarios, and the proposed FSFS method was applied for the calculations, as outlined step by step in the earlier scenarios. The features identified as most impactful, ranked from highest to lowest according to the FSFS theory, are <italic>f</italic><sub>4</sub>, f<sub>1</sub>, <italic>f</italic><sub>23</sub>, <italic>f</italic><sub>9</sub>, <italic>f</italic><sub>22</sub>, <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>11</sub>, and <italic>f</italic><sub>19</sub>. Feature <italic>f</italic><sub>4</sub> (flag) is the most significant, while <italic>f</italic><sub>19</sub> (Num access files) is the least, as depicted in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>. The corresponding labels for these features are flag, Duration, count, Urgent, Is guest login, Service, Number failed logins, Num access files, Is host login, and Logged in, &#x2026; , etc., respectively. Notably, 31 features were deemed insignificant and discarded by the FSFS approach. <xref ref-type="table" rid="table-14">Table 14</xref> presents a comparative analysis of the proposed FSFS method and theory against several other feature selection techniques, including the CS Approach, Gain Ratio Approach, and Filtered Subset Eval Approach. The number of selected features varies among these methods, influenced by their underlying principles, methodologies, and statistical formulations. Notably, the FSFS method identified only eight features, namely <italic>f</italic><sub>4</sub>, <italic>f</italic><sub>1</sub>, <italic>f</italic><sub>23</sub>, <italic>f</italic><sub>9</sub>, <italic>f</italic><sub>22</sub>, <italic>f</italic><sub>3</sub>, <italic>f</italic><sub>11</sub>, and <italic>f</italic><sub>19</sub>, as highly significant and impactful on the model&#x2019;s outcomes. The integration of the FSFS approach with the RF classifier yielded robust results on the KDD CUP 1999 dataset. Specifically, the model achieved an accuracy of 98.63%, with precision, recall, and F1-score values of 97.9%, 98.3%, and 98.5%, respectively. This demonstrates the FSFS method&#x2019;s ability to select a minimal feature subset while consistently outperforming the others in terms of evaluation metrics. <xref ref-type="fig" rid="fig-12">Fig. 12</xref> presents a comparative analysis of the proposed FSFS theory against other feature selection techniques, visualizing the accuracy achieved and the number of features selected by each method. <xref ref-type="table" rid="table-15">Table 15</xref> presents a comparison of several data processing techniques applied to a standard dataset, all implemented on the same Deep Neural Network (DNN) model, which demonstrated superior performance compared to other models for the selected data. The results reveal that applying encoding approaches integrated with normalization techniques had a positive and direct impact on the model outputs in the first scenario. In contrast, when the one-hot encoding technique was used without normalization, the results deteriorated due to an increase in data dimensionality, which led to misclassification, as observed in the second scenario. In the third scenario, which aligns with the workflow proposed in this study, encoding mechanisms were integrated with normalization techniques and the proposed FSFS approach. This combination resulted in notable variations in performance compared to the first scenario, with improvements attributed to the selection of reliable features based on similarity. These reliable features contributed to achieving balanced outcomes. The findings underscore that encoding and normalization techniques play a crucial role in improving model performance. However, achieving high accuracy alone is insufficient unless the results are reliable, balanced and reasonable, preserving the data most critical to the model&#x2019;s outputs. In this discussion, we analyze and conclude the observations and results of the proposed FSFS method, comparing it to similar techniques. The primary objective of the proposed FSFS method was to identify features that have the most significant impact&#x2014;whether positive or negative&#x2014;on the model outputs. These findings provide increased reliability in model outcomes, making them highly dependable for decision-making processes. Moreover, the FSFS technique is highly interpretable, making it a crucial foundation for the concept of XAI. It also contributes to the overall improvement of data-driven models&#x2019; performance. While FSFS achieves reasonably good accuracy compared to its counterparts, it stands out in its interpretability.</p>
<table-wrap id="table-13">
<label>Table 13</label>
<caption>
<title>The FSFS metric calculation</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col align="center"/>
<col/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr align="center">
<th><italic>F</italic> <sub><italic>i</italic></sub> &#x2191;</th>
<th><inline-formula id="ieqn-179"><mml:math id="mml-ieqn-179"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-180"><mml:math id="mml-ieqn-180"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-181"><mml:math id="mml-ieqn-181"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">23</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-182"><mml:math id="mml-ieqn-182"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">9</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-183"><mml:math id="mml-ieqn-183"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">22</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-184"><mml:math id="mml-ieqn-184"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-185"><mml:math id="mml-ieqn-185"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">11</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-186"><mml:math id="mml-ieqn-186"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">19</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-187"><mml:math id="mml-ieqn-187"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">21</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-188"><mml:math id="mml-ieqn-188"><mml:mo>&#x2211;</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">f</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">12</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Name (<inline-formula id="ieqn-189"><mml:math id="mml-ieqn-189"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>)</td>
<td>Flag</td>
<td>Duration</td>
<td>Count</td>
<td>Urgent</td>
<td>Is guest login</td>
<td>Service</td>
<td>failed logins</td>
<td>Access files</td>
<td>Is host login</td>
<td>Logged in</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-190"><mml:math id="mml-ieqn-190"><mml:mrow><mml:mo>(</mml:mo><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mi>F</mml:mi><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:math></inline-formula>&#x2191;</td>
<td>78.01</td>
<td>82.25</td>
<td>97.17</td>
<td>114.32</td>
<td>121.01</td>
<td>137.41</td>
<td>151.73</td>
<td>182.11</td>
<td>201.03</td>
<td>217.91</td>
</tr>
<tr align="center">
<td></td>
<td align="center" colspan="8">Most important features</td>
<td align="center" colspan="2">Less important</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Features ranked by impact, from highest to lowest, according to the FSFS approach</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-11.tif"/>
</fig><table-wrap id="table-14">
<label>Table 14</label>
<caption>
<title>FSFS vs. other FS techniques: a comparative analysis</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Ref.</th>
<th align="center">FS<break/>Approach</th>
<th>Algorithm</th>
<th>Dataset</th>
<th align="center">Selected<break/>Feature</th>
<th>Name of Selected Features</th>
<th>Accuracy (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td rowspan="3">[<xref ref-type="bibr" rid="ref-52">52</xref>&#x2013;<xref ref-type="bibr" rid="ref-54">54</xref>]</td>
<td>Chi-squared</td>
<td>NA&#x00CF;VE BAYES (NB)</td>
<td rowspan="4">KDD CUP 1999 DATASET</td>
<td>30 out of 41</td>
<td><inline-formula id="ieqn-191"><mml:math id="mml-ieqn-191"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>40</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>33</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>41</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>38</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>23</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>37</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>35</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>34</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula><break/><inline-formula id="ieqn-192"><mml:math id="mml-ieqn-192"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>27</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>24</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>29</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>36</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>25</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>13</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>28</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>39</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>32</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>30</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>18</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula><break/><inline-formula id="ieqn-193"><mml:math id="mml-ieqn-193"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>31</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>93.209</td>
</tr>
<tr align="center">
<td>Gain ratio</td>
<td>NB</td>
<td>30 out of 41</td>
<td><inline-formula id="ieqn-194"><mml:math id="mml-ieqn-194"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>13</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>18</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>28</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>29</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>30</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>41</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula><break/><inline-formula id="ieqn-195"><mml:math id="mml-ieqn-195"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>21</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>22</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>27</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>36</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>25</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>14</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>16</mml:mn><mml:mo>,</mml:mo></mml:mrow></mml:msub><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>40</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>24</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula><break/><inline-formula id="ieqn-196"><mml:math id="mml-ieqn-196"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>35</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>38</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>26</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>34</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>89.037</td>
</tr>
<tr align="center">
<td>Filtered subset eval</td>
<td>Decision tree</td>
<td>7 out of 41</td>
<td><inline-formula id="ieqn-197"><mml:math id="mml-ieqn-197"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>24</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>36</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td>97.026</td>
</tr>
<tr align="center">
<td><bold>This study (FSFS)</bold></td>
<td><bold>FSFS approach</bold></td>
<td><bold>RF</bold></td>
<td><bold>8 out of 41</bold></td>
<td><inline-formula id="ieqn-198"><mml:math id="mml-ieqn-198"><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">4</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">1</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">23</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">9</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">22</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">3</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">11</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">19</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td><bold>98.63</bold></td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Comparative analysis of the proposed FSFS theory against other feature selection techniques</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64872-fig-12.tif"/>
</fig><table-wrap id="table-15">
<label>Table 15</label>
<caption>
<title>Comparison of data processing techniques and accuracy on a standard dataset</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col align="center"/>
<col align="center"/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Features&#x2192;Datasets&#x2193;</th>
<th>Models</th>
<th align="center">Replacement Encoding</th>
<th align="center">One Hot Encoding</th>
<th align="center">Min-Max Normalized</th>
<th align="center">Selected Features</th>
<th>FSFS</th>
<th>Accuracy (<inline-formula id="ieqn-199"><mml:math id="mml-ieqn-199"><mml:mrow><mml:mtext mathvariant="bold">\%&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula>)</th>
<th>F1-Score (<inline-formula id="ieqn-200"><mml:math id="mml-ieqn-200"><mml:mi mathvariant="bold">&#x0025;</mml:mi></mml:math></inline-formula>)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td align="center" colspan="9"><bold>First scenario</bold></td>
</tr>
<tr align="center">
<td>NSL-KDD</td>
<td rowspan="3">DNN</td>
<td rowspan="3">Yes</td>
<td rowspan="3">No</td>
<td rowspan="3">Yes</td>
<td>Full (41)</td>
<td rowspan="3">No</td>
<td>98.17</td>
<td>98.2</td>
</tr>
<tr align="center">
<td>UNSW-NB15</td>
<td>Full (42)</td>
<td>95.86</td>
<td>95.5</td>
</tr>
<tr align="center">
<td>Edge-IIoT</td>
<td>Full (61)</td>
<td>94.62</td>
<td>94.2</td>
</tr>
<tr align="center">
<td align="center" colspan="9"><bold>Second scenario</bold></td>
</tr>
<tr align="center">
<td>NSL-KDD</td>
<td rowspan="3">DNN</td>
<td rowspan="3">No</td>
<td>41 became 122 features</td>
<td rowspan="3">No</td>
<td>Full (41)</td>
<td rowspan="3">No</td>
<td>94.68</td>
<td>94.5</td>
</tr>
<tr align="center">
<td>UNSW-NB15</td>
<td>42 became 194 features</td>
<td>Full (42)</td>
<td>93.15</td>
<td>92.4</td>
</tr>
<tr align="center">
<td>Edge-IIoT dataset</td>
<td>61 became 218 features</td>
<td>Full (61)</td>
<td>86.11</td>
<td>86.4</td>
</tr>
<tr align="center">
<td align="center" colspan="9"><bold>Third scenario</bold></td>
</tr>
<tr align="center">
<td>NSL-KDD</td>
<td rowspan="3">DNN</td>
<td rowspan="3">Yes</td>
<td rowspan="3">No</td>
<td rowspan="3">Yes</td>
<td>Selected (16)</td>
<td rowspan="3">Yes</td>
<td>98.94</td>
<td>98.7</td>
</tr>
<tr align="center">
<td>UNSW-NB15</td>
<td>Selected (20)</td>
<td>94.27</td>
<td>92.2</td>
</tr>
<tr align="center">
<td>Edge-IIoT dataset</td>
<td>Selected (17)</td>
<td>98.46</td>
<td>98.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>FS techniques range from traditional to modern, each with specific goals. They differ in how they operate and choose the most suitable features. It is observed that each method selects different features based on its mechanism of operation, but certain features are consistently ranked as more important across all feature selection techniques (FSFS, GA, CC, etc.). This indicates that differences in feature selection reflect the unique statistical principles of each technique. The most suitable features are not necessarily the most important but are often the most tailored to achieve high performance and evaluation results. Conversely, agreement on certain features among the techniques suggests common factors in the statistical processes and feature selection criteria, possibly representing the calculation of feature similarity. Some techniques focus on improving system performance by reducing data dimensions, such as CC and CS methods, which were highlighted and compared in the first and third scenarios. Others prioritize achieving high evaluation results through optimal solutions, like GA techniques, discussed in the first and second scenarios. Meanwhile, others aim to select noise-free features. Methods focused on system performance cannot often choose the most impactful features for model outputs. Conversely, less impactful features may sometimes be more critical for achieving high performance. In both cases, reducing features generally enhances performance. Some methods prioritize high evaluation results regardless of feature importance, navigating through feature selection to find optimal solutions, resulting in high accuracy but overlooking crucial features. However, feature navigation does not always guarantee high results. Therefore, techniques that prioritize high performance do not necessarily select the most important features, instead focusing on performance optimization and speed without considering feature importance in ensuring reliable outcomes. It is better to choose features that provide reliable results regardless of their accuracy level. However, in this study, selecting the most suitable and important features guarantees reliable outcomes with reasonable performance and evaluation results. The computational complexity of the FSFS approach, which utilizes the IQR method for outlier detection, is primarily driven by the sorting required for each feature. For a dataset with features (<italic>f</italic>) and records (<italic>n</italic>), sorting each feature to compute quartiles incurs a time complexity of O(<italic>n</italic> log <italic>n</italic>) per feature, resulting in an overall complexity of O(<italic>f</italic><sub><italic>n</italic></sub> log <italic>n</italic>). Additional operations such as computing feature sums and correlations contribute linearly, but they are overshadowed by the sorting step. In terms of space, the method requires O(<italic>f</italic><sub><italic>n</italic></sub>) to store the dataset and minimal additional space for intermediate computations. This analysis indicates that while the FSFS approach is scalable for large-sized datasets, its performance may be impacted when dealing with very large datasets due to the computational cost associated with sorting. It emphasized the dependencies and interactions among features, as characterized by similarity measures, to ensure reliable model predictions. The proposed FSFS method addresses the challenge of ensuring both performance and reliability by prioritizing the identification of similarities among features. This approach highlights the correlations between data points and selects the most relevant features, ensuring that critical information is preserved. By focusing on these relationships, FSFS enhances the reliability and trustworthiness of the results while maintaining reasonable performance levels. Thus, continuous innovation in FS techniques is essential for adapting to evolving and diverse datasets.</p>
<p>One limitation of the proposed FSFS method is its current applicability only to numerical data, but it can be adapted with data encoding techniques to transform data into structured numerical forms suitable for statistical operations. As a future direction for this study, FSFS is interpretable for data because it operates by calculating the difference between features and the similarity ratio. Methods based on proximity and similarity are comparable to features, allowing us to compare features based on their similarity ratios. Thus, it serves as a fundamental approach, and we strive to develop it further to become a gateway for interpretable artificial intelligence. This is referred to as evidence and feature analysis based on similarity.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>Farea Similarity for Feature Selection (FSFS) is introduced as a novel statistical mechanism for feature selection. FSFS is an automated statistical method designed to identify and classify the most significant features in large-scale and data-driven AI models. FSFS not only demonstrated reliable results by selecting the features with the greatest impact on model outcomes but also effectively reduced data dimensionality without compromising accuracy. By calculating the similarity ratio between features and the approximate average of each record, FSFS systematically excluded outliers, improving the fairness and trustworthiness of feature selection.</p>
<p>In comparison to existing approaches, FSFS achieved a robust balanced evaluation matrix by fairly identifying the most important features, ensuring that those with the highest similarity were selected while irrelevant features were discarded. The accuracy of the best classifiers without employing the FSFS approach reached 60.00%, 95.13%, 97.02%, 98.17%, 95.86%, and 94.62% on the Experimental dataset, Breast Cancer Wisconsin (Original), KDD CUP 1999, NSL-KDD, UNSW-NB15, and Edge-IIoT datasets, respectively. However, integrating the FSFS method with data normalization, encoding, data balancing, and feature importance selection improved accuracy to 100.00%, 97.81%, 98.63%, 98.94%, 94.27%, and 98.46%. Although results fluctuated across datasets, rigorous testing against existing feature selection techniques, including CS, CC, and GA, demonstrated that FSFS excels in selecting features strongly correlated with model outcomes, enhancing its reliability and effectiveness. Notably, the significant predictive power afforded by the interplay of feature interactions and dependencies underscores the importance of explicitly modelling these relationships&#x2014;a critical gap addressed by our FSFS approach. The FSFS approach, using IQR for outlier detection, is influenced by sorting-related computational complexity, making it suitable for large datasets but challenging for very large ones. A linear scan can enhance performance.</p>
<p>Extensive validation demonstrates the applicability of this method in data-driven domains&#x2014;such as cybersecurity and healthcare&#x2014;where informed and interpretable insights are paramount for reliable decision-making. By elucidating inter-feature relationships and providing a clear rationale for feature importance, FSFS establishes a robust foundation for transparent and accountable AI models, thereby facilitating their deployment in high-stakes environments.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>All authors&#x2014;Ali Hamid Farea, Iman Askerzade, Omar H. Alhazmi, and Sava&#x015F; Takan contributed to the conceptualization, literature review, methodology, data curation, implementation, validation, writing, and editing of the manuscript. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The data are openly available in a public repository and are cited in the references.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bouktif</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fiaz</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ouni</surname> <given-names>A</given-names></string-name>, <string-name><surname>Serhani</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Optimal deep learning LSTM model for electric load forecasting using feature selection and genetic algorithm: comparison with machine learning approaches</article-title>. <source>Energies</source>. <year>2018</year>;<volume>11</volume>(<issue>7</issue>):<fpage>1636</fpage>. doi:<pub-id pub-id-type="doi">10.3390/en11071636</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Philip Chen</surname> <given-names>CL</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>CY</given-names></string-name></person-group>. <article-title>Data-intensive applications, challenges, techniques and technologies: a survey on Big Data</article-title>. <source>Inf Sci</source>. <year>2014</year>;<volume>275</volume>(<issue>4</issue>):<fpage>314</fpage>&#x2013;<lpage>47</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ins.2014.01.015</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zebari</surname> <given-names>R</given-names></string-name>, <string-name><surname>Abdulazeez</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zeebaree</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zebari</surname> <given-names>D</given-names></string-name>, <string-name><surname>Saeed</surname> <given-names>J</given-names></string-name></person-group>. <article-title>A comprehensive review of dimensionality reduction techniques for feature selection and feature extraction</article-title>. <source>J Appl Sci Technol Trends</source>. <year>2020</year>;<volume>1</volume>(<issue>1</issue>):<fpage>56</fpage>&#x2013;<lpage>70</lpage>. doi:<pub-id pub-id-type="doi">10.38094/jastt1224</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Leskovec</surname> <given-names>J</given-names></string-name>, <string-name><surname>Rajaraman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ullman</surname> <given-names>JD</given-names></string-name></person-group>. <source>Mining of massive data sets</source>. <publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>; <year>2020</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cheng</surname> <given-names>X</given-names></string-name></person-group>. <article-title>A comprehensive study of feature selection techniques in machine learning models</article-title>. <source>Ins Comput Signal Syst</source>. <year>2024</year>;<volume>1</volume>(<issue>1</issue>):<fpage>65</fpage>&#x2013;<lpage>78</lpage>. doi:<pub-id pub-id-type="doi">10.70088/xpf2b276</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Theng</surname> <given-names>D</given-names></string-name>, <string-name><surname>Bhoyar</surname> <given-names>KK</given-names></string-name></person-group>. <article-title>Feature selection techniques for machine learning: a survey of more than two decades of research</article-title>. <source>Knowl Inf Syst</source>. <year>2024</year>;<volume>66</volume>(<issue>3</issue>):<fpage>1575</fpage>&#x2013;<lpage>637</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10115-023-02010-5</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Schwalbe</surname> <given-names>G</given-names></string-name>, <string-name><surname>Finzel</surname> <given-names>B</given-names></string-name></person-group>. <article-title>A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts</article-title>. <source>Data Min Knowl Discov</source>. <year>2024</year>;<volume>38</volume>(<issue>5</issue>):<fpage>3043</fpage>&#x2013;<lpage>101</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10618-022-00867-8</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Munirathinam</surname> <given-names>DR</given-names></string-name>, <string-name><surname>Ranganadhan</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A new improved filter-based feature selection model for high-dimensional data</article-title>. <source>J Supercomput</source>. <year>2020</year>;<volume>76</volume>(<issue>8</issue>):<fpage>5745</fpage>&#x2013;<lpage>62</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11227-019-02975-7</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Kira</surname> <given-names>K</given-names></string-name>, <string-name><surname>Rendell</surname> <given-names>LA</given-names></string-name></person-group>. <article-title>The feature selection problem: traditional methods and a new algorithm</article-title>. In: <conf-name>Proceedings of the Tenth National Conference on Artificial Intelligence</conf-name>; <year>1992 Jul 12&#x2013;16</year>; <publisher-loc>San Jose, CA, USA</publisher-loc>. p. <fpage>129</fpage>&#x2013;<lpage>34</lpage>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Franke</surname> <given-names>TM</given-names></string-name>, <string-name><surname>Ho</surname> <given-names>T</given-names></string-name>, <string-name><surname>Christie</surname> <given-names>CA</given-names></string-name></person-group>. <article-title>The chi-square test: often used and more often misinterpreted</article-title>. <source>Am J Eval</source>. <year>2012</year>;<volume>33</volume>(<issue>3</issue>):<fpage>448</fpage>&#x2013;<lpage>58</lpage>. doi:<pub-id pub-id-type="doi">10.1177/1098214011426594</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guyon</surname> <given-names>I</given-names></string-name>, <string-name><surname>Elisseeff</surname> <given-names>A</given-names></string-name></person-group>. <article-title>An introduction to variable and feature selection</article-title>. <source>J Mach Learn Res</source>. <year>2003</year>;<volume>3</volume>:<fpage>1157</fpage>&#x2013;<lpage>82</lpage>. doi:<pub-id pub-id-type="doi">10.5555/944919.944968</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>XW</given-names></string-name>, <string-name><surname>Jeong</surname> <given-names>JC</given-names></string-name></person-group>. <article-title>Enhanced recursive feature elimination</article-title>. In: <conf-name>Proceedings of the 6th International Conference on Machine Learning and Applications (ICMLA)</conf-name>; <year>2007 Dec 13&#x2013;15</year>; <publisher-loc>Cincinnati, OH, USA</publisher-loc>. p. <fpage>429</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICMLA.2007.35</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Regression shrinkage and selection via the lasso</article-title>. <source>J R Stat Soc Ser B Stat Methodol</source>. <year>1996</year>;<volume>58</volume>(<issue>1</issue>):<fpage>267</fpage>&#x2013;<lpage>88</lpage>. doi:<pub-id pub-id-type="doi">10.1111/j.2517-6161.1996.tb02080.x</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rashid</surname> <given-names>TA</given-names></string-name>, <string-name><surname>Majidpour</surname> <given-names>J</given-names></string-name>, <string-name><surname>Thinakaran</surname> <given-names>R</given-names></string-name>, <string-name><surname>Batumalay</surname> <given-names>M</given-names></string-name>, <string-name><surname>Dewi</surname> <given-names>DA</given-names></string-name>, <string-name><surname>Hassan</surname> <given-names>BA</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>NSGA-II-DL: metaheuristic optimal feature selection with deep learning framework for HER2 classification in breast cancer</article-title>. <source>IEEE Access</source>. <year>2024</year>;<volume>12</volume>:<fpage>38885</fpage>&#x2013;<lpage>98</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2024.3374890</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nayak</surname> <given-names>GS</given-names></string-name>, <string-name><surname>Muniyal</surname> <given-names>B</given-names></string-name>, <string-name><surname>Belavagi</surname> <given-names>MC</given-names></string-name></person-group>. <article-title>Enhancing phishing detection: a machine learning approach with feature selection and deep learning models</article-title>. <source>IEEE Access</source>. <year>2025</year>;<volume>13</volume>(<issue>12</issue>):<fpage>33308</fpage>&#x2013;<lpage>20</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2025.3543738</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Alelyani</surname> <given-names>S</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Feature selection for classification: a review</article-title>. <source>Data Classif Algorithms Appl</source>. <year>2014</year>;<volume>37</volume>:<fpage>1</fpage>&#x2013;<lpage>29</lpage>. doi:<pub-id pub-id-type="doi">10.1201/b17320</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sadeghian</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Akbari</surname> <given-names>E</given-names></string-name>, <string-name><surname>Nematzadeh</surname> <given-names>H</given-names></string-name>, <string-name><surname>Motameni</surname> <given-names>H</given-names></string-name></person-group>. <article-title>A review of feature selection methods based on meta-heuristic algorithms</article-title>. <source>J Exp Theor Artif Intell</source>. <year>2025</year>;<volume>37</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>51</lpage>. doi:<pub-id pub-id-type="doi">10.1080/0952813x.2023.2183267</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>RC</given-names></string-name>, <string-name><surname>Dewi</surname> <given-names>C</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>SW</given-names></string-name>, <string-name><surname>Caraka</surname> <given-names>RE</given-names></string-name></person-group>. <article-title>Selecting critical features for data classification based on machine learning methods</article-title>. <source>J Big Data</source>. <year>2020</year>;<volume>7</volume>(<issue>1</issue>):<fpage>52</fpage>. doi:<pub-id pub-id-type="doi">10.1186/s40537-020-00327-4</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rong</surname> <given-names>M</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>D</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Feature selection and its use in big data: challenges, methods, and trends</article-title>. <source>IEEE Access</source>. <year>2019</year>;<volume>7</volume>:<fpage>19709</fpage>&#x2013;<lpage>25</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2019.2894366</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Obaid</surname> <given-names>L</given-names></string-name>, <string-name><surname>Hamad</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ali Khalil</surname> <given-names>M</given-names></string-name>, <string-name><surname>Nassif</surname> <given-names>AB</given-names></string-name></person-group>. <article-title>Effect of feature optimization on performance of machine learning models for predicting traffic incident duration</article-title>. <source>Eng Appl Artif Intell</source>. <year>2024</year>;<volume>131</volume>(<issue>6</issue>):<fpage>107845</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2024.107845</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rickert</surname> <given-names>CA</given-names></string-name>, <string-name><surname>Henkel</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lieleg</surname> <given-names>O</given-names></string-name></person-group>. <article-title>An efficiency-driven, correlation-based feature elimination strategy for small datasets</article-title>. <source>APL Mach Learn</source>. <year>2023</year>;<volume>1</volume>(<issue>1</issue>):<fpage>016105</fpage>. doi:<pub-id pub-id-type="doi">10.1063/5.0118207</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jia</surname> <given-names>W</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lian</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hou</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Feature dimensionality reduction: a review</article-title>. <source>Complex Intell Syst</source>. <year>2022</year>;<volume>8</volume>(<issue>3</issue>):<fpage>2663</fpage>&#x2013;<lpage>93</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s40747-021-00637-x</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Malik</surname> <given-names>HK</given-names></string-name>, <string-name><surname>Al-Anber</surname> <given-names>NJ</given-names></string-name></person-group>. <article-title>Comparison of feature selection and feature extraction role in dimensionality reduction of big data</article-title>. <source>J Tech</source>. <year>2023</year>;<volume>5</volume>(<issue>1</issue>):<fpage>16</fpage>&#x2013;<lpage>24</lpage>. doi:<pub-id pub-id-type="doi">10.51173/jt.v5i1.1027</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abdel Majeed</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Awadalla</surname> <given-names>SS</given-names></string-name>, <string-name><surname>Patton</surname> <given-names>JL</given-names></string-name></person-group>. <article-title>Regression techniques employing feature selection to predict clinical outcomes in stroke</article-title>. <source>PLoS One</source>. <year>2018</year>;<volume>13</volume>(<issue>10</issue>):<fpage>e0205639</fpage>. doi:<pub-id pub-id-type="doi">10.1371/journal.pone.0205639</pub-id>; <pub-id pub-id-type="pmid">30339669</pub-id></mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jain</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zongker</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Feature selection: evaluation, application, and small sample performance</article-title>. <source>IEEE Trans Pattern Anal Mach Intell</source>. <year>1997</year>;<volume>19</volume>(<issue>2</issue>):<fpage>153</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/34.574797</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Hall</surname> <given-names>MA</given-names></string-name></person-group>. <article-title>Correlation-based feature selection for machine learning [dissertation]. Hamilton, New Zealand: University of Waikato</article-title>; <year>1999</year>. <comment>[cited 2025 May 7]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.lri.fr/&#x007E;pierres/donn%E9es/save/these/articles/lpr-queue/hall99correlationbased.pdf">https://www.lri.fr/&#x007E;pierres/donn%E9es/save/these/articles/lpr-queue/hall99correlationbased.pdf</ext-link>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Brown</surname> <given-names>G</given-names></string-name>, <string-name><surname>Pocock</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>MJ</given-names></string-name>, <string-name><surname>Luj&#x00E1;n</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Conditional likelihood maximisation: a unifying framework for information theoretic feature selection</article-title>. <source>J Mach Learn Res</source>. <year>2012</year>;<volume>13</volume>:<fpage>27</fpage>&#x2013;<lpage>66</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Peng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Long</surname> <given-names>F</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy</article-title>. <source>IEEE Trans Pattern Anal Mach Intell</source>. <year>2005</year>;<volume>27</volume>(<issue>8</issue>):<fpage>1226</fpage>&#x2013;<lpage>38</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TPAMI.2005.159</pub-id>; <pub-id pub-id-type="pmid">16119262</pub-id></mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Fleuret</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Fast binary feature selection with conditional mutual information</article-title>. <source>J Mach Learn Res</source>. <year>2004</year>;<volume>5</volume>:<fpage>1531</fpage>&#x2013;<lpage>55</lpage>. doi:<pub-id pub-id-type="doi">10.5555/1005332.1044711</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Setiono</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Chi2: feature selection and discretization of numeric attributes</article-title>. In: <conf-name>Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence</conf-name>; <year>1995 Nov 5&#x2013;8</year>; <publisher-loc>Herndon, VA, USA</publisher-loc>. p. <fpage>388</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TAI.1995.479783</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Quinlan</surname> <given-names>JR</given-names></string-name></person-group>. <article-title>Induction of decision trees</article-title>. <source>Mach Learn</source>. <year>1986</year>;<volume>1</volume>(<issue>1</issue>):<fpage>81</fpage>&#x2013;<lpage>106</lpage>. doi:<pub-id pub-id-type="doi">10.1007/BF00116251</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Priyatno</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Widiyaningtyas</surname> <given-names>T</given-names></string-name></person-group>. <article-title>A systematic literature review: recursive feature elimination algorithms</article-title>. <source>J Ilmu Pengetah Dan Teknol Komput (JITK)</source>. <year>2024</year>;<volume>9</volume>(<issue>2</issue>):<fpage>196</fpage>&#x2013;<lpage>207</lpage>. doi:<pub-id pub-id-type="doi">10.33480/jitk.v9i2.5015</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Siedlecki</surname> <given-names>W</given-names></string-name>, <string-name><surname>Sklansky</surname> <given-names>J</given-names></string-name></person-group>. <article-title>A note on genetic algorithms for large-scale feature selection</article-title>. <source>Pattern Recognit Lett</source>. <year>1989</year>;<volume>10</volume>(<issue>5</issue>):<fpage>335</fpage>&#x2013;<lpage>47</lpage>. doi:<pub-id pub-id-type="doi">10.1016/0167-8655(89)90037-8</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mitra</surname> <given-names>P</given-names></string-name>, <string-name><surname>Murthy</surname> <given-names>CA</given-names></string-name>, <string-name><surname>Pal</surname> <given-names>SK</given-names></string-name></person-group>. <article-title>Unsupervised feature selection using feature similarity</article-title>. <source>IEEE Trans Pattern Anal Mach Intell</source>. <year>2002</year>;<volume>24</volume>(<issue>3</issue>):<fpage>301</fpage>&#x2013;<lpage>12</lpage>. doi:<pub-id pub-id-type="doi">10.1109/34.990133</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zou</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Regularization and variable selection via the elastic net</article-title>. <source>J R Stat Soc Ser B Stat Methodol</source>. <year>2005</year>;<volume>67</volume>(<issue>2</issue>):<fpage>301</fpage>&#x2013;<lpage>20</lpage>. doi:<pub-id pub-id-type="doi">10.1111/j.1467-9868.2005.00503.x</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hoerl</surname> <given-names>AE</given-names></string-name>, <string-name><surname>Kennard</surname> <given-names>RW</given-names></string-name></person-group>. <article-title>Ridge regression: biased estimation for nonorthogonal problems</article-title>. <source>Technometrics</source>. <year>2000</year>;<volume>42</volume>(<issue>1</issue>):<fpage>80</fpage>. doi:<pub-id pub-id-type="doi">10.2307/1271436</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name>, <string-name><surname>Friedman</surname> <given-names>J</given-names></string-name></person-group>. <source>The elements of statistical learning</source>. <publisher-loc>Berlin/Heidelberg, Germany</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2009</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Friedman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name>, <string-name><surname>Tibshirani</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Regularization paths for generalized linear models via coordinate descent</article-title>. <source>J Stat Softw</source>. <year>2010</year>;<volume>33</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>22</lpage>. doi:<pub-id pub-id-type="doi">10.1145/1401890.1401893</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Breiman</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Random forests</article-title>. <source>Mach Learn</source>. <year>2001</year>;<volume>45</volume>(<issue>1</issue>):<fpage>5</fpage>&#x2013;<lpage>32</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Whitney</surname> <given-names>AW</given-names></string-name></person-group>. <article-title>A direct method of nonparametric measurement selection</article-title>. <source>IEEE Trans Comput</source>. <year>1971</year>;<volume>C-20</volume>(<issue>9</issue>):<fpage>1100</fpage>&#x2013;<lpage>3</lpage>. doi:<pub-id pub-id-type="doi">10.1109/T-C.1971.223410</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Miller</surname> <given-names>A</given-names></string-name></person-group>. <source>Subset selection in regression</source>. <publisher-loc>Boca Raton, FL, USA</publisher-loc>: <publisher-name>CRC Press</publisher-name>; <year>2002</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hocking</surname> <given-names>RR</given-names></string-name></person-group>. <article-title>A biometrics invited paper. the analysis and selection of variables in linear regression</article-title>. <source>Biometrics</source>. <year>1976</year>;<volume>32</volume>(<issue>1</issue>):<fpage>1</fpage>. doi:<pub-id pub-id-type="doi">10.2307/2529336</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Jolliffe</surname> <given-names>IT</given-names></string-name></person-group>. <source>Principal component analysis</source>. <edition>2nd ed</edition>. <publisher-loc>Berlin/Heidelberg, Germany</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2002</year>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Abdi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Williams</surname> <given-names>LJ</given-names></string-name></person-group>. <article-title>Principal component analysis</article-title>. <source>Wires Comput Stat</source>. <year>2010</year>;<volume>2</volume>(<issue>4</issue>):<fpage>433</fpage>&#x2013;<lpage>59</lpage>. doi:<pub-id pub-id-type="doi">10.1002/wics.101</pub-id>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pudil</surname> <given-names>P</given-names></string-name>, <string-name><surname>Novovi&#x010D;ov&#x00E1;</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kittler</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Floating search methods in feature selection</article-title>. <source>Pattern Recognit Lett</source>. <year>1994</year>;<volume>15</volume>(<issue>11</issue>):<fpage>1119</fpage>&#x2013;<lpage>25</lpage>. doi:<pub-id pub-id-type="doi">10.1016/0167-8655(94)90127-9</pub-id>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kohavi</surname> <given-names>R</given-names></string-name>, <string-name><surname>John</surname> <given-names>GH</given-names></string-name></person-group>. <article-title>Wrappers for feature subset selection</article-title>. <source>Artif Intell</source>. <year>1997</year>;<volume>97</volume>(<issue>1&#x2013;2</issue>):<fpage>273</fpage>&#x2013;<lpage>324</lpage>. doi:<pub-id pub-id-type="doi">10.1016/S0004-3702(97)00043-X</pub-id>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Duda</surname> <given-names>RO</given-names></string-name>, <string-name><surname>Hart</surname> <given-names>PE</given-names></string-name>, <string-name><surname>Stork</surname> <given-names>DG</given-names></string-name></person-group>. <source>Pattern classification</source>. <edition>2nd ed</edition>. <publisher-loc>Hoboken, NJ, USA</publisher-loc>: <publisher-name>Wiley</publisher-name>; <year>2001</year>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Gu</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Han</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Generalized Fisher score for feature selection</article-title>. In: <conf-name>Proceedings of the 27th Conference on Uncertainty in Artificial Intelligence (UAI 11)</conf-name>; <year>2011 Jul 14&#x2013;17</year>; <publisher-loc>Barcelona, Spain</publisher-loc>. p. <fpage>266</fpage>&#x2013;<lpage>73</lpage>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Wolberg</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Breast cancer Wisconsin (original) [Dataset]. Irvine, CA, USA: UCI Machine Learning Repository; 1990 [cited 2025 Jan 1]</article-title>. Available from: <pub-id pub-id-type="doi">10.24432/C5HP4Z</pub-id>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Cup</surname> <given-names>KDD</given-names></string-name></person-group>. <article-title>Data [Online]. Irvine, CA, USA: UCI Machine Learning Repository; 1999 [cited 2025 Jan 1]</article-title>. Available from: <ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/KDD&#x002B;Cup&#x002B;1999&#x002B;Data">https://archive.ics.uci.edu/ml/datasets/KDD&#x002B;Cup&#x002B;1999&#x002B;Data</ext-link>.</mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ferrag</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Friha</surname> <given-names>O</given-names></string-name>, <string-name><surname>Hamouda</surname> <given-names>D</given-names></string-name>, <string-name><surname>Maglaras</surname> <given-names>L</given-names></string-name>, <string-name><surname>Janicke</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Edge-IIoTset: a new comprehensive realistic cyber security dataset of IoT and IIoT applications for centralized and federated learning</article-title>. <source>IEEE Access</source>. <year>2022</year>;<volume>10</volume>:<fpage>40281</fpage>&#x2013;<lpage>306</lpage>. doi:<pub-id pub-id-type="doi">10.21227/mbc1-1h68</pub-id>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Singh</surname> <given-names>R</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>H</given-names></string-name>, <string-name><surname>Singla</surname> <given-names>RK</given-names></string-name></person-group>. <article-title>Analysis of feature selection techniques for network traffic dataset</article-title>. In: <conf-name>Proceedings of the 2013 International Conference on Machine Intelligence and Research Advancement</conf-name>; <year>2013 Dec 21&#x2013;23</year>; <publisher-loc>Katra, India</publisher-loc>. p. <fpage>42</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICMIRA.2013.15</pub-id>.</mixed-citation></ref>
<ref id="ref-53"><label>[53]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kumar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Nidhi</surname></string-name>, <string-name><surname>Sharma</surname> <given-names>B</given-names></string-name>, <string-name><surname>Handa</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Building predictive model by using data mining and feature selection techniques on academic dataset</article-title>. <source>Int J Mod Educ Comput Sci</source>. <year>2022</year>;<volume>14</volume>(<issue>4</issue>):<fpage>16</fpage>&#x2013;<lpage>29</lpage>. doi:<pub-id pub-id-type="doi">10.5815/ijmecs.2022.04.02</pub-id>.</mixed-citation></ref>
<ref id="ref-54"><label>[54]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lavanya</surname> <given-names>D</given-names></string-name>, <string-name><surname>Rani</surname> <given-names>DKU</given-names></string-name></person-group>. <article-title>Analysis of feature selection with classification: breast cancer datasets</article-title>. <source>Indian J Comput Sci Eng (IJCSE)</source>. <year>2011</year>;<volume>2</volume>(<issue>5</issue>):<fpage>756</fpage>&#x2013;<lpage>63</lpage>.</mixed-citation></ref>
<ref id="ref-55"><label>[55]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Farea</surname> <given-names>AH</given-names></string-name>, <string-name><surname>Alhazmi</surname> <given-names>OH</given-names></string-name>, <string-name><surname>Kucuk</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Advanced optimized anomaly detection system for IoT cyberattacks using artificial intelligence</article-title>. <source>Comput Mater Contin</source>. <year>2024</year>;<volume>78</volume>(<issue>2</issue>):<fpage>1525</fpage>&#x2013;<lpage>45</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2023.045794</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>






















