<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CSSE</journal-id>
<journal-id journal-id-type="nlm-ta">CSSE</journal-id>
<journal-id journal-id-type="publisher-id">CSSE</journal-id>
<journal-title-group>
<journal-title>Computer Systems Science &#x0026; Engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">0267-6192</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">34277</article-id>
<article-id pub-id-type="doi">10.32604/csse.2023.034277</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Lightweight Deep Autoencoder Scheme for Cyberattack Detection in the Internet of Things</article-title><alt-title alt-title-type="left-running-head">A Lightweight Deep Autoencoder Scheme for Cyberattack Detection in the Internet of Things</alt-title><alt-title alt-title-type="right-running-head">A Lightweight Deep Autoencoder Scheme for Cyberattack Detection in the Internet of Things</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Sabir</surname><given-names>Maha</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Ahmad</surname><given-names>Jawad</given-names></name>
<xref ref-type="aff" rid="aff-2">2</xref><email>J.Ahmad@napier.ac.uk</email>
</contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Alghazzawi</surname><given-names>Daniyal</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<aff id="aff-1"><label>1</label><institution>Information Systems Department, Faculty of Computing and Information Technology, King Abdulaziz University</institution>, <addr-line>Jeddah, 80200</addr-line>, <country>Saudi Arabia</country></aff>
<aff id="aff-2"><label>2</label><institution>School of Computing, Edinburgh Napier University</institution>, <addr-line>Edinburgh EH10 5DY</addr-line>, <country>UK</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Jawad Ahmad. Email: <email>J.Ahmad@napier.ac.uk</email></corresp></author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>17</day><month>1</month><year>2023</year></pub-date>
<volume>46</volume>
<issue>1</issue>
<fpage>57</fpage>
<lpage>72</lpage>
<history>
<date date-type="received"><day>12</day><month>7</month><year>2022</year></date>
<date date-type="accepted"><day>22</day><month>9</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Sabir et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Sabir et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CSSE_34277.pdf"></self-uri>
<abstract><p>The Internet of things (IoT) is an emerging paradigm that integrates devices and services to collect real-time data from surroundings and process the information at a very high speed to make a decision. Despite several advantages, the resource-constrained and heterogeneous nature of IoT networks makes them a favorite target for cybercriminals. A single successful attempt of network intrusion can compromise the complete IoT network which can lead to unauthorized access to the valuable information of consumers and industries. To overcome the security challenges of IoT networks, this article proposes a lightweight deep autoencoder (DAE) based cyberattack detection framework. The proposed approach learns the normal and anomalous data patterns to identify the various types of network intrusions. The most significant feature of the proposed technique is its lower complexity which is attained by reducing the number of operations. To optimally train the proposed DAE, a range of hyperparameters was determined through extensive experiments that ensure higher attack detection accuracy. The efficacy of the suggested framework is evaluated via two standard and open-source datasets. The proposed DAE achieved the accuracies of 98.86&#x0025;, and 98.26&#x0025; for NSL-KDD, 99.32&#x0025;, and 98.79&#x0025; for the UNSW-NB15 dataset in binary class and multi-class scenarios. The performance of the suggested attack detection framework is also compared with several state-of-the-art intrusion detection schemes. Experimental outcomes proved the promising performance of the proposed scheme for cyberattack detection in IoT networks.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Autoencoder</kwd>
<kwd>cybersecurity</kwd>
<kwd>deep learning</kwd>
<kwd>intrusion detection</kwd>
<kwd>IoT</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label><title>Introduction</title>
<p>The IoT is most commonly referred to as the interconnection of smart sensors and devices with the internet. The IoT could be established with multiple types of devices, including environmental sensors to consumer electronics or wearable devices. The IoT has been incorporated into a vast variety of applications such as healthcare, industry, transportation, agriculture, smart buildings, etc. [<xref ref-type="bibr" rid="ref-1">1</xref>]. The interconnection of smart devices through IoT networks allows the exchange of valuable information and data among themselves. However, this exchange of information could be intercepted by the attackers and intruders, which consequently compromises the security and privacy of all devices and the complete IoT architecture [<xref ref-type="bibr" rid="ref-2">2</xref>]. The intrusion detection system (IDS) can be useful in ensuring the security of IoT networks. The IDS can detect several malicious activities, such as security violations and unauthorized access to valuable information in the IoT network [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>In this article, a lightweight deep autoencoder (DAE) scheme is presented for cyberattack detection in IoT networks. The suggested technique learns the normal and malicious patterns of the data to successfully identify the anomalous behavior of the network. One of the primary objectives for developing the suggested DAE framework is to minimize the model&#x2019;s overall complexity by reducing the number of operations. It reduces the computational cost along with the energy consumption and makes this IDS more feasible to be implemented in resource-constrained IoT networks. The proposed DAE-based IDS has been evaluated in both binary-class and multi-class scenarios by using two standard and open-source datasets such as NSL-KDD, and UNSW-NB15.</p>
<p>The rest of the article is organized as follows. Section 2 comprises the latest research related to the IDSs for IoT. A detailed research methodology of the proposed framework is presented and discussed in Section 3. Section 4 presents the implementation procedure and a detailed discussion of the obtained results. Finally, Section 5 concludes the outcomes of this research.</p>
</sec>
<sec id="s2">
<label>2</label><title>Literature Review</title>
<p>This section presents an overview of some of the latest research on DL-based intrusion detection algorithms for IoT networks. The discussed studies have been categorized according to the proposed DL schemes, utilized datasets, and performance metrics.</p>
<p>Parra et al. [<xref ref-type="bibr" rid="ref-5">5</xref>] introduced a cloud-based DL approach for cyberattack detection in the IoT. The suggested scheme contains an integration of two deep learning schemes, including a CNN and LSTM network. The performance of the suggested technique was analyzed through the N-BaIoT dataset. The experimental outcomes demonstrate that the proposed scheme efficiently detected phishing and botnet attacks with higher accuracies. Shone et al. [<xref ref-type="bibr" rid="ref-6">6</xref>] introduced a nonsymmetric deep autoencoder (NDAE) for unsupervised feature learning. Researchers presented a novel DL-based intrusion detection model using stacked NDAE. The proposed model was implemented in a graphics processing unit (GPU)-enabled TensorFlow and performance was evaluated using NSL-KDD and KDDCup99 datasets. Awotunde et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] introduced a deep feed-forward neural network (DFNN) with rule-based features selection method for cyberattack detection in the IIoT. The proposed architecture verifies the collected information of packets. The effectiveness of the suggested technique was analyzed using the two standard IoT security datasets. In another work, Attota et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] presented a multiple-view federated learning-based IDS (MV-FLID). Researchers utilized the grey wolf optimization technique (GWO) for feature selection. The proposed model is trained and evaluated by using a lightweight MQTT protocol dataset.</p>
<p>Qaddoura et al. [<xref ref-type="bibr" rid="ref-9">9</xref>] designed an FNN-LSTM-based hybrid IDS for IoT networks. The designed hybrid scheme utilizes the smote oversampling method to equivalent the samples of each class and evaluates the suggested design through the IoTID20 dataset. Hassan et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed a reliable cyberattack detection scheme to improve the trustworthiness of an IIoT network. Researchers utilized an ensemble learning technique based on the combination of a random subspace (RS) with a random tree (RT) for cyberattack detection. The suggested scheme was tested over 15 datasets on the SCADA networks. Experimental findings indicated the superior performance of the proposed technique over conventional attack detection models. Li et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] designed a bidirectional long and short-term memory (LSTM) network for cyberattack detection in the IIoT. In the proposed scheme, sequence and stage feature layers are introduced that facilitate the model to learn corresponding attack intervals from historical data. As compared to some related works the proposed scheme demonstrated the lower false positive and false negative rates. Despite considerable work spent on annotating IoT traffic data, the number of labeled records remains very small, increasing the challenge of detecting assaults and intrusions. Growing IoT networks are becoming more vulnerable to various types of cyberattacks. Luo et al. [<xref ref-type="bibr" rid="ref-12">12</xref>] introduced a web attack detection system (WADS). Researchers proposed three DL-based models to detect the cyberattacks separately, and an ensemble classifier has been used for the final decision obtained from the combined results of all three DL models. The real-world and open-source security datasets have been utilized for performance evaluation. The experimental results proved the efficacy of the proposed framework to detect several web attacks with higher accuracy and lower false alarm rates.</p>
<p>Based on the aforementioned discussion, several researchers have done a great job in the development of ML/Dl-based intrusion detection schemes for IoT. However, there is still room for improvement in the development of a lightweight and advanced attack detection scheme that can improve attack detection accuracy, reduce the computational cost, and be highly compatible with the resource-constrained nature of the IoT networks. The proposed DAE presents significant advantages over existing schemes, such as compact design, and optimal hyperparameters selection. Furthermore, it reduces the computational complexity and energy requirements that tend to be helpful in the integration of IDS with resource-constrained networks.</p>
</sec>
<sec id="s3">
<label>3</label><title>Research Methodology</title>
<p>This section presents a detailed description of the proposed scheme including utilized datasets, the deep autoencoder (DAE) design, and the performance assessment parameters.</p>
<sec id="s3_1">
<label>3.1</label><title>Datasets Description</title>
<p>The proposed scheme is evaluated through two publicly available IoT security datasets including NSL-KDD, and UNSW-NB15. In the following, a short description of each dataset is presented.</p>
<sec id="s3_1_1">
<label>3.1.1</label><title>NSL-KDD</title>
<p>This is one of the most commonly used datasets for the benchmarking of modern-day internet traffic. This dataset contains 42 features per record, of which 41 features are considered input features, and the last feature is the label to determine whether it is a normal or malicious value. Furthermore, it contains 4 different classes of cyberattacks including Probe, Remote to Local (R2L), Denial of Service (DoS), and User to Root (U2R) [<xref ref-type="bibr" rid="ref-13">13</xref>].</p>
</sec>
<sec id="s3_1_2">
<label>3.1.2</label><title>UNSW-NB15</title>
<p>This is considered a new generation dataset and it was firstly published in 2015. This dataset has a total of 49 features and a variety of normal and malicious values with the class label of a total of 257,673 samples. It contains 164673 malicious and 93000 normal samples [<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-15">15</xref>]. Features of this dataset are categorized into 6 groups which are named &#x201C;basic&#x201D;, &#x201C;time&#x201D;, &#x201C;flow&#x201D;, &#x201C;content&#x201D;, &#x201C;additional generated&#x201D;, and &#x201C;labeled&#x201D; features. Further, UNSW-NB15 has 9 different classes of modern attacks which include analysis, backdoor, Dos, exploits, fuzzers, generic, reconnaissance, shellcode, and Worms.</p>
</sec>
</sec>
<sec id="s3_2">
<label>3.2</label><title>The Proposed Framework</title>
<p>The workflow of the suggested framework is depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The main operation consists of three stages that include, data pre-processing, the mathematical model of the DAE, and performance assessment parameters. In the following sub-sections, details of each stage have been described briefly.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption><title>Block diagram of the proposed architecture</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-1.tif"/>
</fig>
<sec id="s3_2_1">
<label>3.2.1</label><title>Data Pre-processing</title>
<p>It is one of the most important stages of each ML/DL model. It transforms the data into the most compatible form for input of any neural network. In our study, four different datasets have been utilized, hence, different procedures have been adopted for the pre-processing of each dataset.<list list-type="simple"><list-item>
<p><italic>a) Pre-processing of NSL-KDD:</italic> In the NSL-KDD dataset, data contains some text values. Therefore, we need to transform the nominal features into numeric values. At this stage, categorical features are transformed into numerical features by using one-hot encoding. Since the dataset is very large and there is a large variation between values, data normalization is also required for better performance. We&#x2019;ll use mean normalization here. It makes the values of each feature in the data have zero-mean and unit variance. We utilized &#x201C;min-max scaling&#x201D; for the normalization process. The detailed class distribution of the NSL-KDD dataset is presented in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
</list-item><list-item>
<p><italic>b) Pre-processing of UNSW-NB15:</italic> This dataset contains 10 classes and 49 features. In preprocessing phase, we used two approaches data conversion and data normalization. In the first stage, the data conversion technique transforms all the categorical data into numerical data. In the second stage, the large variance of all features is reduced by using the &#x201C;min-max scaling&#x201D; technique. This technique removes all the invalid samples and scales the large values in the range of zero to one. A detailed distribution of the UNSW-NB15 dataset is presented in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
</list-item></list></p>
<table-wrap id="table-1"><label>Table 1</label>
<caption><title>Class distribution of the NSL-KDD dataset</title></caption>
<table><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Classes</th>
<th align="left">Total samples</th>
<th align="left">Training</th>
<th align="left">Testing</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Normal</td>
<td align="left">77054</td>
<td align="left">53938</td>
<td align="left">23116</td>
</tr>
<tr>
<td align="left">DoS</td>
<td align="left">53387</td>
<td align="left">37371</td>
<td align="left">16016</td>
</tr>
<tr>
<td align="left">Probe</td>
<td align="left">14077</td>
<td align="left">9854</td>
<td align="left">4223</td>
</tr>
<tr>
<td align="left">R2L</td>
<td align="left">3880</td>
<td align="left">2716</td>
<td align="left">1164</td>
</tr>
<tr>
<td align="left">U2R</td>
<td align="left">119</td>
<td align="left">83</td>
<td align="left">36</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-2"><label>Table 2</label>
<caption><title>Class distribution of the UNSW-NB15 dataset</title></caption>
<table><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Classes</th>
<th align="left">Total samples</th>
<th align="left">Training</th>
<th align="left">Testing</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Normal</td>
<td align="left">93000</td>
<td align="left">65100</td>
<td align="left">27900</td>
</tr>
<tr>
<td align="left">Analysis</td>
<td align="left">2677</td>
<td align="left">1874</td>
<td align="left">803</td>
</tr>
<tr>
<td align="left">Backdoor</td>
<td align="left">2329</td>
<td align="left">1630</td>
<td align="left">699</td>
</tr>
<tr>
<td align="left">DoS</td>
<td align="left">16353</td>
<td align="left">11447</td>
<td align="left">4906</td>
</tr>
<tr>
<td align="left">Exploits</td>
<td align="left">44525</td>
<td align="left">31168</td>
<td align="left">13358</td>
</tr>
<tr>
<td align="left">Fuzzers</td>
<td align="left">24246</td>
<td align="left">16972</td>
<td align="left">7274</td>
</tr>
<tr>
<td align="left">Generic</td>
<td align="left">58871</td>
<td align="left">41210</td>
<td align="left">17661</td>
</tr>
<tr>
<td align="left">Reconnaissance</td>
<td align="left">13987</td>
<td align="left">9791</td>
<td align="left">4196</td>
</tr>
<tr>
<td align="left">Shellcode</td>
<td align="left">1511</td>
<td align="left">1058</td>
<td align="left">453</td>
</tr>
<tr>
<td align="left">Worms</td>
<td align="left">174</td>
<td align="left">122</td>
<td align="left">52</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label><title>Mathematical Model of the Proposed Attack Detection Scheme</title>
<p>The proposed attack detection scheme contains three main stages including the feature extraction stage, feature selection stage, and classification stage. The first two stages perform the dimensionality reduction operation that increases the computational efficiency of the model to make it highly compatible with resource-constrained IoT devices. A DAE is used for the extraction of optimal features with mutual information (MI) and a support vector machine (SVM) is incorporated with a gradient descent (GD) algorithm to perform the detection process.</p>
<p>The basic design of the proposed DAE is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The DAE is an unsupervised neural network that uses a backpropagation algorithm to learn from unlabeled information. The input and output values of DAE are the same that try to lean the hypothesis function.<disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2248;</mml:mo><mml:mi>r</mml:mi><mml:mtext>&#x00A0;</mml:mtext></mml:math>
</disp-formula></p>
<fig id="fig-2">
<label>Figure 2</label>
<caption><title>The proposed Deep Autoencoder (DAE) for cyberattack detection</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-2.tif"/>
</fig>
<p>The DAE contains an encoder and decoder. The main function of the encoder is to compress the incoming information in low-dimensional representation. On the other hand, the decoder performs the reconstruction of data from the low-dimensional representation. In the encoding process, the input vectors are transformed into an abstract vector and the input data space&#x2019;s dimensionality is also reduced.</p>
<p>To accurately perform the encoding and decoding operations, multiple constraints are involved in the neural network. The selection of the hidden neurons less than the input features and some useful representation can be discovered during reconstruction operation. As a result, if there are any correlations among the features, the DAE will be able to discover them.</p>
<p>The constraint shown in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> is applied to hidden neurons in the encoder that compresses the input data representation and performs feature extraction. Here <inline-formula id="ieqn-1">
<mml:math id="mml-ieqn-1"><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>z</mml:mi></mml:msub></mml:math>
</inline-formula> in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref> represents the average activation and <inline-formula id="ieqn-2">
<mml:math id="mml-ieqn-2"><mml:msub><mml:mi>a</mml:mi><mml:mi>z</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> represent the activation of hidden neuron <italic>z</italic>. The neuron <italic>z</italic> is considered to be in an active or inactive state if the activation of a neuron is 1 and 0 respectively. The variable <inline-formula id="ieqn-3">
<mml:math id="mml-ieqn-3"><mml:mi>&#x03B4;</mml:mi></mml:math>
</inline-formula> indicates the sparsity parameter and is usually set near zero to ensure the inactive state of neurons most of the time.<disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>z</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow></mml:mrow></mml:math>
</disp-formula><disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>z</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:msub><mml:mi>a</mml:mi><mml:mi>z</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>The mean squared error (MSE) shown in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref> specifies the cost function of DAE.<disp-formula id="eqn-4"><label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>m</mml:mi></mml:mfrac></mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mstyle displaystyle="true" scriptlevel="0"><mml:msup><mml:mrow><mml:mo symmetric="true">&#x2016;</mml:mo><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mi>s</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mo symmetric="true">&#x2016;</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mstyle></mml:mstyle></mml:math>
</disp-formula></p>
<p>L2 regulation shown in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref> is added to the cost function that prevents overfitting by decreasing the weights <inline-formula id="ieqn-4">
<mml:math id="mml-ieqn-4"><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>z</mml:mi><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math>
</inline-formula> among neuron <italic>y</italic> in layer <inline-formula id="ieqn-5">
<mml:math id="mml-ieqn-5"><mml:mi>l</mml:mi><mml:mtext>&#x00A0;</mml:mtext></mml:math>
</inline-formula> and neuron <italic>z</italic> in layer <inline-formula id="ieqn-6">
<mml:math id="mml-ieqn-6"><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math>
</inline-formula>:<disp-formula id="eqn-5"><label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi mathvariant="normal">&#x03A9;</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mn>2</mml:mn><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>k</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>W</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mi>z</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>l</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mstyle></mml:math>
</disp-formula></p>
<p>Here <italic>L</italic> represents the total layers in a neural network. The other parameters <italic>n</italic> and <italic>k</italic> indicate the number of neurons in layers <italic>l</italic> and <inline-formula id="ieqn-7">
<mml:math id="mml-ieqn-7"><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math>
</inline-formula> respectively.</p>
<p>Additionally, a sparsity regularization is added to a cost function as shown in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>. It penalizes <inline-formula id="ieqn-8">
<mml:math id="mml-ieqn-8"><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:math>
</inline-formula> for deviating from &#948; using the Kullback-Leibler (KL) divergence [<xref ref-type="bibr" rid="ref-16">16</xref>]. KL is an indicator of the difference between two different distributions. This function will be zero if <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> is satisfied or can have a higher value if <inline-formula id="ieqn-9">
<mml:math id="mml-ieqn-9"><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:math>
</inline-formula> diverges from &#948;. The minimization of this term enables the <inline-formula id="ieqn-10">
<mml:math id="mml-ieqn-10"><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:math>
</inline-formula> to be close to &#948;. Here <inline-formula id="ieqn-11">
<mml:math id="mml-ieqn-11"><mml:msub><mml:mi>S</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>
</inline-formula> represents the hidden neurons within the encoder.<disp-formula id="ueqn-1">
<mml:math id="mml-ueqn-1" display="block"><mml:msub><mml:mi mathvariant="normal">&#x03A9;</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>K</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mrow><mml:mover><mml:mi>&#x03B4;</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mrow><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</disp-formula><disp-formula id="eqn-6"><label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi mathvariant="normal">&#x03A9;</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mrow><mml:mi>z</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mrow><mml:mover><mml:mrow><mml:mi mathvariant="normal">&#x03B4;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mrow><mml:mi>z</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mstyle></mml:math>
</disp-formula></p>
<p>The cost function consists of the sum of <inline-formula id="ieqn-12">
<mml:math id="mml-ieqn-12"><mml:msub><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>
</inline-formula> regularization, MSE, and a sparsity regularization term. Here <inline-formula id="ieqn-13">
<mml:math id="mml-ieqn-13"><mml:mrow><mml:mrow><mml:mo mathvariant="italic">&#x3C6;</mml:mo></mml:mrow></mml:mrow></mml:math>
</inline-formula> and <inline-formula id="ieqn-14">
<mml:math id="mml-ieqn-14"><mml:mi>&#x03D1;</mml:mi></mml:math>
</inline-formula> regulate the strength of <inline-formula id="ieqn-15">
<mml:math id="mml-ieqn-15"><mml:msub><mml:mi>L</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math>
</inline-formula> regularization and sparsity respectively.<disp-formula id="eqn-7"><label>(7)</label>
<mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>W</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>b</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>+</mml:mo><mml:mrow><mml:mrow><mml:mo mathvariant="italic">&#x3C6;</mml:mo></mml:mrow></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03A9;</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>L</mml:mi><mml:mn>2</mml:mn><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03D1;</mml:mi><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi mathvariant="normal">&#x03A9;</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math>
</disp-formula></p>
<p>The proposed attack detection scheme performs the feature selection operation through DAE. It facilitates obtaining the most optimal features and removing the irrelevant features to reduce the computational complexity and increase the attack detection performance. In the proposed scheme mutual information (MI) is incorporated with optimal feature selection.</p>
<p>MI is a measure of the mutual dependency among two random variables. It describes the level of information of one random variable about another. In other words, it denotes the reduction in uncertainty of one random variable as a result of information about another. As stated in <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref>, MI is related to the concept of entropy <inline-formula id="ieqn-16">
<mml:math id="mml-ieqn-16"><mml:mi>&#x03C8;</mml:mi></mml:math>
</inline-formula>, which is the anticipated information content of a random variable R:<disp-formula id="eqn-8"><label>(8)</label>
<mml:math id="mml-eqn-8" display="block"><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;</mml:mtext></mml:math>
</disp-formula></p>
<p>Here, <inline-formula id="ieqn-17">
<mml:math id="mml-ieqn-17"><mml:mi>&#x03BC;</mml:mi></mml:math>
</inline-formula> represents the probability of occurrence of an event with index <italic>y</italic>. The entropy of two random variables <italic>R</italic> and <italic>S</italic> with values <inline-formula id="ieqn-18">
<mml:math id="mml-ieqn-18"><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:math>
</inline-formula> and <inline-formula id="ieqn-19">
<mml:math id="mml-ieqn-19"><mml:msub><mml:mi>r</mml:mi><mml:mi>z</mml:mi></mml:msub></mml:math>
</inline-formula> can be defined as shown in <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref><disp-formula id="eqn-9"><label>(9)</label>
<mml:math id="mml-eqn-9" display="block"><mml:mi>&#x03C8;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>R</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mo fence="false" stretchy="false">|</mml:mo></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:mi>S</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>Here <inline-formula id="ieqn-20">
<mml:math id="mml-ieqn-20"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> represents the joint probability distribution. Then, <inline-formula id="ieqn-21">
<mml:math id="mml-ieqn-21"><mml:mi>M</mml:mi><mml:mi>I</mml:mi></mml:math>
</inline-formula> of two discrete variables <italic>R</italic> and <italic>S</italic> can be described as<disp-formula id="ueqn-2">
<mml:math id="mml-ueqn-2" display="block"><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mo>;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>S</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mrow><mml:mo fence="false" stretchy="false">|</mml:mo></mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula><disp-formula id="ueqn-3">
<mml:math id="mml-ueqn-3" display="block"><mml:mspace width="1em" /><mml:mspace width="2em" /><mml:mo>=</mml:mo><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>S</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula><disp-formula id="eqn-10"><label>(10)</label>
<mml:math id="mml-eqn-10" display="block"><mml:mspace width="1em" /><mml:mspace width="2em" /><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>z</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>z</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mtext>&#x00A0;</mml:mtext></mml:mstyle></mml:math>
</disp-formula></p>
<p>Here, <inline-formula id="ieqn-22">
<mml:math id="mml-ieqn-22"><mml:mi>&#x03C8;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>S</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> indicates the joint entropy. The bigger MI value reduces the uncertainty in a variable is lower value can increase uncertainty.</p>
<p>The proposed scheme classifies the data into two classes attack and normal using SVM. To perform this operation, a linear SVM with GD is used as the optimizer. Linear SVM is a supervised ML technique that is used to solve two-class binary classification problems. Several hyperplanes can split the classes, thus a technique for determining the optimal one is necessary. SVM seeks the best decision boundary by maximizing the margin between the boundary and the nearest data occurrences. Support vectors are the nearest data occurrences that establish the greatest margin.</p>
<p>Providing training data of <italic>n</italic> instances <inline-formula id="ieqn-23">
<mml:math id="mml-ieqn-23"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula>, where <inline-formula id="ieqn-24">
<mml:math id="mml-ieqn-24"><mml:msub><mml:mi>s</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math>
</inline-formula> is the real class of input data <inline-formula id="ieqn-25">
<mml:math id="mml-ieqn-25"><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>n</mml:mi><mml:mtext>&#x00A0;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> and either <inline-formula id="ieqn-26">
<mml:math id="mml-ieqn-26"><mml:mn>1</mml:mn></mml:math>
</inline-formula> or <inline-formula id="ieqn-27">
<mml:math id="mml-ieqn-27"><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math>
</inline-formula>, the decision boundary is defined as<disp-formula id="eqn-11"><label>(11)</label>
<mml:math id="mml-eqn-11" display="block"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math>
</disp-formula></p>
<p>Here, <italic>w</italic> and <italic>b</italic> represent the weight vector and bias respectively.</p>
<p>To prevent data instances from lying on the wrong side, the following constraints are enforced for each <inline-formula id="ieqn-28">
<mml:math id="mml-ieqn-28"><mml:mi>y</mml:mi></mml:math>
</inline-formula>:<disp-formula id="eqn-12"><label>(12)</label>
<mml:math id="mml-eqn-12" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula><disp-formula id="eqn-13"><label>(13)</label>
<mml:math id="mml-eqn-13" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula></p>
<p><xref ref-type="disp-formula" rid="eqn-12">Eqs. (12)</xref> and <xref ref-type="disp-formula" rid="eqn-13">(13)</xref> can be combined as<disp-formula id="eqn-14"><label>(14)</label>
<mml:math id="mml-eqn-14" display="block"><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn><mml:mspace width="1em" /><mml:mtext>&#x00A0;</mml:mtext><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>n</mml:mi></mml:math>
</disp-formula></p>
<p>SVM may address non-linearly issues by employing the kernel technique, which translates the original information into higher dimensional space. One possible issue is that SVM may need a lengthy training period. Despite producing high-performance outcomes, SVM training periods are frequently excessively long in contrast to alternative classifiers. However, a linear variant of SVM was used in this work, which shortened training time while getting equivalent results.</p>
<p>SVM optimizes using hinge loss as its loss function. The hinge loss can be defined with an output <inline-formula id="ieqn-29">
<mml:math id="mml-ieqn-29"><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x00B1;</mml:mo><mml:mn>1</mml:mn></mml:math>
</inline-formula> as<disp-formula id="eqn-15"><label>(15)</label>
<mml:math id="mml-eqn-15" display="block"><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>y</mml:mi><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula><disp-formula id="eqn-16"><label>(16)</label>
<mml:math id="mml-eqn-16" display="block"><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula><disp-formula id="eqn-17"><label>(17)</label>
<mml:math id="mml-eqn-17" display="block"><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:mtd><mml:mtd columnalign="left"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:mtd><mml:mtd columnalign="left"><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>The objective function <inline-formula id="ieqn-30">
<mml:math id="mml-ieqn-30"><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> presented in <xref ref-type="disp-formula" rid="eqn-18">Eq. (18)</xref> is made up of two terms: the regularization term and the loss term. Due to the convexity of the hinge loss function, ML convex optimizers can be employed. The goal function should be minimized for optimization:<disp-formula id="eqn-18"><label>(18)</label>
<mml:math id="mml-eqn-18" display="block"><mml:mi>M</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>z</mml:mi><mml:mi>e</mml:mi><mml:mtext>&#x00A0;</mml:mtext><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>w</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mo mathvariant="italic">&#x3C6;</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mrow><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:msup><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo fence="false" stretchy="false">&#x2016;</mml:mo><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;</mml:mtext></mml:mstyle></mml:mstyle></mml:math>
</disp-formula></p>
<p>GD uses repeated stages to update parameters in the gradient&#x2019;s direction. GD requires derivatives concerning <italic>b</italic> and <italic>w</italic>. However, because the hinge loss is not differentiable, the following sub-gradient should be utilized for <italic>w</italic> and <inline-formula id="ieqn-31">
<mml:math id="mml-ieqn-31"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula>:<disp-formula id="eqn-19"><label>(19)</label>
<mml:math id="mml-eqn-19" display="block"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>W</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:mtd><mml:mtd columnalign="left"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>y</mml:mi><mml:mtext>&#x00A0;</mml:mtext></mml:mrow></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd columnalign="left"><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mi>y</mml:mi></mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mi>y</mml:mi></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mtd><mml:mtd columnalign="left"><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mtext>&#x00A0;</mml:mtext></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
</sec>
<sec id="s3_2_3">
<label>3.2.3</label><title>Performance Evaluation Metrics</title>
<p>To analyze the proposed DAE, several performance assessment metrics are defined. All of these parameters are described in the following.<list list-type="simple"><list-item>
<p><italic>a) Accuracy:</italic> It is the most commonly used performance indicator that presents a ratio of accurately predicted observations to the total number of observations.<disp-formula id="ueqn-4">
<mml:math id="mml-ueqn-4" display="block"><mml:mi>A</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>u</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p></list-item><list-item>
<p><italic>b) Precision:</italic> It is defined as the proportion of accurately anticipated positive observations to total expected positive observations.<disp-formula id="ueqn-5">
<mml:math id="mml-ueqn-5" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p></list-item><list-item>
<p><italic>c) Recall:</italic> It is the proportion of accurately predicted positive observations to all positive observations in the class.<disp-formula id="ueqn-6">
<mml:math id="mml-ueqn-6" display="block"><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p></list-item><list-item>
<p><italic>d) F1 Score:</italic> Averaging precision and recall yields this score. As a result, this score takes into consideration both false positives and false negatives. While F1 is not as intuitive as accuracy, it is sometimes more useful, especially when the class distribution is uneven.<disp-formula id="ueqn-7">
<mml:math id="mml-ueqn-7" display="block"><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p></list-item></list></p>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label><title>Implementation and Performance Analysis</title>
<p>This study explores the proposed framework&#x2019;s implementation details and evaluates the effectiveness of the proposed DAE through extensive experimentation.</p>
<sec id="s4_1">
<label>4.1</label><title>Simulation Platform</title>
<p>The simulations and performance analysis of the suggested DAE are performed on a Dell Precision 7550 Data Science computer. This workstation contains an Intel Xeon W-10855M processor and 32 GB DDR4 2933&#x2005;MHz ECC Memory. An NVIDIA Quadro RTX 5000 w/16 GB graphic card ensures the smooth operations of DAE. The main algorithm of the proposed DAE is written in &#x201C;Anaconda Navigator&#x201D; using Python script.</p>
</sec>
<sec id="s4_2">
<label>4.2</label><title>Selection of Hyperparameters</title>
<p>In all experiments, the main structure of the DAE is fixed. By conducting extensive experiments, we selected the optimal hyperparameters of the proposed model to ensure the best performance for all the datasets. The utilized hyperparameters are learning rate, batch size, no of epochs, and latent space. All the selected hyperparameters are depicted in <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-3"><label>Table 3</label>
<caption><title>Utilized hyperparameters for training and performance evaluation</title></caption>
<table><colgroup><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left" rowspan="2">Hyperparameters</th>
<th align="center" colspan="2">Datasets</th>
</tr>
<tr>
<th align="left">NSL-KDD</th>
<th align="left">UNSW-NB15</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Learning rate</td>
<td align="left">0.001, 0.01, 0.10</td>
<td align="left">0.005, 0.075, 0.150</td>
</tr>
<tr>
<td align="left">Batch size</td>
<td align="left">32, 64, 128</td>
<td align="left">64, 128, 256</td>
</tr>
<tr>
<td align="left">No of epochs</td>
<td align="left">100</td>
<td align="left">100</td>
</tr>
<tr>
<td align="left">Latent space</td>
<td align="left">12</td>
<td align="left">14</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s4_2_1">
<label>4.2.1</label><title>Learning Rate</title>
<p>This parameter defines how much the model should change in response to the expected error when the model weights are updated. The selection of learning rate is tricky since a number that is too little may result in a protracted training process that becomes stuck, while a value that is too large may result in learning an inefficient set of weights too rapidly or in an unstable training process.</p>
</sec>
<sec id="s4_2_2">
<label>4.2.2</label><title>Batch Size</title>
<p>The parameter demonstrates how many samples must be processed before the internal model parameters are updated.</p>
</sec>
<sec id="s4_2_3">
<label>4.2.3</label><title>No of Epochs</title>
<p>This parameter indicates how many times the learning algorithm will traverse the training dataset. Each epoch is an opportunity for each sample in the training dataset to change the internal model parameters.</p>
</sec>
<sec id="s4_2_4">
<label>4.2.4</label><title>Latent Space</title>
<p>Latent space is a compressed data format in which related data points are relatively close together. Latent space may be used to discover features of the data and to develop simpler representations of data for analysis.</p>
</sec>
</sec>
<sec id="s4_3">
<label>4.3</label><title>Performance Analysis</title>
<p>All datasets have been split into the training and testing datasets with 70/30 percent respectively. The detailed distribution of all the datasets is already presented in <xref ref-type="table" rid="table-1">Tables 1</xref> and <xref ref-type="table" rid="table-2">2</xref>. In the following, we discuss the performance of the DAE for each dataset.</p>
<sec id="s4_3_1">
<label>4.3.1</label><title>Performance Evaluation with NSL-KDD</title>
<p>The simulations for the NSL-KDD dataset have been conducted with a range of learning rates of 0.001, 0.01, and 0.10 on 32, 64, and 128 batch sizes. The latent space is fixed as 12 and all the simulations are executed for 100 epochs. The efficiency of the suggested DAE is evaluated for both binary class and multi-class scenarios using the NSL-KDD dataset.<list list-type="simple"><list-item>
<p><italic>a) Binary-class Performance Assessment:</italic> In the first batch of the experiments, the effectiveness of the suggested scheme was evaluated for the NSL-KDD dataset in a binary class scenario. Experiments were conducted by using three learning rates 0.001, 0.01, and 0.10 and batch sizes 32, 64, and 128. Performance scores of the proposed DAE for batch size 32 are presented in bar graphs in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. The suggested scheme attained the highest accuracy of 98.86&#x0025; at a learning rate of 0.001. In the second stage, all experiments were repeated for batch size 64 using the same learning rates. Performance scores of the proposed DAE for batch size 64 are presented in bar graphs in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. The suggested DAE attained the highest accuracy of 98.68&#x0025; at the learning rate of 0.002. In the third stage, all experiments were repeated for batch size 128 using the same learning rates. Performance scores of the proposed DAE for batch size 128 are presented in bar graphs in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. The suggested scheme attained the highest accuracy of 98.51&#x0025; at a learning rate of 0.001.</p>
</list-item><list-item>
<p><italic>b) Multiclass Performance Evaluation:</italic> In the second batch of experiments, the effectiveness of the suggested scheme is evaluated for the NSL-KDD dataset in the multiclass classification scenario. Experiments are conducted by using three learning rates 0.001, 0.01, and 0.10 and batch sizes 32, 64, and 128. As previously explained for the binary class experiments, at the first stage a batch size of 32 was selected. Performance scores of the suggested DAE for the batch size of 32 are presented in bar graphs in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. The suggested scheme attained the highest accuracy of 98.14&#x0025;, at a learning rate of 0.001. In the second stage, all experiments were repeated for the batch size of 64 using the same learning rates. Performance scores of DAE for batch size 64 are presented in bar graphs of <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. The suggested DAE attained an accuracy of 98.08&#x0025; at the learning rate of 0.001. In the third stage, all experiments were repeated for a batch size of 128 using the same learning rates. Performance scores of the suggested technique for batch size 128 are presented in bar graphs in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>. The suggested scheme attained the highest accuracy of 98.26&#x0025; at a learning rate of 0.001.</p>
</list-item></list></p>
<fig id="fig-3">
<label>Figure 3</label>
<caption><title>Binary class evaluation with NSL-KDD for batch size 32</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-3.tif"/>
</fig><fig id="fig-4">
<label>Figure 4</label>
<caption><title>Binary class evaluation with NSL-KDD for batch size 64</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-4.tif"/>
</fig><fig id="fig-5">
<label>Figure 5</label>
<caption><title>Binary class evaluation with NSL-KDD for batch size 64</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-5.tif"/>
</fig>
<fig id="fig-6">
<label>Figure 6</label>
<caption><title>Multiclass evaluation with NSL-KDD for batch size 32</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-6.tif"/>
</fig><fig id="fig-7">
<label>Figure 7</label>
<caption><title>Multiclass evaluation with NSL-KDD for batch size 64</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-7.tif"/>
</fig><fig id="fig-8">
<label>Figure 8</label>
<caption><title>Multiclass evaluation with NSL-KDD for batch size 128</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-8.tif"/>
</fig>
</sec>
<sec id="s4_3_2">
<label>4.3.2</label><title>Performance Evaluation with UNSW-NB15</title>
<p>The simulations for the UNSW-NB15 dataset have been conducted with a range of learning rates of 0.005, 0.075, and 0.150 on 64, 128, and 256 batch sizes. The latent space is fixed as 14 and all the simulations are executed for 100 epochs. The effectiveness of the suggested scheme is analyzed in both binary class and multi-class scenarios using the UNSW-NB15 dataset.</p><list list-type="simple"><list-item>
<p><italic>a) Binary-class Performance Assessment:</italic> In the third batch of experiments, the performance of the suggested scheme is analyzed for the UNSW-NB15 dataset in the binary class scenario. Experiments were conducted by using three learning rates 0.005, 0.075, and 0.150, and batch sizes 64, 128, and 256. In the first stage of experiments, the performance of the proposed DAE was evaluated for batch size 64. Performance scores for batch size 64 are presented in bar graphs in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>. The suggested scheme attained the highest accuracy of 99.07&#x0025;, at a learning rate of 0.075. In the second stage, all the experiments were repeated for batch size 128 using the same learning rates. Performance scores of DAE for batch size 128 are presented in bar graphs in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>. The suggested scheme attained the highest accuracy of 99.32&#x0025; at the learning rate of 0.075. In the third stage, all the experiments were repeated for batch size 256 using the same learning rates. Performance scores of DAE for batch size 256 are presented in bar graphs in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>. The suggested scheme attained the highest accuracy of 99.21&#x0025; at the learning rate of 0.075.</p>
</list-item><list-item>
<p><italic>b) Multiclass Performance Evaluation: </italic>In the fourth batch of experiments, the effectiveness of the suggested scheme is evaluated for the UNSW-NB15 dataset in a multiclass classification scenario. Experiments are conducted by using three learning rates 0.005, 0.075, and 0.150, and batch sizes 64, 128, and 256. As previously implemented for the binary class classification, at the first stage a batch size of 64 was selected. Performance scores of the proposed DAE for batch size 64 are presented in bar graphs in <xref ref-type="fig" rid="fig-12">Fig. 12</xref>. The suggested scheme attained the highest accuracy of 98.82&#x0025;, at a learning rate of 0.005. In the second stage, all the experiments were repeated for batch size 128 using the same learning rates. Performance scores of DAE for batch size 128 are presented in bar graphs in <xref ref-type="fig" rid="fig-13">Fig. 13</xref>. The suggested DAE attained the highest accuracy of 98.79&#x0025; at the learning rate of 0.005. In the third stage, all experiments were repeated for batch size 256 using the same learning rates. Performance scores of DAE for batch size 256 are presented in bar graphs in <xref ref-type="fig" rid="fig-14">Fig. 14</xref>. The suggested scheme attained the highest accuracy of 98.66&#x0025; at a learning rate of 0.005.</p>
</list-item></list>
<fig id="fig-9">
<label>Figure 9</label>
<caption><title>Binary class evaluation with UNSW-NB15 for batch size 64</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-9.tif"/>
</fig><fig id="fig-10">
<label>Figure 10</label>
<caption><title>Binary class evaluation with UNSW-NB15 for batch size 128</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-10.tif"/>
</fig><fig id="fig-11">
<label>Figure 11</label>
<caption><title>Binary class evaluation with UNSW-NB15 for batch size 256</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-11.tif"/>
</fig>
<fig id="fig-12">
<label>Figure 12</label>
<caption><title>Multiclass evaluation with UNSW-NB15 for batch size 32</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-12.tif"/>
</fig><fig id="fig-13">
<label>Figure 13</label>
<caption><title>Multiclass evaluation with UNSW-NB15 for batch size 64</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-13.tif"/>
</fig><fig id="fig-14">
<label>Figure 14</label>
<caption><title>Multiclass evaluation with UNSW-NB15 for batch size 128</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34277-fig-14.tif"/>
</fig>
</sec>
<sec id="s4_3_3">
<label>4.3.3</label><title>Performance Comparison with the State-of-the-art</title>
<p>To further investigate the efficiency and robustness of the proposed scheme, the performance is also compared to the related works. A brief performance comparison is presented in <xref ref-type="table" rid="table-4">Table 4</xref>. This comparison is organized based on the utilized DL algorithm, datasets, hyperparameter selection, and attack detection accuracy. Most of the researchers utilized time-intensive deep learning algorithms which are not suitable for deployment in resource-constrained IoT networks. Second, only a few studies focused on the selection of suitable hyperparameters for the optimal training of their schemes. Third, most of the studies presented their evaluation for binary class scenarios and multiclass evaluation is missing. The performance of the proposed scheme is analyzed in both binary and multiclass scenarios and it attained higher attack detection accuracies as compared to the several state-of-the-art IDSs.</p>
<table-wrap id="table-4"><label>Table 4</label>
<caption><title>Performance comparison with the state-of-the-art IDSs</title></caption>
<table><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left" rowspan="2">Reference</th>
<th align="left" rowspan="2">Proposed scheme</th>
<th align="left" rowspan="2">Utilized dataset</th>
<th align="left" rowspan="2">Hyperparameters slection</th>
<th align="center" colspan="2">Accuracy</th>
</tr>
<tr>
<th align="left">Binary class</th>
<th align="left">Multiclass</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-5">5</xref>]</td>
<td align="left">LSTM</td>
<td align="left">N_BaIoT</td>
<td align="left">No</td>
<td align="left">97.74&#x0025;</td>
<td align="left">Not evaluated</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-6">6</xref>]</td>
<td align="left">NDAE</td>
<td align="left">KDD Cup &#x2018;99 and NSL-KDD</td>
<td align="left">No</td>
<td align="left">Not evaluated</td>
<td align="left">97.85&#x0025;, 80.58</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-7">7</xref>]</td>
<td align="left">DFFNN</td>
<td align="left">NSL-KDD, UNSW-NB15</td>
<td align="left">Yes</td>
<td align="left">99.0&#x0025;, 98.90&#x0025;</td>
<td align="left">93.64&#x0025;, 91.22&#x0025;</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-8">8</xref>]</td>
<td align="left">MV-FLID</td>
<td align="left">MQTT dataset</td>
<td align="left">No</td>
<td align="left">98.0&#x0025;</td>
<td align="left">Not evaluated</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td align="left">SLFN</td>
<td align="left">IoTID20</td>
<td align="left">No</td>
<td align="left">86.20&#x0025;</td>
<td align="left">Not evaluated</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-10">10</xref>]</td>
<td align="left">RSRT</td>
<td align="left">SCADA dataset</td>
<td align="left">Yes</td>
<td align="left">96.78&#x0025;</td>
<td align="left">Not evaluated</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-11">11</xref>]</td>
<td align="left">B-MLSTM</td>
<td align="left">CTU-13, gas-water, AWID</td>
<td align="left">No</td>
<td align="left">95.01&#x0025;, 93.41&#x0025; 97.58&#x0025;</td>
<td align="left">Not evaluated</td>
</tr>
<tr>
<td align="left">Proposed scheme</td>
<td align="left">DAE</td>
<td align="left">NSL-KDD, UNSW-NB15</td>
<td align="left">Yes</td>
<td align="left">98.96&#x0025;, 99.32&#x0025;</td>
<td align="left">98.26&#x0025;, 98.82&#x0025;</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s5">
<label>5</label><title>Conclusion</title>
<p>This article proposed a novel DAE-based framework for cyberattack detection in IoT networks. The most significant feature of the proposed design is its lower computational complexity which makes it resource efficient cybersecurity framework for IoT networks. To achieve the optimum performance of the DAE, a range of suitable hyperparameters were determined to train the neural network. These parameters include learning rate, batch size latent space, and no of epochs. Extensive experiments are conducted to analyze the efficacy of the suggested scheme using two standard security datasets including NSL-KDD, and UNSW-NB15. The performance was evaluated through several assessment parameters such as accuracy, precision, recall, and F1 score in both binary class and multiclass scenarios. Experimental results proved that the suggested scheme attained higher attack detection accuracy and other scores for both datasets. A single board computing platform can be incorporated as a hardware accelerator to improve the speed and performance of proposed attack detection for future endeavors.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p><funding-source>The Deanship of Scientific Research (DSR)</funding-source> at King Abdulaziz University (KAU), Jeddah, Saudi Arabia has funded this project, under Grant No. (<award-id>IFPDP-279-22</award-id>).</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear"><title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K. P.</given-names> <surname>Dharshini</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Gopalakrishnan</surname></string-name>, <string-name><given-names>C. K.</given-names> <surname>Shankar</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Ramya</surname></string-name></person-group>, &#x201C;<article-title>A survey on IoT applications in smart cities</article-title>,&#x201D; <source>Immersive Technology in Smart Cities</source>, pp. <fpage>179</fpage>&#x2013;<lpage>204</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Latif</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Huma</surname></string-name>, <string-name><given-names>S. S.</given-names> <surname>Jamal</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Ahmed</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ahmad</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Intrusion detection framework for the internet of things using a dense random neural network</article-title>,&#x201D; <source>IEEE Transactions on Industrial Informatics</source>, vol. <volume>18</volume>, no. <issue>9</issue>, pp. <fpage>6435</fpage>&#x2013;<lpage>6444</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Churcher</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>S. U.</given-names> <surname>Rehman</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Masood</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>An experimental analysis of attack classification using machine learning in IoT networks</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>21</volume>, no. <issue>2</issue>, pp. <fpage>446</fpage>&#x2013;<lpage>478</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. M.</given-names> <surname>Tahsien</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Karimipour</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Spachos</surname></string-name></person-group>, &#x201C;<article-title>Machine learning based solutions for security of internet of things (IoT): A survey</article-title>,&#x201D; <source>Journal of Network and Computer Applications</source>, vol. <volume>161</volume>, pp. <fpage>102630</fpage>&#x2013;<lpage>102651</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G. D. L. T.</given-names> <surname>Parra</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Rad</surname></string-name>, <string-name><given-names>K. K. R.</given-names> <surname>Choo</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Beebe</surname></string-name></person-group>, &#x201C;<article-title>Detecting internet of things attacks using distributed deep learning</article-title>,&#x201D; <source>Journal of Network and Computer Applications</source>, vol. <volume>163</volume>, pp. <fpage>102662</fpage>&#x2013;<lpage>102675</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Shone</surname></string-name>, <string-name><given-names>T. N.</given-names> <surname>Ngoc</surname></string-name>, <string-name><given-names>V. D.</given-names> <surname>Phai</surname></string-name> and <string-name><given-names>Q.</given-names> <surname>Shi</surname></string-name></person-group>, &#x201C;<article-title>A deep learning approach to network intrusion detection</article-title>,&#x201D; <source>IEEE Transactions on Emerging Topics in Computational Intelligence</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>41</fpage>&#x2013;<lpage>50</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. B.</given-names> <surname>Awotunde</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Chakraborty</surname></string-name> and <string-name><given-names>A. E.</given-names> <surname>Adeniyi</surname></string-name></person-group>, &#x201C;<article-title>Intrusion detection in industrial internet of things network-based on deep learning model with rule-based feature selection</article-title>,&#x201D; <source>Wireless Communications and Mobile Computing</source>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. C.</given-names> <surname>Attota</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Mothukuri</surname></string-name>, <string-name><given-names>R. M.</given-names> <surname>Parizi</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Pouriyeh</surname></string-name></person-group>, &#x201C;<article-title>An ensemble multi-view federated learning intrusion detection for iot</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>117734</fpage>&#x2013;<lpage>117745</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Qaddoura</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Al-Zoubi</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Faris</surname></string-name> and <string-name><given-names>I.</given-names> <surname>Almomani</surname></string-name></person-group>, &#x201C;<article-title>A multi-layer classification approach for intrusion detection in iot networks based on deep learning</article-title>,&#x201D; <source>Sensors</source>, vol. 21, no. <issue>9</issue>, pp. <fpage>2987</fpage>&#x2013;<lpage>3008</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. M.</given-names> <surname>Hassan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Gumaei</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Huda</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Almogren</surname></string-name></person-group>, &#x201C;<article-title>Increasing the trustworthiness in the industrial IoT networks through a reliable cyberattack detection model</article-title>,&#x201D; <source>IEEE Transactions on Industrial Informatics</source>, vol. <volume>16</volume>, no. <issue>9</issue>, pp. <fpage>6154</fpage>&#x2013;<lpage>6162</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Vijayakumar</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Kumar</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Detection of low-frequency and multi-stage attacks in industrial internet of things</article-title>,&#x201D; <source>IEEE Transactions on Vehicular Technology</source>, vol. <volume>69</volume>, no. <issue>8</issue>, pp. <fpage>8820</fpage>&#x2013;<lpage>8831</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Luo</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Min</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Gan</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Shi</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>A novel web attack detection system for internet of things via ensemble classification</article-title>,&#x201D; <source>IEEE Transactions on Industrial Informatics</source>, vol. <volume>17</volume>, no. <issue>8</issue>, pp. <fpage>5810</fpage>&#x2013;<lpage>5818</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Su</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>BAT: Deep learning methods on network intrusion detection using NSL-KDD dataset</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>29575</fpage>&#x2013;<lpage>29585</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Sinha</surname></string-name>, <string-name><given-names>A. K.</given-names> <surname>Das</surname></string-name>, <string-name><given-names>S. C.</given-names> <surname>Pandey</surname></string-name> and <string-name><given-names>R. T.</given-names> <surname>Goswami</surname></string-name></person-group>, &#x201C;<article-title>An integrated rule based intrusion detection system: Analysis on UNSW-NB15 data set and the real time online dataset</article-title>,&#x201D; <source>Cluster Computing</source>, vol. <volume>23</volume>, pp. <fpage>1397</fpage>&#x2013;<lpage>1418</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>S. A.</given-names> <surname>Shah</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Latif</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Ahmed</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zou</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>DRaNN_PSO: A deep random neural network with particle swarm optimization for intrusion detection in the industrial internet of things</article-title>,&#x201D; <source>Journal of King Saud University-Computer and Information Sciences</source>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. J.</given-names> <surname>Lee</surname></string-name>, <string-name><given-names>P. D.</given-names> <surname>Yoo</surname></string-name>, <string-name><given-names>A. T.</given-names> <surname>Asyhari</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Jhi</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Chermak</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>IMPACT: Impersonation attack detection via edge computing using deep autoencoder and feature abstraction</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>65520</fpage>&#x2013;<lpage>65529</lpage>, <year>2020</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>

















