Optimal Deep Learning Driven Intrusion Detection in SDN-Enabled IoT Environment

: In recent years, wireless networks are widely used in different domains. This phenomenon has increased the number of Internet of Things (IoT) devices and their applications. Though IoT has numerous advantages, the commonly-used IoT devices are exposed to cyber-attacks periodically. This scenario necessitates real-time automated detection and the mitigation of different types of attacks in high-traffic networks. The


Introduction
Internet of Things (IoT) refers to a dynamic network that contains smartphones, sensor nodes, software, switches/routers and servers [1]. IoT was developed as a phenomenon. In this dynamic network, real-time data movements or activities are processed and sensed. The IoT network acts as a common platform for data transmission between the physical world and the Internet of conventional things. The idea of the IoT network has resulted in extensive consumption, production and processing of information [2]. The number of devices connected via the Internet has surpassed the global population and is expected to increase considerably in a few years [3,4].
On the other hand, the devices with constrained resources contribute to the security issues in the IoT network, which considerably augments it in terms of risks, vulnerabilities and threats [5]. In this background, an appropriate analysis of the information recorded in the IoT platforms assists in predictions and early detection of the threats [6]. The Intrusion Detection Systems (IDSs) can rectify malicious activities in the IoT network through real-time traffic analysis. The IDS takes measures to protect the system from getting damaged through attack detection and classification processes. Software-Defined Network (SDN) is a naïve concept in the networking domain that decouples the packet-forwarding plane and the control plane. The SDN approach provides a global view of the network and its centralized control [7,8]. Different authors have focused on developing novel IDSs for conventional networks in literature. These studies focused on the constrained devices in the IoT networks and the recognition of malicious activities in them [9]. Fig. 1 depicts the processes involved in the SDN environment. The concept of IDS-based software defines network systems as unique, especially in the IoT environment [6]. In conventional networks, network devices such as routers or switches can forward the data through different control mechanisms [10]. However, the SDN model conceived the network as a programmable entity and decouples both the control plane and the data plane. Here, the routers or switches simply act as a forwarding devices since the control module is operated from the centralized control. The SDN approach deals with the networks based on the abstraction of low-level functionality and maintains an Application Programmable Interface (API) to control the low-level devices [11]. In general, the SDN controller has a global view of the network, making it easy to configure it as and when required.
Furthermore, if there are any modifications to be done to the networking systems, the programmability feature of the SDN makes it relatively straightforward [9]. The security system and the associated features are programmed via an API module and are performed in a network by following the flow rules. These rules are administered through an OpenFlow protocol [12]. The programmability feature increases the flexibility of the network. As mentioned earlier, in case of a modification in the networking system, the control plane performs it instead of reconfiguring every device in the network individually.
In the study conducted earlier [13], an SDN-enabled deep-learning-driven structure was suggested for attack detection in the IoT environment. The existing classifiers such as the Cuda-Deep Neural Network (DNN), Cuda-Bidirectional Long Short Term Memory (Cu-BLSTM) and the Gated Recurrent Unit (Cu-DNNGRU) were efficiently leveraged for attack detection. In this study, a tenfold Cross-Validation (CV) was executed to display the unbiased results. Shu et al. [14] used a Deep Learning (DL) technique with generative adversary networks. They explored the distributed-SDN to devise a Collaborative IDS (CIDS) for the Vehicular Adhoc Network (VANET) model. In this model, multiple SDN controllers are allowed to train a global ID method mutually for the entire network without any direct interchange among the sub-network flows. Shrestha et al. [15] suggested a satellite-related, Unmanned Aerial Vehicle (UAV) 5G-network security method in which a Machine Learning (ML) technique was used to identify the cyberattacks and other vulnerabilities efficiently. The solution had two major phases: the model created for the intrusion detection process using several ML techniques and the application of the ML-related techniques in satellite or terrestrial gateways.
Aslam et al. [16] suggested an Adaptive ML-related SDN-enabled Distributed Denial of Service (DDoS) attack Detection and Mitigation (AMLSDM) structure. The suggested AMLSDM structure was an SDN-enabled security system for the IoT gadgets. An adaptive ML classification method was utilized to achieve fruitful mitigation and the detection of DDoS attacks. The presented structure used ML methods in an adaptive multi-layered feed forwarding method to identify the DDoS attacks in a successful manner. This was accomplished by probing the static attributes of the network traffic under review. Derhab et al. [17] suggested a security architecture in which the Software-Defined Network (SDN) model and the Blockchain technology were integrated. The suggested IDS security structure was created by integrating the K-Nearest Neighbor (KNN) technique and the Random Subspace Learning (RSL) technique. This was done to defend the forged commands that target the industrial control procedures and the Blockchain-related Integrity Checking System (BICS) that prevents the misrouting assault that meddles with the SDN-enabled industrial's OpenFlow regulations IoT mechanisms.
The current study devises a Harmony Search algorithm-based Feature Selection with Optimal Convolutional Autoencoder (HSAFS-OCAE) for the purpose of intrusion detection in the SDNenabled IoT environment. The presented HSAFS-OCAE method follows a three-stage process in which the Harmony Search algorithm-based FS (HSAFS) technique is first exploited for feature selection. Next, the CAE algorithm is used for the recognition and classification of the intrusions in the SDN-enabled IoT environment. Finally, the Artificial Fish Swarm Algorithm (AFSA) is used to fine-tune the hyperparameters to boost the intrusion detection performance of the CAE algorithm.
The proposed HSAFS-OCAE technique was experimentally validated, and the results were assessed under different measures.

The Proposed Model
In this study, a new HSAFS-OCAE model has been proposed for a proficient recognition of intrusions in the SDN-enabled IoT environment. The presented HSAFS-OCAE model follows a threestage process in which the HSAFS technique is exploited at first for feature selection. Next, the CAE approach is leveraged to recognize and classify intrusions in the SDN-enabled IoT environment. Finally, the AFSA-based hyperparameter fine-tuning process is executed to boost the intrusion detection performance of the CAE model. Fig. 2 shows the overall block diagram of the HSAFS-OCAE approach.

Design of HSAFS Technique
In this study, the HSAFS technique is exploited for feature selection. In the HSA method, the optimal solution (or component from the solution space) is named 'harmony' viz., n-dimensional real vector. An arbitrary value is assigned to the initial population and is loaded from the Harmony Memory (HM) [18]. Then, a novel candidate (following iteration or generation) is evaluated and the harmony is generated based on the component from HM, either by altering the pitch or through an arbitrary selection of the element from HM. Afterwards, the component from the harmony memory and the newly-evaluated candidate harmony are correlated with the least HM vector. This process is repeated to satisfy the ending criteria. The parameters of the HS optimization approach are as follows (i) Pitch Adjusting Rate (PAR) (ii) the size of the HM (iii) distance bandwidth (BW) (iv) HM Consideration Rate (HMCR), and (v) the number of iterations or improvisations (NI). It is crucial to configure the HM module (HMS Vector). Assume that xi = {xi(1), xi (2), . . . xi(n)} characterizes an arbitrarily-evaluated HM vector: xi(k) = X l (k)+(X u (k) − X l (k)) * rand (0, 1) for k = 1, 2, . . . , nandi = 1, 2, . . . , HMS i.e., the length of the HM. Hence, the upper and the lower bounds of the searching space are characterized by X l (k) and X u (k), respectively. The HM matrix for every component is a harmony vector.
In HM matrix, i.e., x new , a harmony vector is created by three functions such as (i) the pitch adjustment (ii) memory consideration and (iii) an arbitrary re-initialization. Initially, a decision value (1), an arbitrary number r1 is used in the range of 0 and 1. Once an arbitrary number is selected within the HMCR, The two functions, such as (i) an arbitrary re-initialization and (ii) the memory consideration, are defined using the following equation.
The newly-generated x new value is inspected while, on the other hand, the required value is either pitch-adjusted or not. To address these problems, a PAR entity is proposed by combining the Bandwidth Factor (BF) and the frequency. These two factors are adjusted to get a novel x new value for the selected HM solution during the local search process. The pitch-adapted novel solution x new is calculated as x new (k) + / − rand (0, 1). BW with a probability of PAR. This PAR is mostly similar to the mutation process from the evolutionary bio-inspired technique. The range of the PAR is limited At last, x new , i.e., the newly-generated harmony vector, is upgraded or estimated as a novel component to fit itself between the x new value and the worst harmony vector x w in the HM. Consequently, x w is altered by x new , whereas the established part of the HM is identified to maximize the objective function (inter-class variance) for which the HSA is employed.
Either harmony or the solution employs a k element to decide the optimization system. The threshold value th k is applied upon a multi-level segmentation process as formulated herewith.
Here, T characterizes the transpose of the matrix, and the maximum size of the HM is denoted by the HMS technique. Each element from the HM is denoted by xi which lies in the range of [0, k]. The HSAFS method derives a Fitness Function (FF) to handle the trade-off between the chosen feature count and the classification accuracy with the help of the selected features. FF is determined as given below.
Here, γ R (D) represents the error rate of the presented classifier. |R| indicates the selected subset and |C| denotes the total feature count and α and β correspond to the constants.

Design of CAE Classification Model
In this stage, the CAE technique is used to recognize and classify intrusions in the SDN-enabled IoT environment. Autoencoder is a self-supervised learning mechanism that exploits the Neural Network (NN) for representative learning [19]. Representative learning is a method in which a scheme learns how to encode the input dataset. The Autoencoder (AE) approach maps the input datasets to a compressed domain demonstration or a low-dimension space. In the current study, a bottleneck is proposed in which an algorithm is enforced to learn how to demonstrate the compressed domains of the input dataset. In general, the AE approach encompasses four elements such as the Reconstruction Loss, Encoder Network (EN), Bottleneck Layer and Decoder Network (DN). Encoder Network is a NN system that encodes the input dataset to a compressed domain. The bottleneck layer is the final layer of the Encoder Network, and its output is called the encoded input data.
In Eq. (5), X ei corresponds to the input for the i th layer, X denotes the output of the i th layer of EN, WV ei shows the weight vector for the i th layer, b ei indicates the bias for the i th layer, and f ei indicates the activation function for the i th layer.
In Eq. (6), X di denotes the input for the i th layer of the DN method, Xdi + 1 signifies the output of the i th layer, w di shows the weight vector for the i th layer, b di refers to the bias of the i th layer, and f di illustrates the activation function for the i th layer. The variance between the original dataset X O and the reconstructed dataset X R is called the Reconstructed Loss. On the other hand, the Binary Cross-Entropy (BCE) and the Mean Squared Error (MSE) Loss are the two commonly-applied loss functions in the calculation of Reconstructed Loss. Here, D indicates the count of the instances in a dataset in which the AE is employed.

Algorithmic Process of AFSA Based Hyperparameter Tuning
Finally, the AFSA hyperparameter tuning process is performed to boost the intrusion detection performance of the CAE model. The AFSA model is a kind of swarm intelligence technique that is simulated from the behaviour of the animals [20]. In this method, the fish's clustering, collision and foraging behaviours are simulated along with its collective support in a fish swarm to realize the optimum global point. In this Artificial Fish (AF) technique, the maximum distance that passes through is referred to as Step. The apparent distance that passes over the AF is referred to as Visual.
Further, the retry quantity is characterized by Try − Number. The crowd quantity factor is characterized by η. The place of a particular AF is referred to as the resultant vector, X = (X 1 , X 2 , . . . , X n ) and the distance between i and j i.e., AF is determined using d ij = X i − X j . The performance function of the AF is described in the following order i.e., prey, random, follow and swarm.
Given that a fish observes the food via its eyes, the existing position is denoted by X i along with an arbitrarily-chosen position i.e., X j within a perceptive range as given below.
In Eq. (9), rand (0-1) characterizes an arbitrary value between [0, 1]. If Y i > Y j , the fish moves in the direction. Then, the technique arbitrarily chooses a novel position X j to check whether it fulfils the motion condition as follows.
If it does not fulfil the motion condition Try − Number times, an arbitrary motion is created as given below.
In order to avoid the over-crowding issues, an artificial existing location X i is set. Followed by, the fish amount in the n f company and X c center in the area (d ij < Visual) are determined. If Y c /n f < η×Y i , the companion's location is characterized by the optimal food count and low-crowding. Consequently, the fish moves towards the companion area i.e., centre of the location.
Then, it begins to follow the prey's behaviour.
The existing place of the AF swarm is referred to as X i . The swarm describes the company Y j as X j in the area (d ij < Visual). If Y j /n f < η × Y i , then the location of the company embodies an optimal food count with a less crowd.
It allows the AF to accomplish the company as well as the food over a large region. The location is chosen in a random manner based on which the AF moves towards them.
Using the search region of D dimension, the extremely-possible distance between the two AFs is applied to vigorously limit the Visual & Step of the AF as given below.
In Eq. (14), x min and x max signify the lower and the upper limits. D denotes the dimension of the searching region. The classification results accomplished by the proposed HSAFS-OCAE model are portrayed as confusion matrices in Fig. 3. The figure reports that the proposed HSAFS-OCAE model effectually recognized and classified all the input data under six class labels.  Table 2 and Fig. 4 highlight the classification outcomes achieved by the proposed HSAFS-OCAE model on 80% of the TR data. The results imply that the proposed HSAFS-OCAE model accomplished effectual outcomes under each class. For instance, on 80% of the TR data, the HSAFS-OCAE model gained an accu y of 98.75%, prec n of 99.35%, reca l of 99.13%, F score of 99.24% and an MCC of 95.76%. Then, when using 80% of the TR data, the presented HSAFS-OCAE method obtained an accu y of 99.31%, prec n of 90.96%, reca l of 89.77%, F score of 90.36% and an MCC of 90%. When using 80% of the TR data, the proposed HSAFS-OCAE model attained an accu y of 99.38%, prec n of 87.85%, reca l of 96%, F score of 91.74% and an MCC of 91.52%.           To showcase the supremacy of the proposed HSAFS-OCAE model, a comparative study was conducted, and the results are shown in Table 4 [11].       In this study, a new HSAFS-OCAE model has been devised to recognize intrusions in the SDN-enabled IoT environment proficiently. The presented HSAFS-OCAE model follows a threestage process in which the HSAFS technique is exploited for feature selection. Next, the CAE methodology is leveraged to recognise and classify intrusions in the SDN-enabled IoT environment. Finally, the AFSA-based hyperparameter tuning process is performed to boost the intrusion detection performance of the CAE approach. The proposed HSAFS-OCAE methodology was experimentally validated under several aspects. The comparison study outcomes established the improved outcomes of the HSAFS-OCAE model over other techniques. In the future, the HSAFS-OCAE model's performance can be improved using hybrid metaheuristic approaches.