Hybrid Metaheuristics Feature Selection with Stacked Deep Learning-Enabled Cyber-Attack Detection Model

Due to exponential increase in smart resource limited devices and high speed communication technologies, Internet of Things (IoT) have received significant attention in different application areas. However, IoT environment is highly susceptible to cyber-attacks because of memory, processing, and communication restrictions. Since traditional models are not adequate for accomplishing security in the IoT environment, the recent developments of deep learning (DL) models find beneficial. This study introduces novel hybrid metaheuristics feature selection with stacked deep learning enabled cyber-attack detection (HMFS-SDLCAD) model. The major intention of the HMFS-SDLCAD model is to recognize the occurrence of cyberattacks in the IoT environment. At the preliminary stage, data pre-processing is carried out to transform the input data into useful format. In addition, salp swarm optimization based on particle swarm optimization (SSOPSO) algorithm is used for feature selection process. Besides, stacked bidirectional gated recurrent unit (SBiGRU) model is utilized for the identification and classification of cyberattacks. Finally, whale optimization algorithm (WOA) is employed for optimal hyperparameter optimization process. The experimental analysis of the HMFS-SDLCAD model is validated using benchmark dataset and the results are assessed under several aspects. The simulation outcomes pointed out the improvements of the HMFS-SDLCAD model over recent approaches.


Introduction
The Internet of things (IoT) consists of a compilation of heterogeneous resource-restrained objects interlinked through distinct network frameworks, namely wireless sensor networks (WSNs) [1]. These "things" or objects are generally made up of processors, sensors, and actuators with the capability to interact with one another for achieving a common objective or applications by unique identifiers in relation to the Internet protocol (IP) [2]. Recent IoT applications involve smart buildings, agriculture, industrial and manufacturing processes, aerospace and aviation, telecommunications, medical and pharmaceutical, and environmental phenomenon monitoring [3]. The fundamental IoT layered structure consists of 3 layers firstly the perception layer (comprising edge devices which interact with the environment for identifying specific external elements or other smart objects in the environment), secondly the network layer (made up of number of networking devices which finds and links devices beyond the IoT network for sending and receiving the sensed data), and lastly the application layer (made up of several IoT services or applications which is accountable for storage and data processing). Many cyber-attacks focus on the network and application layers of the IoT system [4]. After the IoT architecture is breached, attackers have the capability for sharing the IoT data with unapproved crews and may control consistency and preciseness of the IoT data over its whole life cycle [5]. Thus, these cyber-attacks must be addressed for utilization of safe IoT. Fig. 1 depicts the role of machine learning (ML) in cybersecurity.
Network intrusion identification approaches achieve progression from mechanisms lying on port inspection to methods making complete use of ML [6]. The normal port-related approaches are outdated since recent applications majorly depend on dynamic port allotment instead of registered port numbers [7]. The rise in the proportion of encrypted traffic drives the failure of payload-related methodologies. This guides the cybersecurity experts in the direction of using ML and network flow features. Current developments in ML methodologies for network anomaly identification were most welcomed [8,9]. Owing to the diverse and heterogeneous nature of cloud environments, ML offers responses to the Panda et al. [11] utilized the University of New South Wales (UNSW)-NB15, a novel IoT-Botnet data (imbalanced and noisy dataset) to categorize cyberattacks. Scatter search-based feature engineering and K-Medoid sampling methods are utilized for obtaining representation data with optimum feature sets. Al-Haija [12] proposed an effectual and generic top-down structure for intrusion classification, along with recognition in IoT networks through non-conventional ML technique is presented. The presented method is personalized and utilized for intrusion classification/detection integrating IoT cyber-attack data, namely MeSSOge Queuing Telemetry Transport (MQTT) dataset, CICIDS Dataset, etc. Especially, the presented method is comprised of detection and classification (DC) subsystems, feature engineering (FE) subsystems, and feature learning (FL) subsystems. In [13], a hybrid deep random neural network (HDRaNN) for detecting cyber-attack in the IIoT is proposed. The presented method integrates a multilayer perceptron with dropout regularization and deep random neural network.
Amma [14] proposed a Vector Convolution Deep Autonomous Learning (VCDAL) classification for detecting cyberattacks in the network traffic dataset. The presented method classification extracts the feature through vector convolution neural network (CNN), automatically learns the feature via increment learning using distilled cross entropy, as well as classifies the developing network traffic dataset via softmax function. The presented classification has been by implementing experiments on standard network traffic data sets and it is clear that the presented classification could probably identify known and unknown cyberattacks. An et al. [15][16][17][18] presented an unsupervised ensemble autoencoder (AE) interconnected with the Gaussian mixture method (GMM) for adapting various fields nevertheless of the skewness of all the domains. In the hidden region of the ensemble AE, the attention-based latent representation and recreated feature of the minimal error are employed.
This study introduces novel hybrid metaheuristics feature selection with stacked deep learning enabled cyber-attack detection (HMFS-SDLCAD) model. The major intention of the HMFS-SDLCAD model is to design salp swarm optimization based on particle swarm optimization (SSOPSO) algorithm for feature selection process. Moreover, stacked bidirectional gated recurrent unit (SBiGRU) model is utilized for the identification and classification of cyberattacks. At last, whale optimization algorithm (WOA) is employed for optimal hyperparameter optimization process. The experimental analysis of the HMFS-SDLCAD model is validated using benchmark dataset and the results are assessed under several aspects.

The Proposed Cyber-Attack Detection Model
In this study, a new HMFS-SDLCAD model has been developed to recognize the occurrence of cyberattacks in the IoT environment. At the preliminary stage, data pre-processing is carried out to transform the input data into useful format. Then, the SSOPSO algorithm is utilized to elect features. In addition, the WOA with SBiGRU model is utilized for the identification and classification of cyberattacks. Fig. 2 demonstrates the block diagram of HMFS-SDLCAD technique.

Process Involved in SSOPSO Based FS Model
In this work, the SSOPSO algorithm is employed to choose an optimal subset of features from the preprocessed data. The framework of the presented technique is explained. It is named SSOPSO that integrates the SSO and PSO approaches. The fundamental infrastructure of SSO technique was improved by enhancing the upgrade step of population place. This alteration merges the upgrade process of PSO as to important infrastructure of SSO. This combination adds further flexibility to SSO in exploring the population and makes sure its diversity of it and attains the optimum value rapidly.
In the primary stage, the presented SSOPSO is used for determining the parameter and creating the population that signifies the group of solutions to offered problem (feature selection) [19]. Next, the performance of all the solutions is measured by calculating the fitness function (FF) for everyone and defining the optimum of them. The next stage from the presented SSOPSO technique is for updating the existing population by utilizing also the SSO or PSO technique that depends upon the quality of FF (evaluated by their probability). When the probability of FF, to the present solution, is superior to 0.5 then SSO, else, the PSO was utilized. Next, the FF to all the solutions was calculated and optimum solution was defined then upgrades the population. The next stage is for checking when the end criteria are fulfilled before returning by optimum solution, then, repeating the preceding stages in calculating the probability to end.
The SSOPSO technique begins with determining the primary value of SSO and PSO techniques, next the SSO creates an arbitrary population X of size N in dimensional D, next SSO computes the food fitness to all the solutions x i ; i ¼ 1; 2; . . . ; N. But, before calculating the objective function, all the solutions x i was changed to binary vector (that comprises only 1's and 0 0 s) based on the value of an arbitrary threshold e 2 ½O; 1 utilizing the subsequent formula: Thus, only the x j element which is equivalent to 1's were selected for representing the chosen features (moreover, the other elements were ignored later which can signify the irrelevant feature). The next stage is for computing the objective function for all x i as in Eq. (2): whereas E x i ðtÞ implies the error of classifier executed by the effectual classification, but the second term signifies the amount of chosen features. For balancing amongst the classifier error and the amount of chosen features, the parameter n 2 ½0; 1 was utilized. The next stage is for computing the probability of all the FFs ðPro i Þ as: Based on the Pro j value, the present solution x j is upgraded utilizing the SSO or PSO techniques. The FF was calculated for all upgrade solutions, and optimum solution was upgraded. This sequence was iterated still meeting the end criteria (the presented SSOPSO technique executes to the max iteration number as ending criteria).

SBiGRU Based Classification
Once the feature subsets are chosen, the next step is to identify the cyberattacks using the SBiGRU model. The SBiGRU is comprised of forwarding and backwarding layers stacked on top of the other. The input dataset is given to the initial forward and backward layers. The output is a sequence of latter forward and backward layers [20]. For time series t, the input series fe; e . . . ; e t g entered hidden layer in the forward direction fh a 1 ; h a 2 ; . . . ; h a t g for obtaining comprehensive dataset from each historical time step and entering hidden layer in the reverse direction fh c 1 ; h c 2 ; . . . ; h c t g for obtaining comprehensive data from each future time step. Next, the upper hidden layer takes the output from the low hidden layer as input for extracting features. Especially, the upper layer of the forwarding hidden layer is . . . ; h b t g; and the upper layer of the backward hidden layer is fh d Lastly, the output layer integrates the hidden vector of two upper layers as output. For the initial forward layer, hidden layer h a t , is to attain the candidate value, update, and reset gates, correspondingly: In the next forward layer, the hidden layer h b t , is to attain the candidate value, update, and reset gates, correspondingly: In the initial backward layer, hidden layer h c t , is to attain the candidate value, update, and reset gates, correspondingly: In the next backward layer, the hidden layer h d t ; is to attain the candidate value, update, and reset gates, correspondingly: The output of next forward and backward layers is given in the following:

WOA Based Hyperparameter Optimization
Finally, the WOA is utilized for optimal hyperparameter optimization process. Mirjalili et al. [21] presented the WOA stimulated by the whale behavior. The foraging behavior is named bubble-net feeding technique. But, in WOA, the existing optimal candidate solution is to set the target prey or closer to the optimal. The other tries to upgrade the location towards the optimal one. Arithmetically, it can be expressed in the following: whereas t indicates the existing iteration, X denotes the location vector, X Ã represent the location vector coincides with the optimal solution found, and A and C denote the coefficient vectors. A and C are determined as follows: whereas r is positioned arbitrarily within ½0; 1 and a is reduced linearly from 2 to 0. This method has two stages: exploration and exploitation. The exploitation stage: is separated into; (1) shrinking encircling mechanism: This is attained by reducing a value. Noted that a indicates an arbitrary number within ½Àa; a.
Spiral updating location: This technique estimates the distance among the whale and the prey. A spiral formula is utilized for mimicking the helix-shaped movement: X ðt þ 1Þ ¼ D l e bl Á cos ð2plÞ þ X Ã ðtÞ (25) whereas 1 denotes an arbitrary value within [1,1] and b denotes a constant. A possibility of 50% to choose among the shrinking encircling model or the spiral model. Therefore, the arithmetical method is expressed in the following: In which p denotes an arbitrary value in a uniform distribution. The exploration stage: While, in the exploration stage, A has utilized arbitrary value in 1 0 A 0 À1 to force the agent to move away from the position and arithmetically expressed in the following: X ðt þ 1Þ ¼ X r and À A Á D (28)

Experimental Validation
In this section, the experimental validation of the HMFS-SDLCAD model is tested using a benchmark dataset, available at https://dataset.litnet.lt/. The dataset holds samples under 12 class labels and 84 features. The proposed model has chosen a set of 47 features. Tab. 1 provides the details related to the dataset. respectively. Also, with label-5, the HMFS-SDLCAD approach has offered accu y , prec n , reca l , F score , and MCC of 99.85%, 99.50%, 99.04%, 99.27%, and 99.18% correspondingly. Moreover, with label-10, the HMFS-SDLCAD algorithm has offered accu y , prec n , reca l , F score , and MCC of 99.91%, 96.68%, 99.10%, 97.88%, and 97.84% correspondingly. Furthermore, with label-12, the HMFS-SDLCAD system has obtainable accu y , prec n , reca l , F score , and MCC of 99.96%, 97.75%, 97.75%, 97.75%, and 97.73% correspondingly.  Tab. 3 and Fig. 6 demonstrate the overall classification outcome of the HMFS-SDLCAD technique on 70% of TRS dataset. The experimental values represented that the HMFS-SDLCAD approach has reached maximum classifier results under all class labels. For instance, with label-1, the HMFS-SDLCAD approach has offered accu y , prec n , reca l , F score , and MCC of 99.77%, 98.29%, 99.51%, 98.89%, and 98.77% correspondingly. In addition, with label-5, the HMFS-SDLCAD model has offered accu y , prec n , reca l ,      The training accuracy (TA) and validation accuracy (VA) attained by the HMFS-SDLCAD method on test dataset is demonstrated in Fig. 9. The experimental outcome implied that the HMFS-SDLCAD model has gained maximum values of TA and VA. In specific, the VA seemed that superior to TA.
The training loss (TL) and validation loss (VL) achieved by the HMFS-SDLCAD system on test dataset are established in Fig. 10. The experimental outcomes inferred that the HMFS-SDLCAD approach has able least values of TL and VL. In specific, the VL appeared to be lower than TL.     performance with lower values of acc y , prec n , and reca l . Next, the long short term memory (LSTM) algorithm has tried to exhibit moderate performance with acc y , prec n , and reca l of 99.42%, 97.85%, and 98.90% respectively. In addition, the RF-NML model has resulted in reasonable outcomes with acc y , prec n , and reca l of 98.50%, 97.26%, and 98.12% correspondingly. But, the HMFS-SDLCAD model has outperformed other methods with maximum acc y , prec n , and reca l of 98.86%, 98.79%, and 99.07% correspondingly. Fig. 13 demonstrates a comparative F score and MCC examination of the HMFS-SDLCAD model with existing models. The figure indicated that the RF, SVM, MLP, and DNN models have shown poor performance with lower values of F score and MCC. Next, the LSTM model has tried to exhibit moderate performance with F score and MCC of 98.34% and 98.26% respectively. Then, the RF-NML model has resulted in reasonable outcomes with F score and MCC of 98.59% and 98.11% respectively. However, the HMFS-SDLCAD model has outperformed other methods with maximum F score and MCC of 98.93% and 98.85% respectively. The above mentioned results and discussion reported that the HMFS-SDLCAD model has accomplished effectual outcomes over other methods.

Conclusion
In this study, a new HMFS-SDLCAD model has been developed to recognize the occurrence of cyberattacks in the IoT environment. At the preliminary stage, data pre-processing is carried out to transform the input data into useful format. Then, the SSOPSO algorithm is utilized to elect features. In addition, the WOA with SBiGRU model is utilized for the identification and classification of cyberattacks. The experimental analysis of the HMFS-SDLCAD model is validated using benchmark dataset and the results are assessed under several aspects. The simulation outcomes pointed out the improvements of the HMFS-SDLCAD model over recent approaches. Thus, the HMFS-SDLCAD model can be employed for effectual identification of cyberattacks in the IoT environment. In future, feature reduction and outlier removal approaches can be included to enhance the classifier outcomes.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.