GRU-based Buzzer Ensemble for Abnormal Detection in Industrial Control Systems

: Recently, Industrial Control Systems (ICSs) have been changing from a closed environment to an open environment because of the expansion of digital transformation, smart factories, and Industrial Internet of Things (IIoT). Since security accidents that occur in ICSs can cause national confusion and human casualties, research on detecting abnormalities by using normal operation data learning is being actively conducted. The single technique proposed by existing studies does not detect abnormalities well or provide satisfactory results. In this paper, we propose a GRU-based Buzzer Ensemble for Abnormal Detection (GBE-AD) model for detecting anomalies in industrial control systems to ensure rapid response and process availability. The newly proposed ensemble model of the buzzer method resolves False Negatives (FNs) by complementing the limited range that can be detected in a single model because of the internal models composing GBE-AD. Because the internal models remain suppressed for False Positives (FPs), GBE-AD provides better generalization. In addition, we generated mean prediction error data in GBE-AD and inferred abnormal processes using soft and hard clustering. We confirmed that the detection model’s Time-series Aware Precision (TaP) suppressed FPs at 97.67%. The final performance was 94.04% in an experiment using an HIL-based Augmented ICS (HAI) Security Dataset (ver.21.03) among public datasets.


Introduction
In the past, industrial control systems were protected by whitelist-based detection [1,2], separation-based technologies [3] and fault diagnosis prediction [4] in closed environments.But recent industrial control systems are being converted to open environments with the expansion of digital transformation, smart factories, and IIoT.As the industrial control system transforms itself into an environment that can operate and control itself, research on Operational Technology (OT) security using artificial intelligence technology is under way.
Abnormal processes in industrial control systems can occur in various ways, such as from cyberattacks (e.g., malicious code infection, control logic changes), hardware and software errors, operator errors, and environmental factors.In particular, if the control logic of the safety system is changed, that may lead to facility destruction, operation interruption, and even casualties.In case of attack by bypassing the IT system (e.g., Ukraine power grid hack, Triton Attack), the OT system must be able to detect it.Therefore, it is necessary to detect abnormal processes by using learning operation data that occurs between processes.
In the abnormal detection studies [5][6][7][8][9][10][11][12][13] using various deep learning models and detection techniques based on operation data, the most important consideration is how to detect new and unknown attacks.However, if an abnormal process is detected based on the prediction error of a deep learning model trained on normal operation data, it is directly related to the cost of response, such as on-site due diligence and process interruption, so the problem of FPs must be addressed.One must also be able to identify the physical location of a detected anomaly.
In this paper, we propose a GRU-based Buzzer Ensemble for Abnormal Detection (GBE-AD) model with suppressed FPs and infer the abnormal process by means of soft and hard clustering from the mean prediction error (MPE) generated from the built model.The first step in detecting anomalies is to build a Gated Recurrent Units (GRU) [14] prediction model by means of training data consisting of only normal data.In the second step, the time step (window size) of the time-series data is set differently, and the training data is shuffled to create a model similar to the first built model.This is defined by the internal model of GBE-AD.Each internal model has a threshold for suppressing FPs, and the internal models are complementary to FNs.Finally, the prediction error data collected from the detection model is generated, and the detailed process of the abnormal data is inferred by means of fuzzy c-means and k-means clustering.

Suggestions contributed are:
We propose a GBE-AD model with suppressed FPs using GRU.We derived the detailed process points of the abnormal process from the MPE data collected from the predictive model.
We confirmed that FPs were suppressed in TaP 97.67% of the detection model, and the final performance was 94.04%.In addition, by clustering the MPE data, we could infer the detailed process of the abnormal data.

Related Works 2.1 HIL-based Augmented ICS (HAI) Security Dataset
The ICS public dataset is being published for industrial control system security research.Among them, the HAI dataset used in this paper was constructed to facilitate synchronization between components and data collection by means of OPC-UA (Open Platform Communications-Unified Architecture) and Hardware-in-the-loop (HIL) simulation.In addition, it is a very reliable dataset, because it includes various stealth attacks and labels them by means of information generated using automated attack tools [15].
As shown in Fig. 1, The HAI 21.03 dataset uses normal data collected for 352 h of 78 data points in a testbed consisting of a boiler, turbine, water treatment process, and HIL simulator as training data, and includes 50 attack scenarios for 112 h.Abnormal data is used as test data [16].

Approaches to Detecting Abnormalities in ICSs
Detecting abnormalities by means of operation data learning aims to protect the level 0 ∼ 1 area of the Purdue Model [17] that has actuators, sensors, and controllers.Threat modeling can be defined as outliers of sensors and actuators, injected network traffic, and damage to control systems [18].
Continuous monitoring is required to ensure the data integrity and deterministic output of the ICS process.However, as the complexity of industrial control systems increases and attacks become more precise, creating a knowledge-based model using domain knowledge cannot resolve the problem.For detecting new abnormalities of industrial control systems, a deep learning-based approach has been studied, as shown in Tab. 1.In most studies using deep learning, abnormalities were detected by means of the threshold-based results of residual analysis and prediction error techniques [5][6]8,9,11,12].In addition, abnormal data that does not belong to the normal data group was detected with each technique, such as reconstruction error using GAN (Generative Adversarial Networks) discriminator, classification by decision tree, and unknown class classification using open set recognition [7,10,13].Each study uses different evaluation indicators, such as attack scenarios, classification evaluation metrics, and timeseries evaluation metrics to represent the research results.

Considerations for detecting abnormalities of industrial control systems:
In order for a model to be used in various industrial control systems, it must compare the difference between the actual value and the predicted value by predicting with a multivariate regression model based on unsupervised learning.
Because a single model may not detect anomalies well, it needs to be improved.
The performance of the detection model should be measured using a time-series-based evaluation considering the process environment.
When deciding whether to update the detection model, one must restore the result data related to detecting abnormalities as data that can be analyzed.
When detecting anomalies for the entire process, one must reduce the problem of response costs.

Proposed GRU-based Buzzer Ensemble Scheme for Abnormal Detection
In this section, we propose a GBE-AD to detect abnormal processes by learning operation data of industrial control systems.Since a single model using deep learning learns and analyzes data by means of a single technique, it does not detect anomalies well.In addition, the most important point in anomaly detection is to minimize detection errors (FPs, FNs).As shown in Fig. 2, the proposed model learns the same data from various perspectives, so that comprehensive conclusions can be drawn.Extended Threshold (ET) is set to a value slightly higher than max( ŷt − (|y t |) of Normal Data.If ET (Internal Model) expresses the normal range, the normal range and the abnormal range can be expressed in n internal models as shown below. Thus, This section is divided into four subsections: Data Preprocessing includes the process of scaling operation data and then converting it into time-series data.
In the model tuning process, we describe how to select the optimal hyperparameter and create an internal model that composes GBE-AD.
Describes the buzzer method ensemble technique of the proposed model.When abnormal process is detected, the internal model of GBE-AD generates mean prediction error data and infers the abnormal detailed processes.

Data Preprocessing
Data Preprocessing required for training is very important.Operation data generated in industrial processes is the result of the operation of industrial facilities.It has deterministic properties, and its physical state changes over time.Therefore, it is appropriate to interpret it in the form of time-series data according to the operating environment.
Since the data must be converted to time-series data, it may interfere with learning if it is not numeric data.The features required for learning use the points of the process.null, not a number, and missing values of data are handled because all data generated in a normal operating environment must be quantified.Because the scale is different for each point of the operational data, one must prevent the learning model from having bias.Min-Max Normalization is used for data scaling as follows, and the original data can be restored by means of simple calculations. (3) The scaled data is converted into time-series data to complete the data preprocessing.The conversion process is shown in Algorithm 1.

Tuning of the Internal Model
After completing data preprocessing for training, the model trains and predicts using stacked-GRU.The model generated here is defined as the internal model of GBE-AD, and is tuned for optimization.The hyperparameters used for tuning determine the time step of time-series data, the number of cells and layers, epoch, and batch size [23].We used grid search for the optimization, as shown in Tab. 2. In addition, we created the final model by removing unnecessary features from the tuned full model.Since the internal model was created using all points as features, we removed unnecessary features by means of Backward Elimination.An internal model with unnecessary features removed is shown in Fig. 3.
A label value of "0.00" is in the normal range, and "0.20" is in the abnormal range.There are three types of model optimization results using Backward Elimination: Type 1 evaluation lowers the MPE for normal data.Type 2 evaluation increases the MPE for abnormal data.Type 3 evaluation lowers the MPE range of normal data identified as anomalies.Input X expressed as time-series data 'X ', point (feature) 'P', number of points 'k', current time 't', and time-step size 'w' is as follows: The data preprocessed with the size of time step for each internal model is input to each internal model, and the mean absolute error of each point is output: Then, if the output Y n of each Internal Model n is greater than Extended Threshold n , an abnormality is detected: When abnormal data occurs in the process, it affects for a certain period according to the task sequence, so a range-based performance evaluation is required [24].We evaluated the performance using TaPR (Time-Series Aware Precision and Recall) [25], which added a score for the ambiguous detection range in which the system does not operate normally during the recovery period of the control system.

Inferring Abnormal Processes
In order to infer the abnormal process, the prediction error data of the detecting abnormalities period is generated in the GBE-AD model.The most important point in this part is to identify the end time of the attack and extract prediction errors for the same attack.Then, the prediction error data for each process is averaged.Since the level of attack difficulty is different for each attack scenario, data preprocessing is calculated as the L2 norm for each process based on the attack scenario.
The order of generating mean prediction error data is shown in Algorithm 2.
When the test data is input to the GBE-AD model, the prediction error data detected as anomalies is extracted.
The classification of attack scenarios is labeled based on the value of adding 120 s (twice the time step) to the last detection time for each attack.
Data is generated by calculating the mean prediction error for each point based on each attack scenario.
The mean prediction error for each process is extracted from the generated data and preprocessed using Normalizer() in Scikit-learn.We inferred abnormal processes using the fuzzy c-means clustering algorithm of soft clustering and the k-means clustering algorithm of hard clustering.For the number of clusters, we tested the range of 3 to 6 using the number of processes as the minimum value and the number of types of attack scenarios as the maximum value.fuzzy c-means clustering is a soft clustering method; so we set the number of clusters (K) to 3, and tested k-means clustering by increasing the number of clusters to 6 to check whether multiple attacks can be distinguished.As shown in Fig. 6 and Tab. 5, we obtained good results when there were 3 clusters for fuzzy c-means clustering and 3-4 for k-means clustering.Fuzzy c-means '•' clusters the cases where the P3 process is attacked.but includes some other processes (P1, P1+P2).Each clustering method accurately derived the target process of a single attack scenario.For multiple attack scenarios, two or more target processes could not be grouped separately, but some of the attacked processes could be inferred.

Results
We evaluated the detection of abnormalities by adding internal models that make up the GBE-AD model one by one.In the process of recovering to normal after attack, TaP decreased by 97.678%, but TaR (Time-series Aware Recall) increased to 90.663% relatively, and FPs did not occur.The GBE-AD model composed of 10 internal models is similar to the increase rate of the previous model, so it is most effective to configure GBE-AD with 10 internal models.
In 50 attack scenarios, attack 10 was not detected by the GBE-AD model; so only 49 attacks were detected.In the internal models constituting GEB-AD, attack 10 was similar to the normal process, so no one was able to detect it.If only the points of a specific process are learned or a technique that can detect subtle differences is added to the internal model, undetected attacks will be detectable.
When compared using soft and hard clustering in an experiment to infer an anomalous process, the target process of a single attack scenario was mostly derived.Since the P1 and P2 processes are synchronized with each other, we confirmed that the prediction error is large in the connected process because of the change in the value of the target point.In addition, since the attack scenario targeting P3 has fewer attacks than did other attacks, only a part of the compound attack could be inferred.
Only some processes were inferred even if the number of clusters was increased in an experiment that derived anomalous processes in multiple attack scenarios, for the following reasons: Since the number of points used in each process is different, the prediction error of each process is affected.
The number of attack scenarios is insufficient, and the attack scenarios are unbalanced for each target process.
Various attack scenarios should be created to take into account the difficulty and impact and severity of the attack on the process.

Conclusion
We proposed the GBE-AD model by learning operation data to detect abnormal processes because security incidents occurring in industrial control systems can affect the real world.Most of the existing studies have focused on proposing strategies and techniques to detect new attacks.However, FPs and FNs used for detection evaluation affect each other's performance.Abnormal detection should consider FP and FN simultaneously.In this paper, we solved the FNs problem by combining internal models after suppressing the FPs problem of each internal model constituting GEB-AD.The proposed ensemble model suppressed FPs.Thus, a complementary approach to FNs is possible even when combined with other internal models.The buzzer ensemble technique can improve the abnormal detection limit of a single model or technique.In addition, the model may be able to infer abnormal processes by generating prediction error data in GBE-AD.In a further study, the limitations on the CMC, 2023, vol.74, no.1 abnormal process inference will be advanced by combining data expansion and domain knowledge of the industrial control system.

Figure 1 :
Figure 1: Process control architecture of HAI Testbed

Figure 2 :
Figure 2: Comparison of detecting abnormalities by means of a single model and the proposed model

Figure 4 :
Figure 4: Operational process of the proposed GBE-AD model

Fig. 5
Fig.5shows the anomaly detection results of the GBE-AD model on the test data.In TaPR's performance evaluation, attack scenarios 10 and 25 were undetected, but attack scenario 25 was partially detected in the internal model.Since the GBE-AD model keeps FPs suppressed, the anomaly detection results are reliable; so 49 out of 50 attack scenarios were detected.

Figure 5 :
Figure 5: Detection results of the GBE-AD model

Figure 6 :
Figure 6: Comparison of clustering methods for abnormal process inference (soft vs. hard)

Table 1 :
Deep learning-based anomaly detection approach in ICSs

Table 2 :
Hyperparameter values used in internal models

Table 3 :
Performance of internal model by time step

Table 4 :
Performance of GBE-AD model when increasing internal model

Table 5 :
Comparison of optimal clustering for anomalous process inference