Ensemble Classifier Design Based on Perturbation Binary Salp Swarm Algorithm for Classification

Multiple classifier system exhibits strong classification capacity compared with single classifiers, but they require significant computational resources. Selective ensemble system aims to attain equivalent or better classification accuracy with fewer classifiers. However, current methods fail to identify precise solutions for constructing an ensemble classifier. In this study, we propose an ensemble classifier design technique based on the perturbation binary salp swarm algorithm (ECDPB). Considering that extreme learning machines (ELMs) have rapid learning rates and good generalization ability, they can serve as the basic classifier for creating multiple candidates while using fewer computational resources. Meanwhile, we introduce a combined diversity measure by taking the complementarity and accuracy of ELMs into account; it is used to identify the ELMs that have good diversity and low error. In addition, we propose an ECDPB with powerful optimizing ability; it is employed to find the optimal subset of ELMs. The selected ELMs can then be used to form an ensemble classifier. Experiments on 10 benchmark datasets have been conducted, and the results demonstrate that the proposed ECDPB delivers superior classification capacity when compared with alternative methods.


Introduction
Multiple classifier system (MCS) [1] is a popular field in machine learning. Compared with single classifiers, it provides potential enhancement in classification capability by integrating multiple base members [2]. MCS has been applied to handle variant tasks, including intrusion detection [3], expression recognition [4], image processing [5], and imbalanced learning [6]. In addition, some scholars attempt to solve the classification or regression problems in healthcare [7], well-being index [8], cancer detection [9], and population forecasting [10]. MCS mainly contains two steps: the first step is that an initial pool of classifiers with good diversity should be created, and the second one is that the results of each classifier should be aggregated to produce the final decisions. Unfortunately, choosing an MCS with strong ability in actual applications involves tradeoffs [11] because it requires significant computing resources as the data size and the number of members increase [12]. Selective ensemble methods [13,14] aim at employing fewer classifiers while maintaining the classification capability of their ensemble to reduce the computing complexity. Considering their advantages, many scholars have studied how selective ensemble methods can be used to design well-performing ensemble classifiers [15].
To build an ensemble classifier with good classification ability, numerous methods have been developed. Different diversity measures, such as double-fault [13], Kappa [12], and the margin measure [16], can be used to assess each learner and to extract the members that satisfy certain conditions using different pruning methods. Finally, the constituents selected are integrated into an ensemble classifier. Meanwhile, many heuristic algorithms, such as genetic algorithm [17], glowworm swarm optimization [18], and artificial fish swarm algorithm [19], have been used to search for the final ensemble. Some scholars also utilize different means, such as graph coloring [20], simple coalitional games [15], greedy pruning [21], and clustering techniques [22,23], to identify the well-performing classifiers to form an ensemble. Moreover, the fusion of heuristic algorithms and diversity measures is a new approach to developing a selective ensemble of classifiers [16,19]. Diversity measures such as double-fault [19] and the margin measure [16] are used to select the initial classifiers, and heuristic algorithms, such as AFSA [16], are employed to select the final ensemble. The combination performs well when it comes to selecting classifiers with good performance within an ensemble [16,19].
In addition, the selective ensemble methods have a direct effect on the final ensemble classifier, and it is also critical to select the appropriate base classifier, which can be used to create multiple classifiers with good diversity [13,20]. Meanwhile, the base classifier should also have high computational efficiency to reduce the training times of multiple classifiers. Compared with traditional methods [24], extreme learning machines (ELMs) [25,26] are effective and simple learning methods. The random generation of the ELM's hidden nodes increases its learning speed [27,28]. Taking the advantages of ELM into consideration (high learning efficiency, conceptual simplicity, and good generalization capacity) [29,30], ELMs can be used as the base classifier for designing ensemble classifiers.
From this discussion, we find that fusion approaches work well when designing an ensemble classifier with good capability. However, the diversity measures used typically consider only the diversity or the precision of classifiers. This makes it difficult to capture the members with good diversity and low error to form the ensemble. If we only consider diversity, then the members with good diversity and low accuracy may be selected to construct the final ensemble, which may result in its low classification capability. If we only consider accuracy, then members with poor diversity and high precision may be used, which may lead to only slight improvement compared with single classifiers [16]. Additionally, the adopted heuristic algorithms cannot reach the expected convergence performance because of their shortcomings, such as the difficulty of implementation, a tendency to run into local optimums, and low global search capability. Since the salp swarm algorithm (SSA) [31][32][33] is simply constructed, has fewer parameters, and is more easily implemented, we employ a perturbation binary salp swarm algorithm (PBSSA) to enhance its searching ability. In this article, we propose a novel method called ensemble classifier design technique based on the perturbation binary salp swarm algorithm (ECDPB), which employs a combined diversity measure composed of complementarity and accuracy to yield a segment of candidates. Based on the preselected classifiers, we adopt the proposed PBSSA to search for better-performing classifiers in order to determine the final ensemble classifier.
The contributions of this work can be listed as follows: (1) We propose the ECDPB to design a well-performing ensemble classifier, by using a combined diversity measure and PBSSA. (2) The presented a combined diversity measure that simultaneously considers complementarity and accuracy in evaluating ELMs, and the proposed PBSSA delivers improved searching capacity compared to other algorithms. (3) Results of 10 benchmark datasets demonstrate that the proposed technique is superior to other related approaches.
The rest of this work is organized as below. In Section 2, we introduce the combined diversity measure. Section 3 proposes the ECDPB in detail. Comparison experiments are presented in Section 4. Conclusions and future research directions are discussed in Section 5.

Combined Diversity Measures
As we know, the diversity and the accuracy of ELMs are two fundamental determining factors for the classification capacity of an ensemble. However, there is no widely accepted measure of diversity. In addition, the method of designing a standard to extract the ELMs with good diversity and classification capacity remains an unresolved question [12,13]. We provide a novel combined diversity measure, which balances the diversity and the accuracy of ELMs. Therefore, the ELMs selected by the measure can be used to form an ensemble with high classification capacity in contrast to most current measures, which assess ELMs separately in terms of their complementarities or accuracies [34]. Instead, we evaluate the ELMs by simultaneously considering both these factors.
An ensemble with good predictive capability requires ELMs that complement each other [12,34]. In order to measure the ELM's complementarity with others, a complementarity measure is introduced, which can be used to select those ELMs that can correct the predictive errors obtained by other ELMs on the training samples. Meanwhile, the measure can make the selected ELMs perform diversely, which is an important prerequisite for attaining better classification capacity. If the ELMs provide identical output, it is then impossible to attain an improvement in classification by aggregating them [13].
Consider a series of samples, X = {(x i , y i ), i = 1, 2, . . . , N}. Each sample is denoted by a feature vector x i with its label y i . For a set of ELMs H = {h 1 , h 2 , . . . , h M }, the complementarity measure, Com i of ELM h i , can be calculated as follows: where I(•) expresses an indicator function (I(true) = 1 and I(false) = 0), h i (x k ) denotes the predictive result of ELM h i on the sample x k .
Although the complementarity measure shows which ELM has higher complementarity with others, using it to select the ELMs can be problematic. The selection of ELMs using the measure can increase the diversity of ELMs, however, it may lead to a higher or lower error rate for an ensemble.
Selecting an ELM with a high complementarity measure may correct the predictive errors on some samples, however, it may produce flawed results in the remaining samples, particularly when the ELM has high complementarity with others, but its classification error is high. Therefore, the ensemble may generate incorrect decisions, which can result in the low precision of the final ensemble. Hence, we should take into consideration not only the ELM's complementarity but also its accuracy. We define accuracy in this work, in terms of the classification error of the ELM. The accuracy measure of ELM h i can be calculated as follows: To choose an ensemble with high predictive ability, the ELMs in the ensemble should possess high levels of diversity and accuracy. When we attempt to increase the complementarity measures of the ELMs, the mean accuracies of all the ELMs may be reduced. However, when we try to increase the precision of the ELMs, their complementarity may be reduced. There is a trade-off between the complementarity measure and the accuracy measure of the ELMs. In order to find the optimal combination, we assign a set of weights for the complementarity measure and the accuracy measure, which we use to create the combined diversity measure. The combined diversity measure CDM i of ELM h i can be formulated as follows: where α and β indicate the weights of the complementarity measure and the accuracy measure of the ELM h i , respectively, and where α + β = 1.
The combined diversity measure (CDM) can be used to assess the ELMs to reduce the probability of selecting ELMs with good diversity and low accuracy. Instead, those ELMs with higher complementarity measures and accuracy measures can be selected to form a final ensemble with good classification capacity. We can rank the ELMs using the combined diversity measure in Eq. (3). According to the rankings of the ELMs, we start the ensemble with the ELM that has the maximal measure value, and then select additional ELMs individually to construct the ensemble [34]. In the following analysis of the combined diversity measure, the weights (α, β) are set as (0.4, 0.6).  Fig. 1, we can see that, as the sizes of the ELMs increase, the error curves under the weight combinations of (0.1, 0.9), (0.2, 0.8) and (0.4, 0.6) display lower error rates than the weight combinations of (0.6, 0.4), (0.8, 0.2) and (0.9, 0.1). Under the weight combinations of (0.1, 0.9), (0.2, 0.8) and (0.4, 0.6), the weight of the accuracy measure is greater than that of the complementarity measure, leading to the selection of ELMs with lower errors at the beginning. Thereby, they can perform better as the complementary ELMs are added into the ensemble. Under the weight combinations of (0.6, 0.4), (0.8, 0.2) and (0.9, 0.1), and the ELMs with high complementarity measures are selected to form the ensemble. However, they perform poorly because of low accuracy measures. We can clearly observe from Fig. 1 that, among the weight combinations of (0.1, 0.9), (0.2, 0.8) and (0.4, 0.6), the weight combination (0.4, 0.6) has significantly lower errors in classification. The ELMs selected by the combined weights (0.1, 0.9) and (0.2, 0.8) have higher precision, however, they lack complementarity.
According to the above analyses, it is easy to identify the tradeoff between the complementarity and the error of ELMs. The combined diversity measure with weights (0.4, 0.6) performs the best. Fig. 1 shows that the errors generated by this combined measure decline first and then rise as the ELMs' sizes increase, which shows that we use the combined measure to extract the well-performing ELMs. As such, the weights (α, β) of the combined diversity measure are set at (0.4, 0.6). To evaluate the performance of the combined diversity measure, we test its classification errors in comparison with the double-fault measure [13], kappa measure [12], complementarity measure [12], and margin measure [16]. Fig. 2 displays the error trend curves of these diversity measures. We compare the classification errors of the combined diversity measure with its weights of (0.4, 0.6) with other diversity measures. The combined diversity measure can achieve lower classification errors than others, because it evaluates ELMs by considering their accuracy and complementarity simultaneously, outperforming the alternative ones that only consider either one of them. In conclusion, the analysis demonstrates that the combined diversity measure produces good classification results. In this section, we propose the ECDPB. First, we present the process used to generate the base ELMs. Second, we present the proposed PBSSA. Finally, we propose the ECDPB using the combined diversity measure and PBSSA. We also show how to employ the ECDPB to identify the final subset of ELMs to design a well-performing ensemble classifier. In addition, the pseudo-code of the ECDPB is presented.

Generating Multiple Extreme Learning Machines
In this section, we design an ensemble classifier with good capacity and employ ECDPB to select the ELMs for the ensemble. That is, we extract the ELMs with good complementarity and low error for inclusion in the final ensemble classifier. Before this, we need to generate a pool of ELMs with good diversity [20] in order to select a collection of well-performing ELMs. ELMs have fast learning rates and strong generalization ability; we make complete use of these advantages in creating multiple classifiers that work quickly. To increase the diversity of the generated ELMs, we utilize bootstrap extraction [35] to select a part of the samples in the training set, which can be used as the input of ELM; this produces the trained ELM. We utilize bootstrap extraction [35] with M iterations to extract the training set, generating M subsets. Then, multiple ELMs can be obtained by training the model of the ELM on each subset. Therefore, multiple ELMs with good diversity are created.

Perturbation Binary Salp Swarm Algorithm
In this section, we join the PBSSA and the combined diversity measure to select a fraction of well-performing ELMs in order to design an ensemble classifier. We first introduce the SSA. Mirjalili et al. [31] first proposed the SSA, which was inspired by the salps' swarming behavior in the ocean. Salps consume marine phytoplankton; they move to other places by inhaling or exhaling seawater. And they usually gather to form a chain called a salp chain. The chain behavior can be adopted to forage food. In the SSA, the salp chain is composed of two kinds of salps: the leader and the follower. The updating process of the leader salp can be formulated as shown in Eqs. (4) and (5).
where x 1 j shows the jth dimension of the leader; F j indicates the jth dimension of food source; ub j and lb j represent the upper and lower bound of the jth dimension space, respectively; c 2 and c 3 denote random numbers between 0 and 1, respectively; m represents the power factor; t denotes the current iteration; T max denotes the maximal iteration; and c 1 expresses the parameter that balances the exploring and developing capabilities of SSA.
The followers' position can be updated utilizing the Eq. (9), denoting their movement.
where x i j (i ≥ 2) declares the ith follower (because the first salp is the leader). After the population initialization of salps, the salps can seek the optimal solution in terms of Eqs. (4) and (6).
The selection of ELMs is a combinatorial optimization problem, which the basic SSA cannot directly solve. Hence, we need to improve the motion pattern of the salps, such that they can search for the optimal solution in the discrete solution space. The enhancements of SSA are discussed next.

Improvement of the Search Process
The motion pattern of salps in the SSA cannot match the movements of individuals in a discrete solution space; this means that the motion of the leader and the follower should be changed. Considering the characteristics of the problem to be solved and that the motion pattern should be simple and effective, we change the original updating process of the leader and the follower in SSA. The updating processes of the leader and the follower are presented in the following equations.
For followers, we update their movement using Eq. (9).
where r points to a random number, which is either 0 or 1.

Perturbation Mechanism
In the SSA, the individuals usually gather to construct the salp chain, and the population may lack diversity. Hence, we attempt to improve the solution diversity in order to enhance the searching efficiency. A perturbation mechanism [36] is introduced to the SSA, which can optimize the tradeoff between exploitation and exploration in the searching process. A predefined perturbation factor perfactor is introduced, and the perturbation mechanism is presented as follows: where coh signifies the cohesion index of the salp population; x i denotes the ith individual;x shows the center position of all of the salps; r p indicates a predefined perturbation probability; and step denotes a perturbation step. When coh is smaller, it suggests that the cohesion index is larger and the individuals flock together, which requires a large perturbation; otherwise, it indicates that the cohesion index is smaller and the individuals are dispersed, which may require only a small perturbation.
In the early stage of the SSA's search, the individuals are scattered in the solution space, and the cohesion index is low, which produces a small perturbation. In the later stage of the search, most of the individuals have gathered, and the cohesion index is high, generating a large perturbation. This process ensures that the population performs diversely.

Gauss Mutation Operation
To avoid having the salp individuals caught in local optimum, we introduce a Gauss mutation operation [37,38], such that, some of the individuals break away from the local optimum with a large probability. After each iteration, we select 20% of the individuals with the worst performance to carry out the Gauss mutation operation. The operation can be described as follows: where r g denotes a random number from the Gaussian distribution N (1, 1).

Selection Process of Extreme Learning Machines
In this section, we utilize the ECDPB to select the ELMs for constructing the ensemble classifier. This process involves four main steps. First, the combined diversity measure of each ELM is calculated using Eq. (3). Second, we rank all the ELMs in terms of their combined measures to obtain their new ordering. Third, we retain the first M (M ≤ M) ELMs with the largest combined diversity measures; M is a parameter of the ECDPB, and we analyze it in Section 4.3. Finally, we adopt the PBSSA to choose well-performing ELMs from the remaining M options. Thereby, the proposed ECDPB can extract the ELMs with good diversity and low error to constitute an ensemble classifier with good classification capacity.
The selection of ELMs is accomplished using the ECDPB, and the classification error of the ensemble classifier is taken as the objective function in this study. The ensemble process of multiple ELMs is calculated as follows: where h i shows the ith ELM; (x, y) denotes a sample; and C denotes the class label set. In addition, we present the pseudo-code of the ECDPB as follows: Algorithm

Complexity Analysis of ECDPB
In this section, we analyze the time complexity of the proposed ECDPB. Assume that the number of PBSSA's population is N, the maximal number of iterations is T max , the number of ELMs in the initial pool is M, and the dimension of salp is M . Firstly, we use combined diversity measure to select M ELMs, and its complexity is O(Mlog 2 M). Secondly, the time complexity of salp population initialization is O(N × M ), and the searching process's complexity is O(N 2 ). Finally, the overall complexity of ECDPB is O(T max × N 2 ) after T max iterations.

Experiments
To evaluate the ECDPB's utility, we employ 10 benchmark tasks as the empirical dataset, as listed in Table 1. In this work, to reduce the randomness of experimental results, each experiment was repeated 30 times. The experiments were carried out in Matlab 2020a on a computer running 64-bit Windows 10 with 3.6 GHz processor i7-9700K and 32 GB memory. Each dataset is randomly divided into five equal parts. Three are used for training; one is used for verification; and one is used for testing.

Experimental Results
The ECDPB is composed of two parts: the CMD and the PBSSA. We attempt to determine whether the ECDPB (the fusion of the CMD and the PBSSA) can attain lower errors than the CMD and the PBSSA. Tables 2 and 3 illustrate the classification errors produced by the ECDPB and those obtained by the CDM and the PBSSA. The Wilcoxon test [20] is applied in this study to verify whether the differences between the ECDPB and others are significant (its significance level is usually set to 0.05). When the p-value is lower than 0.05, we should reject the null hypothesis, and the difference in results between the two methods can be regarded as significant. As can be seen in Tables 2 and  3, the errors and standard deviations of the ECDPB are lower than the CDM and PBSSA, and it indicates that the ECDPB achieves obvious improvements in classification over the two alternatives. Additionally, the results in Tables 2 and 3 include p-values that are under 0.05. The results in Tables 2  and 3 indicate that the ECDPB utilizes fewer ELMs than the CMD and PBSSA and produces fewer errors. Meanwhile, the ECDPB employs fewer ELMs in constructing an ensemble classifier. Therefore, we can conclude that the ECDPB is an effective technique for designing an ensemble classifier with greater classification ability.  It also can be seen from Tables 2 and 3 that, overall, over 75 percent of the ELMs can be eliminated employing the ECDPB, meaning that we can utilize fewer ELMs to design a better-performing ensemble classifier and save significant computational resources in practical applications. Regarding the selection of ELMs, as the size of an ELM increases, its computational complexity increases exponentially [12,13]. In comparison with the CDM and PBSSA using 100 and 200 ELMs, the ECDPB has the capability to produce fewer errors with ELMs of different sizes. In addition, the errors can be reduced as the number of ELMs increases. When we employ 200 ELMs to design an ensemble classifier, the ECDPB achieves only a slight enhancement in classification. As a result, the number of ELMs should be set at 100.
To verify the utility of the ECDPB, we compare the proposed method with the following techniques: GASEN [17], DMEP [20], MDOEP [39], IBAFSEN [19], PEAD [40], RCOA [34], and IDAFMEP [16]. The primary goal of these techniques is to enhance the classification capacity in comparison with a bagging ensemble. The implementations of these approaches are based on their corresponding literatures. In contrast to IBAFSEN and IDAFMEP, the proposed ECDPB wields the combined diversity measure and PBSSA by optimizing them overall, and the combined diversity measure takes the diversity and accuracy of classifiers into consideration. IBAFSEN and IDAFMEP use a double-fault or margin measure and AFSA to select the final ensemble by optimizing them separately. Double-fault and margin measures consider only either diversity or accuracy at a time. In addition, the proposed PBSSA has more powerful optimization ability compared to AFSA, which is shown in Section 4.2.
We present the comparisons of classification errors and standard deviations between the proposed method and the alternatives with 100 ELMs in Table 4. The results in Table 4 reveal that the proposed ECDPB produces lower errors and standard deviations than the other methods, though it does produce higher errors than IDAFMEP on Website datasets. Moreover, the proposed technique does not obtain the highest errors on any of the 10 datasets. Overall, the ECDPB yields lower errors than its competitors with good stability. Meanwhile, IDAFMEP can also obtain lower errors than other methods. The sizes of the ELMs used by different approaches to design an ensemble classifier have been displayed in Table 5. We find that the ECDPB uses smaller ELMs than most methods for designing an ensemble classifier, however, it still uses larger ELMs than DMEP and IDAFMEP. Although the proposed ECDPB does not use the smallest ELMs, it does produce lower classification errors.
In addition, we use the Wilcoxon test [20] to verify the significance of differences between the proposed ECDPB and its competitors for each of the datasets, as shown in Table 6, where "+/-" indicates the number of datasets that there exist significant differences between the proposed ECDPB and other methods or not. The results in Table 6 suggest that most of the values are lower than 0.05, which by and large implies that there are significant differences between the proposed method and the comparison methods. Therefore, we can conclude that the proposed ECDPB provides superior classification compared with other techniques, and it is an effective approach for designing a betterperforming ensemble classifier. Meanwhile, the proposed ECDPB can deliver better classification capacity because it employs the combined diversity measure to evaluate each ELM by simultaneously considering diversity and accuracy. It can choose a collection of better-performing ELMs than alternative measures that consider only one of the two factors. Furthermore, the proposed ECDPB also uses PBSSA with powerful optimizing ability to search for a more competitive subset of ELMs, which can provide an ensemble classifier with superior performance. Therefore, the proposed ECDPB is a reasonable and effective technique for designing an ensemble classifier.

Parameters Sensitivity
In this study, we employ the proposed ECDPB to design an ensemble classifier, and its parameters have great effects on its performance. Hence, we need to explore the effects of the ECDPB's parameters: the maximal iterations T max , the retained size M , population size N, the perturbation probability r p , and the power factor m. Due to space constraints, we carry out the experiments on the Energy dataset utilizing 100 ELMs. The sensitivity analysis of the parameters is reported next.
The ECDPB uses the PBSSA proposed in Section 3.2 as its component. We need to analyze the influence of the number of iterations on the performance of the PBSSA, which is shown in Fig. 3. Fig. 3 indicates that the classification errors of the PBSSA initially decrease as the number of iterations increases, flattening out afterward. Moreover, we assess the search performance of the proposed PBSSA in comparison with other heuristic algorithms. The comparison algorithms include IDAFS [16], IBAFS [19], BAFS [41], IAFS [42], DGSO [43], BGSO [44], MPSO [45], BPSO [46], and BGA [17].
The trends in Fig. 3 suggest that the proposed PBAFSA produces fewer errors than other methods, and it has a more powerful optimizing ability and higher optimizing precision. The results also show that the different strategies we employ to modify the basic SSA are reasonable and effective. We explore the effect of the retained size of ELMs produced by the CDM on the results of the ECDPB, as shown in Fig. 4a. Fig. 4b demonstrates the impacts of population size on the classification errors. We show the relationship between the perturbation probability and performance of PBSSA in Fig. 4c. The effect of the power factor on the results of the PBSSA is exhibited in Fig. 4d. With these figures, we can identify the best choice for each parameter. In addition, similar observations on other datasets can be made. Therefore, we set T max = 600, M = 35, N = 35, r p = 0.6, m = 3.5 for the experiments in this study. We proposed the ECDPB as a powerful means for designing a well-performing ensemble classifier. The proposed ECDPB unifies the combined diversity measure, the high-performance PBSSA, and the efficient ELM to form an ensemble classifier. The combined diversity measure can optimize the selection of the ELMs to account for the tradeoff between diversity and precision, and it enables those ELMs with good performance to be selected. The high-performance PBSSA selects the ELMs rapidly and accurately, which allows the ensemble classifier to attain good classification results while utilizing smaller ELMs. The efficient ELM can create a base classifier utilizing very few computing resources, and it is critical for minimizing resource consumption when generating multiple ELMs. The aforementioned methods allow for improvements in classification.
We have assessed the classification capacity of the ECDPB in 10 benchmark datasets, and the results imply that, compared with other algorithms, the proposed ECDPB can deliver superior performance in classification with smaller ELM sizes. In summary, the proposed ECDPB is an effective way to design an ensemble classifier. In the future, we attempt to apply the proposed technique in the stock market [47], compared with its feature extraction using the deep learning method, and the enhancements in prediction precisions of stock price indices can be made to provide support for stock investments.