A Transfer Learning-Enabled Optimized Extreme Deep Learning Paradigm for Diagnosis of COVID-19

: Many respiratory infections around the world have been caused by coronaviruses. COVID-19 is one of the most serious coronaviruses due to its rapid spread between people and the lowest survival rate. There is a high need for computer-assisted diagnostics (CAD) in the area of artificial intelligence to help doctors and radiologists identify COVID-19 patients in cloud systems. Machine learning (ML) has been used to examine chest X-ray frames. In this paper, a new transfer learning-based optimized extreme deep learning paradigm is proposed to identify the chest X-ray picture into three classes, a pneumonia patient, a COVID-19 patient, or a normal person. First, three different pre-trained ConvolutionalNeural Network (CNN) models (resnet18, resnet25, densenet201) are employed for deep feature extraction. Second, each feature vector is passed through the binary Butterfly optimization algorithm (bBOA) to reduce the redundant features and extract the most representative ones, and enhance the performance of the CNN models. These selective features are then passed to an improved Extreme learning machine (ELM) using a BOA to classify the chest X-ray images. The proposed paradigm achieves a 99.48% accuracy in detecting covid-19 cases.

The extreme learning machine (ELM) [25] is widely used in recognition systems because of its fast-learning capability and adequate generalization [26]. Generally, in the basic ELM model, the random initialization of the input weights and hidden biases can put the ELM solution models closest to local minima [27]. The generalization ability of ELM can be enhanced by combining this model with other techniques [28,29]. In some studies, researchers have successfully optimized the ELM using nature-inspired algorithms. Mohapatra et al. [30] introduced the ELM model optimized by cuckoo search to organize medical datasets. Satapathy et al. [31] suggested another optimized ELM by firefly algorithm. The hybrid firefly-ELM model was then applied to a photovoltaic interactive microgrid for stability analysis. Researchers in [32] enhanced the ELM using a whale optimization algorithm and then utilized it to evaluate the ageing degree of the insulated gate bipolar transistor. The comparison of optimized ELM and singleton ELM proved that the optimized ELM produces reasonable recognition.
Butterfly optimization algorithm (BOA) [33] is a new meta-heuristic swarm intelligence technique motivated by the food foraging butterflies' behavior. Based on random exploration and improvement by exploitation, the BOA can solve complex problems. Authors in [33] have observed that mostly BOA perform better on the unimodal and multimodal benchmark functions. The exploitation and high convergence rate are the exceptional strength of BOA, based on elitism and employed random movement. The appealing behaviors of BOA allow researchers to implement BOA in several other applications (e.g., function selection dependent on wrapping algorithms). A binary Butterfly optimization algorithm (bBOA) is presented by Arora et al. [34] to resolve the feature selection issues. The bBOA selects robust features that enhance the classification accuracy. The bBOA also possesses a good convergence rate and produces the optimal solution.
This study presents an end-to-end ML-based system to identify COVID-19 from CXR scans automatically. Pre-trained CNN models, which are Resnet18 [35], Resnet50 [35], densenet201 [36], were applied to CXR images for extracting the discriminative features. The bBOA algorithm was then used to pick the most informative features from the collected deep features. These features, then, were combined and were applied as the input to an optimized ELM model (ELM-BOA). Briefly, this study demonstrates that combining the deep characteristics extracted from the common levels of different CNN architectures enhances the efficiency of the classification process. The contributions of this study are recapitulated as follow: -The proposed framework uses several pre-trained CNNs for feature extraction.
-The framework uses a butterfly optimization algorithm (BOA) for feature selection and optimizing the ELM model for classification. -Evaluate and equate the output of the proposed methodology with state-of-the-art approaches.
The remainder of the paper is broken down into this structure. Section 2 describes the literature review of previous studies. Methods are explained in Section 3. Section 4 describes the proposed model. The obtained results and their detailed analysis are discussed in Section 5. In Section 6, the paper is concluded.

Literature Review
Recently, various methods have been developed for the identification of COVID-19 patients from their X-ray and CT images. These methods utilized computer vision (CV) based machine learning (ML) algorithms. Apostolopoulos et al. [37] executed transfer learning on X-ray images dataset using different Convolutional Neural Network (CNN) models. They evaluate their technique on two datasets. The datasets include confirming COVID-19, bacterial pneumonia, and normal cases. The maximum accuracy achieved on MobileNetv2 is 96.78% on 2-class and 94.72% on 3-class. Researchers in [38], introduced a novel DL model (COVNet) for the detection of COVID-19 disease from chest CT frames. This model achieved sensitivity and specificity of 90% and 96%, respectively. Ghoshal et al. [39] conducted a study to investigate uncertainty in DL models to identify COVID-19 in chest X-ray images. They estimated the uncertainty by performing transfer learning on the Bayesian DL classifier.
Narin et al. [40] performed experiments on three CNN models including ResNet50, Inception-ResNetV2, and InceptionV3 to identify the COVID-19 patients from chest X-ray images. According to their experiments, ResNet50 produced the highest performance with 98% classification accuracy. In [41], a DL-based COVIDX-Net was proposed for automatic detection of COVID-19 in X-ray images. This COVIDX-Net model is based on seven different CNN models including VGG19, DenseNet201, InceptionV3, ResNetV2, InceptionResNetV2, Xception, and MobileNetV2. This technique achieved a 0.91 F1-score on VGG19 and DenseNet201. Wang et al. [42] proposed a DL-based method to predict the COVID-19 disease in CT images. They fine-tuned the modified Inception architecture and extract the CNN features for classification. This method achieved an accuracy of 82.9% and AUC 0.90.
A new artificial neural network based CapsNet [43] was introduced for coronavirus detection in chest X-Ray image dataset. Researchers tested their network on binary and multi-class classification. The achieved recognition accuracy on binary class, and multi-class classification is 97.24%, and 84.22% respectively. A dual-branch combination network (DCN) [44] was introduced to identify and risk identification the COVID-19 in chest CT scans. First, researchers segment the lesion area to gain more accurate results in the classification phase. The DCN model achieved 96.74% accuracy. Horry et al. [45] utilized the TL approach for COVID-19 determination in Ultrasound, X-Ray, and CT frames. To perform TL, they selected the VGG-19 network and make appropriate changes in the parameters to fine-tune the model. The capabilities of the presented technique were measured using precision and achieved 100%, 86%, 84% precision rate for Ultrasound, X-Ray, and CT scans, respectively. COVIDiagnosis-Net [46] is a deep learning model based on the SqueezNet with Bayesian optimization. This developed model was trained and estimated on a dataset that contains a little number of X-ray images for COVID-19 cases and produced a 100% accuracy for detection of COVID-19 class and overall accuracy of 98.26%. Rahimzadeh et al. [47] merged the features produced by two pre-trained networks: Xception and ResNet50 V2. But this merge made the model's size up to 560 MB, which is unsuitable for practical implementation for real-time detection. The model was trained using 3783 CXR images and tested on 11302 CXR images. The model achieved an overall accuracy of 99.5 and 91.4 in the COVID-19 class. The authors compared the results obtained from a model that used concatenated features, and the results produced by models that use features come from a single CNN. Moreover, they proved that using concatenated features achieves better results.
Shaban et al. [48] achieved a high accuracy of 96% by exploiting the advantages of genetic algorithm in feature selection and using an enhanced K-nearest neighbor as a classifier. This model is tested on chest CT images for identification of infected and non-infected people and achieved results better than other recent models. A medical model of COVID-19 detection was proposed by Nour et al. [49] to support clinical applications. It is constructed on deep learning and Bayesian optimization. The CNN model automatically extracts features, which are often processed using various machine learning methods, including KNN, SVM, and Decision Tree. Data augmentation was applied to increase COVID-19 class samples. The proposed system's efficiency was assessed using 70% and 30% of the data set for training and testing, respectively. The introduced technique obtained an accuracy of 97.14%.
Deep learning models have gained significant importance in detecting patients infected by a coronavirus from the chest X-ray images. The detection model for COVID-19 can be enhanced by using deep learning models as feature extractors or by fine-tuning deep learning models. Thus, our work's primary goal is to develop a deep learning-based methodology for the recognition of COVID-19 affected patients.

Convolutional Neural Network (CNN)
Deep learning techniques help extract meaningful features in some data types, such as images and videos. In Medical research, a Convolutional Neural network (CNN) is significantly utilized in extracting these features from a large volume of medical images such as X-ray images and computerized tomography (CT) images. Besides, it achieves high accuracy and lowers computational cost, which provides generous support in improving health community research [50]. "Deep" refers to the large numbers of layers in the network. This type of architecture helps these networks to find complex features while the simple networks cannot. CNN's primary fundamental is to gain local features at beginning layers and merge them at last layers to form more complex features [51].
Transfer learning is an efficient technique that takes advantage of a formerly learned model's knowledge to solve another probably related job by demand minimal re-training or fine-tuning [52]. Deep learning algorithms require two essential requirements to work effectively: a massive amount of labeled data and mighty computing power. Forming a large-scale and high-quality dataset is very difficult and complicated [53]. While providing a powerful device to implement deep learning techniques require Equipped laboratories or large fund [54]. Hence, Deep Transfer Learning (DTL) attempts to solve this issue.
The first shape of transfer learning is using the pre-trained CNN as a fixed feature extractor [55]. This approach preserves the primary architecture and all learned weights. After CNN extracts features, it is inserted into a new network to perform the classification tasks. In the second and more complex shape called fine-tuning [56], some particular modifications are applied to the pre-trained CNN to achieve better results. These modifications involve architecture adjustment and parameter tuning. Besides, some New parameters are inserted into the network and demand training on a relatively significant amount of data to be more beneficial.

Extreme Learning Machine (ELM)
ELM is a learning model for a single hidden layer feed-forward neural network (SLFN). Also, it is classified as a neural network with random weights (NNRW). It was presented by Haung et al. [57]. The ELM was introduced to avoid over-fitting problems and reduce the training time. ELM is a learning model for a single hidden layer feed-forward neural network (SLFN). Also, it is classified as a neural network with random weights (NNRW). It was presented by Haung et al. [57]. The ELM was introduced to avoid over-fitting problems and reduce the training time. NNRW is preferable than the traditional artificial neural networks (ANN), because ANN has several drawbacks due to its training mechanism (error backpropagation) and the number of hidden layers, such as slow convergence, time-consuming, and local minima problems [58]. The ELM was introduced to avoid the over-fitting problems, reduce the training time, and solve all problems that traditional ANN have. This method randomly initializes the connection weights between the input and hidden layers and hidden biases. Then output weights are computed to connect the hidden layer with the output layer. This training mechanism allows ELM to be faster than traditional ANN. To determine the impact of the random parameters on ELM, Cao et al. [59] organized an experimental framework and studied the relationship between parameters (e.g., the number of neurons in the hidden layer, the threshold randomization range between the hidden nodes, the randomization range of the weights between the input layer and hidden layer, and the activation function types) and the optimal performance of ELM. They found that all the parameters mentioned above performs a dominant role in the stability of the model.
For the activation function g(p), the ELM model can be given as The equation for the ELM model can be simplified and represented in matrix form as , and T denotes the transpose of matrix T. D denotes the hidden layer output matrix of and can be given as γ can be determined using the Moore-Penrose (MP) inverse function as given below.

Butterfly Optimization Algorithm (BOA)
Butterfly optimization algorithm (BOA) [33] is a new meta-heuristic swarm intelligence technique motivated by the food foraging butterflies' behavior. Based on random exploration and improvement by exploitation, the BOA can solve complex problems.
In the BOA, a butterfly has its unique fragrance. Mathematically, the fragrance of the butterfly can be defined as follow: where pf i denotes the perceived magnitude of fragrance, which is the intensity of fragrance of the i th butterfly, detected by other butterflies, I represents the fragrance intensity, c represents the sensor modality, and the power exponent a accounts for the varying degree of absorption and dependent on modality. The BOA, consists of three phases as follow: Phase 1: Initialization phase. This phase consists of three steps: -Step 1: The values are assigned for the algorithm's parameters.
-Step 2: The fitness function and its solution space are defined -Step 3: An initial population of butterflies is generated.
Phase 2: Iteration phase. In this phase, the search is performed by the algorithm with the artificial butterflies generated in the initialization phase. This phase consists of the following steps: -Step 1: Each butterfly produces fragrance at its position and can be calculated by Eq. (5).
-Step 2: The fitness value of each butterfly in the search space is computed. -Global search algorithm: If the butterfly can sense the fragrance of the fittest butterfly/solution X * , it moves toward that solution X * which is given in Eq. (6) x t+1 where the ith butterfly has the solution vector x i in iteration t and denoted by x t i . The fragrance of ith butterfly is denoted by f i and r is a random value within the range [0, 1].
-Local search algorithm: In this process, the butterflies fail to feel the scent of the other butterflies, they randomly change their position in the search space. This process can be defined as follow.
x t+1 where x t j and x t k are jth and kth butterflies from the search space.
Phase 3: Termination phase. The iteration phase continues until the stopping criteria (Ex. maximum number of iterations, a specific value of error rate, etc) are reached.

Proposed Framework
Briefly, the proposed framework is done in four stages as shown in Fig. 1. First, the chosen and collected dataset crosses the pre-processing stage to make it suitable as input for the CNN network. Then, the pre-processed images are fed to the next stage to compute features from each input image. After that, the feature set go through the feature selection stage to choose the most relevant features. Finally, the selected features are forwarded to the classification model to decide which class these features belong to.

Pre-processing
The prepared dataset must pass through some pre-processing steps before fed to CNN pretrained models. First, images are resized to the pre-trained model acceptable size, which is 224 × 224 pixels in this experiment. And then are transformed to RGB with 24-bit depth.

CNN Feature Extraction Using Transfer Learning
Resnet18, Resnet50, and Densenet201 used for feature extraction in this work. We extract the features from the fully connected layer with 1000 features of the pre-trained CNN models. The full feature set obtained from each pre-trained CNN model is size n × 1000, where n denotes the total number of X-ray images in the dataset.

Binary Butterfly Optimization Algorithm (bBOA) for Feature Selection
The binary optimization problems perform feature selection between the binary values {0, 1} only. These feature optimization problems are multi-objective optimization problems. These algorithms' fundamental goals are to be achieved: selecting the smaller number of features and obtaining the maximum recognition accuracy. The best solution is obtained when the optimization algorithm achieves the best performance results with few features.

Figure 1: Proposed framework
In our proposed framework, the bBOA is utilized to select the most informative features to increase the recognition accuracy and reduces the computational time for Covid-19 prediction. Each feature subset in the bBOA is presented as a butterfly or solution. Each solution is specified as a single-dimensional vector, and the number of features in the dataset defined the dimension of the feature vector. The feature vector cells consist of two values, 1 or 0. The value 1 represents that the relevant features are selected, while the value 0 shows that the feature is not chosen.
Firstly, the butterflies are randomly initialized, and their fitness value can be computed by Eq. (5). The butterflies sense the fragrance, when a butterfly feel the fragrance of the best butterfly in the search space, the butterfly step towards the butterfly containing best traits, and this movement is given by Eq. (6). This phase is known as the global search. If the butterflies do not sense fragrance, this is known as the local search phase. Eq. The Sigmoidal function forcefully moves the butterflies in a binary search space. The Sigmoidal function is described in Eq. (8).
where F k i representing the fragrance with a continuous value of ith butterfly in kth element during the iteration t. The transfer functions can fluently map the infinite input values to a finite output. The S-shaped transfer function produces the output continuously; therefore, a threshold is defined to obtain the binary-valued output. The applied threshold is given in Eq. (9) to obtain the binary solution.
where x k i (t) and F k i (t) designate the position and fragrance respectively, during the iteration t of ith butterfly in the kth element.
A fitness function is specified based on the KNN classifier to evaluate each solution. The fitness function measures the exactness to define the selected functions. Mathematically, the suggested fitness function is defined as.
where ERR(D) represents the error rate (calculated using the KNN classifier), |N| represents the original feature set, and |R| shows the selected features. The parameters α and β can be selected within the range [0, 1]. α and β are the weights of error rate and the selection ratio respectively where α is the complement of β.
In the proposed framework, the multi-CNN feature vectors after bBOA are combined to obtain a feature set. The moral behind the concatenation of feature sets is to exploit each CNN's capabilities in extracting useful features.

Hybrid ELM-BOA Model for Classification
ELM model randomly initializes the input weights and hidden biases and systematically calculates the output weights using the MP inverse technique [23]. However, the assignment of random weights and biases, ELM has some limitations such as long training time and weak generalization ability. In this study, ELM is optimized to overcome these problems using the BOA, and a new hybrid model ELM-BOA is presented (Fig. 2). The BOA is mainly used to find the optimal set of weights and biases to improve the learning performance of ELM. In the ELM-BOA model's network structure consists of m inputs and K hidden nodes, the particle's length (L) can be determined in Eq. (11).
In this study, we select the minimization of the Root Mean Square Error (RMSE) value as a fitness function to assess each generated solution's performance by the BOA. The learning method of the ELM-BOA model is described as following: (1) A learning sample is set, including the input vector and the output vector.
(2) The ELM-BOA neural network's topology is established as the number of inputs, hidden and output layer neurons are determined, and the activation function is selected. (3) Randomly generating the swarm of BOA butterflies. Each butterfly consists of the input weights and hidden biases, which are optimized and represent a candidate ELM. Here, the butterfly randomly initialized values for the elements within the range [−1, 1]. (4) Calculating the fitness value of each butterfly in the swarm. To evaluate the fitness value, the output weights of the ELM are generated using Eq. (4) after formulating the SLFN using the butterfly elements and computing the matrix D using Eq. (3). (5) Determining the best butterfly with the highest fitness X * . (6) Applying the BOA algorithm to update the position of each butterfly using Eqs. (6) and (7). (7) The iteration is terminated when the algorithm is reached to the stopping criteria and output the best butterfly X * . The ELM-BOA model is applied to test the obtained model's generalization performance; otherwise, this process again starts from Step 5.

Experiments
In this section, we present the used datasets, the parameter setting for methods, experimental design.

Datasets Description
In this experiment, the proposed model is trained and tested in a balanced CXR dataset composed of three different classes: normal, pneumonia, and COVID-19. Out of 3885 CXR images, there are 1295 COVID-19 images, 1295 normal images, and 1295 pneumonia images. Normal and pneumonia classes are extracted from one source: COVID-19 Radiography Database on the Kaggle website [60]. While the COVID-19 class, due to the disease's novelty, is obtained from various open-source image databases, it is organized as 800 already augmented images were collected from Alqudah et al. [61] and 495 images from Cohen et al. [62].

Parameter Settings
In this study, all experiments are carried out in MATLAB R2020a software on a PC with a 2.60 GHz CPU Intel (R) Core (TM) i7-4510U and 8 GB RAM, and Windows 10 (64 bit). The images dataset is split up into a 70:30 ratio, 70% for training and 30% for testing. The transfer learning approach was performed on the pre-trained deep networks to utilize them for a new task. The pre-trained CNN training parameters used in these experiments are presented in Tab. 1. The presented ELM-BOA performance is compared with SVM, KNN, and ELM. The parameter setting of ELM, GA, PSO, GWO, and BOA are shown in Tab. 2. The five algorithms all have 100 iterations. The average experimental results are obtained after running the algorithm 20 times.

Experiments Design
To evaluate the proposed automated COVID-19 detection and classification system, the following experiments were formulated: • Experiment 1-The three CNN models were applied to the dataset to evaluate the classification performance. • Experiment 2-The 1000 features from the FC1000 layer of each CNN model were extracted, and classification performance was evaluated using four machine learning classifiers and the proposed ELM-BOA model. • Experiment 3-The BOA algorithm was applied to the feature set obtained from the FC1000 layer, combined different feature sets from CNN models, and evaluated performance. • Experiment 4-The ELM-BOA classification performance was assessed and compared with the other optimized ELM model.

Performance Measures
The performance of the proposed COVID-19 detection and classification model is measured by computing the four major performance measures: true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN). Following measures are utilized to evaluate the functioning of the proposed framework.

Results
In this study, the evaluation of the proposed framework is demonstrated through four experiments. In the first experiment, the original three pre-trained CNN models were applied to classify the COVID-19 images. The features were extracted first from the dataset's images in each model, and then the SoftMax classifier was used to classify the image to a predefined class. Tab. 3 presents the results for the experimental analysis. The best result was obtained using the descent201 model with an accuracy of 96.66 % and a training time of 670 min. As seen in Tab. 3, the three CNN models' training and validation processes were completed with high accuracy; however, the training time was very high. In such a scheme, transfer learning can be used for feature extraction, leverage the power of CNNs, and reduce the computational costs, as illustrated in the next experiments.
In the second step, the dataset is processed by the three CNN models, and 1000 features were extracted from the FC1000 layer of each model. The results of the four machine learning models and the proposed ELM-BOA model are shown in Tab. 4. The proposed ELM-BOA classifier was superior to traditional ELM, SVM, and KNN machine learning methods. It was observed that the ELM-BOA model guarantees an enhancement in the automated detection task for COVID- 19. Contrary to what was observed, the classification performance was significantly reduced when KNN and SVM performed the classification task. The best performance was obtained on the ELM-BOA classifier when that classifier fed with the Densenet201 features.  In the third experiment, the bBOA method was utilized to select the most relevant features. As compared to the results in Tabs. 4-6 show that the bBOA produces better classification results with fewer features. The best performance was ensured by combining the three deep CNN models, as shown in Tab. 6. The 601 selected features achieved the best results. The maximum recognition rate was 99.48% computed by the ELM-BOA classifier.
In the fourth experiment, we used the original dataset to analyze the presented framework. 30% data was designated as the test data. By utilizing the bBOA technique, we combined the robust features extracted from the CNNs, and a new feature set is obtained with 601 features. The next process is the classification performed using the proposed ELM-BOA compared with ELM optimized with different metaheuristics algorithms such as Gray wolf optimization (GWO) algorithm, Genetic algorithm (GA), and particle swarm optimization (PSO). The classification accuracy results are shown in Tab. 7. In this step, the best-achieved recognition rate is 99.48% on the ELM-BOA method with training time of 34 s.

Discussion
The COVID-19 has infected several people. But the large-scale labeled databases still not available. Different datasets are combined to perform the computational work on the automated COVID-19 detection model. Recently, researchers concentrated on chest X-ray images to develop the automated systems for the clinical assessment of COVID-19 disease. Several COVID-19 computational models have been introduced based on CNN models. As compared to traditional machine learning techniques, CNN-based models perform well in terms of efficiency and accuracy. These models extract robust features and produce good results in the classification phase. In this respect, our proposed technique consists of innovative components.
In this work, we exploited the advantages of the CNN models' end-to-end learning scheme. We extracted the deep features and performed transfer learning on Resnet18, Resnet50, and Densenet201 models. After the selection of robust features, the final feature vector fed to the ELM-BOA classifier for the recognition of disease. For a fair comparison, we used a balanced dataset with the same number of images for normal, pneumonia, and COVID-19. Besides, we utilized the feature selection method (bBOA) to obtain the robust feature set that yields better results compared with other studies [Tab. 8]. We also improve the time efficiency and classification accuracy. The best classification accuracy achieved is 99.48%. We conclude that the second shape of transfer learning used in our system outperforms the first shape (which was tested in the first experiment) in terms of efficiency.

Conclusion
This study's main objective is to establish an accurate and rapid AI-diagnostic method that can categorize patients into COVID-19 or regular or pneumonia. We use multi-class classification to decide whether the respiratory infections are caused by coronavirus or other viruses (pneumonia). Consequently, the hospital's workload will be reduced significantly. We aimed to have an equal number of images in each class in the dataset, which improved our proposed model's robustness and effectiveness. We based our model on multi-CNN, which concatenate deep features resulting from each network after passing a feature selection step using a butterfly optimization algorithm. Then, we used the optimized ELM model (ELM-BOA) as a classifier due to its ability to learn and modify weights, which led to a decrease in the error between actual and predicted output and achieves an accuracy of 99.48%. To prevent overfitting, 5-fold cross-validation is used. It is evident from experimental results that the proposed methodology outperforms competitive techniques, as shown in Tab. 7. The drawback of this approach is that when the patient is in a critical condition and is unable to attend X-ray scanning. In future work, we aim to develop our proposed model as a mobile application to increase reliability and availability.

Funding Statement:
The authors received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.