SNELM: SqueezeNet-Guided ELM for COVID-19 Recognition

(Aim) The COVID-19 has caused 6.26 million deaths and 522.06 million confirmed cases till 17/May/2022. Chest computed tomography is a precise way to help clinicians diagnose COVID-19 patients. (Method) Two datasets are chosen for this study. The multiple-way data augmentation, including speckle noise, random translation, scaling, salt-and-pepper noise, vertical shear, Gamma correction, rotation, Gaussian noise, and horizontal shear, is harnessed to increase the size of the training set. Then, the SqueezeNet (SN) with complex bypass is used to generate SN features. Finally, the extreme learning machine (ELM) is used to serve as the classifier due to its simplicity of usage, quick learning speed, and great generalization performances. The number of hidden neurons in ELM is set to 2000. Ten runs of 10-fold cross-validation are implemented to generate impartial results. (Result) For the 296-image dataset, our SNELM model attains a sensitivity of 96.35 ± 1.50%, a specificity of 96.08 ± 1.05%, a precision of 96.10 ± 1.00%, and an accuracy of 96.22 ± 0.94%. For the 640-image dataset, the SNELM attains a sensitivity of 96.00 ± 1.25%, a specificity of 96.28 ± 1.16%, a precision of 96.28 ± 1.13%, and an accuracy of 96.14 ± 0.96%. (Conclusion) The proposed SNELM model is successful in diagnosing COVID-19. The performances of our model are higher than seven state-of-the-art COVID-19 recognition models.


Introduction
COVID-19 has caused 6.26 million deaths and 522.06 million confirmed cases till 17/May/ 2022. The polymerase chain reaction (PCR) can effectively detect its existence; however, the cluster of false-positive [1] perplexes clinicians. The chest computed tomography (CCT) [2] is another precise way to help clinicians to diagnose COVID-19 patients. Till July/2022, This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. https://creativecommons.org/licenses/by/4.0/. Nevertheless, the above models still have room to improve in terms of their recognition performances, i.e., the accuracy. Inspired by the model in Özyurt et al. [13], we proposed SqueezeNet-guided ELM (SNELM), which combines traditional SqueezeNet (SN) with the extreme learning machine (ELM). Nevertheless, our SNELM is different from [13] in two ways. First, we do not use fuzzy C-means for super-resolution. Second, we choose the SN model with complex bypass, while [13] chooses the vanilla SN model. Our experiments show the effectiveness of this proposed SNELM model. In all, this study has several novel contributions:

(a)
The multiple-way data augmentation (MDA) is used to increase the size of the training set.

(b)
We propose the novel SNELM model to diagnose COVID-19.
(c) SNELM model gives higher results than seven state-of-the-art models.

Dataset and Preprocessing
Two datasets (D1 and D2) are used since they can report the results more unbiasedly. The details of the two datasets can be found in [4,5]. Table 1 displays the descriptions of D1 and D2. Suppose n 1 stands for the number of subjects, and n 2 the number of CCT images. It is easy to observe that there are n 2 = 296 images in D1 and n 2 = 640 images in D2.
x min y u 1 (x, y k) , (1) and the HSed image is defined as The grayscale range of u 2 (k) is [u min , u max ]. Figs. 1b and 1c show the raw COVID-19 and preprocessed images, respectively. The downsampled dataset is symbolized as U 4 = {u 4 (k)} with the size of each image as (a 1 , a 2 ). The final grayscale image u 4 (k) is then stacked along channel direction to output the color image u(k): where f cat cℎannel denotes the catenation function along the channel direction. The size of u(k) is now a 1 × a 2 × 3. Table 2 itemizes the abbreviation and their meanings. Fig. 2 illustrates the schematic of MDA. Assume the original image is u(k), then the horizontally mirrored image (HMI) is defined as u HMI (k) as u HMI (x, y k) = u(a 1 − x, y k),

Multiple-Way Data Augmentation
where we do not take color channels into consideration. Then, all the b 1 different data augmentation (DA) methods g i DA , i = 1, …, b 1 are applied to both u(k) and u HMI (k). Suppose each DA generates b 2 new images. Finally, the whole generated images Λ(k) are defined as: where f con image is the concatenation function along the image direction. The augmentation factor of MDA (AFMDA) is defined as: Compared to normal individual DA methods, the MDA fuse the separate DA methods together and thus can yield better performances [14].

Fire Module and SqueezeNet with Complex Bypass
SqueezeNet (SN) is chosen since it can achieve a 50× reduction in model size compared to AlexNet and maintain the same accuracy [15]. This lightweight SN can help make our final COVID-19 recognition model fast and still have sufficient accuracy.
The fire module (FM) is the core component in the N. It contains a squeeze layer (SL), which uses only 1 × 1 kernels, followed by an expand layer (EL), which contains several 1 × 1 and 3 × 3 kernels [16]. The structure of FM is shown in Fig. 3. Three tunable hyperparameters need to be tuned in an FM: s 1×1 , e 1×1 , and e 3×3 , which stand for the number of 1 × 1 kernels in the SL, and the number of 1 × 1 and 3 × 3 kernels in the EL.
Compared to ordinary convolutional neural network (CNN) architectures, the SN [17] has three main advantages: (i) replace traditional 3 × 3 kernels with 1 × 1 kernels. (ii) drop the number of input channels to 3 × 3 kernels using SLs. (iii) downsample late in SN, so the convolution layers have large activation maps [18].
There are different variants of SN. Özyurt et al. [13] used vanilla SN, while our SNELM use SN with complex bypass. Fig. 4 shows the flowchart, where we can observe not only simple bypass but also complex bypass are added between some FMs. If the "same-numberof-channel" requirement is met, a simple bypass is added. If that requirement is not met, a complex bypass is added. These bypasses can help improve the recognition performances, and their designs are similar to those in ResNet.

SN-Guided ELM
The SN features after global avgpool (See Fig. 4) are used as the learnt features and passed to the extreme learning machine (ELM) [19] that features a very fast classifier. Besides, ELM is simple to use, has greater generalization performance, and is appropriate for several nonlinear kernel functions and activation functions. Its structure is a single hidden-layer feedforward network shown in Fig. 5.
Let the i-th input sample be x i = (x i1 , …, x in ) T ∈ R n ; i = 1, …, N. The output of an ELM with L hidden neurons is: where h stands for the activation function, α j = (α j1 , α j2 , …, α jn ) T the input weight, β j the bias, O i = (o i1 , o i2 , o i3 , …o im ) T the output of the model for the i-th input sample.
Afterwards, the model is trained to yield Let us rephrase the above equation as where It challenges the users to acquire the optimal α j , β j and λ j . ELM can yield a solution quickly via the pseudo inverse: where M † signifies the Moore-Penrose [20]of M. The pseudocode is shown in Algorithm 1.

Algorithm 1: ELM
Step A Initialize values of input weight α j and the bias β j randomly.
Step B Compute the output matrix M using Eq. (10).
Step C Compute the output weight λ using the pseudo inverse in (12). Output The trained ELM model.

Cross-Validation and Evaluation
T runs of I-fold cross-validation (CV) are carried out. Assume the test confusion matrix (TCM, symbolized as Θ) over t-th run and i-th fold is: where i = 1, …, I stands for the fold index, and t = 1, …, T the run index. The (θ 11 , θ 12 , θ 21 , θ 22 ) signify true positive, false negative, false positive, and true negative, respectively.
At i-th trial, the i-th fold is employed as test, and the left folds {1, …, i -1, i + 1, …, I} altogether are employed as training, as shown in Fig. 6, here one I-fold CV consists of I trials.

Europe PMC Funders Author Manuscripts
The previous process is for one run of I-fold CV. The experiment runs the I-fold CV T runs. After all runs, the mean and standard deviation (MSD) of all seven indicators κ = κ m (m = 1, …, 7) are gauged over T runs.
where μ signifies the mean value and σ the standard deviation. The values of MSD are recorded in the format of μ ± σ.

Hyperparameter Setting
The hyperparameters are listed in Table 3. The minimum and maximum gray values of HSed images are (0, 255). The size of the downsampled image is 227 × 227. We have in total b 1 = 9 different DA methods on both raw image and HMI. Every DA produces b 2 = 30 images. The AFMDA is b 3 = 542. Activation function in ELM is chosen the sigmoid function. The number of hidden neurons in ELM is set to L = 2000. We run ten runs of 10-fold CV to report the robust results.

Results of MDA
The MDA result of Fig. 1c is shown in Fig. 7, in which we can observe the nine DA results, i.e., speckle noise, random translation, scaling, salt-and-pepper noise, vertical shear, Gamma correction, rotation, Gaussian noise, and horizontal shear. Due to the space limit, the nine DA outcomes on HMI are not displayed. Fig. 7 indicates that the MDA can increase the diversity of the training set.
Meanwhile, the AFMDA value b 3 = 542 makes the training burden of our model 542 times as much as that of the model without MDA. Nevertheless, in the test stage, there is no need to apply MDA to the test images, so our model is the same quick as the model without MDA. Table 4 displays the ten runs of 10-fold CV, where t = 1, 2, ... , 10 means the run index.

Confusion Matrix and ROC Curve
After combining the ten runs altogether, we can draw the overall TCMs and the ROC curves of the two datasets. The top row of Fig. 8 displays the TCM of two datasets. The bottom row of Fig. 8 displays their corresponding ROC curves. The AUC values of D1 and D2 are 0.9767 and 0.9776, respectively.
Error bar (EB) can assist in observing the differences in the model's performances. Fig. 9 displays the EB of different models over two datasets. It shows that the performance of this proposed SNELM model is higher than those of seven state-of-the-art models. The reason of the success of SNELM model may lie in three points: (i) MDA helps increase the size of training set significantly. (ii) The SN with complex bypass helps extract efficient features.
(iii) ELM serves as an effective classifier.

Conclusions
This study proposes an innovative SNELM model for COVID-19 detection. The MDA is used to increase the size of the training set. The SN with complex bypass is employed to generate SN features. ELM is used as the classifier. This proposed SNELM model can produce higher results than seven state-of-the-art models.
There are three deficiencies of the proposed SNELM model: (i) Strict clinical validation is not tested. (ii) The SNELM model is a black box. (iii) Other chest-related infectious diseases are not considered.
In our future studies, our team first shall distribute the proposed SNELM model to the online cloud computing environment (such as Microsoft Azure or Amazon Web Services). Second, we intend to incorporate Gram-CAM into this model to make it explainable. Third, chest-related infectious diseases, such as tuberculosis or pneumonia, will be added to our task.    Table 4 Results often-run 10-fold CV of the proposed SNELM model  Table 5 Comparison of the proposed SNELM with SOTA models (Unit: %) Dataset Model κ 1 κ 2 κ 3 κ 4 κ 5 κ 6 κ 7