Automatic Detection of COVID-19 Using a Stacked Denoising Convolutional Autoencoder

: The exponential increase in new coronavirus disease 2019 (COVID-19) cases and deaths has made COVID-19 the leading cause of death in many countries. Thus, in this study, we propose an efficient technique for the automatic detection of COVID-19 and pneumonia based on X-ray images. A stacked denoising convolutional autoencoder (SDCA) model was proposed to classify X-ray images into three classes: normal, pneumonia, and COVID-19. The SDCA model was used to obtain a good representation of the input data and extract the relevant features from noisy images. The proposed model’s architecture mainly composed of eight autoencoders, which were fed to two dense layers and SoftMax classifiers. The proposed model was evaluated with 6356 images from the datasets from different sources. The experiments and evaluation of the proposed model were applied to an 80/20 training/validation split and for five cross-validation data splitting, respec-tively. The metrics used for the SDCA model were the classification accuracy, precision, sensitivity, and specificity for both schemes. Our results demonstrated the superiority of the proposed model in classifying X-ray images with high accuracy of 96.8%. Therefore, this model can help physicians accelerate COVID-19 diagnosis.


Introduction
The coronavirus disease 2019 (COVID-19) pandemic has severely disrupted various industries, sectors, and occupations. According to the World Health Organization (WHO), the number of new cases exponentially increases globally. The daily average number of new confirmed cases last February 2021 was recorded to be more than 300000, posing a significant challenge for the images and their corresponding segments that have been manually labeled by the doctors. Thus, training is performed to segment the input images accurately.  [15].
Many DL algorithms with varying modalities, datasets, preprocessing algorithms, machine learning techniques, and performance criteria have been explored for COVID-19 diagnosis (Tab. 1). In one study, the authors proposed the COVID smart data-based network (COVID-SDNet) algorithm for diagnosing COVID-19 using the CXR image COVIDGR-1.0 dataset. Different preprocessing techniques, including segmentation, data augmentation, and data transformation, were used to eliminate irrelevant information from the original images [16]. In another study, ResNet-50 was used to extract features from a dataset collected from different publicly available repositories. A deep learning-based extreme-learning machine (ELM) classifier was employed to distinguish the patient with COVID-19 from the uninfected patients [17]. Meanwhile, Martini et al. [18] compared the interpretation of the conventional radiography CXR with machine learning-enhanced CXR-mlCXR for COVID-19 diagnosis. They found that the sensitivity for COVID-19/pneumonia diagnosis was improved based on the mlCXR image interpretation. While one study applied the convolutional neural network model for COVID-19 diagnosis based on the CXR images collected from the Kaggle dataset [19], another study demonstrated that deep learning models failed to classify the CXR images taken from smartphones [20]. Therefore, it is essential to check the sources of images before inputting the images into deep learning models.
DL algorithms have been applied to COVID-19 diagnosis with varying degrees of success. For example, one study proposed the COVID-ResNet model on the COVIDx dataset and achieved a 96.23% detection accuracy in all the classes [21]. Another team diagnosed COVID-19 based on the AlexNet model. They collected the images from various sources and were able to achieve a 98% detection accuracy on two main classes [22]. Meanwhile, a team applied VGG-19 and DenseNet201 models to identify health status against the COVID-19. As a result, they achieved a 90% accuracy for the binary classification of X-ray images [23]. Moreover, a team developed a new deep learning model for anomaly detection. First, they collected chest X-ray images from GitHub and other images from the ChestX-ray14 dataset. With binary classification, their model can detect 96% of the COVID-19 cases [24]. Lastly, a group of researchers developed a new deep CNN called COVID-Net to detect COVID-19 disease from CXR images and the COVIDx dataset. In addition, their model could predict COVID cases with critical factors [25].

The Proposed Model
In this work, the proposed SDCA model with multiple layers was used to label each image as either COVID-19, pneumonia, or normal. We propose a customized loss function to improve the reconstruction of the original images. SDCA, which is a stochastic extension of the classic autoencoder (AE) [28], is used to reduce the dimensions and learn latent features; it can also be used as a generative model to generate fake samples. SDCA tries to learn the identity function where the output is reproduced from the input. Generally, the AE is built by two parts of encoders, which project input data onto a space with low dimension. Let x ∈ R m×n be a distorted image and y ∈ R m×n be the corresponding normal image without any noise. The image distortion can be defined as [29]: where δ : R m×n → R m×n depicts a random distortion of the normal image. The encoder tries to compress the image into a smaller representation, also known as a latent space or bottleneck layer. The encoder is defined as: where x i is the input of noisy image data, W is the matrix of the weight of CNN, b is the bias of the CNN, and f is the activation function. The parameter (W, b) of the encoder is used to reconstruct the original data via the encoder's inverse function.
Based on this bottleneck, the AE tries to rebuild the original image using the decoder function: where W is the weight matrix, b is the bias, andŷ(x i ) is the approximation of the original image y i . All parameters are optimized over the input data x (x 1 , x 2 , . . . , x n ) and y(y 1 , y 2 , . . . , x n, ).
The training of the autoencoder is to compute the distance between the compressed and decompressed representation of the data. Different metrics can be used to compute the loss between the input and output cross-entropy or mean square error functions of the SDCA. The loss function applied to approximate the input data is defined as the sum of the loss of each layer, as defined in Eq. (4).
where N is the number of layers in the SDCA.
The sparsity regularized reconstruction loss function, L 1 , is defined as follows to prevent the overfitting problem: where β represents the sparsity term and λ is the weight decay term coefficient.
The relative entropy Kullback-Leibler divergence [30] is a measure of how the target activation, ρ, and the average activation of the hidden layer,ρ, are different from each other.
According to Erhan et al. [31], we remove the sparsity regularization from the expression of the loss when the pre-trained weights are used to regulate the network. Consequently, the entire loss function of L SDCA will be minimized based on the gradient descent optimization of the algorithm [32].
In this section, the hyperparameter tuning of the proposed model will be introduced. The proposed model consists of three main parts. Technically, CNN [33] algorithm is considered as one of the best and most robust algorithms that can extract the relevant features from the input data at different levels without any human supervision. Due to its architectural structure and layers, CNN is reliable for image processing tasks. CNN structure consists of a combination of the 3264 CMC, 2021, vol.69, no.3 convolutional layers, non-linear processing units, and subsampling layers. In particular, this stage is focused on obtaining deep features with discriminative representation capability. The convolutional matrix C can be computed based on a filter matrix B and image matrix A as follows: where matrix A represents the input matrix, B refers to a 2D filter matrix with size (m, n), and C denotes the feature map. Therefore, each element of C is computed from the sum of the elementwise multiplication of A and B.
The proposed model is designed based on the stack of layers. The architecture of the encoding stage mainly includes a set of kernels, batch normalization, 2D Max-pooling, and up sampling operation. In addition, the size of the filters varies between the layers. The SDCA is composed of 16 convolutional layers, 14 patch normalization layers, 2 Max-pooling layers, and 1 fully connected layer composed of 2 dense layers and 1 SoftMax layer. The encoder architecture comprises 2 embedded bloc 1, followed by 2 embedded bloc 2 (Figs. 2 and 3, respectively). Bloc 1 is designed as a stack of convolution, batch normalization, and Max-pooling, whereas Bloc 2 is designed as bloc1 without the Max-pooling layer.  Indeed, different architectures have been proposed during the expansion of convolutional neural networks. In many cases, adding more layers often yield better data compression [34]. The detailed descriptions of operations used are described in the following subsections (Fig. 4).

Image Preprocessing
Rifai et al. [35] demonstrated that adding some noise to an image input led to a significant improvement in generalizing the input data. In addition, this technique can be considered a kind of augmentation of the dataset. In this experiment, all the images in the database were resized to 32 × 32 × 1 to obtain a consistent dimension for all the input images. Another essential preprocess stage is intensity normalization, which converts the intensity values of all images from [0, 255] to the standard normal distribution to the intensity range of [0, 1].

Figure 4:
The proposed architecture of SDCA

Data Augmentation
Imbalanced data can have a critical impact on the training process and detection capability of the deep learning network; therefore, it is considered a limitation for classification. Besides, oversampling technique (SMOTE) [36], random under-sampling, random oversampling, synthetic oversampling: SMOTE, the Adaptive Synthetic Sampling Method (ADASYN) [37] has been identified as one of the techniques that can solve the problem of unbalanced data. For example, new training examples from existing training data are added to the classes with fewer samples using the augmentation procedure to reduce the problem of imbalanced data. Usually, this method is used only on the training dataset, and not on the validation and test datasets. This technique is based on several transforms, such as shifts, flips, zooms, and rotation, that can be applied to an image. In this study, the following operations have been applied to the input images: random rotation of ±10%, zoom range of ±10%, horizontal flipping of ±10%, and, finally, the vertical flipping shift of 10%.

Evaluation Metrics
In this study, several metrics, accuracy, precision, recall, and F1 are used to evaluate the proposed model. The metrics are calculated, respectively, as: where TP, FN, FP, and TN represent the number of true positives, false negatives, false positives, and true negatives, respectively.

Datasets
The SDCA was applied to a dataset containing 6356 images to verify the effectiveness of the proposed model. The dataset was collected from diverse sources of images of normal, pneumonia, and COVID-19 cases. First, 5856 X-ray images were collected from the Kaggle repository [38], including 4273 pneumonia and 1583 normal. Then, 125 images were collected from Ozturk et al. [39] and augmented to obtain 500 images. The number of X-ray images of each class was calculated (Tab. 2).

Experimental Results
In this study, Python was used for the experiment. A Windows-based computer system with an Intel (R) Core (TM) i7-7700 HQ 2.8 GHz processor and 16 GB RAM was used. The proposed architecture was implemented using the Keras package with TensorFlow on Nvidia GeForce GtX 1050 Ti GPU with 4 GB RAM. The SDCA was evaluated using an 80% training and 20% test set, combined with a fivefold cross-validation method. The accuracy, precision, recall, F1-score, and confusion matrix were computed for each experiment. For all the experiments the average of error is taken from 30 executed runs for all the used methods. All the parameters are chosen based on literature used value.
In the first experiment, 1589 images of the dataset were used for the test stage. As mentioned above, the appropriate dataset of noisy X-ray images was developed by applying the Gaussian noise. The initial value of the Gaussian is selected as μ, σ = 0.30 to regulate the level of noise in the images. The proposed SDCA model was trained to have better performance by using 16 layers in the encoder-decoder, followed by three connected layers that were used to increase the prediction performance. In the experiments, the chosen image size was fixed to 32 × 32 with batch 64. The proposed deep learning-based SDCA model achieved the reconstruction of the original X-ray images within 50 epochs. Fig. 5 presents the training and validation loss during the reconstruction of the original input. Fig. 6 shows the feature map of the convolutional layer.
Figs. 7 and 8 depict the noised image used as input of the SDCA model and restored images after the training, respectively. Fig. 9 represents the original image without noise. The SDCA model succeeded in removing the noises with a loss of 0.003, but some contract improvements are needed for more visibility.
Then, the performance metrics of the proposed model between two splitting ratios were analyzed. The results of the metric performance by data splitting (80%, 20%) were reported (Tab. 3).     Also, the precision, recall, and F1-score values for the three cases were visualized (Fig. 10). All the used metrics with the number of samples for each test class were summarized (Tab. 3). For the COVID-19 cases, the proposed SDCA deep learning model has reportedly achieved 100% precision, 99% recall, and 100% F1-score. For the pneumonia cases, the model accomplished 97% for all the metrics. For the normal cases, the model achieved slightly lower scores than the other cases, i.e., 93% precision, 92% recall, and 93% F1-score. The average accuracy that the proposed model achieved was 96.8%. The test results of the confusion matrix (Fig. 11) demonstrated that the proposed model accurately detected COVID-19 and pneumonia images. Among the 1589 images of the test dataset, only 62 images were misclassified. Among the 125 COVID-19 images, only 1 image was misclassified. Lastly, the validation accuracy and loss (Fig. 12) were displayed.
The metric performance results of the proposed SDCA by using fivefold cross-validation were shown in Tab. 4.

Discussion
This work was evaluated based on a public dataset of X-ray images to detect diseases. Based on deep learning, the proposed model succeeded in differentiating the CXR radiographs of COVID-19, pneumonia, and healthy patients with high accuracy, sensitivity, and specificity. The SDCA model achieved the extraction of the relevant features based on noisy images and the augmentation approach. The model's validation was based on 6356 X-ray images divided into training and test data splitting and fivefold cross-validations, respectively. Although the data were imbalanced, the results derived from the proposed model were impressive. The experiment results on CXR images demonstrated that the features extracted by stacked denoising autoencoder architecture and trained by the feed-forward neural network classifier achieved an accuracy of 96.8%. Therefore, physicians can use this framework to accelerate the diagnosis of COVID-19 and improve their decisions regarding misclassified X-ray images by radiologists.  Clinical diagnosis based on computed tomography images faces many challenges as the number of people with COVID-19 pneumonia is huge, and in contrast there is a shortage of highly experienced radiologists to continuously distinguish between CT images of pneumonia and images of the emerging COVID-19 virus.
The large amount of CT scans is also one of the factors affecting the quality of CT images during the data storage and transmission process.
As it is mentioned that diagnosis of COVID-19 is a challenging task. In this paper, the SDCA has been applied to X-ray images of two types of pulmonary diseases, including COVID-19. It is found that the results COVID-19 is very much similar to the results of pneumonia disease. For this reason, it is better to have an experienced radiologist to continuously distinguish X-ray images of pneumonia to COVID-19 pneumonia.
Although the high performance of the proposed model for the COVID-19 pneumonia diagnosis, there are nonetheless certain limitation and these issues should be additionally considered in future works.
First of all, the experimentations of this works just used only the chest X-ray images for the diagnosis of the COVID-19 cases, however clinical data and laboratory tests are considered of the pillars a correct diagnosis. Second, the strategy for getting a X-ray images additionally influence on its quality.