A Classification–Detection Approach of COVID-19 Based on Chest X-ray and CT by Using Keras Pre-Trained Deep Learning Models

: The Coronavirus Disease 2019 (COVID-19) is wreaking havoc around the world, bring out that the enormous pressure on national health and medical staff systems. One of the most effective and critical steps in the fight against COVID-19, is to examine the patient’s lungs based on the Chest X-ray and CT generated by radiation imaging. In this paper, five keras-related deep learning models: ResNet50, InceptionResNetV2, Xception, transfer learning and pre-trained VGGNet16 is applied to formulate an classification–detection approaches of COVID-19. Two benchmark methods SVM (Support Vector Machine), CNN (Convolutional Neural Networks) are provided to compare with the classification–detection approaches based on the performance indicators, i.e., precision, recall, F1 scores, confusion matrix, classification accuracy and three types of AUC (Area Under Curve). The highest classification accuracy derived by classification–detection based on 5857 Chest X-rays and 767 Chest CTs are respectively 84% and 75%, which shows that the keras-related deep learning approaches facilitate accurate and effective COVID-19-assisted detection.


Introduction
Using the chest X-ray for radiation imaging to examine the patient's lungs is one of the most effective and critical steps in the fight against COVID-19 [1][2][3][4][5][6][7]. Many deep learning-based artificial intelligence (AI) systems have been proposed, and the results show that the chest X-ray and CT images benefit the detection of COVID-19 infection patients, and can effectively improve the accuracy of detection. Bernheim et al. [8] analyzed the COVID-19 infection on imaging related to chest CT of 121 symptomatic patients, and figured out that about 65% patients still has a normal CT, and later comes with more clinical symptoms, such as consolidation, bilateral as well as linear opacities. Fang et al. [9] investigated the sensitivity of chest CT as well as viral nucleic acid assay to compare the results derived by baseline methods [10]. The experiments indicated that sensitivity of CT and RT-PCR for COVID-19 infection is 98% and 7%, respectively. Hellewell et al. [11] developed a stochastic transmission model to systematically analyze the quantify the effectiveness of contact tracing and isolation of cases, and provided an effective strategy to control the SARS-CoV-2-like pathogen. Shi et al. [12] discussed the descriptive research for the 81 patients with COVID-19 pneumonia by using the radiological findings, and figured out that the COVID-19 pneumonia can be effectively detected based on the chest CT imaging abnormalities in asymptomatic patients. Convolutional neural network (CNN) is a kind of feed-forward neural network with multi-level deep structure, the neurons of which is non-full connected, and share the network's weights among some neurons at the same layer. Therefore, CNN has a significant advantage in processing high-dimensional data and automatically extracting features. Therefore, deep learning is also increasingly and widely used in medically assisted analysis, such as the CNN is applied to split the cartilage in the knee based on MRI [13]. In general, CNN requires a large number of samples for training and structural adjustment in order to form a model with strong characteristic analysis capabilities. Reinforcement learning (RL) is an important learning method, the main processing purpose of which is to achieve goal optimization through learning strategies [14]. The significant advantage of the RL approach is that it can receive learning information and updating model parameters, without any training data in advance, only by receiving feedback on actions from the external environment. RL has been widely used in image analysis, financial trading system and planetary vehicle path planning [15][16][17]. Improving CNN's generalization ability with unsupervised competitive learning or RL, can effectively reduce the over-reliance on large samples and improve model's classification accuracy.
Deep Convolutional Neural Network (DCNN) such as VGG16 (VGG with 16-weights layers) as well as DenseNet121, with transfer learning methods were applied in [18] to enhace the generalization ability of the proposed model and perform the better classification accuracy based on the pediatric Chest X-ray dataset. Experiments based on the occlusion test was proposed to detect the appropriate area and visualize model's output. Three classification issues, such as normal, pneumonia-bacterial and pneumonia-viral based on publically available COVID-19 chest X-ray medical images were implemented through the CNN with transfer learning [19]. The precision, recall and classification accuracy for COVID-19 dataset were 89.6%, 93.0% as well as 98.2%, respectively, which shown the better performance compared with the results that reported in the literatures. Zhang et al. [20] proposed an approach based on confidence-aware anomaly detection (CAAD) to implement the binary classification. Experiments based on the 5,977 Non-COVID-19 viral pneumonia samples and 18,774 healthy controls cases etc. were provided to shown that viral pneumonia usually exhibits significant different visual appearances on the Chest X-ray images. Baltruschat et al. [21] proposed a novel insight of the deep learning approaches related ResNet-50 to classify the ChestX-ray14 dataset, as aforementioned above, the transfer learning without finetuning methods were jointed into the new X-ray network. The ROC statistics of the ResNet-50 with extended architecture, indicating that the proposed deep learning model can achieve the best overall results compared with different approaches. Bhandary et al. [22] developed a modified AlexNet (MAN) network with deep-learning architecture to evaluate the human lung abnormality based on the Chest CT images. The extracted features were effectively selected by the PCA to construct the principal features dataset and then the experiments compared with the state-of-theart methods such as VGG16, VGG19 and ResNet50 shown that the proposed approaches can achieve the classification accuracy of 97.25%. Moreover, GPU will benefit the rapid convergence of the CNN model when a large amount of structured data requires floating-point operation. The deep CNN model can get more detailed feature than the simple one. Especially when the training samples are difficult to stimulate all the modes of the system due to the limited samples, resulting that the generalization capability of the CNN model is difficult to be guaranteed. The RL and pre-training methods are utilized to train deep CNN, i.e., VGG16 model by transferring the knowledge of the simple CNN network to optimize the deep network architecture, and to achieve high-precision model recognition and detection of Pneumonia, which undoubtedly benefits the early diagnosis of COVID-19 pneumonia.
The rest of this paper is organized as follows. In Section 2, we will introduce the experimental analysis and model performance evaluation, and provide the fundamental analysis according to the experimental results for the detection of the COVID-19. Finally, Section 3 will conclude the paper with discussions and provide the further research.

Proposed Classification-Detection Approach
In routine medical examinations, usually only the professional experience of medical imaging doctors can be qualified to accurately determine the difference between normal samples and infected samples based on the Chest X-ray and CT dataset. Because the difference in image visual display is difficult to evaluate, the performance indicator used to evaluate the images' similarity, such as statistical histogram, Mean Squared Error (MSE) as well as the Structural Similarity Index (SSIM) [23], are applied to arbitrarily selected two images from the dataset and estimate the similarity by checking that if their distribution are identical or near identical. MSE and SSIM used to calculate the quantitative results are provided by Eqs. (1) and (2), respectively.
where P (i, j) and Q (i, j) are both the pixel intensities corresponding to the location (i, j) of two arbitrary images, μ x (or μ y ) and σ 2 x (or σ 2 y ) are the mean and variance of the pixels intensities, respectively. MSE is a perceived differential measurement that can be used to evaluate two arbitrary images with the same size, but which is sensitive to outliers because the distance between the difference pixels intensity will be increased. In other words, it still doesn't mean that the two images are dramatically similar even if the MSE is small enough. SSIM as the performance indicator that focuses more on the similarity assessment of structural information, which can be used to effectively reflect the properties of the structure of the object between the two images, and calculate the differences in terms of brightness, contrast and structure to effectively evaluate their similarity. The quantitative results and the image visual comparison are shown in Fig. 1. Both normal Chest X-ray and CT, and the corresponding histogram of RGB pixel intensity etc., are shown in Fig. 2.

Figure 1: MSE and SSIM between the normal and pneumonia samples
The maximum of the RGB pixels intensity histogram of the normal Chest X-ray and CT are derived when pixels intensity equal to 0.6 and 0.2, respectively. The approximate dispersion of the former and latter are correspondingly distributed between [0.1, 0.9] and [0.2, 1.0]. Taking into account the similarity of the Chest X-ray and CT, the MSE and SSIM of the former are higher than the latter 0.08 and 0.07, respectively. This shows that there are significant differences in the feature distribution of these two types of samples.
The processing diagram of this paper is shown in Fig. 3. The construction process of the classification-detection model is divided into three parts: Firstly, the corresponding Chest X-ray and CT are divided into training samples and testing samples, and then used as the inputs in features learning; Secondly, five keras-related deep learning models, i.e., ResNet50, InceptionRes-NetV2, Xception, transfer learning and pre-trained VGGNet16 are provided to formulate the classification-detection approaches of COVID-19; Finally, the performance indicators, i.e., precision, recall, F1 scores, confusion matrix and AUC, are used to evaluate the classification-detection approaches, and compare two benchmark methods such as SVM and CNN, to demonstrate the main performance of the proposed approaches.

Experiments Setup and Design
The data collected from the dataset: Chest X-ray [24][25][26] and Chest CT [27][28][29], are precisely classified as two groups: Normal & Pneumonia. Obviously, this is essentially a typical binary classification issues to judge if the people are infected or not. The training and testing dataset are divided in Tab. 1 to evaluate the performance of the proposed approach. The explanation of the abbreviations in Tab. 1 is given as follows, Xtrain and Xtest: represents the training sample and testing samples related to images of normal persons. ytrain and ytest: Are the training sample and testing samples related to images of persons infected by Pneumonia.

Architecture of the VGGNet
All programming experiments are based on Python 3.6 under OS win10. The hardware used for experiments are configured as the i7-9750H CPU, 32G RAM, and NVIDIA P620 4G GPU. VGG16 is proposed by Visual Geometry Group from Oxford. Compared with the AlexNet, VGG utilized the successive 3 × 3 convolution cores instead of larger convolution nucleus in AlexNet (11 × 1, 7 × 7, 5 × 5). For a given sensory field (the local size of the input image associated with the output), VGG using a small convolutionary that accumulates is preferable to a large convolutionary, because multi-layer nonlinear layers increase the CNN's depth to ensure learning more complex patterns at a lower cost and less parameters. In addition, 25 normal and pneumonia Chest X-ray (and CT) images and the corresponding visualization heatmap are respectively shown in Figs. 4 and 5, respectively. Essentially, because the distribution of these two types of data (positive and negative samples related to normal and pneumonia symptoms) is uneven, such as training sample size is too much larger than the test sample size. The model performance may not be effectively evaluated and reflected, especially for the unbalanced class distribution if only the classification accuracy is provided. In other words, it may be insufficient to evaluate the model performance only based on model accuracy. To this end, more performance indicators, such as Precision, Recall, F1-score and Support, are provided to comprehensively evaluate the performance of the classification-detection approaches.

Experimental Comparison and Analysis
In order to further verify the performance of model classification detection, SVM, two hidden layers of CNN (i.e., two layers of convolution layer and maximum pooled layer), ResNet50, InceptionResNetV2, Xception, transfer learning VGG16 and pre-trained VGG16 model, are respectively utilized to evaluate the final classification detection based on the positive and negative samples. The parameter gamma and the punitive parameters of the utilized Radial Basis Function kernel in SVM are 0.001 and default value, respectively. Similarly, based on the excellence classification performance of the deep learning, the Keras framework is used to build a CNN network classification model. The positive and negative samples as the inputs are normalized to 150 × 150 × 1, and the two convolution layer and the maximum pooled layer are then set, and the output classified result is calculated through two layers of full-connection layer. Stochastic gradient descent algorithm is then applied to optimize the architecture of CNN model and improve feature learning ability, and the learning rate of the outlined five models i.e., ResNet50, Inception-ResNetV2, Xception, TLVGG and PTVGG is 0.0001. Performance evaluation and comparison of the SVM, CNN, ResNet50, InceptionResNetV2, Xception, TLVGG16 (Transfer learning VGG16) and PTVGG16 (Pre-trained VGG16) are given in Tabs. 2 and 3, respectively. The computational resources consumption of the deep learning approaches are given in Tab. 4. where ATC: Average time cost in seconds per step in each learning epoch. The number of the training (or validation) samples of the X-ray and CT are 2698 (468) and 600 (140), respectively. The experimental results show that the classification accuracy calculated by SVM for pneumonia is 0.77 (Chest X-ray) and 0.63 (Chest CT), respectively. More precisely, the classification accuracy related to the positive samples is significantly higher than that of negative samples by 0.20 (Chest X-ray) and 0.17 (Chest CT), respectively. This indicates that the SVM has a stronger ability to identify normal lungs, but it is still not qualified to identify patients with pneumonia. For Chest X-ray images, the significant increase in training and testing sample size will effectively improve the classification accuracy although increasing training time. The aforementioned methods facilitates the accurate identification and detection of pneumonia.      The accuracy curve (and loss function curve) of the five outlined keras-related deep learning models are successively marked as a-1-e-1 (a-2-e-2) in Fig. 6. Moreover, the model training and the curve related to the verification curve and the number of iteration steps are also sequentially shown in a-e of the Figs. 7, while the corresponding confusion metrics are provided in a-e of the Fig. 8. In addition, the final model performance is reported by the Tabs. 5-9. The AUC (Area Under Curve) is defined as the area enclosed with the axes under the ROC (Receiver Operating Characteristic) curve, the value of which is between the interval (0.5, 1). More precisely, the closer the AUC is to 1.0, the higher the credibility of the detection methods are. Especially, the method will show meanness in the real application if the AUC equals to 0.5. In order to verify the performance of the utilized methods in this paper, AUC_SVM, AUC_NB and AUC_LOG related to the ResNet50 X-ray (CT), InceptionResNetV2 X-ray (CT), Xception X-ray (CT), TLVGG16 X-ray (CT) and PTVGG16 X-ray (CT) are listed in Tab. 10. Where AUC_NB and AUC_LOG represent the ROC Curves for naive Bayes and logistic regression classification, respectively. SVM produces lower ROC values compared with naive Bayes and logistic regression, and the ROC curve for naive Bayes is generally equal to logistic regression, which indicates better performance than the SVM classifier method.

Performance Analysis of the Proposed Approaches
Precision usually refers to the ratio of the number of correctly predicted positive samples to the total number of predicted positive samples, that is, Precision = TP/(TP + FP). A higher value means a lower false positive rate. The highest precision of patients with normal and pneumonia infections are: 0.99 (TLVGG16, Chest X-ray, Normal), 0.75 (InceptionResNetV2, Chest X-ray, Pneumonia), 0.81 (SVM, Chest CT, Normal), 0.80 (PTVGG16, Chest CT, Pneumonia), and generally almost all of the outlined five deep learning models are higher than 0.7 on both of the Chest X-ray and Chest CT. This shows that the performance of the outlined five deep learning model in this paper is excellent.   Recall is usually also called sensitivity, which is defined as Recall = TP/(TP+FN), and can be used to represent the ratio of the number of correctly predicted samples to the number of samples in all actual categories. That is, the number of the positive samples are correctly predicted. In other word, the deep learning model usually has better sensitivity if its value is higher than 0.5. The highest Recall of patients with normal and pneumonia infections are: 0.77 (Xception, Chest X-ray, Normal), 0.99 (TLVGG16 and PTVGG16, Chest X-ray, Pneumonia), 0.83 (PTVGG16, Chest CT, Normal), 0.84 (SVM, Chest CT, Pneumonia), and generally almost all of the outlined five deep learning models is higher than 0.6 on both of the Chest X-ray and Chest CT. This shows that the overall performance of the model is good.  F1 Score = 2 * (Recall * Precision)/(Recall + Precision), which is the weighted average of Precision and Recall. Especially, when the data presents an uneven class distribution, the P and R indicators sometimes are inconsistent, at this time, F-Measure is intuitively (also known as F-Score) and more helpful to measure the cost of false positives and false negatives. The highest Recall of patients with normal and pneumonia infections are: 0.82 (ResNet50, Chest X-ray, Normal), 0.86 (ResNet50, Chest X-ray, Pneumonia), 0.77 (ResNet50 and PTVGG16, Chest CT, Normal), 0.74 (PTVGG16, Chest CT, Pneumonia), and generally almost all of the outlined five deep learning models is higher than 0.7 on both of the Chest X-ray and Chest CT. This shows that the performance of the model is generally superior.
The confusion matrix is mainly used to evaluate the performance of the classification model according to a given test sample, which includes four parts: True Positive (TP), False Positive (FP), True Negative (TN) and False Negative (FN), where TP and FP are usually related to the number of the correct positive samples and false positive samples, respectively. TN and FN respectively indicate the number of the correct negative samples and false negative samples. In other word, the less the false positives and false negatives are, the better the model's classification performance is. Based on the discussion, deep transfer learning approaches is suitable for the rapid and accurate detection of pneumonia in theoretical analysis.

Conclusions
The main purpose of this paper is to formulate the keras-related deep learning approach for classification-detection of COVID-19. Model architecture design, structure selection and model summary are illustrated in detail to illustrate the main processing diagram of the proposed approaches. The experimental results obtained by five keras-related deep learning approaches are provided and applied to compare with the benchmark methods according to the performance indicators. The experimental evaluation indicates that the deep learning approaches benefits the accurate auxiliary detection of COVID-19. In our further work, the clinical dataset analysis in combination with the classification-detection approach will be considered to further improve the performance.