WACPN: A Neural Network for Pneumonia Diagnosis

Community-acquired pneumonia (CAP) is considered a sort of pneumonia developed outside hospitals and clinics. To diagnose community-acquired pneumonia (CAP) more efficiently, we proposed a novel neural network model. We introduce the 2-dimensional wavelet entropy (2d-WE) layer and an adaptive chaotic particle swarm optimization (ACP) algorithm to train the feed-forward neural network. The ACP uses adaptive inertia weight factor (AIWF) and Rossler attractor (RA) to improve the performance of standard particle swarm optimization. The final combined model is named WE-layer ACP-based network (WACPN), which attains a sensitivity of 91.87±1.37%, a specificity of 90.70±1.19%, a precision of 91.01±1.12%, an accuracy of 91.29±1.09%, F1 score of 91.43±1.09%, an MCC of 82.59±2.19%, and an FMI of 91.44±1.09%. The AUC of this WACPN model is 0.9577. We find that the maximum deposition level chosen as four can obtain the best result. Experiments demonstrate the effectiveness of both AIWF and RA. Finally, this proposed WACPN is efficient in diagnosing CAP and superior to six state-of-the-art models. Our model will be distributed to the cloud computing environment.


Introduction
Community-acquired pneumonia (CAP) is considered a sort of pneumonia [1] developed outside hospitals, and clinics, along with infirmaries [2]. CAP may affect people of any age, but it is more prevalent in very young and elderly groups, which may need hospital treatment if they develop CAP [3]. Chest computed tomography (CCT) is a crucial way to help radiologists/physicians to diagnose CAP patients. Recently, automatic diagnosis models based on artificial intelligence (AI) have gained promising performances and attracted researchers' attention. For example, Heckerling, et al. [4] employed the genetic algorithm for neural networks to foresee CAP. This approach is shortened to the genetic algorithm for This work is licensed under a CC BY 4.0 International license. pneumonia (GAN). Afterward, Liu, et al. [5] proposed a computer-aided detection (CADe) model to uncover lung nodules in the CCT slides. Strehlitz, et al. [6] presented several prediction systems by means of support vector machines (SVMs) together with Monte Carlo cross-validation. Dong, et al. [7] proposed an improved quantum neural network (IQNN) for pneumonia image recognition. Ishimaru, et al. [8] proposed a decision tree (DT) model to foresee the atypical pathogens of CAP. Zhou [9] introduced the cat swarm optimization (CSO) method to recognize CAP. Wang, et al. [10] proposed an advanced deep residual dense network for the image super-resolution problem. Wang, et al. [11] proposed a CFW-Net for X-ray based COVID-19 detection.
However, the above methods still have room to improve. Their recognition performances, for example, the accuracies, are no more than or barely above 91.0%. We analyze their models and believe the reason is their training algorithms. After comparing recent global optimization algorithms, we find that particle swarm optimization (PSO) is one of the most successful optimization algorithms, compared to otheroptimization algorithms such as artificial bee colony [12] and bat algorithm [13]. Hence, we use the framework in Zhou [9] but replace CSO with an improved PSO. In addition, we introduce the two-dimensional wavelet-entropy (2d-WE) layer, introduce an improved PSO method-adaptive chaotic PSO (ACP) [14], and combine it with a feed-forward neural network. The final combined model is named WE-layer ACP-based network (WACPN). The experiments show the effectiveness of this proposed WACPN model. In all, we exhibit three contributions:

(a)
The 2d-WE layer is managed as the feature extractor.

(b)
ACP is utilized for training the neural network to gain a robust classifier.

(c)
The proposed WACPN is proven to give better results than six state-of-the-art models.

Dataset and Preprocessing
The dataset is described in Zhou [9], where we have 305 CAP images and 298 healthy control (HC) images. The detailed demographical information can be found in Ref. [9]. Assume the raw CCT dataset is signified as F A , within which each image be signified as f a , and the number of entire images of both classes is |F| = 603, we get F A = {f a (i), i = 1,2, …, |F|}. The size of each image can be obtained as: where (W 0 ,H 0 ) connotes the width and height of the image set F A and h size (x) outputs the size of x. Here W 0 = H 0 = 1024. Fig. 1(a-b) depicts the schematic for preprocessing, which aims to grayscale the raw images, enhance their contrasts, cut the margins and texts, and resize the images.
Second, we use histogram stretching (HS) on all images F B = {f b (i)} to enhance the contrast. Take the i-th image f b (i) as a case, its image-wise minimum, and maximum grayscale value f b l (i) and f b h (i) are calculated as: where (p w , p h ) are temporary variables signifying the index of width and height along with the image f b (i), respectively. The HSed image set F c = {f c (i), i = 1, …, |F|} can be determined as:  Fig. 2 shows two examples of the preprocessed image set. We use 10-fold cross-validation in our experiment.

Discrete Wavelet Transform
Tab. 1 enumerates all abbreviations and their associated meanings. The advantage of wavelet transform (WT) is that it holds both time/spatial and frequency information of the given signal/image. Nevertheless, the discrete wavelet transform (DWT) is chosen to convert the raw signal r(t) into the wavelet coefficient domain [15] in reality. Suppose the signal r(t) is one-dimension, first, we define the continuous wavelet transform (CWT) E γ (s a , s t ) of r(t) as: E γ s a , s τ = ∫ −∞ ∞ r t × γ(t|s a , s t )dt, (5) in which E stands for the wavelet coefficient, γ the mother wavelet. γ(t|s a , s t ) is defined as: γ(t|s a , s t ) = 1 s a γ t − s t s a , s a > 0 , s t > 0, (6) where the s a signifies the scale factor (SF) and s t the translation factor (TF).
Now, we deduct the definition of DWT from CWT. The equation (5) is discretized by substituting s a and s t with two discrete variables (DVs) c and v, where c signifies the DV of the SF s a , and v the DV of the TF s t [16]. Moreover, the original signal r(t) is a DV to r(q), of which q signifies the DV of t. Like this, two subbands (SBs) can be calculated. The approximation SB E A (q|c, v) is determined as: where f A (q) signifies the low-pass filter. S D is the down-sampling operation. The detail SB E D (q|c, v) is determined as: where f D (q) signifies the high-pass filter.

2d-WE Layer
Suppose we handle a two-dimensional (2d) image Q; the 2d-DWT [17] is worked out by processing row-wise and column-wise 1d-DWT in succession [15]. Initially, the 2d-DWT operates on the original image Q. Later, four SBs (Z 1 , O 1 , F 1 , A 1 ) are generated, where the subscript i means i -th level decomposition. Tab. 2 itemizes the description of four SBs. Note here MDL means the maximum decomposition level.
Assuming h 2d-DWT signifies a 2D-DWT decomposition operation, we deduce The subsequent decompositions run as:

Europe PMC Funders Author Manuscripts
Europe PMC Funders Author Manuscripts where M is the MDL and m the current decomposition level [18].
The subband A 1 is further decomposed into four SBs (A 2 , Z 2 , O 2 , F 2 ) at the 2nd level. The SB A 2 is later decomposed to (A 3 , Z 3 , O 3 , F 3 ), and then SB A 3 is decomposed accordingly. Fig. 3 portrays a diagram of 5-level 2d-DWT, whose pseudocode is represented in Algorithm 1. This study chooses a M-level decomposition. The optimal value of M is found via trial-and-error approach [19] and related in Section 4.1.

Pseudocode of 2d-DWT
Input Image Q Step 1 Decompose the image Q into four subbands for m = 2: M Step 2 The approximation subband A m-1 is decomposed into four subbands. p ℎ s = ℎ P r S = = s ℎ , ℎ = 1, 2, ⋯ H, (12) where h Pr signifies the probability function.
Second, the entropy of the PMF p(s) is calculated as f e (s): where f e is the entropy function.
Lastly, the entropy values of the whole SBs are concatenated to grow a feature vector I.
where the number of the features in I is N I , = (3M + 1), which equals the number of SBs. Record I(n) ← f e (s).

End
Output: The concatenated 2d-WE feature vector I with N I features. See Eq. (14).

ACP Network
The

Europe PMC Funders Author Manuscripts
where Z j ( n ), j = 1,..., N H signifies the output of j-th neuron in the HL. The description of z j (n) is z j (n) = β 1 ∑ i = 1 N I a i, j x i n + r j . (17) where A = {a(i,j)},i = 1, ...,N I ,j = 1, ...,N L and R = {r(j)},j = 1, ...,N L are the WBs of the neurons that connect the input layer with the HL, and β 1 the AF linked to the HL.
The parameter training is an optimization problem that guides us to search for the optimal WB parametric vector θ = (A, B, R, S). The length of θ is the number of parameters we need to optimize and is calculated as N θ =N I ×N L +N L ×N o +N L +N o . The training algorithm we choose is adaptive chaotic PSO (ACP) [14].
Recap that two attributes (position x and velocity v) are linked with each particle p in the standard PSO algorithm. Those two attributes are defined as the position of the particle (PoP) and the velocity of the particle (VoP). In each epoch, the fitness function E is re-calculated for the entire particles {p} in the swarm. The VoP v is re-evaluated by keeping track of the two best positions (BPs).
The first is the BP a particle p has traversed till now. It is dubbed pBest and symbolized as x pB . The second is the BP that any neighbor of p has traversed till now. It is a neighborhood best and is named nBest and symbolized as x nB .
If p takes the entire swarm as its neighborhood, the nBest turns to the global best and is for that reason named gBest. In standard PSO, the VoP v of particle p is updated as: v ωv + b 1 r 1 x P B − x + b 2 r 2 x nB − x (18) where ω signifies the inertia weight (IW) controlling the influence of the preceding velocity of the particle on its present one. b 1 and b 2 stand for two positive constants named acceleration coefficients. r 1 and r 2 mean two random numbers, uniformly distributed in the range of [0,1]. r 1 and r 2 are re-calculated whenever they occur. The PoP x of the particle p is updated as: x x + ν Δ t (19) where Δt is the assumed time step and always equals 1 for simplicity.
The ACP algorithm proposed an adaptive IW factor (AIWF) strategy. It uses ω AIWF to replace ω.
ω AIW F = ω max − ω max − ω min k max × k (20) Here, ω max signifies the maximum IW, ω min the minimum IW, k max the epoch once the IW goes to the final minimum IW, and k the present epoch.
Another improvement in ACP is upon the two random numbers (r 1 , r 2 ). In reality, (r 1 , r 2 ) are created by pseudo-random number generators (RNG), which cannot guarantee the optimization's ergodicity in solution space since they are pseudo-random. Rossler attractor (RA) is a good choice to calculate the random numbers (r 1 , r 2 ). RA equations are defined: dx dt = − y + z dy dt = x + δ a y , dz dt = δ b + xz − δ c z (21) where δ a , δ b , and δ c are inherent parameters of RA. We choose δ a = 0.2, δ b = 0.4, δ c = 5.7 via the trial-and-error method [20]. The corresponding curve is drawn in Fig. 5(a). We agree r 1 = x(t) and r 2 = y(t) to implant the chaotic properties of RA into the two parameters (r 1 ,r 2 ) in standard PSO. The (x,y) plane of RA is displayed in Fig. 5(b).

Parameter Configuration
The parameters of this study are listed in Tab

Results of Proposed WACPN Model
Tab. 4 shows the ten runs of 10-fold CV via the parameters shown in Tab

Effects of AIWF and RA
If we remove the AIWF from our WACPN model, the results using the same configuration are shown in Tab. 5. Similarly, the results of removing RA from our WACPN model are shown in Tab. 6. After comparing the results in Tab. 4 against the results in Tab. 5 and Tab. 6, we can deduce that both strategies-AIWF and RA-are beneficial to our WACPN model. Fig. 7 represents the ROC curves together with their upper and lower bounds of the proposed WACPN model and its two ablation studies (without AIWF and without RA). The AUC of WACPN model is 0.9577. The AUCs of the models removing AIWF or RA are only 0.9319 and 0.9456, respectively, demonstrating that both AIWF and RA help improve the standard PSO.
Error bar (EB) is an excellent tool for ease of visual evaluation. Fig. 8 presents the EB of model comparison, from which we can observe that the proposed WACPN model is superior to six state-of-the-art models. The causes are triple. First, the 2d-WE layer stands as a proficient way to designate CCT images. Second, ACP is efficient in training FNN. Third, we fine-tune and select the best parameters for the RA. In the future, our model may be applied to other fields [21,22].

Conclusions
A novel WACPN method is proposed for diagnosing the CAP in CCT images. In WACPN, the 2d-WE layer works as feature extraction, and the optimization algorithm-ACP-is exercised to optimize the neural network. This proposed WACPN model is verified to have better results than six state-of-the-art models. To work out the three limitations, first, we shall utilize the data augmentation method to enlarge the number of images in the dataset. Second, our team shall circulate the proposed WACPN model to the online CC environment (such as Azure) and summon specialists, clinicians, and physicians to examine its efficiency. Third, trustworthy or explainable Ais, which may provide the heatmaps pointing out the lesions, are two optional models to assist in adding explainability to the proposed WACPN model.      Table 6 Ten-run results without RA