COVID-19 is a pandemic that has affected nearly every country in the world. At present, sustainable development in the area of public health is considered vital to securing a promising and prosperous future for humans. However, widespread diseases, such as COVID-19, create numerous challenges to this goal, and some of those challenges are not yet defined. In this study, a Shallow Single-Layer Perceptron Neural Network (SSLPNN) and Gaussian Process Regression (GPR) model were used for the classification and prediction of confirmed COVID-19 cases in five geographically distributed regions of Asia with diverse settings and environmental conditions: namely, China, South Korea, Japan, Saudi Arabia, and Pakistan. Significant environmental and non-environmental features were taken as the input dataset, and confirmed COVID-19 cases were taken as the output dataset. A correlation analysis was done to identify patterns in the cases related to fluctuations in the associated variables. The results of this study established that the population and air quality index of a region had a statistically significant influence on the cases. However, age and the human development index had a negative influence on the cases. The proposed SSLPNN-based classification model performed well when predicting the classes of confirmed cases. During training, the binary classification model was highly accurate, with a Root Mean Square Error (RMSE) of 0.91. Likewise, the results of the regression analysis using the GPR technique with Matern 5/2 were highly accurate (RMSE = 0.95239) when predicting the number of confirmed COVID-19 cases in an area. However, dynamic management has occupied a core place in studies on the sustainable development of public health but dynamic management depends on proactive strategies based on statistically verified approaches, like Artificial Intelligence (AI). In this study, an SSLPNN model has been trained to fit public health associated data into an appropriate class, allowing GPR to predict the number of confirmed COVID-19 cases in an area based on the given values of selected parameters. Therefore, this tool can help authorities in different ecological settings effectively manage COVID-19.

In December 2019, an outbreak of pneumonia of unknown etiology was noticed in Wuhan City, China, which later spread across the globe. In January 2020, the cause of this pneumonia-like disease was confirmed to be a novel coronavirus known as SARS-CoV-2 [

In March 2020, the World Health Organization (WHO) classified COVID-19 as a pandemic that could threaten millions of people all over the world [

COVID-19 poses a significant challenge for governments. Though stakeholders have dedicated many resources to fight it, the epidemic has nevertheless caused a social and economic crisis in both developed and developing countries. During the present crisis, it is important to understand how to maintain sustainability practices with limited resources so that long-term public health outcomes can still be achieved. Sustainable development depends on the cooperation of stakeholders across social, ecological, cultural, and political domains. The current challenges of COVID-19 have caused mortality and morbidity on a massive scale, directly or indirectly influencing all these domains. After the emergency declaration from the WHO, all trade and travel were banned, which led to social unrest and devastating economic consequences. In the past, the Ebola, influenza, SARS, and HIN1 epidemics caused almost US $10 billion in losses. The current crisis is similar in nature to what occurred during the SARS epidemic and may have worse consequences; if the spread of the epidemic continues as it has, the worldwide losses are projected to exceed US $150 billion [

As the situation worsens, relevant tools based on artificial intelligence (AI) need to be studied; a machine learning process uses big data for pattern recognition, explanation, and prediction based on input data [

This study was designed to predict the number of COVID-19 cases based on environmental and non-environmental factors. We used two different approaches. First, we analyzed the correlations between the confirmed cases (from February 1, 2020 to April 20, 2020) and several environmental factors (temperature, humidity, wind speed, ultraviolet (UV) index, elevation, air quality index and pollution level) and non-environmental factors (population, population density, gender ratio, and human development index). Second, we built a binary classification model to predict and classify COVID-19 cases using an SSLPNN algorithm based on critical factors related to sustainable development in the area of public health. These factors were divided into two significant modules: the first was the non-environmental module, and the second was the environmental module. Both modules were used as the inputs, with the number of confirmed COVID-19 cases designated as the outputs. The study design is presented in

In the analysis, specific conditions were applied. These conditions included the following:

In addition to the number of COVID-19 cases, 14 different environmental and non-environmental variables were used, including temperature (minimum, maximum, and average), humidity, wind speed, air quality index, UV index, pollution level, population, population density, gender ratio, average age, and human development index levels.

To enhance the precision of the estimates and to reduce bias, different countries were considered due to their different topographical, monetary, and ecological situations.

The environmental data used in this study was based on the capital cities of the selected regions, as these regions generally had larger populaces.

The non-environmental data used in this study was also taken from the regions of the respective countries.

The analysis period was from February 1, 2020 to April 20, 2020.

Different countries of the Asian continent were selected for this study.

In this study, data was collected from the various official and independent websites of the selected countries, which were China, South Korea, Japan, Pakistan, and Saudi Arabia [

To see the relationships between the total confirmed COVID-19 cases in the 54 provinces of the five countries included in this study and the 14 environmental and non-environmental variables, a correlation analysis was performed.

Before building the model, it was necessary to evaluate the correlations between each independent dataset. For this purpose, a Spearman’s correlation analysis was done on the non-parametric dataset. In a non-parametric dataset, the population data usually does not have a normal distribution and is randomly distributed vertically and horizontally. For this test, the selected parameters (environmental and non-environmental) and the total number of confirmed COVID-19 cases were included. This test was conducted to reveal the associations between two different variables without considering the distribution of the data, which is highly recommended for a dataset with at least ordinal scale. The relationships among the non-parametric variables are represented by parallel plot in

s_{i} = the transformation between the ranks of corresponding parameters

m = the number of values

AI embraces a wide variety of approaches and algorithms based on machine intelligence. It has numerous applications in innumerable areas of science, encompassing fuzzy logic theory, machine learning techniques, risk valuations and hazard detection, meta-heuristic algorithms and classification, and clustering techniques [

In this study, we used a neural network architecture with 2,352 inputs for each selected parameter, one output neuron with a linear output function, and a single-layer grid. Through forward propagation, the network calculated the dot product between the n^{th} sample x(n) and the weight vector w and then added the bias b. This calculation produced the weighted sum of the inputs with bias correction:

w = weight vector

b = bias

g = activation function

The mean square error function assesses the credibility of the algorithm on a distinct trial:

where, y^{(n)} is 2 if the n^{th} trial fits category 2, 1 if the n^{th} trial fits category 1, and 0 if the n^{th} trial fits category 0. A cost function with L2 regularization of the weights is used to assess the global performance of the classifier. The term is affixed with the cost function to handle huge weights and to lessen the search space, reducing the inoperable weights toward zero, thus delivering more straightforward representations:

where:

For large values of

Of note, the accuracy is computed as if it were a classification part.

The cost function is deputed to compute the errors in the recent forecasts. The learning process matter is comparable to the cost function reduction. While the training samples are fixed, the cost function depends only on the network parameters (the weights and bias). Thus, the cost function reduction is also comparable to the optimization of the grid parameters. The whole process is controlled by the following equations.

The objective function to be reduced is the cost function K_{n}(θ), where n denotes the n^{th} epoch, µ is a label for w and b, and g_{n} represents the gradient.

This evaluation is then utilized to consider two exponential moving averages of the gradient m_{n} and the squared gradient v_{n}, respectively.

The two hyper-parameters β_{1}, β_{2} ∈ [0, 1) regulate the exponential decline rates of these moving averages.

Finally, the grid parameters are restructured by utilizing the classical method of gradient descent represented by

The term

The SSLPNN algorithm can forecast the value

To refine the precision of the model and to reduce the learning errors so as to obtain optimized outcomes, dissimilar models were created by hit-and-trial methods to find the appropriate number of layers and neurons for each layer. The input variables were the previously mentioned 14 notable factors: Namely, the population, population density, gender ratio, average age, human development index, elevation, temperatures (maximum, minimum, and average), relative humidity, wind speed, air quality index, pollution level, and UV index of each region. The number of confirmed COVID-19 cases was used as the output dataset. Two classes (labels) were assigned to the number of confirmed cases. Specifically, the number of confirmed cases under or equal to 800 were labeled as “0,” and the number of confirmed cases above 800 were labeled as “1.” The number of cases in five countries were included in the study. For modeling, 70% of the cases were used as training cases, 15% were used for validation, and 15% were reserved for testing.

Determining the regulating parameters in an algorithm is important, as it aids in the quick convergence of the algorithm. There were no explicit associations among most of the parameters in this study. Thus, these parameters were considered independent and identified with the assistance of recent studies, experts, and trial-and-error methods. It is also important to identify the relationships between parameters through regression analysis, which helps with predictions based on the least learning error that are measured by the Root Mean Square Error (RMSE). Therefore, the process of selection was dimensionless and influenced the sensitivity of the modeling error. It is worth mentioning that the RMSE was used by the GPR algorithm with the Matern 5/2 GPR preset as a measurement of accuracy for the regression learner model.

When the dimensionality of the data is high, parameter identification typically turns out to be instinctive for the learning algorithms, as high-dimensional data tends to undesirably affect the efficacy of the majority of learning algorithms. Parameter identification is an effective dimensionality reduction procedure that chooses an ideal subclass of the unique parameters, delivering exceptional predictive control when modeling the data. These diverse structures can then be utilized to segregate trials into dissimilar modules. In this study, the Principal Component Analysis (PCA) procedure was used to select the optimal parameters.

In regression analysis, a GPR algorithm with variable models can adapt to numerous types of pattern recognition data for prediction through classification. The excellent experimental results demonstrate that GPR models provide a very promising feature selection solution to numerous pattern recognition problems through PCA. The algorithm can acquire patterns from the global distribution, therefore improving the precision of its pattern recognition capabilities.

GPR models with a finite-dimensional group of arbitrary variables and multivariate distribution are non-parametric kernel-based probabilistic models. Therefore, each linear combination is consistently distributed and the notion of Gaussian procedures is named after Carl Friedrich Gauss, as it emerges from Gaussian distribution to be an infinite-dimensional generalization of multivariate normal distributions. In this study, Gaussian process was used in the statistical modeling, regression to multiple target values, and analyses of mapping in higher dimensions. In addition, a GPR model with the Matern 5/2 GPR preset was used to plot the behavior of the algorithm; calculate the RMSE, R-Squared Value, MSE, Mean Absolute Error (MAE), prediction speed, and training time; and analyze the results of the GPR to see the similarities and differences in the data. The Matern 5/2 kernel does not have competence for measure problems in high dimensional spaces. The mathematical model of the Matern 5/2 GPR is illustrated as follows:

where:

The number of cases showed a significant correlation with the population and air quality index of a region. A statistically significant inverse relationship was observed between the number of cases and the average age and human development index levels. The results of the correlation analysis are presented in

Variable | Correlation coefficient | |
---|---|---|

Population | 0.597* | <0.001 |

Population Density | −0.01 | 0.93 |

Gender Ratio | 0.23 | 0.079 |

Average Age | −0.26* | 0.04 |

Human Development Index (HDI) | −0.52* | <0.001 |

Elevation | −0.19 | 0.14 |

Maximum Temperature | 0.19 | 0.14 |

Minimum Temperature | 0.06 | 0.63 |

Average Temperature | 0.15 | 0.23 |

Humidity | 0.11 | 0.37 |

Wind Speed | 0.03 | 0.76 |

Air Quality Index | 0.37* | 0.004 |

Pollution Level | 0.23 | 0.07 |

UV Index | −0.04 | 0.74 |

*Correlation is significant at the 0.05 level (2-tailed).

Furthermore, all included variables were analyzed to assess their correlations. An independent association was observed between each of the parameters. The results of this analysis are presented in

Variables | Population | Gender ratio | Age | HDI | Elevation | Max temp | Min temp | Avg temp | Humidity | Wind speed | Air quality | Pollution level | UV index | Covid-19 cases |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

Population | 1.0 | −0.19 | 0.14 | −0.48* | −0.27* | −0.25 | −0.37* | −0.30* | 0.39* | −0.16 | 0.35* | 0.04 | 0.20 | 0.59* |

Gender Ratio | −0.19 | 1.0 | −0.84* | −0.30* | 0.48* | 0.64* | 0.56* | 0.63* | −0.49* | 0.13 | 0.32* | 0.59* | −0.24 | 0.23 |

Age | 0.14 | −0.84* | 1.0 | 0.55* | −0.62* | −0.73* | −0.64* | −0.72* | 0.43* | −0.22 | −0.33* | −0.70* | 0.34* | −0.26* |

HDI | −0.48* | −0.30* | 0.55* | 1.00 | −0.36* | −0-.26* | −0.14 | −0.24 | −0.17 | 0.01 | −0.43* | −0.45* | 0.00 | −0.52* |

Elevation | −0.27* | 0.48* | −0.62* | −0.36* | 1.0 | 0.42* | 0.39** | 0.42* | −0.41** | 0.11 | 0.01 | 0.34* | −0.20 | −0.19 |

Max Temp | −0.25 | 0.64* | −0.73* | −0.26* | 0.42* | 1.0 | 0.91** | 0.97* | −0.31* | 0.30* | 0.10 | 0.42* | −0.41* | 0.19 |

Min Temp | −0.37* | 0.56* | −0.64* | −0.14 | 0.39* | 0.91* | 1.0 | 0.97* | −0.31* | 0.36* | 0.00 | 0.35* | −0.36* | 0.06 |

Avg Temp | −0.30* | 0.63* | −0.72* | −0.24 | 0.42* | 0.97* | 0.97* | 1.0 | −0.31* | 0.31* | 0.06 | 0.40* | −0.41* | 0.15 |

Humidity | 0.39* | −0.49* | 0.43* | −0.17 | −0.41* | −0.31* | −0.31* | −0.31* | 1.0 | −0.27* | −0.19 | −0.44* | 0.03 | 0.11 |

Wind speed | −0.16 | 0.13 | −0.22 | 0.01 | 0.11 | 0.30* | 0.36* | 0.31* | −0.27* | 1.00 | 0.10 | 0.24 | 0.15 | 0.03 |

Air Quality | 0.35* | 0.32* | −0.33* | −0.43* | 0.01 | 0.10 | 0.08 | 0.06 | −0.19 | 0.10 | 1.0 | 0.80* | 0.12 | 0.37* |

Pollution Level | 0.04 | 0.59* | −0.70* | −0.45* | 0.34* | 0.42* | 0.3* | 0.40* | −0.44* | 0.24 | 0.80* | 1.00 | −0.12 | 0.235 |

UV Index | 0.20 | −0.24 | 0.34* | 0.06 | −0.20 | −0.41* | −0.36* | −0.41* | 0.03 | 0.15 | 0.12 | −0.12 | 1.0 | −0.04 |

Covid-19 Cases | 0.59* | 0.23 | −0.26* | −0.52* | −0.19 | 0.19 | 0.06 | 0.15 | 0.11 | 0.039 | 0.37* | 0.23 | −0.04 | 1.00 |

*Correlation is significant at the 0.05 level (2-tailed).

Before applying the binary classification through the pattern recognition model using the SSLPNN algorithm, a correlation analysis was conducted for the 54 case studies in the five countries, which included China, South Korea, Japan, Saudi Arabia, and Pakistan. This analysis showed a reasonable correlation coefficient (R) among the non-parametric variables (^{−01}).

Neural Network Architecture | Neural Network Performance | |||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|

Number of Hidden Nodes | Epocs | Gradient | Validation Checks | Performance | Accuracy (%) | RMSE (%) | Cross-Entropy | Training (Classified/Misclassified) out of 1646 | Testing (Classified/Misclassified) out of 353 | Validation (Classified/Misclassified) out of 353 | Overall (Classified/Misclassified) out of 2352 | |

Shallow Neural Network | 5 | 40 | 0.003 | 6 | 0.06 | 98.67 | 1.33 | 1.31 | 1624/22 | 347/6 | 353/0 | 2324/28 |

10 | 32 | 0.009 | 6 | 0.05 | 98.66 | 1.34 | 1.23 | 1624/22 | 351/2 | 349/4 | 2324/28 | |

50 | 21 | 0.02 | 6 | 0.05 | 98.97 | 1.03 | 1.26 | 1629/17 | 347/6 | 348/5 | 2324/28 | |

100 | 33 | 0.03 | 6 | 0.06 | 99.09 | 0.91 | 1.33 | 1631/15 | 347/6 | 346/7 | 2324/28 | |

200 | 54 | 0.04 | 6 | 0.08 | 98.67 | 1.33 | 1.29 | 1624/22 | 351/2 | 349/4 | 2324/28 |

The results of the regression analysis using the GPR technique with Matern 5/2 were reasonably accurate, with an RMSE of 0.95239 in the prediction of confirmed COVID-19 cases. The PCA technique was used for the removal of noise and redundant parameters in order to reduce the dimensionality of the dataset. The information and results for the models are presented in

Serial. No | Model | Preset | RMSE | R-Squared | MSE | MAE | Prediction Speed | Training Time | Terms | Upper bound on Terms | Robust | PCA | Max. no. of Steps | Min. Leaf Size | Surrogate Decision Splits | Kernel Function | Kernel Scale | Box Constraint | Epsilon | Standardize Data | Learning Rate | No. of Leaves | Basis Function | Use of Isotropic Kernel | Optimized Numeric Parameter | Kernel Sigma |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|

1 | Linear Regression | Linear | 317.4 | 0.58 | 1.01e+05 | 243.96 | ~26000 obs/s | 3.1733 sec | Linear | Off | After training 7 components were kept | |||||||||||||||

2 | Linear Regression | Interactions Linear | 231.2 | 0.78 | 53454 | 169.98 | ~24000 obs/s | 0.5699 sec | Interactions | Off | ||||||||||||||||

3 | Linear Regression | Robust Linear | 342.71 | 0.51 | 1.18e+05 | 223.13 | ~34000 obs/s | 0.6687 sec | Linear | On | ||||||||||||||||

4 | Stepwise Linear Regression | Stepwise Linear | 231.94 | 0.78 | 53794 | 172.57 | ~29000 |
21.744 sec | Linear | Interactions | 1000 | |||||||||||||||

5 | Tree | Medium Tree | 18,37 | 1.00 | 337.31 | 1.1309 | ~35000 |
0.385 |
4 | Off | ||||||||||||||||

6 | Tree | Fine Medium Tree | 18,05 | 1.00 | 325.91 | 1.0737 | ~37000 |
2.334 |
12 | Off | ||||||||||||||||

7 | Tree | Fine Coarse Tree | 131.97 | 0.93 | 17417 | 1.0737 | ~39000 |
0.383 |
36 | Off | ||||||||||||||||

8 | SVM | Linear SVM | 334.53 | 0.53 | 1.1191e+05 | 221.5 | ~35000 |
8.31 |
Linear | Automatic | Automatic | Automatic | True | |||||||||||||

9 | SVM | Quadratic SVM | 219.4 | 0.80 | 48137 | 134.58 | ~33000 |
73.372 |
Quadratic | Automatic | Automatic | Automatic | True | |||||||||||||

10 | SVM | Cubic | 39.41 | 0.99 | 1553 | 36.47 | ~35000 |
95.735 |
Cubic | Automatic | Automatic | Automatic | True | |||||||||||||

11 | SVM | Fine Gaussian SVM | 44.15 | 0.99 | 1949.6 | 43.352 | ~31000 |
754625.73 |
Gaussian | 0.66 | Automatic | Automatic | True | |||||||||||||

12 | SVM | Fine Gaussian SVM | 77.54 | 0.97 | 6013 | 52.411 | ~36000 |
1.016 |
Gaussian | 2.6 | Automatic | Automatic | True | |||||||||||||

13 | SVM | Coarse Gaussian SVM | 279.48 | 0.67 | 78108 | 184.23 | ~32000 |
1.280 |
Gaussian | 11 | Automatic | Automatic | True | |||||||||||||

14 | Ensemble | Boosted Trees | 57.12 | 0.99 | 3262.3 | 42.551 | ~15000 obs/s | 4.051 |
8 | 0.1 | 30 | |||||||||||||||

15 | Ensemble | Bagged Trees | 12.82 | 1.00 | 164.29 | 1.162 | ~19000 |
2.34 sec | 8 | 30 | ||||||||||||||||

16 | Gaussian Process Regression | Squared Exponential GPR | 0.995 | 1.00 | 0.98963 | 0.224 | ~14000 obs/s | 117.9 sec | Squared Exponential | Automatic | True | Constant | True | Automatic | ||||||||||||

17 | Gaussian Process Regression | Matern 5/2 GPR | 0.952 | 1.00 | 0.90705 | 0.208 | ~11000 obs/s | 134.04 sec | 5/2 GPR | Automatic | True | Constant | True | True | Automatic | |||||||||||

18 | Gaussian Process Regression | Exponential GPR | 1.579 | 1.00 | 2.4923 | 0.172 | ~16000 obs/s | 120.04 sec | Exponential | True | Constant | True | True | Automatic | ||||||||||||

19 | Gaussian Process Regression | Rational Quadratic GPR | 0.9973 | 1.00 | 0.99457 | 0.206 | ~7800 obs/s | 171.41 sec | Rational Quadratic | Automatic | True | Constant | True | True | Automatic |

Finally, the predictive number of COVID-19 cases was compared with the actual observed cases; the results were close. The overall observed cases were 1,271.00, and our model predicted 1,118.2 with an 87.96% accuracy. The results are presented in

Observed Value | ||||||||||||||

96,050,000.00 | 575.00 | 1.06 | 38.40 | 0.73 | 104.00 | 13.70 | −0.10 | 6.80 | 61.90 | 6.80 | 123.00 | 58.00 | 6.00 | 1,271.00 |

Predicted Value | ||||||||||||||

96,050,000.00 | 575.00 | 1.06 | 38.40 | 0.73 | 104.00 | 13.70 | −0.10 | 6.80 | 61.90 | 6.80 | 123.00 | 58.00 | 6.00 | 1,118.20 |

This paper examined the relationship between COVID-19 cases and different environmental, ecological, and socio-economic factors and established a model system based on these variables to classify and predict rates of infection. COVID-19 has created a panic among the public. Scientific approaches must be identified and developed to predict the impact of these factors and to help policymakers take appropriate actions in the future.

Weather conditions, such as temperature, humidity, wind speed, and air quality, can affect the viability of viruses. Studies suggest that temperature and humidity have a strong influence on the transmission of COVID-19 [

The results of this study indicate that population density and human development index levels can also be associated with the number of COVID-19 cases in an area. Socio-economic factors like population size and low human development index levels are a significant driver for emerging infectious diseases and their subsequent effects on public health [

Despite significant advancements in medical science, infectious diseases are a leading cause of mortality. For a novel disease like COVID-19 that does not have any standard guidelines for treatment and vaccination, the short-term response from medical science will be limited. However, we can utilize mathematical tools to better understand and forecast the impacts of such diseases. In the last few years, AI has been widely adopted to better understand infectious diseases and to predict epidemics [

Studies have reported on the use of neural networks to predict the outbreaks of many diseases, such as foot and mouth disease, influenza, epidemic diarrhea, Ebola virus, Rift Valley fever virus, Nipah virus, and SARS [

In this paper, we used the SSLPNN algorithm, which performed excellently, predicting the classes of COVID-19 cases for both the training and testing datasets with an accuracy of 99.09% and 99.04%, respectively.

The results of the binary classification modeling using SSLPNN with Scaled Conjugate Gradient Backpropagation (SCGB) showed high accuracy, with an MSE of 0.0114858 in five selected countries. Moreover, the results of the regression analysis using the GPR technique with Matern 5/2 for 54 case studies in five countries also showed high accuracy in the prediction of COVID-19 confirmed cases, with an RMSE of 0.952. This study established some previously unexplored patterns in the relationships between COVID-19 infections and the environmental and non-environmental conditions of select countries. Based on this analysis, we propose that both SCGB and GPR may be applicable to classifying and predicting patterns of COVID-19 cases. The results show that AI techniques can provide reasonable estimates about upcoming events based on specific inputs by learning the hidden structures of a scenario [

Our findings are consistent with previous studies into the effects of climatic conditions on epidemic diseases and public health [