COVID-19, being the virus of fear and anxiety, is one of the most recent and emergent of various respiratory disorders. It is similar to the MERS-COV and SARS-COV, the viruses that affected a large population of different countries in the year 2012 and 2002, respectively. Various standard models have been used for COVID-19 epidemic prediction but they suffered from low accuracy due to lesser data availability and a high level of uncertainty. The proposed approach used a machine learning-based time-series Facebook NeuralProphet model for prediction of the number of death as well as confirmed cases and compared it with Poisson Distribution, and Random Forest Model. The analysis upon dataset has been performed considering the time duration from January 1^{st} 2020 to16^{th} July 2021. The model has been developed to obtain the forecast values till September 2021. This study aimed to determine the pandemic prediction of COVID-19 in the second wave of coronavirus in India using the latest Time-Series model to observe and predict the coronavirus pandemic situation across the country. In India, the cases are rapidly increasing day-by-day since mid of Feb 2021. The prediction of death rate using the proposed model has a good ability to forecast the COVID-19 dataset essentially in the second wave. To empower the prediction for future validation, the proposed model works effectively.

The coronavirus is also known as novel coronavirus, 2019, or just the coronavirus. Initially, it was identified in China Wuhan Province in December 2019. The various respiratory illnesses caused by the coronavirus are being referred to as COVID-19 which has been caused by a new strain of coronavirus called SAR-COV2 [

The prime objective of this paper is to predict and forecast the COVID-19 cases in India, specifically, daily new cases, daily death cases, cumulative new cases, as well as cumulative death cases using the latest Facebook NeuralProphet model which is based upon time series forecasting because the COVID-19 prediction problem comprises the time component and it is difficult to handle and solve such problems using simple machine learning model. The model essentially relies upon the approach named hyperparameters tuning approach that further enhances the accuracy levels of the attained results. The proposed scheme digs and predicts the trend analysis of COVID-19 cases that have been further compared with the Random Forest model and Poisson Distribution model. The proposed approach tends to generate the forecasting and predictions for the aforementioned cases and when compared with the two models, the proposed model outperformed both of them in terms of attaining the accuracy in forecasted, reliability, and predicted results.

The remainder of the paper is organized into four sections. Section 2 describes the related works and Section 3 illustrates the methodology implemented in the paper. The experimental analysis of the proposed work has been explained in Section 4 and finally concluding remarks are discussed in Section 5.

For respirational measurement, Massaroni et al. [

Furthermore, numerous flexible predictive mixture models [

Authors, Ref. | Month and year of publication | Model adopted | Accuracy | Pros | Cons |

Aljameel et al. [ |
April 21 | Logistic regression, random forest and extreme gradient boosting | Very high | Identification of risk and assist the decision-making process | Multiple datasets validation issues |

Kafieh et al. [ |
February 21 | Random forest, multilayer perceptron, long short-term memory with regular, extended, multivariate features | High | Predict healthcare equipment's requirements | Data quality issues in real-time modeling |

Ardabili et al. [ |
October 20 | Multi-layered perceptron, adaptive network-based fuzzy inference system | High | Overcome the drawbacks of SIR and SIER models | Lack of generalizations and abstractions for prediction |

Khayyat et al. [ |
January 21 | Prophet model | Moderate | Forecasts the death cases efficiently | Low accuracy in prediction for recovered cases |

Mangayarkarasi et al. [ |
January 21 | Seasonal autoregressive integrated moving average and prophet | High | Forecasting AQI and PM 2.5 values help regularity bodies for decision-making. | Slow modeling process |

Gupta et al. [ |
January 21 | Polynomial regression, decision tree regression, and random forest regression | Moderate | Predicting resource requirements to provide better facilitation. | Overfitting issues. |

Zoabi et al. [ |
December 20 | Cloud-based smart detection algorithm using support vector machine | Moderate | Predict COVID-19 tests are required or not in case of low availability of resources | Low accuracy for new COVID-19 mutants |

Aldhyani et al. [ |
November 20 | Long-short term memory and holt-trend mode | High | Advanced time series model used | Lack of validated dataset |

To predict the second wave variety of phases and tools are involved. The initial and critical step is to collect authenticated data to make accurate predictions as far as possible. Once the data is available, the next step is to scrutinize depending upon the needs since the entire data cannot always be utilized and hence, it must be filtered, cleaned to prepare it for further forecasting process [

Random Forest is a machine learning algorithm that is used to solve problems based upon regression and classification. The Random Forest model has some pre-requisites so that it can perform well: first is rather than the random guess of its feature, the model predicts better results if some actual signals or threshold values should be set on a prior basis. Second is the predictions of individual decision tree should have a very low correlation. Ergo, if the results so obtained by the multiple predictors could be amalgamated, it would yield better-predicted values in comparison to the best individual predictor. An aggregation of more than one predictor to form a group is referred to as an ensemble and the learning process based upon this ensemble is known as Ensemble learning.

The Poisson Distribution may be defined as the discredited distribution that tends to compute the probability of the occurrence of the specific event in specified time duration. It takes only integer values and does not take fractional or decimal values. These values are not in a continuous range. This model is generally used to analyze the independent events that occur with a consistent speed but within a defined time period.

A variable k indicates the count of occurrence of an event whose value can be 0, 1, 2… and so on.

The event occurrences are entirely independent of each-other, that means, any event say event x if occurred would not ensure that event y would also take place as a consequence. The two events would not invoke any kind of probability of each-other's occurrences.

The average ratio of event occurrence is completely independent of any specific occurrence. However, for the sake of simplicity, its value is generally kept constant or fixed but it could also vary with time in real life practices.

Any two events cannot take place at exact same time instance rather considering every subsequent small time interval, exactly one event occurs or does not occur at all.

One of the popular forecasting tools named the Prophet model which was developed by Facebook forms the foundation of the NeuralProphet model. NeuralProphet is essentially a decomposable and moldering time-series model that further comprises numerous components namely, special events, seasonality, auto-regression, trend, lagged regressors and future regressors. The lagged regressor comprises the external variables having values confined to the observed time period only whereas the future regressor consists of external variables that store known and prospective future values considering the forecast period. The functionality of future regressors may be considered equivalent to the special events. Moreover, it is essential and prime requirement to pass the future values of the regressors for performing the forecasting of the values. Moreover, the component trend can be configured in two ways, either linear or piece-wise linear while updating the change points. To model the component named seasonality, Fourier terms may be utilized efficiently and hence, multiple seasonality to incur highly frequent data can be handled. An AR-Net short for Auto-Regressive Feed-Forward Neural Network can be used to model auto-regression for time-series. However, a separate feed-forward neural network works well in order to model lagged regressors. The remaining two components, special events and future regressors are configured and modeled in terms of co-variants of the model having dedicated co-efficient. It is vigorous to the data that is missing and changes with respect to the trend and handles outliers well. NeuralProphet consists of the sum of the three-time functions with the error term, that is, growth

The modeling process of NeuralProphet can be made faster in comparison to the prophet by embedding PyTorch's Gradient Descent optimization engine. This model works automatically as it detects the change point or can be customized too. Change points in data are the direction of the data shifts. For example, in the second wave of COVID-19 cases, the new cases started to fall down as the vaccination of COVID-19 reached the market and hence, the data deviated from its direction. On the other side, the increase in cases of COVID-19 occurred due to the new strain of the corona virus.

Once the desired dataset has been retrieved, it has to be divided into some proportions for training and testing purposes. Most of the time 70% of data is utilized to train the proposed model and the remaining data component is utilized to test and predict future outcomes. Ergo, in this training step, the proposed model has been fit onto the data samples of some fixed size, spanning approximately 70% dataset, eventually, the final evaluation of the model goes through in this phase only and hence, is crucial for the model generalization. The free parameters have been set for extensive experimentation using the NeuralProphet model, namely the forecasting time period has been set for seventy days. It gave us a training accuracy of 97.21% using the proposed model.

The dataset considered has been obtained from Kaggle and the dataset consists of time series summary tables. The proposed framework used hyperparameter tuning in which the uncertainty interval parameter is fixed to 80 and 3000 simulations for uncertainty intervals that are taken into consideration. There exist n number of hyper-parameters in the NeuralProphet model and their values may be supplied explicitly by the user. However, in case of unavailability of explicit values for the hyper-parameters, their default values may be considered. The default value of changepoints_range is 0.8, seasonality mode is additive, and the third hyper-parameter, growth parameter has linear default value. The additive seasonality mode has been selected along with the strength of seasonality set to 20. The changepoint period (Ratio) being set to 0.80 while considering 25 potential change points. The flexibility of change points is set to 0.05. Order 3 and 10 have been used, respectively, for weekly seasonality and annual seasonality along with the holiday effects being added merely as the dummy variables.

The CSV format is used in the present work having four attributes for new cases, cumulative new cases and new death cases, cumulative death cases of COVID-2019 The prediction function used to predict daily new and death cases as well as cumulative new and death cases, of COVID-19, is based upon time-series forecasting technique. The forecasting approach forecasts the cases for July, Aug and last week of September 2021. The models namely, NeuralProphet model, Random Forest model and Poisson Distribution model are considered for forecasting. Since the results essentially rely on these models; ergo, their basic working, primitives and significance are described below:

In the proposed approach, the results have been obtained using the NeuralProphet model as mentioned above. Moreover, to ensure the integrity and correctness of the obtained results, the comparison of the proposed scheme has been done with the existing results that were obtained using various models. Ergo, for such reasons, the results obtained in the proposed approach have been compared with the Poisson Distribution and Random Forest model. The predictions are done for the new cases as well as the death cases for the specified duration; therefore, the obtained results are divided into two sections: (i) Comparative analysis of daily new and cumulative cases and (ii) Comparative analysis of daily death and cumulative death cases.

Since the data is required for performing comparisons, therefore, the daily new cases and cumulative new cases considered and their associated statistical details are shown in ^{st} January 2020 till 16^{th} July 2021. Therefore, the comparison has been done for the actual values with the predicted values obtained using the time-series-based facebook NeuralProphet model.

Based upon both the comparative analysis so performed while considering daily and cumulative cases and it may be summarized that the predicted values using the NeuralProphet model are highly accurate when compared to remaining considered models.

Similar to predicting the confirmed new cases, the next concern of the proposed approach was to predict the death cases as well. For forecasting the values, the daily death cases and cumulative death cases considered and their associated statistical details are shown in

However, the results obtained from the Random Forest and Poisson Distribution model, illustrated in

The dataset has been split such that the testing may also be performed after training it using the considered model. Therefore, 70% of data has been used for training purposes and the remaining 30% of data has been utilized for testing the attained results to validate the efficacy and effectiveness of the developed model. The time-series data of India has been considered for evaluating the trends of covid cases. The dataset has been trained using three algorithms, namely the NeuralProphet model for predictions and forecasting and Random Forest and Poisson Distribution model for comparative analysis. The programming language was used for the training of the models. Moreover, the model accuracy was computed using scale-dependent error.

The evaluation metrics for all the three models attained using the Mean Absolute Percentage Error (MAPE) are statistically described in

Model | Random forest (%) | Poisson distribution (%) | NeuralProphet (%) |

MAPE | 0.21 | 0.32 | 0.12 |

S. No. | Model | Accuracy |

1. | Poisson distribution | 86.87 |

2. | Random forest | 93.43 |

3. | NeuralProphet model | 97.21 |

It is quite evident from the attained values and results that the NeuralProphet model outperformed the remaining two models namely Poisson Distribution as well as Random Forest model; ergo, it may be considered for fetching the trends of covid cases and outbreak for India. The performance metric i.e., Root Mean Square Error (RMSE) is also calculated and compared. The RMSE factor is directly affecting the reliability value in single time-series forecasting and the objective is to minimize the error factor to increase the accuracy value.

The performance of the models has been calculated using a valid dataset of confirmed cases and deceased cases of COVID-19 and comparison are represented in

Data | Poisson distribution | Random forest | NeuralProphet model |

Daily new cases | 74.76 | 39.58 | 27.187 |

Cumulative new cases | 59.64 | 34.468 | 24.829 |

Daily death cases | 42.68 | 22.432 | 19.325 |

Cumulative death cases | 22.78 | 20.45 | 14.35 |

Residual quality | Poisson distribution (%) | Random forest (%) | NeuralProphet model (%) | |

Daily new cases | Very good | 41.80 | 46.50 | 48.90 |

Good | 16.10 | 17.40 | 19.10 | |

Regular | 12.20 | 14.40 | 15.80 | |

Unreliable | 29.90 | 21.70 | 16.20 | |

Cumulative new cases | Very good | 49.10 | 51.30 | 55.20 |

Good | 11.10 | 12.70 | 13.90 | |

Regular | 15.30 | 15.80 | 16.60 | |

Unreliable | 24.50 | 21.20 | 14.30 | |

Daily death cases | Very good | 27.20 | 34.60 | 38.10 |

Good | 16.30 | 17.10 | 17.20 | |

Regular | 12.50 | 13.10 | 14.10 | |

Unreliable | 44.00 | 35.20 | 30.60 | |

Cumulative death cases | Very good | 44.50 | 48.10 | 52.30 |

Good | 23.10 | 23.40 | 24.30 | |

Regular | 11.10 | 11.40 | 12.10 | |

Unreliable | 21.30 | 17.10 | 11.30 | |

The statistics may be interpreted as follows: the predictions classified as ‘very good’ comprise remnants or residuals not more than 15% of the actual value of mean away from its analogous and equivalent actual counterpart, the next referred to as ‘good’ falls in the range of 15% to 25%, similarly, predictions classified as ‘regular’ lies in the range of 25% to 40% of the mean value; however, as and when the prediction has remanent beyond 40% of the actual mean value, it is categorized as ‘unreliable’. Henceforth, it may be deduced from the statement that the predictions comprising residuals below 40% are considered to be ‘reliable’; ergo, to depict the summary metric, the overall reliability is taken into consideration that further comprises ‘very good, ‘good’ and ‘reliable predictions’.

The proposed prediction using the NeuralProphet model yielded accurate results when compared to Poisson Distribution, and the Random Forest model. However, to accommodate the outbreak that the government has taken the necessary steps along with the availability of effective treatment or medication, the result of the outbreak would drastically decline which might affect the predicted values in the proposed approach.

The COVID-19 pandemic has traumatized the entire world in the extreme way possible while leading to the state of health emergencies worldwide. To get an insight into the proliferation as well as the repercussions of the epidemic, it has become the need-of-an-hour to evoke some sort of advancements in the prediction models for the outbreak to improvise the accuracy in the results. Moreover, the existing standard epidemiological models failed to yield accurate statistical values to accommodate long-term predictions; the reason being the non-availability of crucial data as well as the high uncertainty level. Therefore, in the proposed approach, it has been put forward to predict the end of the pandemic by predicting the confirmed cases as well as the death cases based upon the eighteen months datasets considered for attaining the predictive results. The proposed scheme has been developed with the help of an advanced time-series model while making use of hyper-parameters tuning. The attained results indicate that the proposed method based upon the NeuralProphet model got successful in attaining pretty accurate predictions. The accuracy using the NeuralProphet-based model is better than the other two models while predicting the number of confirmed cases and death cases. Furthermore, predictions also indicated that the confirmed cases would start to decline in May 2021 and the epidemic would come to an end by September 2021 in India. The proposed work may be extended to perform real-time live forecasting that could be best implemented using various advanced deep learning approaches or reinforcement learning models. The results would better ensue its real-life utility if few realistic parameters may be imbibed with the approaches namely, human behavior, vaccinations or doses, various government policies.

This research was supported by Taif University Researchers supporting Project Number (TURSP-2020/254), Taif University, Taif, Saudi Arabia.