Machine Learning Based Depression, Anxiety, and Stress Predictive Model During COVID-19 Crisis

: Corona Virus Disease-2019 (COVID-19) was reported at first in Wuhan city, China by December 2019. World Health Organization (WHO) declared COVID-19 as a pandemic i.e., global health crisis on March 11, 2020. The outbreak of COVID-19 pandemic and subsequent lockdowns to curb the spread, not only affected the economic status of a number of countries, but it also resulted in increased levels of Depression, Anxiety, and Stress (DAS) among people. Therefore, there is a need exists to comprehend the relationship among psycho-social factors in a country that is hypothetically affected by high levels of stress and fear; with tremendously-limiting measures of social distancing and lockdown in force; and with high rates of new cases and mortalities. With this motivation, the current study aims at investigating the DAS levels among college students during COVID-19 lockdown since they are identified as a highly-susceptible population. The current study proposes to develop Intelligent Feature Subset Selection with Machine Learning-based DAS predictive (IFSSML-DAS) model. The presented IFSSML-DAS model involves data preprocessing, Feature Subset Selection (FSS), classification, and parameter tuning. Besides, IFSSML-DAS model uses Group Gray Wolf Optimization based FSS (GGWO-FSS) technique to reduce the curse of dimensionality. In addition, Beetle Swarm Optimization based Least Square Support Vector Machine (BSO-LSSVM) model is also employed for classification in which the weight and bias parameters of the LSSVM model are optimally adjusted using BSO algorithm. The performance of the proposed IFSSML-DAS model was tested using a benchmark DASS-21 dataset and the results were investigated under different measures. The outcome of the study suggests the development of specialized programs to handle DAS among population so as to overcome COVID-19 crisis.


Introduction
World Health Organization (WHO) declared Coronavirus Disease (COVID-19) as a pandemic on March 11, 2020. This respiratory infection is caused by SARS-CoV-2, a novel coronavirus reported for the first time in China. COVID-19 has affected more than 187 Million people across the globe and the disease is characterized by mild to moderate signs such as cough, sore throat, fatigue, fever, and shortness in breath. According to a recent data, persons with comorbidities and above 60 years are vulnerable people who are highly prone to serious respiratory infection which increases the mortality rate. On the other hand, pregnant women do not seem to be harmfully affected.
In order to curtail the spread of coronavirus, governments introduced various modifications in daily life such as closing of workplaces, shopping areas and public amenities, prevalent diagnostics of the virus, forced isolation in schools, and constraints to civil liberties. Consequently, these modifications created several psychological replies in persons that has become the degree of compliance with prevention measures. Indeed, the public health studies conducted earlier focused on slowing/preventing the spreading of crisis, while it also emphasized the significance of social and psychological aspects of curfews. The implication of this difficult aspect is to follow prevention measures, introduced by distinct governments that should be examined in detail. In fact, societal and psychological impacts might be pronounced, widespread, and have long lasting impression compared to physical healthy complications brought by the disease [1]. After few weeks of COVID-19 outbreak in China, 53.8% people mostly Chinese expressed the psychological impact of epidemic creating a mild to serious impact upon them. Extensive research conducted earlier, on the effect of infective epidemics upon mental health, demonstrates that crisis is an extremely stressful event which can drive people to handle ambiguous, uncertain, and unexpected conditions. Particularly, two major factors of epidemics have been established to influence the mental state of people. First, it is relevant to the risk (i.e., fear of contagion). This may raise due to detected threats and occasionally cause behavioral contagion, panic, and emotional crisis [2,3]. The next risk is the introduction of rapid and multiple variations in regular social, work and familiar habits due to social distancing and self-isolation measures. A long self-isolation period results in more persons encounter boredom and frustration, together with concerned disease.
Well-documented psychological responses to such crisis include anxiety behaviors, anger, emotional distress, sleep disorders, fear, health concerns, depression, uncertainty, and sense of powerlessness. Moreover, the researchers who examined the long-term impact of infective epidemics exposed that few persons might improve from the symptoms of Post-Traumatic Stress Disorder (PTSD). From an individual analysis, it is understood that a person may take around 3 years to improve from PTSD that occurred as a result of epidemic. Stress-related reactions might be physiological, behavioral, emotional, or cognitive. Based on the severity, type, and timing of disclosure to stress, the resultant stress might become a risk factor for various diseases that involve psychiatric/cardiovascular diseases. An emergency-like COVID-19 situation may be assumed as a serious stress, since it is novel and unpredicted that poses severe influences on health (experience gained from both personal and through relatives) including social limitation. However, no incident by themselves is the sudden reason for illness and its consequences. Relatively, it is the perception of stress (i.e., the degree up to which an individual assumes the incident to be stressful) that changes the mental and physical responses to a condition [4,5]. Initially, it becomes important to detect vulnerable patients, and support them through efficient prevention schemes so that they can deal with such conditions quickly and escape from negative psychological results.
The current research article examines DAS levels among college students during COVID-19 lockdown since they are highly-susceptible population. The current study proposed to develop an Intelligent Feature Subset Selection (FSS) with Machine Learning based DAS predictive (IFSSML-DAS) model. The presented IFSSML-DAS model involves Group Gray Wolf Optimization-based FSS technique in order to reduce the curse of dimensionality. Moreover, Beetle Swarm Optimization-based Least Square Support Vector Machine (BSO-LSSVM) model is applied as a classifier, which includes BSO algorithm-based parameter optimization. The proposed IFSSML-DAS model was experimentally validated using a benchmark DASS-21 dataset and the results were examined under different dimensions. In short, the contributions of the study are as follows.
• Assessment of the relationship between psycho-social factors and COVID-19 lockdown among college students • Development of an automated DAS predictive model using IFSSML-DAS model to detect different levels of DAS during COVID-19 crisis. • The proposed model involves different sub-processes namely, data preprocessing, feature subset selection, classification, and parameter tuning. • GGWO-FSS technique is employed to choose an optimal feature sunset and reduce the computational complexity. • BSO-LSSVM model is proposed as a classifier to detect different levels of DAS due to COVID-19 crisis. • Application of BSO algorithm to fine tune the weights and biases of LSSVM model, thereby increasing the classification performance. • The performance of IFSSML-DAS model was validated against benchmark DASS dataset.

Related Works
The pandemic has enforced several governments to bring severe restrictions to curtail the movement of people. The governments of the worst affected countries in terms of number of patients, infections, and death levels, such as Spain, China, Ecuador, and Italy, have decided longer period of lockdown and quarantine in which the people should stay at home. This has extremely discomforted the living conditions of these countries' citizens, especially, it has damaged the countries with less assets, for example, Latin American population. Worrying features such as uncertainty about the spread of disease, evolution and immunity of the persons infected with disease and lack of vaccine to overcome the spread of disease have together resulted in an improved feeling of fear amongst the population. Such fears are created by threatening stimulation which occurred in such previous health crises too such as the outbreak of SARS/Middle East Respiratory Syndrome Coronavirus [6].
The severity of common threats and its influence of COVID-19 pandemic have been well recorded in various fields of health, well-being, development, and human survival. Ahorsu et al. [7] devised a scale to measure the fear about this pathogen depending on the survey. This scale has been extensively utilized in many countries. Several researchers have identified a connection between COVID-19 fear and anxiety [8]. Further, the researchers also used Hospital Anxiety and Depression Scale (HADS) and Depression and Anxiety Stress Scale (DASS-21) scales for less extent of depression. Recently, it is witnessed that the fear of COVID-19 is highly related to stress and anxiety with a less range of depression [9]. Though the evidences connecting depression and fear are less, suicidal thoughts have been registered in the population, because of COVID-19 [10]. Additionally, the number of cases being reported daily has increased with high mortality rates. The explosion of data, presented by the media, also started influencing the mood disorders [11]. Therefore, in the earlier stages of pandemic, Chinese scientists established reasonable and severity symptoms for DAS among Chinese population [12]. The associations among DAS have been recognized in their research works. Theoretic methods are assisted by survey while the socio-environmental stress is connected with biological process since it drives the depression about pathogens.
A linear research conducted upon younger generation also recommends the prediction of depression and stress earlier [13]. According to the research, they can recognize highly stressful conditions and there exists a close connection between depression and anxiety, i.e., persons suffer from PTSD frequently exhibit high level of anxiety and fear. Since depression and anxiety are positively connected with one another, both the functions can be utilized for predicting the other. In this situation, due to this crisis, the current study is conducted to reveal a specific variance on the basis of age and gender. Young female population show high levels of DAS and fear of COVID-19. But, all the investigations have been conducted by medical personnel [14] while only a few people know about the younger generation. University students witnessed COVID-19 fear in comparison with graduates [15]. Moreover, based on several researches, the symptoms of anxiety and depression among scholars are raising, because of social distancing and lockdown rules [16]. Fig. 1 illustrates the overall working process involved in the proposed IFSSML-DAS model. According to the figure, DAS dataset is fed as input to the preprocessing module. Next, GGWO algorithm is applied as a feature selector to derive a set of useful features. At last, BSO-LSSVM model is employed as a classifier to determine the DAS levels.

Dataset Collection
The presented IFSSML-DAS model was simulated in a benchmark DAS dataset. The dataset holds an identical number of 938 instances with 7 attributes, and 5 classes for DAS. The information related to the dataset is given in Tab. 1.

Data Preprocessing
Data preprocessing plays a vital role in improving the outcome of ML-based predictive models. This step is an important one since the data is usually noisy and occasionally has missing values. Therefore, data cleaning process is mandatory to remove the noise that exists in it. Besides, missing data imputation and data normalization using min-max technique are carried out to distribute the data, closer to normal distribution. It thereby enhances the predictive outcome of DAS classification process.

Feature Subset Selection Using GGWO Algorithm
After data preprocessing, FSS process takes place using GGWO algorithm. In this stage, the preprocessed data is fed into GGWO algorithm to elect the optimal subset of features. It is essential to decrease the feature vectors during this step for classification since the input is comprised of huge number of features equivalent to remarkable limiting. Consequently, there is a significant requirement arise to devise a method for executing feature dimensionality reduction without losing data. Both chasing process and social behavior of wolves have been mathematically established in the final objective of creating GWO [17]. The processes involved in GGWO-FSS model are concisely described herewith.  1) Feature Set Initialization: In order to optimize the features, GGWO method creates an individual population of the swarm. Finding the solution is a crucial development in enhancing the method since the best solution is identified quickly. The determined feature set is given herewith.
2) Grouping model: For group creation, grey wolfs (i.e., features) are initially ordered based on their processing from diving requests. The basic features present in the ordered wolves denote the optimal ones. The same types of features, from every collection, possess an equivalent pursuit space whereas three different collections of feature information are assumed for every instance.
3) Objective function: Fitness Function (FF) is highlighted as an important factor in classification process. Now, classification accuracy is the fundamental condition which is utilized to describe the FF. For every registered cycle, FF is evaluated as follows.
4) Updating feature selection: After considering the fitness function, the solution gets updated based on the recreation of grey wolves. Considering the initial iteration, the feature prearrangement is separated by three features such as best α, best β, and best ω. According to three grey wolves, the enhanced method is performed to select the best feature during classification procedure. The projected method utilizes the condition given below.
Coefficient vectors are given herewith.
where, r 1 and r 2 represent arbitrary values that differ in the range of [0,1] and a is linearly reduced from {2, 0}.
Hunting behavior: Chase is customarily directed by α, whereas β, and ω contribute in other rounds. The following behavior of grey wolves is based on three basic optimum solutions which are represented herewith.
Enclosing the performance, the following equation is utilized for modeling the process in a mathematical way.
where, fe α , fe β , and fe ω denote the location vectors of grey wolfs α, β, and δ. It is managed by previous place i.e., an uneven place in a represented circle. GWO has two basic variables to be balanced. In some conditions, GWO estimation is preserved as possible function with least operator to be balanced.

5) Termination process:
The algorithm gets terminated after reaching the maximum number of rounds and when the solution, encompassing the optimum fitness value, is selected. When optimum fitness with best features are attained, it denotes that the features are provided to classification method. Fig. 2 shows the flowchart of GGWO technique.

DAS Classification Using BSO-LSSVM Model
In BSO-LSSVM-based data classification process, the chosen subset of features is fed as input to allocate appropriate class labels. In addition, BSO algorithm is utilized to adjust the weight and bias values of LSSVM model to enhance the predictive outcome up to a considerable value. LSSVM is an enhanced method of SVM and is commonly employed in classification process. Though modified, its basic concept remains unchanged. To be specific, identifying the hyperplane that can optimize the classification and maximize the gap between classification processes in order to increase the reliability. The variance between SVM and LSSVM is its design of objective function of LSSVM via binomials for error factor. Simultaneously, the limitations of both methods are equally constraining and can be related to resolve the optimization issue, since LSSVM is highly constrained. Fig. 3 demonstrates the hyperplane of SVM model. Here, linear equation method remains the solution and it decreases the complexity of this method, with an increase in revolving speed. These benefits make it different from other enhancements made in SVM method [18].
where x i ∈ R n denotes the input vector of ith instance, y i ∈ R indicates the targeted value of ith instance, and l represents the number of trained instance. In particular space, LSSVM method is represented as follows.
where ϕ(x) represents the map function of non-linear conversion that maps the input sampling information to higher dimension feature space, W denotes the weight vector, and B indicates the bias. The objective function of LSSVM model is defined herewith.
Type ξ represents the error variable, and ϒ > 0 denotes the penalty coefficient. To ease the process of examination, Lagrangian function is implemented as follows.
where a i denotes Lagrange multiplier. In real-time function, KKT optimum criteria is utilized to calculate, ∂L/∂w = 0, ∂L/∂b = 0, ∂L/∂ξ = 0, and ∂L/∂a i = 0. Thus, the succeeding method of linear formula can be attained as follows.
where I represents the identity matrix. Based on Mercer condition, the kernel function is given by Once a and b are attained from the Eqs. (11) and (12), the non-linear function of LSSVM is attained as follows.
In order to tune the weight and bias parameters of LSSVM model, BSO algorithm is employed. It aims at enhancing the classification accuracy. A 10-fold Cross-Validation (CV) model is also implemented to evaluate FF. In 10-fold CV, the training dataset is arbitrarily separated into 10 mutually exclusive subsets of equivalent size, among which 9 subsets are utilized for training the data while one final subset is utilized for testing the data [19]. The aforementioned process is repeated for 10 times so that all the subsets are utilized, once for testing. FF is determined as 1 − CA validation of 10-fold CV process from the trained dataset that is demonstrated in the Eqs. (14) and (15). Also, the solution with superior CA validation denotes a lesser fitness value.
CA validation = 1 − 1 10 10 i=1 y c y c + y f × 100 (15) where, y c and y f imply the count of true and false classifications correspondingly.
In BSO method, every beetle denotes a significant solution for optimization problem and is equal to a fitness value, defined by FF. Mathematically, it borrowed the concept of PSO technique. There is a population of n beetles denoted by X = (X 1 , X 2 , · · · , X n ) in S-dimension search space, where an ith beetle denotes S-dimensional vector X i = (x i1 , x i2 , · · · , x iS ) T , indicates the location of ith beetle in S-dimensional search space, and it also denotes a significant solution to the problem. Based on the targeted function, the fitness value of every beetle location is identified. The speed of ith beetle is stated by V i = (V i1 , V i2 , · · · , V iS ) T . The separate extremity of beetle is denoted by P i = (P i1 , P i2 , · · · , P iS ) T , and the set extreme value of population is denoted by P g = P g1 , P g2 , · · · , P gS T . Mathematically it can be expressed as follows: (16) where s = 1, 2, · · · , S; i = 1, 2, · · · , n; k represents the present number of iterations. V is indicates the speed of beetle, and ξ is denotes a raise in beetle location motion and λ represents the positive constant. Later, the speed equation is as follows (17) where c 1 and c 2 denote two positive constants while r 1 and r 2 represent two arbitrary functions in the extent of zero and one. ω indicates the inertia weight [20]. In standard PSO method, ω represents the fixed constant. However, with gradual development of the method, several researchers have presented an altering inertia factor approach.
The current study follows the approach of reducing inertia weight and its equation is as follows where ω min and ω max denote minimum and maximum values of ω. k and K indicate the present and maximal number of iterations. In this study, the maximal value of ω is fixed as 0.9, and minimal value is fixed as 0.4. Thus, the technique could seek a large extent at the initiation of development and detect the best regional solutions without delay. Since ω gradually reduces, the beetle speed also gets decreased which later enters a local search. The ξ function that determines the increasing function can be evaluated as given herewith. (19) where δ denotes the step size. The search behavior of right and left antennas is correspondingly equated as follows.
To represent the search path in a visual manner, the researchers utilized a small population size and displayed the position changing procedure for ten iterations in 3-dimensional space. Since factors like step length and inertial weight coefficient get reduced during iterative procedure, the technique does not meet the targeted point too early. So, the group avoids from falling under local optimum group significantly. BSO technique initializes a group of arbitrary solutions. In every iteration, the search agent upgrades its position according to the individual search method while the optimum solution is presently accessible. The integration of these two portions cannot speed up the population iterations. However, it decreases the likelihood of population falls to local optimal that is highly stable when handling high dimension problems. In this concept, BSO method involves the capacity of exploitation and exploration, thus it belongs to global optimization. Moreover, a linear integration of speed and beetle search processes improves the accuracy and rapidity of population optimization and results in highly stable method.

Experimental Validation
The current section examines the performance validation of the proposed model on the applied DAS dataset. Tab. 2 and Fig. 4 shows the results of FSS analysis attained by GGWO algorithm and other existing FSS algorithms. The resultant table values denote that GGWO algorithm accomplished better outcomes which can be understood from the least cost it achieved. For instance, on the applied depression dataset, GGWO algorithm obtained the least cost of 0.0083, whereas GWO, ACO, and PSO algorithms incurred heavy costs such as 0.0187, 0.0259, and 0.0351 respectively and performed badly. Likewise, on the applied anxiety dataset, GGWO algorithm obtained the least cost of 0.0173, whereas GWO, ACO, and PSO algorithms achieved worst performance and their best cost was 0.0190, 0.0268, and 0.0391 respectively. At last, on the applied depression dataset, GGWO model achieved the least cost of 0.0457, whereas GWO, ACO, and PSO algorithms accomplished bad performance with best costs such as 0.0782, 0.0829, and 0.0973 correspondingly.

Conclusion
The current study designed an automated DAS predictive model i.e., IFSSML-DAS model to observe the relationship between psycho-social factors and COVID-19 lockdown among college students during COVID-19 crisis. It involves different sub-processes namely, data preprocessing, FSS, classification, and parameter tuning. The presented method employed GGWO-FSS technique to choose an optimal subset of features and to reduce the computational complexity. Followed by, BSO-LSSVM model is used as a classifier to detect different levels of DAS that arose as a result of COVID-19 crisis. While BSO algorithm is applied to fine-tune the weights and biases of LSSVM model so as to increase the performance of classification. The proposed IFSSML-DAS model was experimentally validated on a benchmark DASS-21 dataset and the results were examined under different dimensions. The experimental outcomes of the proposed IFSSML-DAS model were promising in comparison with other state-of-the-art methods. In future, a new classification model can be designed for crisis management in small-to-medium-sized enterprises during COVID-19 lockdown.