Modelling the Psychological Impact of COVID-19 in Saudi Arabia Using Machine Learning

: This article aims to assess health habits, safety behaviors, and anxiety factors in the community during the novel coronavirus disease (COVID-19) pandemic in Saudi Arabia based on primary data collected through a questionnaire with 320 respondents. In other words, this paper aims to provide empirical insights into the correlation and the correspondence between sociodemographic factors (gender, nationality, age, citizenship factors, income, and education), and psycho-behavioral effects on individuals in response to the emergence of this new pandemic. To focus on the interaction between these variables and their effects, we suggest different methods of analysis, comprising regression trees and support vector machine regression (SVMR) algorithms. According to the regression tree results, the age variable plays a predominant role in health habits, safety behaviors, and anxiety. The health habit index, which focuses on the extent of behavioral change toward the commitment to use the health and protection methods, is highly affected by gender and age factors. The average monthly income is also a relevant factor but has contrasting effects during the COVID-19 pandemic period. The results of the SVMR model reveal a strong positiveeffect of income, with R 2 values of 99.59%, 99.93% and 99.88% corresponding to health habits, safety


Introduction
With the announcement by the World Health Organization (WHO) that the new coronavirus disease (COVID-19) had become a global pandemic, the rapid activation of "social spacing" or "social distancing" was recommended. These terms refer to making space between people when they are outside their homes, such as by avoiding crowded places and mass gatherings, for the sake of preventing the spread of the disease [1][2][3][4][5][6]. However, there must be special care taken when applying the term "social distancing," which implies closing schools, universities, and some institutions. Such practices can be compensated for by working remotely from home if possible and minimizing physical interaction with others. The SARS epidemic resulted in 8,000 cases and 800 deaths worldwide (in 26 countries) in 2003 [7]; it was controlled by July 2003, within eight months [7,8]. Symptoms of moderate to severe post-traumatic stress were reported among the population in areas that were severely affected by the SARS epidemic [9][10][11][12][13]. PTSD is a psychological result of exposure to traumatic events and is characterized by symptoms of re-experiencing the trauma and avoiding trauma reminders. Earlier studies have shown that Middle East respiratory syndrome, the H1N1 virus, and Ebolavirus have affected people's mental health, resulting in such symptoms as anxiety, depression, and post-traumatic stress. Likewise, patients with coronavirus have mental health problems with similar symptoms [12][13][14]. Furthermore, among the reported effects of these diseases, people have experienced food and resource insecurity and discrimination. Such effects can lead to some negative mental health outcomes during these epidemics [14][15][16][17][18][19].
People around the world have developed significant anxiety because the new coronavirus has spread globally; the best way to for people to protect themselves is to stay in their homes as a necessary step to avoid infection. Thus, people are affected by being isolated in their homes and exposed to changing social and economic lifestyles. Health organizations and authorities have reported that people with prior health problems and older people are prone to death when they contract the virus in this pandemic. This has led country authorities and health organizations to impose measures for limiting the spread of COVID-19. They have called on individuals to avoid gatherings except in necessary cases and to stay in their homes, as well as to ensure hygiene. Such warnings and measures have caused panic in various societies; as experts indicate, this will be difficult to contain [20].
There is a paucity of research that has assessed mental health concerns during the COVID-19 epidemic. Considering all the factors mentioned above, to address this gap, this study aims to assess knowledge, attitudes, anxiety, and perceived mental healthcare needs in the community during the coronavirus epidemic in Saudi Arabia. This research paper attempts to assess health habits, safety behaviors, and anxiety factors in the Saudi population. Coronavirus has left the world in a state of uncertainty, and the flow of news has engendered constant anxiety. Following the news is understandable, but the worry this causes can enhance any psychological problems that a person already has [21]. When the WHO published its recommendations on how to protect mental health in the face of the coronavirus outbreak, the measure was widely welcomed. Therefore, this study aims to investigate whether there is an immediate effect of the COVID-19 epidemic on health habits, safety behaviors, and anxiety. It further examines the lifestyle habits and quality of life among Saudi people, especially after the imposed quarantine and travel restrictions by the government.
The anxiety and concerns in society globally affect every individual to varying degrees [22]. Recent evidence has proposed that isolated and quarantined individuals suffer from great distress in the form of anxiety, anger, confusion, and post-traumatic stress symptoms [23]. Public knowledge and attitudes are expected to largely affect the degree of commitment to personal protective measures, and eventually, the clinical outcome [24][25][26]. Therefore, it is important to study this phenomenon in society.
According to the objective of the National Mental Health Program to develop and encourage mental health care services, community members first have to be evaluated to identify their perceptions, knowledge, and attitudes toward people who access mental health services [27][28][29]. Developing mental health services can assist the government in reducing the spread of pandemics.
By the beginning of March 2020, the Saudi government had reported the first case of COVID-19 in the city of Qatif. At the end of March 2020, about 1,563 confirmed cases had been discovered in different cities. As a result of the increase in deaths, levels of fear and anxiety escalated among people.

Overview of the Dataset and Variables
This section presents the material and methods of the research.

Design of Questionnaire Tool
The questionnaire was developed to measure the changes in societal behaviors that have occurred during the emergence of the COVID-19 pandemic. It has three main parts. The first part consists of five questions on demographic information, specifically, gender, age, nationality, education, and average monthly income. The second part consists of 11 questions focused on testing the hypotheses represented by the current behaviors in society in light of facing COVID-19 and changes in social behaviors that may continue after COVID-19. The final part of the questionnaire is composed of nine questions that focus on the extent of behavioral change toward the commitment to use the safety and protection measures of COVID-19. A 5-point Likert items was used with the following response options: "strongly agree," "agree," "neutral," "disagree," and "strongly disagree." The level of commitment was measured according to methods of protecting against exposure to the coronavirus by selecting measures with a score of 1-5, representing "rarely or never," "rarely," "seldom," "sometimes," and "always."

Data Description
Tab. 1 reports the descriptive statistics and the demographic information about the survey respondents. Most of the respondents were men (76.88%), aged between 30 and 50 years, and 53.44% of the participants had a bachelor's degree. Respondents were asked about their average monthly income and nationality. It was observed that most respondents were Saudi (90%), while 23.13% received less than 3,000 riyals, 19.69% received 3000-9000 riyals, and 57.18% received more than 9,000 riyals. Tab. 2 presents the items, and their corresponding descriptive statistics are shown. We used Cronbach's alpha to assess the internal consistency (or reliability) of the set of scale items [30]. In fact, the reliability of any given measurement refers to the extent to which it is a consistent measure of a concept, and the alpha coefficient is one way of measuring the strength of that consistency [31,32]. When calculating alpha, we ensured that all items were formulated in the same direction (positively or negatively worded). This needed to be reversed prior to the reliability analysis [31,33]. There were 20 items on the scale, and the scale reliability coefficient (Cronbach's alpha) was 0.7937. However, it was inappropriate to calculate one Cronbach's alpha value for all 20 items. Thus, Tab. 2 also reports the value for each item.

Methods of Analysis
We used exploratory factor analysis for data reduction. This approach reduces the number of variables by describing linear combinations of the items that contain most of the information and admit meaningful interpretations.

Exploratory Factor Analysis
Principal component factor (PCF) analysis allows us to transform the variables (20 items) such that they are statistically correlated into a few independent factors. We use this approach as an exploratory tool to reduce the information into several components expressing the maximum variance of the data. The meaning and interpretation of the obtained factors are deduced from the initial items with which they are strongly associated. The factoring procedure consists of seeking, within a cloud of points, an axis for which the inertia projected on this axis is maximized [34]. This makes it possible to highlight latent components, considering the total variance of all the initial variables (generating a synthetic quantity that best differentiates the individual behaviors). The latent components are linear combinations of the initial variables. Graphically, the observations are projected on axes, and their new coordinates are the values of the main component. Indeed, the highly correlated variables are grouped around an axis that represents a newly constructed factor. The new axes are the eigenvectors, ordered by decreasing eigenvalues, of the covariance matrix of the data. Bartlett and Kaiser-Meyer-Olkin (KMO) sphericity tests are used to ensure the validity of measurement scales [35,36]. In the factoring procedure, we use the polychoric correlation matrix, which assumes that the ordinary items are imperfect measures of underlying latent continuous ones [37]. Determining the number of factors to extract is also a critical decision in exploratory factor analysis [38]. The Kaiser criterion suggests retaining those factors with eigenvalues equal to or higher than 1. However, this criterion is well known for overspecifying the number of factors; that is, the criterion suggests more factors than it should [39]. Parallel analysis [40] adjusts the original eigenvalues for sampling error-induced collinearity among the variables to arrive at adjusted eigenvalues. It compares the randomly generated eigenvalues with those from the original analysis [33,39,41].

Non-Parametric Regression Trees
We use non-parametric regression trees to divide our observation space to analyze the interactions between the individual characteristics and the three obtained factors-health habits, safety behaviors, and anxiety. Regression tree analysis is a machine learning approach that aims to accurately predict the value of the output variable from certain explanatory variables [42]. In fact, a regression tree establishes a hierarchy between the explanatory variables using their contribution to the overall fit of the regression. More exactly, it divides the set of observations into subclasses characterized by their values in terms of their contribution to the overall fit and their prediction for the dependent variable [43]. These values are validated against a fraction (10%) of the sample that is not used in the estimation. The value at which the partitioning is stopped (and the tree cut) is given by the complexity parameter (cp) [44]. A regression tree is flexible and powerful in the clarification of the structure of the observations [42,45,46]. The tree gives a hierarchical sequence of "conditions" on the independent variables of the model: The higher the role of a condition in the classification of the observed cases, the higher its status on the tree. For each condition, the left branch gives the case for which the condition is true, and the right branch gives the case that is compatible with the complementary condition [47][48][49][50].

Support Vector Machine Regression (SVMR)
In this subsection, the support vector machine regression (SVMR) algorithm is applied to find the behavioral and perceived mental responses of Saudi society. The SVMR algorithm includes a nonlinear mapping of an n-dimensional information space into a high-dimensional component space [51].
The support vector (SV) algorithm is a machine learning algorithm in the framework of statistical learning. This theory has been developed over the last four decades by [52]. The SVM algorithm is used to classify hyperspectral images SVR [53]. The authors have compared the performance of this algorithm with existing research. They applied three algorithms, namely, the MLR, SVR, and artificial neural network (ANN) algorithms, to predict load performance using hyperspectral data [54]. The SVR algorithm is regarded as an application of trickle bed reactors where key design variables for numerous correlations exist in the literature [55]. The SVMR model has great potential and superior performance, as has been witnessed in many existing studies. According to researchers [56] using SVR for travel-time forecasting, the prediction results are compared with another baseline travel-time prediction model using real highway traffic data. The SVMR model was applied to predict network traffic. Reference [57] applied SVR to predict the link load of a network. The results showed that the SVR model is robust for modeling network traffic. The main object of the SVM algorithm is classified data, but using the SVM algorithm for regression, we have used an -insensitive SVR, and the main ideal is to find the function f (x) that deviates from the obtained target y i for the training data. The main principle is the same as SVM classification, but we have a new function to be minimized, where From this equation, we must solve the problem subject to If the problem is not feasible, we need to introduce the slack variables ξ i , ξ * i . This is called the soft margin.
subject to For determination of the trade-off between the flatness of f (x) using the -insensitive SVR, when the amount up for deviations is larger than s are tolerated. This case is called -insensitive loss function |ξ |.

Normalization
The normalization method has been employed to improve time series models. We applied the min-max method to scale the data with the same range to assist the SVM regression model to obtain an appropriate output. The min-max method maps input data into a predefined range [0,1]: where x min is the minimum of the data and x max is the maximum of the data. Furthermore, New min x is the minimum number 0 and New max x is the maximum number 1.

Evaluation Metrics
Several evaluation metrics have been used to test and evaluate SVMR for predicting health habits, safety behaviors, and anxiety. The evaluation metrics used here were the R-squared value, mean square error (MSE), and root mean square error (RMSE). A short description of these evaluation metrics along with their formulations are described below.
Mean Square Error (MSE): where x t comprises observed responses and x t represents estimated responses.

Results
Regression trees and SVMR techniques were used to determine the health habits, safety behaviors, and anxiety caused by COVID-19 in Saudi Arabian society.

Results of Factor Analysis
The results of the principal component factors analysis are given in Tab. 3. Five factors are retained. Factor loadings with absolute values smaller than 0.4 are replaced by blanks. Each principal component (axis) explains a linear combination of a group of interrelated variables having the greatest contribution to the axis. However, Horn's [40] parallel analysis criterion confirms the existence of three components. Furthermore, the results are confirmed using the hierarchical clustering of variables around latent components, clearly showing the existence of three factors. Thus, the first three axes will be retained for the analysis.
The first principal component accounts for the maximum of the variance in the data. The uniqueness is the proportion of the variance of the variable that is not explained by common factors. It should be noted that a greater "uniqueness" is associated with a lower relevance of the variable in the factor model. Generally, the extracted factors should account for at least 50% of a variable's variance. Thus, the uniqueness values should be below 0.5. The KMO measure of sampling adequacy is equal to 0.889, which indicates that the items correlate sufficiently [35]. The score of Bartlett's test of sphericity is significant (p-value = 0), which indicates that the components used are not correlated and are suitable for use in factor analysis (i.e., the correlation matrix is not an identity matrix). Thus, the results of factor analysis will be appropriate [35]. In addition, the likelihood ratio (LR) test of the independent versus saturated model gives χ 2 (190) = 4749.39 with a p-value = 0. This indicates that a no-factor (independent) model does not fit the observed correlation matrix significantly in a way that is better than the saturated or the perfect-fit model. The first three retained axes show that these factors correspond to what would be easily considered health habits, safety behaviors, and anxiety. We calculate the new composite scores as the means of the items related to each factor. (Their continuous values are between 1 and 5).

Results of the Regression Tree Analysis
The regression trees help us to better understand the interactions between the independent variables and the possible complementarity (or substitutability) that can exist between them in relation to health habits, safety behaviors, or anxiety. In the trees below, we include the five variables (nationality, gender, age, education, and monthly income) as characteristics that are potentially related to the individual behaviors. They explore which combinations of the characteristics are associated with high (or low) expected values of health habits, safety behaviors, or anxiety. The regression trees select the more relevant variables and are read from the root upwards [58]. Fig. 2a shows that the initial split of the health habits was effected by the gender variable (which plays a predominant role), with affirmation of that item leading to a second item on the left, which is the average monthly income for male gender. Node 2, on the right side of the tree in Fig. 2a, shows the data related to the variable "age" for female gender. The highest expected value of the composite factor (health habit) is observed on the right side of the tree, when 44 women are more than 30 years old with an expected mean E(health habit) = 4.306 and 30 women are below 30 years old with an expected mean E(health habit) = 3.785. The second node, on the left side, indicates that the best variable to classify and predict health habits is the average monthly income for the men. We observe that 155 men with a monthly income of more than 10,000 riyals have an expected mean of 3.841. The regression tree in Fig. 2c also provides useful information on the variables used at each split of the main determinants of the safety behavior index. The variables used in the final regression tree were as follows: monthly income, nationality, and age. The first split used to bifurcate the data was the monthly income (Fig. 2b). The persons with an average monthly income above 20,000 riyals were found to have an expected safety behavior index of 3.45 (n = 24 persons). The highest expected value of the composite factor (more than 4) can be observed on the right side of the tree for Saudi citizens with an average monthly income under 20,000 riyals. For persons above 30 years old, the expected mean is E(safety behavior) = 4.167 (n = 175 persons). For Saudi citizens aged under 30 years old and with an average monthly income above 13.5 thousand riyals, E(safety behavior) = 4.511.

Results of the SVMR Model
This section presents and describes the results of the SVMR model to predict health habits, safety behaviors, and anxiety from COVID-19 in Saudi Arabia. We have considered five independent variables, which are as follows: age, income, education, gender, and nationality. The dependent variables are health habits, safety behaviors, and anxiety scores. The min-m method was employed to normalize the data.
Tab. 4 summarizes the empirical results of the health habits factors with the independent variables. The analysis results show that income, gender and nationality have a positive relationship with health habits. The significant results are 99.59%, 95.55%, and 95.55% with respect to the R 2 metric. It is noted that the income, gender, and nationality variables have a strong relationship with health habits. Figs. 3a-3d show the performance of the SVMR model for estimating the health habits factors related to COVID-19.    Tab. 5 demonstrates the results of the SVMR model to discover the correlation between the independent variables of age, income, education, and nationality and the dependent safety behavior factor. From the prediction results, it is indicated that the income and age variables have more association with the safety behavior factor in relation to COVID-19. The regression result of income with safety behaviors factors is R = 99.93%, whereas the relationship between the age and safety behaviors factors is R = 92.80%. Figs. 4a-4e illustrates the performance of SVMR for estimating the health habits factors related to COVID-19. It is concluded that the income and age variables have a strong relationship with the safety behavior factor.
Tab. 6 summarizes the output of the SVMR model to predict the relationship between the independent variables and anxiety-dependent factors related to COVID-19 in Saudi society. It discloses that all variables of the proposed research (independents and dependents) have the strongest positive relationship with each other. Among them, there are relationships between the income and education variables and the anxiety factor. It is noted that these positive relationships are 99.88% and 87.87% in terms of the R 2 metric. Figs. 5a-5e show the performance of the SVMR model to estimate independent variables, such as age, income, education, and nationality, with the anxiety factor (dependent variable) in relation to COVID-19. We conclude that income and education have a positive influence on the anxiety factor in Saudi society during the COVID-19 outbreak.

Discussion
This paper provides insights into the correlation between health habits, safety behaviors, and anxiety on the one hand and age, gender, nationality, income, and citizenship factors on the other. It investigates individual behaviors during the COVID-19 pandemic in Saudi Arabia. Which combinations of the characteristics favor the psycho-behavioral responses to COVID-19 in Saudi Arabia? We used regression trees to answer this question. The regression tree analysis was performed to identify relevant discriminating factors (age, gender, nationality, income, and citizenship) that affect individual behavior during the COVID-19 pandemic. The scores calculated using principal factor analysis were used as continuous variables in our analysis.
For each index, Tab. 7 gives the combinations of characteristics that correspond to the lowest and highest average expected scores. This table summarizes the paths as they appear in the regression trees. A score is considered high if its expected value is more than 4 and low if its mean is less than 3. The numbers in parentheses give the order of importance (level) of the corresponding factors. It should be noted that women over 30 years of age have a high expected mean for the health habit index, whereas they have a somewhat high average of anxiety if they are under 30 years old. Women spend a lot of time shopping, and their food purchases increase during the quarantine period. They spend a lot of time keeping track of coronavirus updates via news channels and social media. Furthermore, women feel afraid of infection with the coronavirus whenever they feel similar symptoms. However, men with an average monthly income of less than 10,500 Saudi riyals and a university education show a somewhat high average health habit score. The health habit index consists of nine items (q12 to q20) that focus on the extent of behavioral change toward the commitment to use the health and protection methods related to COVID-19. We also note that people aged over 30 years have the lowest expected score for anxiety (E(Y) = 2.325, n = 175) but the highest scores for health habit (E(Y) = 4.306; n = 44) and safety behaviors (E(Y) = 4.167, n = 175). Participant responses indicate that the respondents have more knowledge of COVID-19 and the methods of prevention. Quarantine has not affected their behavior in general, but they avoid leaving the house frequently because of fear of infection and to support public health. The responses indicate that respondents do not leave the house frequently because of the mandatory quarantine. However, the findings show that only nine respondents aged below 30 years have higher expected scores for safety behaviors. Moreover, Fig. 2 highlight that the average monthly income plays predominant and contrasting roles in health habits, safety behaviors, and anxiety during the COVID-19 pandemic period. Saudi citizens have a higher predicted average score of safety behaviors than other residents do.
The results indicate that 87.8% of the respondents were experiencing mild anxiety and 12.2% moderate anxiety during the COVID-19 outbreak, where the predicted means were between 3 and 4. Unfortunately, the results do not show whether economic effects and effects on daily life, or delays in economic activities, are positively associated with anxiety symptoms. Social support may be negatively correlated with the level of anxiety regarding the COVID-19 epidemic [59]. Similarly, the place of residence and the source of income affect the anxiety level resulting from COVID-19 [59]. The respondents' anxiety may be related to the place of residence, the source of income, and the effect of the coronavirus on their employment or studies [60][61][62]. The differences between individuals' behaviors could potentially be explained by the imbalance of economic, cultural, and educational resources in urban versus rural areas. Some regions are relatively prosperous and provide citizens with more sanitary conditions, which are better in cities than in towns and villages [63,64]. The stability of family income is a factor in experiencing anxiety during the coronavirus period, which could be explained by increased psychological and economic pressure [60]. Because of the outbreak, some families will lose their source of income, and individuals might feel anxious about their financial obligations and daily expenses.

Conclusion
The coronavirus pandemic has had an extensive psychological impact on the Saudi Arabian population. This paper assessed empirical evidence on the role of sociodemographic characteristics (gender, age, nationality, education, and average monthly income) in psycho-behavioral responses in Saudi Arabia. Regression tree and SVMR model analyses were performed to identify relevant discriminating factors (age, gender, nationality, income, and citizenship) that affect individual behavior during the COVID-19 pandemic. This permitted a better understanding of the interactions between the independent factors and the possible complementarity (or substitutability) that can exist between them in relation to health habits, safety behaviors, or anxiety scores. The scores were calculated using PCF analysis used as continuous variables in our analysis. It was revealed that the age variable plays a predominant role in health habits, safety behaviors, and anxiety according to the regression tree analysis model, whereas the SVMR model demonstrated that the income variable has a strong and positive relationship with health habits, safety behaviors, and anxiety factors. The health habit index, which focuses on the extent of behavioral change toward the commitment to use the health and protection methods related to COVID-19, is highly affected by the gender and age factors. We found that women over 30 years of age had a high expected mean on the health habit index, whereas women under 30 years old had a medium average for anxiety. The average monthly income was also a relevant factor, but it had contrasting effects on health habits, safety behaviors, and anxiety during the COVID-19 pandemic period.
This study could serve as a basis for future in-depth research papers on the effect of COVID-19 on MENA countries, where a woefully limited number of studies has been published. Despite the robust findings in the current study and the importance of the topic related to the analysis of the effect of COVID-19 on individual behavior, which is under-represented in the literature, we note several limitations of our research. Specifically, this study only included people who had smartphones, email addresses, and the ability to speak Arabic and English. Saudi Arabia has different communities that do not speak Arabic or English, including Indian, Bangladeshi, and Indonesian communities, and they were not represented. As another limitation, most of the respondents were relatively educated; this study could be improved by including non-educated people to increase awareness and mental health care in all Saudi communities in future possible epidemics. As a final limitation, the respondents of the current study mostly belonged to the Saudi nationality; thus, including other nationalities in future research could enhance the mental healthcare systems.

Funding Statement:
The author(s) received no specific funding for this study.

Conflicts of Interest:
The authors declare that they have no conflicts of interest to report regarding the present study.