Deep Learning Based Face Mask Detection in Religious Mass Gathering During COVID-19 Pandemic

Notwithstanding the religious intention of billions of devotees, the religious mass gathering increased major public health concerns since it likely became a huge super spreading event for the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2). Most attendees ignored preventive measures, namely maintaining physical distance, practising hand hygiene, and wearing facemasks. Wearing a face mask in public areas protects people from spreading COVID-19. Artificial intelligence (AI) based on deep learning (DL) and machine learning (ML) could assist in fighting covid-19 in several ways. This study introduces a new deep learning-based Face Mask Detection in Religious Mass Gathering (DLFMD-RMG) technique during the COVID-19 pandemic. The DLFMD-RMG technique focuses mainly on detecting face masks in a religious mass gathering. To accomplish this, the presented DLFMD-RMG technique undergoes two pre-processing levels: Bilateral Filtering (BF) and Contrast Enhancement. For face detection, the DLFMD-RMG technique uses YOLOv5 with a ResNet-50 detector. In addition, the face detection performance can be improved by the seeker optimization algorithm (SOA) for tuning the hyperparameter of the ResNet-50 module, showing the novelty of the work. At last, the faces with and without masks are classified using the Fuzzy Neural Network (FNN) model. The stimulation study of the DLFMD-RMG algorithm is examined on a benchmark dataset. The results highlighted the remarkable performance of the DLFMD-RMG model algorithm in other recent approaches.


Introduction
Religious events involving the gathering of an enormous number of practitioners of the religion are significant ritualized features. A massive gathering event (MGE) can be no exception, an assembly of many people attending an organized event within a definite space [1]. There exist large public health consequences of MGE. Public health concerns during those events could encompass a range of health problems: spreading food, air, and water, exacerbating non-communicable diseases, vector-borne infectious disease, accidents, health consequences of alcohol and substance abuse, stampedes, and worsening of mental health problems [2]. The mass gathering is considered by the concentration of people at a particular place for a particular goal over a key planning recommendation for mass gathering during COVID19: Interim guidance-2-set time that can potentially strain the response and planning resources of the host community or country.
The mass gathering occurs as a single event or a grouping of numerous events at diverse places [3]. They might be private or public, spontaneous or planned, one-off or recurrent, and of different times and sizes. The range of mass gatherings is wider, from music, sports, religious, entertainment, or business events, to huge meetings and conferences. Some health interferences, like mass drug administrations or immunization campaigns, are considered mass gatherings [4]. The mass gathering includes high-visibility events, frequently related to international travel, great participation, extended media coverage, prolonged duration, and many venues (multiple host countries). Also, high-visibility events are often related to the improved frequency of small private gatherings (at home, in streets, restaurants or bars, etc.), which could characterize further difficulty since they are less controlled [5]. During the COVID19 pandemic, mass gatherings could be closely related to the high risk of spreading SARS-CoV-2; also, they can potentially strain the response and planning resources of the host community or country and be related to destructive effects on health services [6].
Organizing crowded event frequently includes complicated tasks, however, organizing them during pandemics or ongoing crises, namely COVID19 which needs adherence to numerous constraints, is increasingly sophisticated [7]. Also, many out-of-control crowded events exist with or without the participation of regulatory bodies (government of the region or country). Besides washing hands frequently and maintaining physical distancing, the appropriate usage of facemasks has now developed as the pillar to preventing community transmission of the disease [8]. The purpose is to protect oneself from getting infected and spreading the virus. Now, policymaker faces several risks and challenges while facing the transmission and spreading of COVID19. The laws and rules emerged as an action to considerably expand cases and deaths in several regions [9,10]. But monitoring massive groups of people has become increasingly complex.
This study introduces a new Deep Learning Based Face Mask Detection in Religious Mass Gathering (DLFMD-RMG) technique during the COVID-19 pandemic. The DLFMD-RMG technique focuses mainly on detecting face masks in religious mass gatherings. To accomplish this, the presented DLFMD-RMG technique undergoes two pre-processing levels: Bilateral Filtering (BF) and Contrast Enhancement. For face detection, the DLFMD-RMG technique uses YOLOv5 with a ResNet-50 detector. In addition, the face detection performance can be improved by the seeker optimization algorithm (SOA) for tuning the hyperparameter of the ResNet-50 model. At last, the faces with and without masks are classified using the Fuzzy Neural Network (FNN) model. The stimulation study of the DLFMD-RMG model is examined on benchmark datasets.

Literature Review
In [11], a single shot multibox detector and MobileNetV2 (SSDMNV2) proposed for the detection of face masks utilizing MobileNetV2, TensorFlow, Keras, and OpenCV Deep Neural Network (DNN) structure are utilized as image classification. OpenCV DNN employed in SSDMNV2 has SSD, including ResNet-10, as the backbone and can identify faces from many angles. While MobileNetV2 offers lightweight and precise estimations for classification. In [12], a hybrid method utilizing deep and traditional ML for face mask detection is presented. This presented technique method has 2 elements, namely classification and feature extraction.
Gupta et al. [13] introduce a mask detector that utilizes an ML facial classification mechanism to determine whether the mask is worn. It is linked to a closed-circuit television (CCTV) mechanism verifying that only individuals wearing masks are allowable. Asif et al. [14] propose automatically utilizing DL to identify face masks in the video. The presented structure has 2 elements. Initially, it is devised to track and detect faces through ML and OpenCV; then, facial frames are processed into modelled deep transfer learning (DTL) MobileNetV2 to identify the mask region. In [15], an IoT-based smart door utilizes an ML technique for detecting face masks and monitoring body temperature. This method is utilized in apartment entrances, shopping malls, hotels, etc.
Loey et al. [16] intend to localize and annotate the face mask object in real-time images. Wearing a face mask in public places would protect individuals from the spread of transmission among them. The presented method has 2 elements. The first element was devised for the feature-extracting process related to the ResNet50 DTL method. At the same time, the second element was devised for detecting face masks related to YOLO v2. Singh et al. [17] suggest a method that would draw bounding boxes (green or red) around the faces of individuals, whether an individual was wearing a mask or not. The authors have also associated the performance of both methods: inference time and precision rate.

The Proposed Model
In this study, new DLFMD-RMG technology has been developed. The DLFMD-RMG technique focuses mainly on detecting face masks in religious mass gatherings. The DLFMD-RMG technique undergoes two pre-processing levels: the BF technique and Contrast Enhancement. Then, it involves two processes: face detection and face mask classification. Fig. 1 defines the overall flow of the DLFMD-RMG system.

Face Detection Module
For face detection, the DLFMD-RMG technique uses YOLOv5 with a ResNet-50 detector. YOLO is a novel target detection technique with higher accuracy and faster detection. The base YOLO model process image is at forty-five frames every second [18]. The YOLO network has 2 FC layers and twenty-four convolution layers. Interchanging 1 × 1 convolution layers reduces the feature space from previous layers. YOLO could pre-train the convolution layer on ImageNet classifier tasks at half the resolution (224 × 224 input image) and doubles the resolution for recognition. Several YOLO versions were generated. The YOLO v3 approach involves three techniques: YOLO v3-SPP, YOLO v3, and YOLO v3-tiny; the YOLO v4 approach involves four techniques: YOLO v4m-mish, YOLO v4s-mish, YOLO v4x-mish, and YOLO v4l-mish; YOLO v5 architecture involves YOLO v5s, YOLO v5n, YOLO v5x, and YOLO v5l.
The study chooses the YOLO v4s-mish, YOLO v5s, and YOLO v3 approaches for the study. The major reason is: firstly, the three modules could produce a test outcomes graph with a similar indicator that is appropriate for analysis and comparison; next, the three methods are light weighted, appropriate for target recognition in smaller scenes and smaller and medium datasets; Then, the 3 approaches depend on PyTorch DL architecture proposed by Facebook, and have better outcomes in target recognition.
The detection network is a convolution neural network (CNN) that encompasses output, transform, and convolution layers. The transform layer extracts activation of the convolution layer and increases the stability of the DNN. The location of pure bounding boxes can be generated using the output layer.
In this work, ResNet50 is employed as a deep transfer mechanism for extracting features [19]. A ResNet is a type of DTL that uses the residual network. ResNet50 has sixteen residual bottleneck blocks; every block has a convolutional size of 1 Â 1; 3 Â 3 and 1 Â 1 with a feature map (64, 128, 256, 512, 1024). To calculate MSE loss among the target and predicted bounding boxes, the loss function of YOLO v5 is computed by: The coefficient for computing the localization loss includes the grid cells' height (h) and width (w), which is given below.
From the expression, q 1 refers to a weight, g indicates the number of grid cells, v denotes the number of bounding boxes in g; x i ; y i ð Þ represents a center of v in g; w i ; h i ð Þ implies a width and height of v in g;x i ;ŷ i ð Þ symbolizes a center of the target in every g; ðŵ i ;ĥ i Þ signify a center of the target in every g; 1 obj ij means 1 when there is an object in v in every g or else 0: Once the object is identified in v bounding boxes of g, Confidence loss calculates the confidence score of error. It is evaluated in the following expression.

Confidence loss
In Eq. (3), q 2 ; q 3 represents a weight of confidence error, cs i shows the confidence score of v in g; 1 obj ij is 1 when there is an object in v in every g or else 0; 1 noobj ij is 1 when there is no object in v in every g or else 0 is determined as follows.
In Eq. (4), q 4 indicates the weight of classifier error, p i z ð Þ, and b p i z ð Þ denotes the likelihood of the estimated object and actual conditional classes in a grid cell.
In addition, the face detection performance can be improved by the SOA for the hyperparameter tuning of the ResNet-50 model. The SOA performs extensive analysis of human search behaviours [20]. Through "experience gradient", the search direction has been determined, and uncertain reasoning is used to solve the search step measurement.
The SOA has three major updating steps. Here, i refers to i-th individual searchers, and j characterizes the dimension of the individual. s shows the overall number of individuals; D represents the overall number of dimensions; t indicates the present algebra, and iter max signifies the maximal optimization algebra. x ij t ð Þ and x ij t þ 1 ð Þ, correspondingly, epitomize the searcher site at t and t þ 1 ð Þ algebra.
The forward direction of a search can be determined using the experience gradient attained from the individual movement and the assessment of other individuals searching previous locations. The preemptive directionf i;p t ð Þ, egoistic directionf i; e t ð Þ, and altruistic directionf i; a t ð Þ of i-th individuals in other dimensions could be attained.
The searcher uses a random weighted average to attain the search direction.
In Eq. (6), t 1 ; t 2 2 t; t À 1; t À 2 f g ; x ! i t 1 ð Þ andx i t 2 ð Þ denotes the best advantage of x i t À 2 ð Þ;x t À 1 ð Þ;x i t ð Þ f g ; g i; best refers to the previous optimum position in the locality where i-th search factors are positioned; p i; best denotes the optimal neighborhood from i-th search factors for the present locality; w 1 and iw 1 indicate random numbers between zero and one; x indicates the weight of inertia.
SOA describes the reasoning of fuzzy approximation capability. Through computer language, natural human language stimulates the behaviors of human intelligence. When the algorithm expressed fuzzy rules, it adapted to a better estimate of the optimization problem. But the smallest fitness corresponds to the smallest search step length. l a ð Þ ¼ e Àa 2 =2d 2 ; In Eq. (7), a and d represent parameters of the membership function. The likelihood of the output parameter exceeding À3d; 3d ½ is lesser than 0.0111. Then, l min ¼ 0:011. Usually, the optimum location of the individual has l max ¼ 1:0, and the worst location is 0.0111. Choose the following function as a fuzzy parameter with "smaller" target function values: From the expression, l ij can be defined using the above formula. I i refers to the amount of the sequence X t ð Þ of the present individual ordered from higher to lower values. The function rand l i ; 1 ð Þrepresents the real number within l i ; 1 ½ . It is noted Eq. (8) simulate the random search behaviors of human.
Step measurement of j-th dimension search interspace is defined as follows.
In Eq. (10), d ij denotes a variable of the Gaussian distribution function as follows. x Now, x indicates the weight of inertia. x linearly reduced from 0.9 to 0.1 as the evolutionary algebra surges.x min andx max are the variates of minimal and maximal values.
After attaining the scout step measurement and direction, the position update is denoted as . . . ; s; j ¼ 1; 2; . . . ; D: f ij t ð Þ and a ij t ð Þ, correspondingly, the searcher search direction and step size at t time.
The SOA algorithm derives a fitness function from accomplishing better classification performance. It describes a positive integer to characterize the improved candidate solution performance. The reduction of the classification error rate is regarded as the fitness function in the following.

Face Mask Classification Model
In this study, the faces with and without masks are classified using the FNN model. NN is an effective and dynamic approach to understanding supervised learning [21]. The FNN includes output, input, rule, and membership function (MF) layers. Fig. 2 represents the framework of FNN. The dimensional of the input vector signifies the count of neurons from the input layer X ¼ x 1 ; x 2 ; ::x n ½ . The MF layer calculates the membership degree of input modules. All the neurons in this layer define the linguistic variable. The MF F ij is determined as follows: The MF F ij illustrates the membership degree of input members 'i' of fuzzy set 'j'. c ij and r 2 ij define the center and width of membership Gaussian functions of x i . The Gaussian MFs are the base for linking betwixt fuzzy systems and RBFN. The outcome of this function has extremely smooth and continuously takes a maximal value of one. The multivariate Gaussian function was designed for the production of univariate sets. During the rule layer of FNN, all the neurons illustrate a fuzzy rule, and their resultant activation degree A j , can be demonstrated as: At this point, A j illustrates the normalization value, and r represents the count of neurons from the rule layer. The resultant layer identifies the outcome computation Y, which can be demonstrated as: At this point, w j signifies the weighted connecting resultant and rule layers. During the fuzzy NN learning procedure, the main function is for developing and training the weighted connecting the resultant layer w j and rule layer, width r 2 ij and center c ij of membership Gaussian function of x i : The gradient descent is the usual famous technique employed for enhancing these parameters. It can be an optimized technique employed for minimizing a function with iteratively subsequent the direction of steepest descents, as demonstrated by the gradients negative. It can be employed for updating the model parameter. Conversely, the DBN is BP-based fine-tuned, and FNN takes several benefits from supervised learning infrastructure. A most important benefit is that it improves the weight betwixt the rule and resultant layers by employing the BP technique one time, and the parameter present from the membership layer is also improved one time. Specifically, the BP technique in FNN could not be controlled in a repeating approach that effectively eliminates the gradient diffusion issue of DBN fine-tuned.

Results and Discussion
The face mask detection results of the DLFMD-RMG model are investigated using the dataset [22] comprising 1000 images described in Table 1. Fig. 3 illustrates some sample images with and without masks. The proposed model is simulated using Python 3.6.5 tool. The proposed model experiments on PC i5-8600k, GeForce 1050Ti 4 GB, 16 GB RAM, 250 GB SSD, and 1 TB HDD. The parameter settings are learning rate: 0.01, dropout: 0.5, batch size: 5, epoch count: 50, and activation: ReLU.
The confusion matrices accomplished by the DLFMD-RMG model on the identification of face masks on mass gathering in Fig. 4. The figures represented that the DLFMD-RMG model has categorized face mask images appropriately. Table 2 offers face mask recognition outcomes of the DLFMD-RMG technique on 80% of training (TR) datasets and 20% testing (TS) data. Fig. 5 exhibits the quick face mask recognition results of the DLFMD-RMG system on 80% of the TR dataset. The DLFMD-RMG model has recognized images 'with mask' by accu y of 97.88%, prec n of 97.97%, reca l of 97.72%, F score of 97.85%, and MCC of 95.75%. Moreover, the DLFMD-RMG technique has identified images 'without mask' by accu y of 97.88%, prec n of 97.78%, reca l of 98.02%, F score of 97.90%, and MCC of 95.75%.    6 shows a brief face mask detection outcome of the DLFMD-RMG method on 20% of TS data. The DLFMD-RMG method has identified images 'with mask' by accu y of 98.50%, prec n of 99.04%, reca l of 98.10%, F score of 98.56%, and MCC of 97%. Furthermore, the DLFMD-RMG method has identified images 'without mask' by accu y of 98.50%, prec n of 97.92%, reca l of 98.95%, F score of 98.43%, and MCC of 97%. Table 3 offers face mask recognition outcomes of the DLFMD-RMG system on 70% of TR and 30% of TS datasets. Fig. 7 shows a brief face mask detection outcome of the DLFMD-RMG system on 70% of TR data. The DLFMD-RMG system has identified images 'with mask' by accu y of 95.86%, prec n of 99.11%, reca l of 92.80%, F score of 95.85%, and MCC of 91.92%. Furthermore, the DLFMD-RMG method has identified images 'without a mask' by accu y of 95.86%, prec n of 92.82%, reca l of 99.12%, F score of 95.86%, and MCC of 91.92%.     8 demonstrates a brief face mask detection outcome of the DLFMD-RMG system on 30% of TS data. The DLFMD-RMG method has identified images 'with mask' by accu y of 96.33%, prec n of 97.76%, reca l of 94.24%, F score of 95.97%, and MCC of 92.66%. Furthermore, the DLFMD-RMG technique has identified images 'without mask' by accu y of 96.33%, prec n of 95.18%, reca l of 98.14%, F score of 96.64%, and MCC of 92.66%.
A brief ROC investigation of the DLFMD-RMG method under the test database is illustrated in Fig. 9. The results indicated the DLFMD-RMG system had shown its capacity to classify different classes.  Eventually, a comparative study of the DLFMD-RMG with recent approaches is given in Table 4 and Fig. 10 [12,16,23]. These results inferred that the DLFMD-RMG model had shown enhanced outcomes over other models. For instance, based on accu y , the DLFMD-RMG model has attained an increased accu y of 98.50%. Besides, based on prec n , the DLFMD-RMG model has achieved an improved prec n of 98.48%. Also, based on the reca l , the DLFMD-RMG model has accomplished an improved reca l of 98.52%. Therefore, the DLFMD-RMG model can be employed for face mask recognition in religious mass gatherings.

Conclusion
In this study, new DLFMD-RMG technology has been developed. The DLFMD-RMG technique focuses mainly on detecting face masks in religious mass gatherings. To accomplish this, the presented DLFMD-RMG technique undergoes two levels of pre-processing: BF technique and Contrast Enhancement. For face detection, the DLFMD-RMG technique uses YOLOv5 with a ResNet-50 detector. In addition, the face detection performance can be improved by the SOA for tuning the hyperparameter of the ResNet-50 model. At last, the faces with and without masks are classified using the FNN model. The experimental validation of the DLFMD-RMG algorithm is examined on a benchmark dataset, and  Conflicts of Interest: The authors declare they have no conflicts of interest to report regarding the present study.