<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">IASC</journal-id>
<journal-id journal-id-type="nlm-ta">IASC</journal-id>
<journal-id journal-id-type="publisher-id">IASC</journal-id>
<journal-title-group>
<journal-title>Intelligent Automation &#x0026; Soft Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2326-005X</issn>
<issn pub-type="ppub">1079-8587</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">30638</article-id>
<article-id pub-id-type="doi">10.32604/iasc.2023.030638</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Face Mask and Social Distance Monitoring via Computer Vision and Deployable System Architecture</article-title><alt-title alt-title-type="left-running-head">Face Mask and Social Distance Monitoring via Computer Vision and Deployable System Architecture</alt-title><alt-title alt-title-type="right-running-head">Face Mask and Social Distance Monitoring via Computer Vision and Deployable System Architecture</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Ratul</surname><given-names>Meherab Mamun</given-names></name>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Rahman</surname><given-names>Kazi Ayesha</given-names></name>
</contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Fazal</surname><given-names>Javeria</given-names></name>
</contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Abanto</surname><given-names>Naimur Rahman</given-names></name>
</contrib>
<contrib id="author-5" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Khan</surname><given-names>Riasat</given-names></name><email>riasat.khan@northsouth.edu</email>
</contrib><aff><institution>Department of Electrical and Computer Engineering, North South University</institution>, <addr-line>Dhaka</addr-line>, <country>Bangladesh</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Riasat Khan. Email: <email>riasat.khan@northsouth.edu</email>.</corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2022-08-08"><day>08</day>
<month>08</month>
<year>2022</year></pub-date>
<volume>35</volume>
<issue>3</issue>
<fpage>3641</fpage>
<lpage>3658</lpage>
<history>
<date date-type="received"><day>30</day><month>3</month><year>2022</year></date>
<date date-type="accepted"><day>08</day><month>6</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Ratul1 et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ratul1 et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_IASC_30638.pdf"></self-uri>
<abstract>
<p>The coronavirus (COVID-19) is a lethal virus causing a rapidly infectious disease throughout the globe. Spreading awareness, taking preventive measures, imposing strict restrictions on public gatherings, wearing facial masks, and maintaining safe social distancing have become crucial factors in keeping the virus at bay. Even though the world has spent a whole year preventing and curing the disease caused by the COVID-19 virus, the statistics show that the virus can cause an outbreak at any time on a large scale if thorough preventive measures are not maintained accordingly. To fight the spread of this virus, technologically developed systems have become very useful. However, the implementation of an automatic, robust, continuous, and lightweight monitoring system that can be efficiently deployed on an embedded device still has not become prevalent in the mass community. This paper aims to develop an automatic system to simultaneously detect social distance and face mask violation in real-time that has been deployed in an embedded system. A modified version of a convolutional neural network, the ResNet50 model, has been utilized to identify masked faces in people. You Only Look Once (YOLOv3) approach is applied for object detection and the DeepSORT technique is used to measure the social distance. The efficiency of the proposed model is tested on real-time video sequences taken from a video streaming source from an embedded system, Jetson Nano edge computing device, and smartphones, Android and iOS applications. Empirical results show that the implemented model can efficiently detect facial masks and social distance violations with acceptable accuracy and precision scores.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Artificial intelligence</kwd>
<kwd>COVID-19</kwd>
<kwd>deep learning technique</kwd>
<kwd>face mask detection</kwd>
<kwd>social distance monitor</kwd>
<kwd>you only look once</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>COVID-19, known as coronavirus or SARS-CoV-2, originated from animals in Wuhan, China, at the end of 2019 [<xref ref-type="bibr" rid="ref-1">1</xref>]. This virus causes respiratory illness and can affect multiple organ systems in the body [<xref ref-type="bibr" rid="ref-2">2</xref>]. According to the latest statistics, COVID-19 has claimed a staggering number of more than 6.22 million lives worldwide in the span of more than two years. The coronavirus has become one of the deadliest epidemics of the 21<sup>st</sup> century. By taking the form of a pandemic, this deadly virus continuously infects new people every day. Apart from inevitable deaths, the coronavirus is causing long-term health complications to all of its victims, which total up to 140 million people [<xref ref-type="bibr" rid="ref-3">3</xref>]. From causing respiratory illness to severe heart and kidney failure, the world has realized the importance of preventing the spread of this destructive disease within a few months of its discovery and outbreak. Various institutions and vaccine providers are working round the clock to create a preventive solution to the pandemic [<xref ref-type="bibr" rid="ref-4">4</xref>]. However, until a vaccine is globally administered, the coronavirus continues to pose the risk of claiming thousands of lives every day. Therefore, we cannot solely depend on the innovation of a vaccine. As the disease transmits rapidly, it cannot be controlled without proper safety and monitoring protocols [<xref ref-type="bibr" rid="ref-5">5</xref>].</p>
<p>Coronavirus spreads through person-to-person contact through respiratory droplets [<xref ref-type="bibr" rid="ref-6">6</xref>]. People without symptoms or those who do not have symptoms yet can transmit the virus efficiently. From reports by the World Health Organization, maintaining social distance, wearing face masks, and avoiding crowded places are the most effective and simple methods to reduce health risks amongst the general masses and continue everyday activities with little to no obstruction [<xref ref-type="bibr" rid="ref-7">7</xref>]. Even though the prevention of the coronavirus is a relatively simple process, many people all around the globe are still not following the safety guidelines [<xref ref-type="bibr" rid="ref-8">8</xref>]. Therefore, this issue calls for the need for a systematized and automated monitoring of the general people to ensure preventive and protective measures against the disease.</p>
<p>In this research, an automatic system has been developed to monitor everyday pandemic preventative measures using artificial intelligence and computer vision techniques. This paper will employ two detection models that observe and detect within the local feeds from surveillance cameras, smartphones, and embedded system video streams to detect facial masks and social distance between the people in public areas effectively. It will also attempt to create a safer environment for people to carry on their daily duties in light of the pandemic by evaluating the system performance and detection on remote platforms, spreading its deployment further using different video streaming devices, including embedded systems, Jetson Nano with webcam and smartphone. The proposed system is expected to be highly efficient for spaces where large crowds gather, as it can be operated using a simple smartphone. The major contributions of this manuscript are as follows:<list list-type="bullet"><list-item>
<p>An automatic COVID-19 prevention system has been developed to detect face masks and monitor social distance. A modified version of a convolutional neural network (CNN), the ResNet50 model, has been used to identify masked faces in this research. YOLOv3 and DeepSORT approaches are applied to monitor the social distance between people.</p></list-item><list-item>
<p>NVIDIA Jetson Nano embedded system has been used as a medium to collect video streams from various CCTV surveillance footage.</p></list-item><list-item>
<p>We have also utilized smartphones (both Android and iOS devices) to obtain real-time video streams. Smartphone applications have been developed employing Xcode and Swift environments.</p></list-item><list-item>
<p>To the best of our knowledge, an integrated face mask detection and social distance monitoring system has been designed for the first time in this paper using Jetson Nano embedded device and smartphone applications.</p></list-item></list></p>
<p>The paper is arranged accordingly: Section 2 summarizes some of the related works on automatic face mask recognition and social distance monitoring. Section 3 explains the proposed system implementation, including the dataset, utilized software and hardware components. Next, the real-time results of the implemented system are demonstrated in Section 4. Lastly, Section 5 wraps up the paper with some suggestions for future implementations.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Traditional prevention of COVID-19 by manual inspection of face mask detection and physical distance measurement is highly inefficient and needs extensive human labor. Therefore, computer vision and artificial intelligence-based automatic face mask detection and physical distance measurement have been thoroughly studied in recent years [<xref ref-type="bibr" rid="ref-9">9</xref>]. Some of these studies involve only face mask detection [<xref ref-type="bibr" rid="ref-10">10</xref>]; some of them include social distance measurement [<xref ref-type="bibr" rid="ref-11">11</xref>] and some with mixed results. Several works in this context of automatic coronavirus prevention have been discussed briefly in the subsequent paragraphs.</p>
<p>Many works have studied the detection of masked and unmasked faces for COVID-19 prevention, security reasons, identifying individual persons, and tracking criminals. For instance, in a recent work [<xref ref-type="bibr" rid="ref-12">12</xref>], the authors have proposed an automated technique to detect masked faces by utilizing four different steps. These steps are estimating the person&#x2019;s distance from the camera, eye line detection, facial part recognition, and eye detection. They have outlined the benchmarks in every step where commonly known algorithms have been used for human and face detection. Analog Devices, Inc.&#x2019;s Cross Core Embedded Studio (CCES) and HOGSVM were utilized to determine the distance from the camera step and identify individuals. The distance from the camera step was found using the ADSP BF609 dual-core processor, with the highest accuracy of 90 percent. On the other hand, eyeline, facial part, and eye detection give an accuracy of 69.8 percent, 46.6 percent, and 40 percent, respectively. There are some limitations in face detection where the face cannot be appropriately detected if the person is masked. Also, false rate detection is the highest in eye detection and eye line detection. So, the accuracy percentage is relatively lower than expected for these two cases. In [<xref ref-type="bibr" rid="ref-13">13</xref>], the authors harnessed the power of TensorFlow, Keras, OpenCV, and Scikit-Learn libraries and various machine learning packages to detect face masks. It can identify faces from other objects and then determine whether the face has a mask on or not. When provided with a surveillance feed, it can recognize both face and face mask even in motion. The proposed model has been tested on a couple of datasets, where it achieved 95.77 percent and 94.5 percent accuracies, respectively, on two open-source datasets. X. Fan and his colleagues implemented a deep learning-based lightweight face mask detection framework utilizing MobileNet CNN architecture in [<xref ref-type="bibr" rid="ref-14">14</xref>]. A Gaussian heat map regression is added to increase the feature learning task of the proposed model. Next, the performance of the implemented network is assessed on two public datasets, viz. AIZOO and Moxa3K. The authors reported improvements in mAP scores by 1.7% and 10.5% of the proposed CNN architecture compared to the YOLOv3 model.</p>
<p>Many authors employed computer vision-based deep learning techniques to monitor social distance and measure the physical spacing between people. Authors in [<xref ref-type="bibr" rid="ref-15">15</xref>] presented a well-planned framework for monitoring social distance through object detection and deep learning models. In this paper, the authors initially accomplish calibration using bird&#x2019;s eye view, where all the pedestrians are assumed to be on the same plane road. Consequently, the distance is estimated between each person concerning the bird&#x2019;s eye view. Next, the YOLOv3 object detection model is used for person detection. Lastly, after the detection, it draws bounding boxes on people to distinguish the individuals violating the social distancing protocols. The dataset is accumulated from Oxford Town Center, containing CCTV footage of 2,200 pedestrians. The result section has observed that in terms of frames per second (FPS), Single Shot MultiBox Detector (SSD) is performing well compared to YOLOv3. However, in the case of mean average precision (mAP), YOLOv3 outperformed SSD. In [<xref ref-type="bibr" rid="ref-16">16</xref>], I. Ahmed implemented a deep learning-based social distance monitoring technique that uses the pre-trained YOLOv3 object detection model and an extra-trained transfer learning technique to improve the model&#x2019;s accuracy. The distance between the people is detected using bounding box detection information. Next, alarms are generated if people violate the minimum distance threshold. Finally, the proposed centroid technique achieved a tracking accuracy of 95%. The accuracy of the detection model is 92%, and with transfer learning, accuracy increases to 95%. In [<xref ref-type="bibr" rid="ref-17">17</xref>], A. Rahim, A. Maqbool and T. Rana presented a well-organized framework for monitoring social distance in low-light environments through the YOLOv4 model and a single motionless ToF camera. The YOLOv4 model evaluated by COCO detection metrics is trained on the ExDARK dataset with 12 distinct classes of objects. Also, a custom dataset is used for social distance supervision, which is obtained from the market of Rawalpindi, Pakistan. Finally, the results show that the YOLOv4 model achieves the highest accuracy in the low-light environment with a mAP coefficient of 0.9784.</p>
<p>Recently, face mask detection and social distance monitoring have been done simultaneously in some works. As an illustration, in [<xref ref-type="bibr" rid="ref-18">18</xref>], K. Bhambani et al. implemented an efficient system that focuses on three particular objects, i.e., masked and unmasked faces and Euclidean distance between people. The authors used the MAFA dataset and locally linear embedding convolutional neural networks to detect face masks. The MAFA test set achieved an accuracy of 76.4 percent while identifying face masks. Then the authors implemented YOLO object detection and a DeepSORT object tracking modality to track people in the video stream with pixel limitations between 90 to 170. The dataset used in this paper is the renowned COCO dataset that contains a staggering 7,959 images consisting of WIDER Face and MAFA datasets. About 6,120 images were set aside for training from the dataset, whereas 1,839 images were used for validation. Next, a bounding box is created to identify and label people according to the height and width of the image. Finally, the authors have also created their own dataset for detecting social distance and the error of the proposed system increases when the subjects move far from the camera. In [<xref ref-type="bibr" rid="ref-19">19</xref>], the authors implemented an automated system for face mask detection and social distance measurement to address the COVID-19 pandemic. The authors used Faster R-CNN (with 97 percent accuracy) and deep learning techniques for detecting masked faces. They also used YOLOv2 model to estimate the physical separation between two people. Interestingly, even if people wore glasses and scarves or had beard faces, the proposed system&#x2019;s accuracy using Faster CNN is 93.4 percent. As for social distance, 3 meters of distance was set by the authors, and it was detected among pedestrians successfully.</p>
<p>From the above literature reviews, we can conclude that significant works have been done on automatic COVID-19 detection. Artificial intelligence and neural network techniques have been successfully applied in many studies for automatic face mask identification and social distance monitoring. However, most of the works do not consider implementing this automated detection process in an embedded device. Therefore, a simultaneous face mask detection and social distance monitoring system has been proposed in this article employing Jetson Nano edge device and smartphone applications.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Proposed System</title>
<p>In the following paragraphs, the required software and hardware components and system architecture of the proposed face mask detection and social distance monitoring network has been discussed. The main objective of the proposed system is to analyze frames from a video stream or clip from any recording source or real-time live stream, then implement detection algorithms to detect whether the COVID-19 precautionary measures are violated or not. It will also implement the optimized model on a remote desktop PC environment with a real-time video feed from an embedded system (Jetson Nano) with a webcam and a smartphone for easy distribution and low-cost architecture integration within a pre-built infrastructure.</p>
<p>The violation measurements of this work depend on a specific set of rules:<list list-type="simple"><list-item>
<p>a) Are people properly wearing a facial mask or not? This condition is detected by initially using facial features recognition and then executing the facial mask detector on the obtained features.</p></list-item><list-item>
<p>b) Are people maintaining safe social distances (6 feet minimum from one person to another)? The system follows this step by calculating the distance between the bounding box of objects (single person).</p></list-item></list></p>
<sec id="s3_1">
<label>3.1</label>
<title>Software Tools</title>
<p>We employed the following tools and technologies in our system for detecting precautionary measure violations of face mask and social distance, which have been discussed in <xref ref-type="table" rid="table-1">Tab. 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label>
<caption>
<title>Used software tools in this work</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Software tools</th>
<th>Functions</th>
</tr>
</thead>
<tbody>
<tr>
<td>Ubuntu 20.10 Groovy Gorilla</td>
<td>Main operating system</td>
</tr>
<tr>
<td>JupyterLab text editor</td>
<td>Coding purposes</td>
</tr>
<tr>
<td>Python 3</td>
<td>Core programming language</td>
</tr>
<tr>
<td>Jetpack software development kit</td>
<td>Supports a newer version of CUDA, cuDNN and Tensor RT for better optimization</td>
</tr>
<tr>
<td>Android Studio version 2020.3.1</td>
<td>Captures the video feed onto the remote PC <italic>via</italic> the Android smartphone device</td>
</tr>
<tr>
<td>Xcode version 13</td>
<td>Takes the video feed onto the remote PC <italic>via</italic> the iOS smartphone device</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Hardware Components</title>
<p>We employed the following hardware components in our system for detecting precautionary measure violations of face mask and social distance, which have been discussed in <xref ref-type="table" rid="table-2">Tab. 2</xref>.</p>
<table-wrap id="table-2"><label>Table 2</label>
<caption>
<title>Utilized hardware components in this work</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Hardware components</th>
<th>Functions</th>
</tr>
</thead>
<tbody>
<tr>
<td>Desktop PC</td>
<td>In this paper, the experimental hardware platform used as the sole processing unit of the proposed system has an Intel Core i5 8th generation processor, 16GB memory and GTX 1060 graphics card. The CCTV footage is collected through a CCTV camera system and various online CCTV video clips. Some of the footage is taken from a webcam using Jetson Nano and from multiple smartphones possessing two operating systems, <italic>i.e</italic>., Android and iOS smartphones.</td>
</tr>
<tr>
<td>Jetson Nano with webcam</td>
<td>Jetson Nano developer kit of 4GB RAM, with a Fantech 1080P 2MP web camera, is used as a medium for taking real-time video streams for the detection models.</td>
</tr>
<tr>
<td>Smartphones</td>
<td>As the size and usability of smartphones for taking video streams are convenient and effective, we have used an Android smartphone (Samsung Galaxy A51) and an iOS smartphone (Apple iPhone 10) to collect real-time video feed. This implementation will enable us to deploy the proposed system detection methods on a broader scale in the future.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>System Architecture</title>
<p>The proposed system&#x2019;s primary function is to monitor people who violate standard physical distances and facial masks protocols using video footage from CCTV cameras. A deep learning approach, YOLOv3 and DBSCAN clustering have been utilized for monitoring social distance. To identify people without a face mask, ResNet50, a convolutional neural network, is employed. Blurring effects and augmented masked faces are also applied to train the proposed model to identify real-life faces instantly. The designed detection system works as a sequence of different tasks, such as person detection, face identification, face mask classifier, and, finally, clustering detection. Eventually, the proposed network employs violations based on the face mask and proximity clustering recognitions on the detected persons. The working sequences of the proposed social distance monitoring and facial mask recognition system have been demonstrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Working sequences of the proposed COVID-19 prevention system</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-1.png"/>
</fig>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Working Procedure of the Social Distance Monitoring Model</title>
<p>There are various deep learning techniques for automatic object (people) detection, e.g., region-based convolutional neural networks (RCNN), SSD, YOLO, etc. These models offer diverse accuracies concerning mAP scores and frames per second speed, which has been demonstrated in <xref ref-type="fig" rid="fig-2">Fig. 2</xref> [<xref ref-type="bibr" rid="ref-20">20</xref>]. High inference speed in tandem with acceptable accuracy needs to be considered for real-time object detection of embedded devices. In this work, the deep learning model, YOLO, has been used for object detection because of its moderate mAP scores and high frame rate, as consequently, it will be executed in real-time edge devices. The specific YOLOv3 framework has been employed in this work because of its outstanding balance between accuracy and detection speed.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Performance comparison of various object detection models on MS-COCO dataset</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-2.png"/>
</fig>
<p>YOLO-You Only Look Once is a deep learning-based object detection model that is fast and can detect multiple classes within a dataset [<xref ref-type="bibr" rid="ref-21">21</xref>]. This algorithm uses CNN for detection. Over the years, the YOLO algorithm has been undergoing various optimizations and its latest beta version is YOLOv5. However, YOLOv3 is well known for stable optimization, and consequently, this model has been used for the proposed social distance violation detection. The YOLOv3 network used for physical distance monitoring has been illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. In this paper, the monitoring of social distances with people recognition and tracking are performed with YOLOv3 and DeepSORT approaches.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Schematic representation of the proposed YOLOv3 architecture</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-3.png"/>
</fig>
<p>The social distance measurement requires the usage of the COCO dataset&#x2019;s person key-point localization, where the key points are defined as the value points of a person from uncontrolled images. It targets the localizations of all the points and describes the points as a combination of a detected person. Consequently, it recognizes a person from an image or stream of images known as a video stream. The key-point evaluation metrics used by the COCO database for object detection are average precision (AP) and average recall (AR) and their alternatives. These metrics measure the correlation between ground truth objects and detected objects. For each object, the ground truth key-points are denoted such as [<inline-formula id="ieqn-1">
<mml:math id="mml-ieqn-1"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-2">
<mml:math id="mml-ieqn-2"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-3">
<mml:math id="mml-ieqn-3"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:math>
</inline-formula>, <italic>&#x2026;</italic>, <inline-formula id="ieqn-4">
<mml:math id="mml-ieqn-4"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-5">
<mml:math id="mml-ieqn-5"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-6">
<mml:math id="mml-ieqn-6"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>], where <inline-formula id="ieqn-7">
<mml:math id="mml-ieqn-7"><mml:mi>x</mml:mi></mml:math>
</inline-formula>, <inline-formula id="ieqn-8">
<mml:math id="mml-ieqn-8"><mml:mi>y</mml:mi></mml:math>
</inline-formula> are the key point locations and <inline-formula id="ieqn-9">
<mml:math id="mml-ieqn-9"><mml:mi>v</mml:mi></mml:math>
</inline-formula> denotes a visibility flag defined as <inline-formula id="ieqn-10">
<mml:math id="mml-ieqn-10"><mml:mi>v</mml:mi></mml:math>
</inline-formula> &#x003D; 0: unlabeled, <inline-formula id="ieqn-11">
<mml:math id="mml-ieqn-11"><mml:mi>v</mml:mi></mml:math>
</inline-formula> &#x003D; 1: labeled but not visible, and <inline-formula id="ieqn-12">
<mml:math id="mml-ieqn-12"><mml:mi>v</mml:mi></mml:math>
</inline-formula> &#x003D; 2: labeled and visible. The object key-points similarity (<inline-formula id="ieqn-13">
<mml:math id="mml-ieqn-13"><mml:mi>O</mml:mi><mml:mi>K</mml:mi><mml:mi>S</mml:mi></mml:math>
</inline-formula>) is expressed as:</p>
<p><disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mi>O</mml:mi><mml:mi>K</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mspace width="thickmathspace" /><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mi>i</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">[</mml:mo><mml:mspace width="thickmathspace" /><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mn>2</mml:mn><mml:mrow><mml:msup><mml:mi>s</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:msup><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow></mml:msup></mml:mrow><mml:mspace width="thickmathspace" /><mml:mo>.</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2211;</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&gt;</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>According to <xref ref-type="disp-formula" rid="eqn-1">(1)</xref>, the object keypoints similarity detection algorithm is a method for identifying objects based on the accuracy of their key points, which has been used in this work. Here, <inline-formula id="ieqn-14">
<mml:math id="mml-ieqn-14"><mml:mi>s</mml:mi></mml:math>
</inline-formula> indicates object scale, and <inline-formula id="ieqn-15">
<mml:math id="mml-ieqn-15"><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> means a per-keypoint constant that controls falloff. For each keypoint, this generates a keypoint similarity that ranges between 0 and 1 [<xref ref-type="bibr" rid="ref-22">22</xref>]. In <xref ref-type="disp-formula" rid="eqn-1">(1)</xref>, Euclidean distance (<inline-formula id="ieqn-16">
<mml:math id="mml-ieqn-16"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>) measures the spacing between the corresponding ground truth and detected keypoints. On the other hand, <inline-formula id="ieqn-17">
<mml:math id="mml-ieqn-17"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> indicates the visibility flags of the ground truth (this is separate from the predicted <inline-formula id="ieqn-18">
<mml:math id="mml-ieqn-18"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> value from the detector). We then pass the <inline-formula id="ieqn-19">
<mml:math id="mml-ieqn-19"><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> value through an unnormalized Gaussian with standard deviation <inline-formula id="ieqn-20">
<mml:math id="mml-ieqn-20"><mml:mi>s</mml:mi><mml:mrow><mml:msub><mml:mi>k</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> in order to calculate the <inline-formula id="ieqn-21">
<mml:math id="mml-ieqn-21"><mml:mi>O</mml:mi><mml:mi>K</mml:mi><mml:mi>S</mml:mi><mml:mspace width="thickmathspace" /></mml:math>
</inline-formula>function.</p>
<p>Now, going through the video, the model focuses on every frame, identifying, detecting, and labeling (with bounding boxes) objects, which are then stored. The model identifies people by detecting faces. The faces are checked to determine if they are masked or unmasked. By finding faces, the proposed system can identify the presence of people in a single frame and calculate their number. The proposed framework places bounding boxes on people, placing red/green depending on whether they are wearing masks or not. It also places bounding boxes on each person.</p>
<p><disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:msubsup><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow><mml:mo>:</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:msubsup><mml:mi></mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>&#x2264;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>p</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mrow><mml:msubsup><mml:mi></mml:mi><mml:mi>j</mml:mi><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mspace width="thickmathspace" /><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>k</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>, the K-means clustering algorithm is demonstrated, which is a method for vector quantization. Here, <inline-formula id="ieqn-22">
<mml:math id="mml-ieqn-22"><mml:mi>S</mml:mi></mml:math>
</inline-formula> expresses the objective function, which is calculated from the distance of the points. In <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>, the total number of clusters is denoted by <inline-formula id="ieqn-23">
<mml:math id="mml-ieqn-23"><mml:mi>k</mml:mi></mml:math>
</inline-formula> and <inline-formula id="ieqn-24">
<mml:math id="mml-ieqn-24"><mml:mi>&#x03BC;</mml:mi></mml:math>
</inline-formula> illustrates the centroid for the corresponding cluster. The algorithm classifies people based on similar data points. It checks whether the bounding boxes of one person overlap the bounding box of someone in his cluster and measures their distance using Euclidean distancing [<xref ref-type="bibr" rid="ref-23">23</xref>]. This function expressed in <xref ref-type="disp-formula" rid="eqn-2">(2)</xref> is carried out multiple times till the model can identify whether people have crossed the threshold value for social distancing or not. The working sequences of the proposed social distance monitoring system are:</p>
<fig id="fig-12">
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-12.png"/>
</fig>
<p><xref ref-type="table" rid="table-3">Tab. 3</xref> demonstrates the hyperparameters to train the social distance monitoring framework. In this work, non-maximum suppression (NMS) with a threshold value of 0.30 is chosen for the bounding box of people detection.</p>
<table-wrap id="table-3"><label>Table 3</label>
<caption>
<title>Parameters used in the social distancing detection model</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Parameter</th>
<th>Corresponding value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Confidence Threshold</td>
<td>Confidence Threshold &#x003D; 0.50, NMS IoU Threshold &#x003D; 0.30</td>
</tr>
<tr>
<td>Distance Threshold (Safe distance in pixel units)</td>
<td>150 Pixel, best suited for our analyzed captured videos</td>
</tr>
<tr>
<td>Object Detection Frame Range with YOLOv3</td>
<td>blobFromImage &#x003D; (0.00392, (416, 416), (0, 0, 0))</td>
</tr>
<tr>
<td>Stored Detected Objects Confidences</td>
<td>(0.50, 0.40)</td>
</tr>
<tr>
<td>Facemask Classifier Dropout</td>
<td>(0.50)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Working Procedure of the Face Mask Detection Model</title>
<p>The well-known face identification network, dual shot face detector (DSFD), is utilized to detect faces in this work. This improved version of this face detection model with feature learning improvement exhibited better accuracy than a single shot detector [<xref ref-type="bibr" rid="ref-24">24</xref>]. It can detect faces in many orientations with enhanced anchor matching and improved data augmentation techniques, rendering it better than other pre-trained classifiers that work to identify faces. We used a modified version of a convolutional neural network, the ResNet50 model, to identify masked faces in this research [<xref ref-type="bibr" rid="ref-25">25</xref>]. It comprises ImageNet, AveragePooling2D, and dense (with dropout) layers, followed by a sigmoid or softmax classifier. The used DSFD framework uses a feature enhance module where, the module is used on top of a feedforward ResNet architecture as well as two loss layers: first shot PAL for the original features and second shot PAL for the enchanted features to generate the enhanced features.</p>
<p>We artificially put face masks on random face images to enhance the performance of the mask classifier through the deep learning process. In these images we obtained through a deep neural network, some points are found by manipulating facial landmarks, namely the nose bridge and chin. Then face masks are placed to create the auto-generated image. If we encounter blurred faces on video frames, DSFD will mark them correctly. This blurriness may occur because of rapid movements, incorrect camera settings, low light conditions, or grainy footage. As a result, we will need to apply a blurring effect to a random portion of the training data. On a kernel of size, three sorts of effects have been used, i.e., Motion Blur (mimics rapid movement), Average Blur (describes out of focus), and Gaussian Blur (produces random noise).</p>
<p>Finally, ImageDataGenerator function of Keras has been utilized to perform on-the-fly augmentation. The training data is automatically upgraded after every epoch. Additionally, traditional augmentations are applied, such as rotation, horizontal flip, and brightness shift. It is worth mentioning that blurring augmentations are added with particularly associated probabilities during the training stage. <xref ref-type="table" rid="table-4">Tab. 4</xref> depicts the parameters used in the facemask classifier of the pre-trained ResNet50 model.</p>
<table-wrap id="table-4"><label>Table 4</label>
<caption>
<title>Parameters used in the facemask classifier on top of pre-trained ResNet50 model</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Confidence threshold</th>
<th>Confidence threshold &#x003D; 0.50, NMS IoU threshold &#x003D; 0.30</th>
</tr>
</thead>
<tbody>
<tr>
<td>Motion Blur Kernel Range</td>
<td>Motion Blur Kernel Range &#x003D; (4,8),<break/>Average Blur Kernel Range &#x003D; (3,7),<break/>Gaussian Blur Kernel Range &#x003D; (3,8)</td>
</tr>
<tr>
<td>ResNet50 Base Network Input Shape</td>
<td>(224, 224, 3)</td>
</tr>
<tr>
<td>Facemask Classifier AveragePooling2D Pool Size</td>
<td>(7, 7)</td>
</tr>
<tr>
<td>Facemask Classifier Dense Layer (activation)</td>
<td>(128, activation ReLU)<break/>(1, activation_sigmoid)</td>
</tr>
<tr>
<td>Facemask Classifier Dropout</td>
<td>(0.50)</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Working Procedure of Using Jetson Nano with Webcam for Capturing the Video Stream</title>
<p>The main objective of this research is to provide a detection system that can help with the control and monitoring of the COVID-19 pandemic. The proposed face mask detection and physical distance monitoring system also need to be flexible and easily deployable to achieve this goal. The Jetson Nano embedded system allows us to quickly deploy a remote architecture capable of receiving real-time video feed from various remote locations. As it is an AI-capable device, it provides room for future improvements and video feed optimizations. This paper uses a webcam with a Jetson Nano computing device to receive the captured video feed and analyze the video stream for people detection, face mask identification, and social distance violation feedback. This implementation would require us to use the webcam through Jetson Nano and capture the video while simultaneously sending the captured video stream onto the server, which has been depicted in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. Then the remote host computer will receive and execute the detection models and generate relevant results. We are using a desktop personal computer as the processing unit, so we did not require cloud-based resources. We devised a localhost server where the real-time stream was sent, and then the same server would be used on the PC and Jupyter notebook to process and show the results on the monitor.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Jetson Nano Setup with webcam</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-4.png"/>
</fig>
</sec>
<sec id="s3_3_4">
<label>3.3.4</label>
<title>Working Procedure of Smartphone Devices for the Capturing of Video Stream</title>
<p>As technologies improve further each day, smartphones have become widely used devices for all of our everyday life. These devices are capable of adequate processing power and are essential as a portable medium for many on-device real-time video transfers. To ensure the proposed system architecture provides all the scopes of development, we have also tested Android and iOS smartphones to send captured video streams to the remote desktop computer. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows the working procedure of the designed Android application.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Working procedure of the video streaming application</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-5.png"/>
</fig><list list-type="simple"><list-item>
<p>a) Android Device: To enable an Android smartphone device for taking and transferring video feed directly to the remote PC, we have devised an Android application. The app is configured to record and simultaneously send the video to the connected server that is set up in the remote local network on the PC. The app can also be configured to record video at different resolutions and 30 and 60 frames per second, allowing the user to use the app conveniently at times of poor network connection and faster processing.</p></list-item><list-item>
<p>b) iOS Device: The exact process has been used for an iOS device (Apple iPhone 10). A separate iOS platform application has been developed to enable the device to stream the video feed to the remote server. The app has been made using Xcode and Swift for the compatibility of iOS devices. Users can configure the app to record video at different resolutions and 30 and 60 frames per second.</p></list-item></list>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Results and Discussion</title>
<p>This section presents real-time results for the proposed face mask detection and social distance monitoring system. In this paper, two datasets have been used, one combined dataset for the face mask identification model and the COCO (Common Objects in Context) dataset for the social distance violation detection model. For the face mask detection model, the combined open-source dataset contains a total of more than 1,000 pictures. All these pictures are then divided into four different classes, (1) Human face with a facial mask, (2) Human face without a facial mask, (3) Facial mask worn correctly, and (4) Facial mask worn incorrectly.</p>
<p>The COCO dataset is used for the social distance violation model, which is extensively used for large-scale object detection, segmentation, and captioning assignments. COCO dataset consists of more than 300 thousand images where 200 thousand images are labeled. It contains 80 object categories and 250 thousand people with critical points that allow any people detection model to detect a person in a captured frame with high precision [<xref ref-type="bibr" rid="ref-26">26</xref>].</p>
<sec id="s4_1">
<label>4.1</label>
<title>Results of Face Mask Classifier System</title>
<p>In this work, a modified version of a convolutional neural network, the ResNet50 model, has been employed to identify masked faces. The accuracy and loss <italic><italic>vs</italic>.</italic> epochs of the face mask detection model have been shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. According to this figure, it can be observed that the proposed network achieved 78.50% validation accuracy. The reason behind the low accuracy is that the face mask dataset was relatively small, with 1,000 images that included both masked and unmasked images.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Validation accuracy and loss <italic>vs</italic>. epochs of the modified ResNet50 model</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-6.png"/>
</fig>
<p>The confusion matrix in <xref ref-type="fig" rid="fig-7">Fig. 7</xref> signifies the total results of the proposed face mask detection model, including true positive, false positive, true negative, and false negative values. The matrix below shows that our model correctly predicted 517 masked people and 268 unmasked people. On the other hand, it incorrectly detected 132 people as masked even though they were unmasked. Similarly, it recognized 83 people as unmasked, even though they were masked.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Confusion matrix of the face mask detection model</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-7.png"/>
</fig>
<p>Precision calculates the ratio between true positive and total (both true and false positive) values. In this case, the positive values are 649 and the accurate positive value is 517. Calculating their ratio, the face mask detection model achieves a precision of 79.66%. We consider this a high value, predicting an impressive number of correctly masked faces. The accuracy of a model determines its effectiveness at predicting correct values compared to total predictions. As we have used a pre-trained model, the obtained accuracy is 78.50%, which is a moderate performance. As expected, the accuracy will improve if a dataset with a higher number of face mask images is used. Finally, from the matrix, it can be noticed that the false positive and false negative values are significantly low compared to the total dataset, respectively 132 and 83. Therefore the F1 score is 82.79%, symbolizing that the proposed model performs relatively well with a low number of false/incorrect predictions.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Evaluation of Social Distance Monitoring Using YOLOv3</title>
<p>In this paper, the deep learning model, YOLOv3 and DeepSORT technique have been used to monitor social distances with person detection and tracking. In the following paragraphs, the precision of the YOLOv3 detection method in detecting people in the video stream and how this technique has differentiated in terms of accuracy and class recognition compared with various other models are discussed.</p>
<p>For object detection, YOLO predicts the type and location of an RGB image by only looking at one picture at a time. The algorithm considers the detection assignment a regression task instead of a classification one. Next, it assigns the extracted image sectors according to their predicted classes probabilities and binds them to the anchor boxes.</p>
<p>The losses per iteration are measured by applying RPN localization loss, RPN objective loss, and classification with localization loss for the YOLOv3 model. Then the total loss and the overall result with mAP (mean average precision) are determined. The equation for <inline-formula id="ieqn-40">
<mml:math id="mml-ieqn-40"><mml:mi>A</mml:mi><mml:mi>P</mml:mi></mml:math>
</inline-formula> (average precision) is given by:</p>
<p><disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mspace width="thickmathspace" /><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mi>n</mml:mi></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:math>
</disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-3">(3)</xref>, the average precision algorithm is expressed, where it is calculated by taking the mean over all classes and/or overall IoU thresholds. Here, denotes the class numbers, and indicate recall and precision, respectively.</p>
<p>Next, we captured ten random frames from the surveillance camera footage to identify the evaluation metrics of the proposed social distance detection model with the YOLOv3 approach. The confusion matrix in <xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows that the total number of correctly detected distances is 110, falsely detected distance is 2, and the false-negative value is 27. This measurement helps us ascertain that the implemented model exhibits 79% accuracy, which is significantly high. Moreover, the precision score is 98%, which means that out of the predicted positive values, 98% values were correct. The mean average precision (mAP) of the YOLOv3- based object detection model, computed by <xref ref-type="disp-formula" rid="eqn-3">(3)</xref>, is 95%. The recall coefficient of 80% indicates that out of 137 true values, 27 cases were misidentified as false. Finally, the F1 score is 88%, which denotes that the YOLOv3 model performs well.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Confusion matrix of the social distance measurement model using YOLOv3</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-8.png"/>
</fig>
<p>Finally, we can successfully detect social distance and facial masks simultaneously among the people in frames on the testing videos. Simulation test results of various output frames for the proposed system have been demonstrated in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>. We can observe that the model detects persons from the video frames and detects whether they are maintaining social distances. By utilizing the K-mean clustering, the implemented model can easily detect clusters of people and mark them with corresponding bounding boxes. The system places green and red bounding boxes depending on whether they maintain the health protocols. The proposed network counts different cases depending on the social distance and face mask detection, i.e., masked, unmasked, safe, unsafe, and unknown.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Simulation test results of the proposed social distance and face mask monitoring system</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-9.png"/>
</fig>
<p><xref ref-type="fig" rid="fig-10">Fig. 10</xref> illustrates an output frame of the proposed system when the processed video is obtained from the Jetson Nano device with a webcam setup.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Processed video feed from Jetson Nano with webcam setup</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-10.png"/>
</fig>
<p>Lastly, the efficiency of the proposed pandemic prevention system is tested for the video sequence captured by the designed application of a smartphone. The qualitative result of this setup is demonstrated in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>.</p>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Processed video feed frame, which is captured using the designed application of a smartphone</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_30638-fig-11.png"/>
</fig>
<p><xref ref-type="table" rid="table-5">Tab. 5</xref> demonstrates the comparison of the proposed framework with other similar works. According to <xref ref-type="table" rid="table-5">Tab. 5</xref>, this work exclusively implemented an integrated approach to detect face masks and monitor social distance simultaneously. This simultaneous implementation constrained us to use low-quality real-time video sequences, which lowered the model&#x2019;s performance. Additionally, this work captured instantaneous video frames by utilizing Jetson Nano with a webcam and specialized smartphone applications.</p>
<table-wrap id="table-5"><label>Table 5</label>
<caption>
<title>Comparison of the proposed framework with other similar works</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Reference</th>
<th>Primary features</th>
<th>Applied techniques</th>
<th>Performance metrics for mask detection</th>
<th>Performance metrics for social distance</th>
</tr>
</thead>
<tbody>
<tr>
<td>[<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td>Detect face masks</td>
<td>HOGSVM</td>
<td>Accuracy: 69.8%</td>
<td>NA</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td>Detect face mask</td>
<td>TensorFlow, Keras, OpenCV, Scikit-learn</td>
<td>Accuracy: 95%</td>
<td>NA</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-15">15</xref>]</td>
<td>Social distance monitor</td>
<td>YOLOv3</td>
<td>NA</td>
<td>mAP: 91%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-16">16</xref>]</td>
<td>Monitor social distance</td>
<td>YOLOv3, extra-trained with transfer learning</td>
<td>NA</td>
<td>Accuracy: 92%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td>Monitor social distance under low light conditions</td>
<td>YOLOv4, ToF camera</td>
<td>NA</td>
<td>mAP: 97.84%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td>Face mask and social distance detection in real-time</td>
<td>CNN and YOLOv3</td>
<td>Accuracy: 76.4%</td>
<td>mAP: 94.75%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td>Real-time face mask and social distance detection</td>
<td>Faster R-CNN, YOLOv2</td>
<td>Accuracy: 93.4%</td>
<td>NA</td>
</tr>
<tr>
<td>Proposed Work</td>
<td>Simultaneous face mask and social distance detection</td>
<td>YOLOv3, DBSCAN<break/>clustering, ResNet50, Faster CNN</td>
<td>Accuracy: 82.79%</td>
<td>Accuracy: 79%</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusions</title>
<p>The main objective of this paper is to develop an automatic system that can monitor the COVID-19 precautionary measures by identifying face masks and measuring physical separation simultaneously. The face mask detection model is implemented on the modified ResNet50 technique. The proposed social distance measuring model employs the deep learning based YOLOv3 object detection and DeepSORT techniques. The performance of the proposed pandemic prevention system is validated on real-time video feeds captured by Jetson Nano embedded tool and webcam and customized smartphone applications for both Android and iOS devices. The implemented object detection model creates bounding boxes, and consequently, red boxes indicate the facial masks and social distance infarction. The performance of the object detection model can be improved by using more advanced artificial intelligence and deep learning techniques, such as multi-layer feedforward BP neural framework, transformer-based and anchor-free modalities [<xref ref-type="bibr" rid="ref-27">27</xref>]. The accuracy of the face mask classifier can be increased by incorporating real-time CCTV footage datasets and hyperparameter tuning. Future improvements may involve using a more lightweight model incorporating a transfer learning framework for the face mask detection algorithm [<xref ref-type="bibr" rid="ref-28">28</xref>]. This detection system can assist healthcare facilities, malls, education centers, and public gathering sites in identifying the preventive measure violation instances and imposing improved safety protocols to restrict the spread of coronavirus.</p>
</sec>
</body>
<back><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> The authors would like to thank North South University, Bangladesh, for procuring the Jetson Nano developer kit under the CTRG research grant 2021.</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K. G.</given-names> <surname>Andersen</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Rambaut</surname></string-name>, <string-name><given-names>W. I.</given-names> <surname>Lipkin</surname></string-name>, <string-name><given-names>E. C.</given-names> <surname>Holmes</surname></string-name> and <string-name><given-names>R. F.</given-names> <surname>Garry</surname></string-name></person-group>, &#x201C;<article-title>The proximal origin of SARS-CoV-2</article-title>,&#x201D; <source>Nature Medicine</source>, vol. <volume>26</volume>, no. <issue>4</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>3</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Singhal</surname></string-name></person-group>, &#x201C;<article-title>A review of coronavirus disease-2019 (COVID-19)</article-title>,&#x201D; <source>The Indian Journal of Pediatrics</source>, vol. <volume>87</volume>, no. <issue>4</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Groff</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>A. E.</given-names> <surname>Ssentongo</surname></string-name>, <string-name><given-names>D. M.</given-names> <surname>Ba</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Parsons</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Short-term and long-term rates of postacute sequelae of SARS-CoV-2 infection: A systematic review</article-title>,&#x201D; <source>JAMA Network Open</source>, vol. <volume>4</volume>, no. <issue>10</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>17</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Dudley</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Bai</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Dong</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Evaluation of the safety profile of COVID-19 vaccines: A rapid review</article-title>,&#x201D; <source>BMC Medicine</source>, vol. <volume>19</volume>, pp. <fpage>1407</fpage>&#x2013;<lpage>1416</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Lotfi</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Hamblin</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Rezaei</surname></string-name></person-group>, &#x201C;<article-title>COVID-19: Transmission, prevention, and potential therapeutic opportunities</article-title>,&#x201D; <source>Clinica Chimica Acta</source>, vol. <volume>508</volume>, no. <issue>10223</issue>, pp. <fpage>254</fpage>&#x2013;<lpage>266</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Shereen</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Kazmi</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Bashir</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Siddique</surname></string-name></person-group>, &#x201C;<article-title>COVID-19 infection: Origin, transmission, and characteristics of human coronaviruses</article-title>,&#x201D; <source>Journal of Advanced Research</source>, vol. <volume>24</volume>, no. <issue>9393</issue>, pp. <fpage>91</fpage>&#x2013;<lpage>98</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. A. H.</given-names> <surname>Mahmoud</surname></string-name>, <string-name><given-names>A. H.</given-names> <surname>Alharbi</surname></string-name> and <string-name><given-names>N. S.</given-names> <surname>Alghamdi</surname></string-name></person-group>, &#x201C;<article-title>A framework for mask-wearing recognition in complex scenes for different face sizes</article-title>,&#x201D; <source>Intelligent Automation &#x0026; Soft Computing</source>, vol. <volume>32</volume>, no. <issue>2</issue>, pp. <fpage>1153</fpage>&#x2013;<lpage>1165</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Anwar</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Nasrullah</surname></string-name> and <string-name><given-names>M. J.</given-names> <surname>Hosen</surname></string-name></person-group>, &#x201C;<article-title>COVID-19 and Bangladesh: Challenges and how to address them</article-title>,&#x201D; <source>Frontiers in Public Health</source>, vol. <volume>8</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Nowrin</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Afroz</surname></string-name>, <string-name><given-names>M. S.</given-names> <surname>Rahman</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Mahmud</surname></string-name> and <string-name><given-names>Y. -Z.</given-names> <surname>Cho</surname></string-name></person-group>, &#x201C;<article-title>Comprehensive review on facemask detection techniques in the context of COVID-19</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>106839</fpage>&#x2013;<lpage>106864</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Koklu</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Cinar</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Taspinar</surname></string-name></person-group>, &#x201C;<article-title>CNN-based bi-directional and directional long-short term memory network for determination of face mask</article-title>,&#x201D; <source>Biomedical Signal Processing and Control</source>, vol. <volume>71</volume>, no. <issue>8</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>13</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ansari</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Singh</surname></string-name></person-group>, &#x201C;<article-title>Monitoring social distancing through human detection for preventing/reducing COVID spread</article-title>,&#x201D; <source>International Journal of Information Technology</source>, vol. <volume>13</volume>, no. <issue>3</issue>, pp. <fpage>1255</fpage>&#x2013;<lpage>1264</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Deore</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Bodhula</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Udpikar</surname></string-name> and <string-name><given-names>V.</given-names> <surname>More</surname></string-name></person-group>, &#x201C;<article-title>Study of masked face detection approach in video analytics</article-title>,&#x201D; in <conf-name>Conf. on Advances in Signal Processing (CASP)</conf-name>, <conf-loc>Pune, India</conf-loc>, pp. <fpage>196</fpage>&#x2013;<lpage>200</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Das</surname></string-name>, <string-name><given-names>M. W.</given-names> <surname>Ansari</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Basak</surname></string-name></person-group>, &#x201C;<article-title>COVID-19 face mask detection using Tensorflow, Keras and OpenCV</article-title>,&#x201D; in <conf-name>IEEE India Council Int. Conf. (INDICON)</conf-name>, <conf-loc>Delhi, India</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Jiang</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Yan</surname></string-name></person-group>, &#x201C;<article-title>A deep learning based light-weight face mask detector with residual context attention and Gaussian heatmap to fight against COVID-19</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>96964</fpage>&#x2013;<lpage>96974</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Magoo</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Jindal</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Hooda</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Rana</surname></string-name></person-group>, &#x201C;<article-title>Deep learning based bird eye view social distancing monitoring using surveillance video for curbing the COVID-19 spread</article-title>,&#x201D; <source>Neural Computing and Applications</source>, vol. <volume>33</volume>, no. <issue>22</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Ahmed</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Rodrigues</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Jeon</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Din</surname></string-name></person-group>, &#x201C;<article-title>A deep learning based social distance monitoring framework for COVID-19</article-title>,&#x201D; <source>Sustainable Cities and Society</source>, vol. <volume>65</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Rahim</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Maqbool</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Rana</surname></string-name></person-group>, &#x201C;<article-title>Monitoring social distancing under various low light conditions with deep learning and a single motionless time of flight camera</article-title>,&#x201D; <source>PLoS One</source>, vol. <volume>16</volume>, no. <issue>2</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>19</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Bhambani</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Jain</surname></string-name> and <string-name><given-names>K. A.</given-names> <surname>Sultanpure</surname></string-name></person-group>, &#x201C;<article-title>Real-time face mask and social distancing violation detection system using YOLO</article-title>,&#x201D; in <conf-name>Bangalore Humanitarian Technology Conf. (B-HTC)</conf-name>, <conf-loc>Bangalore, India</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Meivel</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Devi</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Maheswari</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Menaka</surname></string-name></person-group>, &#x201C;<article-title>Real time data analysis of face mask detection and social distance measurement using MATLAB</article-title>,&#x201D; in <conf-name>Materials Today: Proceedings</conf-name>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2021</year>. <uri>https://www.sciencedirect.com/science/article/pii/S2214785320407606?via%3Dihub</uri>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Srivast</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Divekar</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Anilkumar</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Naik</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Kulkarni</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Comparative analysis of deep learning image detection algorithms</article-title>,&#x201D; <source>Journal of Big Data</source>, vol. <volume>8</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>27</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Divvala</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>You only look once: Unified, real-time object detection</article-title>,&#x201D; in <conf-name>IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)</conf-name>, <conf-loc>Nevada, USA</conf-loc>, pp. <fpage>779</fpage>&#x2013;<lpage>788</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<chapter-title>Faster R-CNN: Towards realtime object detection with region proposal networks</chapter-title>,&#x201D; in <source>Advances in Neural Information Processing Systems</source>, <person-group person-group-type="editor"><string-name><given-names>C.</given-names> <surname>Cortes</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Lawrence</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Lee</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Sugiyama</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Garnett</surname></string-name></person-group> (eds.), Vol. <volume>28</volume>, <publisher-name>Curran Associates, Inc</publisher-name>, <publisher-loc>USA</publisher-loc>, pp. <fpage>91</fpage>&#x2013;<lpage>99</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. A.</given-names> <surname>Bushra</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Yi</surname></string-name></person-group>, &#x201C;<article-title>Comparative analysis review of pioneering DBSCAN and successive density-based clustering algorithms</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>87 918</fpage>&#x2013;<lpage>87 935</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tai</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Qian</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>DSFD: Dual shot face detector</article-title>,&#x201D; in <conf-name>IEEE/CVF Conf. on Computer Vision and Pattern Recognition (CVPR)</conf-name>, <conf-loc>Nevada, USA</conf-loc>, pp. <fpage>5055</fpage>&#x2013;<lpage>5064</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ren</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Deep residual learning for image recognition</article-title>,&#x201D; in <conf-name>IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)</conf-name>, <conf-loc>Nevada, USA</conf-loc>, pp. <fpage>770</fpage>&#x2013;<lpage>778</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>T.-Y.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Maire</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Belongie</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Hays</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Perona</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<chapter-title>Common objects in context</chapter-title>,&#x201D; in <source>Computer Vision&#x2013;ECCV 2014</source>, <person-group person-group-type="editor"><string-name><given-names>D.</given-names> <surname>Fleet</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Pajdla</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Schiele</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Tuytelaars</surname></string-name></person-group> (eds.), <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>, pp. <fpage>740</fpage>&#x2013;<lpage>755</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X. R.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Xu</surname></string-name> and <string-name><given-names>P. P.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Deformation expression of soft tissue based on BP neural network</article-title>,&#x201D; <source>Intelligent Automation &#x0026; Soft Computing</source>, vol. <volume>32</volume>, no. <issue>2</issue>, pp. <fpage>1041</fpage>&#x2013;<lpage>1053</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X. R.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Sun</surname></string-name> and <string-name><given-names>S. K.</given-names> <surname>Jha</surname></string-name></person-group>, &#x201C;<article-title>A lightweight CNN based on transfer learning for COVID-19 diagnosis</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>72</volume>, no. <issue>1</issue>, pp. <fpage>1123</fpage>&#x2013;<lpage>1137</lpage>, <year>2022</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>