<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">IASC</journal-id>
<journal-id journal-id-type="nlm-ta">IASC</journal-id>
<journal-id journal-id-type="publisher-id">IASC</journal-id>
<journal-title-group>
<journal-title>Intelligent Automation &#x0026; Soft Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2326-005X</issn>
<issn pub-type="ppub">1079-8587</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">22359</article-id>
<article-id pub-id-type="doi">10.32604/iasc.2022.022359</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Framework for Mask-Wearing Recognition in Complex Scenes for Different Face Sizes</article-title><alt-title alt-title-type="left-running-head">A Framework for Mask-Wearing Recognition in Complex Scenes for Different Face Sizes</alt-title><alt-title alt-title-type="right-running-head">A Framework for Mask-Wearing Recognition in Complex Scenes for Different Face Sizes</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Hosni Mahmoud</surname><given-names>Hanan A.</given-names></name>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Alharbi</surname><given-names>Amal H.</given-names></name>
</contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Alghamdi</surname><given-names>Norah S.</given-names></name><email>NOSAlghamdi@pnu.edu.sa</email>
</contrib>
<aff id="aff-1"><institution>Department of Computer Sciences, College of Computer and Information Sciences, Princess Nourah Bint Abdulrahman University</institution>, <addr-line>Riyadh, 11047</addr-line>, <country>KSA</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Norah S. Alghamdi. Email: <email>NOSAlghamdi@pnu.edu.sa</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-11-3"><day>3</day>
<month>11</month>
<year>2021</year></pub-date>
<volume>32</volume>
<issue>2</issue>
<fpage>1153</fpage>
<lpage>1165</lpage>
<history>
<date date-type="received"><day>05</day><month>8</month><year>2021</year></date>
<date date-type="accepted"><day>06</day><month>9</month><year>2021</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Hosni Mahmoud et al.</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Hosni Mahmoud et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_IASC_22359.pdf"></self-uri>
<abstract>
<p>People are required to wear masks in many countries, now a days with the Covid-19 pandemic. Automated mask detection is very crucial to help identify people who do not wear masks. Other important applications is for surveillance issues to be able to detect concealed faces that might be a safety threat. However, automated mask wearing detection might be difficult in complex scenes such as hospitals and shopping malls where many people are at present. In this paper, we present analysis of several detection techniques and their performances. We are facing different face sizes and orientation, therefore, we propose one technique to detect faces of different sizes and orientations. In this research, we propose a framework to incorporate two deep learning procedures to develop a technique for mask-wearing recognition especially in complex scenes and various resolution images. A regional convolutional neural network (R-CNN) is used to detect regions of faces, which is further enhanced by introducing a different size face detection even for smaller targets. We combined that by an algorithm that can detect faces even in low resolution images. We propose a mask- wearing detection algorithms in complex situations under different resolution and face sizes. We use a convolutional neural network (CNN) to detect the presence of the mask around the detected face. Experimental results prove our process enhances the precision and recall for the combined detection algorithm. The proposed technique achieves Precision of 94.5&#x0025;, and is better than other techniques under comparison.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Mask detection</kwd>
<kwd>deep learning</kwd>
<kwd>CNN</kwd>
<kwd>small faces</kwd>
<kwd>Covid-19</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Masks play an important role in defending people against Covid-19. Many countries forced laws to weak masks in public. Some people may not adhere to such protective laws or may forget to wear masks. Nevertheless, wearing a mask might lessen the risk of corona infection [<xref ref-type="bibr" rid="ref-1">1</xref>]. A report of the WHO confirmed over 120 million confirmed cases with more than 2 million death by February 2021. Many of these cases could be avoided by taking precaution steps such as wearing masks and social distancing. Respiratory drops from sneezing or talking loudly can transmit covid-19 as depicted in [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>]. A proven way to reduce covid-19 spreading is to enforce mask wearing in public as advised by WHO [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>In this paper, we studied the micro-graphs of different types of masks by considering the texture features of the images of those masks. Texture features extraction can be performed by fractal analysis methods, Fourier transform techniques, also, gray co-occurrence matrix. Authors at [<xref ref-type="bibr" rid="ref-4">4</xref>], presented a study that indicated that gray co-occurrence matrix (GLCM) can quantify the spatial association among adjacent pixels in a digital image. GLCM is utilized in texture detection [<xref ref-type="bibr" rid="ref-5">5</xref>], skin texture identification [<xref ref-type="bibr" rid="ref-6">6</xref>], defect recognition [<xref ref-type="bibr" rid="ref-7">7</xref>], fabric cataloging [<xref ref-type="bibr" rid="ref-8">8</xref>] and egg fertility detection [<xref ref-type="bibr" rid="ref-9">9</xref>]. Authors in [<xref ref-type="bibr" rid="ref-10">10</xref>], categorized fabrics utilizing Support Vector Machine with GLCM and Component Analysis. The experimental results of [<xref ref-type="bibr" rid="ref-11">11</xref>] show that the technique of combining GLCM and Back propagation deliver 93&#x0025; accuracy. Since GLCM is efficient in texture analysis, we decided to utilize GLCM for the texture analysis of masks. Also, K Nearest Neighbor technique is simple and possess high performance in classification process [<xref ref-type="bibr" rid="ref-12">12</xref>], it is successfully used in text medical domain [<xref ref-type="bibr" rid="ref-13">13</xref>], ingestion patterns examination [<xref ref-type="bibr" rid="ref-14">14</xref>]. The authors in [<xref ref-type="bibr" rid="ref-15">15</xref>], concluded that k-nearest neighbor&#x0027;s algorithm (KNN) variant technique could identify the Coronavirus patients more accurately. Therefore our research utilizes the KNN algorithm to detect the time-wearing of masks based on the texture features mined from the micro-images at different period time.</p>
<p>CNN is a neuron machine model that uses convolution construction [<xref ref-type="bibr" rid="ref-16">16</xref>]. LeNet was one of the first CNN to be proposed. Rectified Linear machines gave a boast to CNN by year 2011. Object detection methodologies are established using Region-based Convolutional Neural Networks (R-CNN) [<xref ref-type="bibr" rid="ref-17">17</xref>]. R-CNN is an upgraded process based on CNN. Authors in [<xref ref-type="bibr" rid="ref-18">18</xref>], projected an epoch R-CNN system that was utilized mainly for object detection. It utilized search selective technique to identify borders from the rest of the image and normalize it to the CNN input size, followed by object identification by the SVM using linear regression model. The drawbacks were complicated training phase and slow testing speed.</p>
<p>An enhancement of the training phase and the testing speed of R-CNN was performed by the proposed Fast R-CNN. Fast R-CNN utilizes less layers with pooling layer. It also used the Softmax classifier instead of the SVM for better and faster classification [<xref ref-type="bibr" rid="ref-19">19</xref>]. However, the utilized selective search technique used in the fast R-CNN makes the computation speed not suitable for large data sets. Faster R-CNN introduces the combined feature extraction and bounding box regression to speed up the process. Faster R-CNN executes well for large objects (faces), but cannot identify small objects or faces [<xref ref-type="bibr" rid="ref-20">20</xref>]. We have to utilize other techniques for smaller objects that can optimizes face detection with respect to scale invariance and image resolution. Scale invariance is a central part of all current object detection methods.</p>
<p>Observing mask-wearing manually in large crowds can deem impossible and costly, especially in the Pilgrimage season. Reducing manual observation while making certain that all people are wearing masks all the time is an urgent problem. Image analysis tools can reduce the labor force and material costs, and can considerably protect people in many areas. Computer vision procedures and can be utilized for mask detection. Deep neural networks also are utilized greatly in object classification and recognition [<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
<p>There are very few computer vision research that was done for mask detection and is usually utilized in very simple scenes but not from surveillance images. For this reason we will survey other object detection techniques that can help us in our research. For object detection, researchers utilized a histogram of gradient and a SVM machine to detect persons and a Hough transform to detect helmets and located faces by background subtraction. They also used background modeling and face classification method called C4 to detect faces, and detect whether masks using color transformation and recognition [<xref ref-type="bibr" rid="ref-22">22</xref>]. But it was not suitable for complex scenes and energetic backgrounds, such as busy streets and Pilgrimage season. Authors in [<xref ref-type="bibr" rid="ref-23">23</xref>], utilized a shot object detector process to detect faces with RetinaNet with a multiple features to overcome the limitation in accuracy. They also utilized the (YOLO) algorithm to detect helmet wearing in images with less than four people.</p>
<p>In this research, we propose a framework to incorporate two deep learning procedures to develop a technique for mask-wearing recognition especially in complex scenes and various resolution images. In this paper, we propose a mask- wearing detection algorithms in complex situations under different resolution and face sizes. Testing data are utilized for validation of our model.</p>
<p>The paper is organized as follows, method is described in Section 2. Experiments are described and results are analyzed in Section 3. While conclusions are depicted in Section 4.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Method</title>
<p>The proposed technique is for mask-wearing detection purpose. The proposed model is comprised of two streams; the first one is to detect anchors for faces and the second stream is utilized to detect the mask-wearing objects. The block diagram of the proposed technique is depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The face detection stream identify and compute the anchor boxes which contains the faces using sliding window. <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Display anchor boxes over the whole pictures, while <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. Displays anchor boxes over two faces with different orientation.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The block diagram of the proposed technique</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-1.png"/>
</fig>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Anchor boxes over the whole picture</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-2.png"/>
</fig>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Anchor boxes over two faces with different orientation</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-3.png"/>
</fig>
<p>Mask-wearing detection can be addressed as object detection algorithm. Many object detection algorithms are established in the literature. Neural Networks (CNN), support vector machine (SVM) and naive Bayes classifier, are examples of such object detection algorithms. CNNs are found to be more suitable for scenes with complex surroundings that have many objects with different orientations. CNNs have superior accuracy and dependability for such cases. We choose to utilize convolution neural networks to resolve this matter.</p>
<p>In this paper, we are using Region-Based CNN for face detection (R-CNN). R-CNN performs feature extraction via a several layers. Namely: Input, Convolution, relu and pooling layers. Region-Based CNN uses a large number of anchor boxes to the input image. 256 positive anchor boxes and 256 negative boxes will be randomly selected in the training phase. Softmax layer will utilize these anchor boxes to extract candidate regions and their bounding boxes. R-CNN scans images utilizing a sliding windows above the anchors. At the end it yields the anchors with the maximum probability of enclosing a face object as shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The result of the Complete learning method (CLM)</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-4.png"/>
</fig>
<p>To calculate the similarity of an anchor box with the surrounding anchor boxes, using a sliding window, we use the metric in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>.<disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="normal">A</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">f</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">v</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">l</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">f</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">h</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">b</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">x</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">A</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">f</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">u</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">i</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">f</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">m</mml:mi><mml:mi mathvariant="normal">p</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mtext>&#xA0;</mml:mtext><mml:mi mathvariant="normal">b</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">x</mml:mi><mml:mi mathvariant="normal">e</mml:mi><mml:mi mathvariant="normal">s</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula>Positive anchor boxes have similarity&#x2009;&#x003E;&#x2009;0.75, while Negative ancher boxes have similarity measures&#x2009;&#x003C;&#x2009;0.3 All other anchor boxes with similarity measures between 0.3 and 0.75 will be ignored in the training.</p>
<p>For an input image <italic>Im</italic>, the ground-truth anchor are represented as <italic>G</italic>. <italic>G<sub>A</sub></italic> represents the selected anchor boxes with Similarity &#x003E; &#x025B;, where &#x025B; is a predetermined threshold that found to give better classification when it is equal to 0.75 as stated before. The symbol <italic>c<sub>A</sub></italic> represents the presence of the mask confidence score, calculated by the R-CNN algorithm. The term <italic>A</italic> represents the used algorithm, <italic>W<sub>A</sub></italic> is the weight of the CNN, and they are depicted in the following <xref ref-type="disp-formula" rid="eqn-2">Eqs. (2)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-4">(4)</xref><disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>G</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>c</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula><disp-formula id="ueqn-1">
<mml:math id="mml-ueqn-1" display="block"><mml:mi>w</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>G</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">]</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>c</mml:mi><mml:mi>A</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /></mml:math>
</disp-formula></p>
<p>Therefore, the Loss function can be defined as follows:<disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">]</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula>To train the model we use the minimization function in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.<disp-formula id="eqn-4"><label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:msubsup><mml:mi>W</mml:mi><mml:mi>A</mml:mi><mml:mo>&#x2217;</mml:mo></mml:msubsup><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow></mml:msub><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo stretchy="false">]</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>A</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>A</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>G</mml:mi></mml:math>
</disp-formula></p>
<sec id="s2_1">
<label>2.1</label>
<title>Different Size Face Detection Algorithm (DS-Face)</title>
<p>A one-stage face detection method is proposed that incorporates the fusion of multi-scale features. We present an algorithm called DS-Face to facilitate the detection for different small-sized faces. The training data are utilized on multiple scales to enhance the prediction accuracy. For an input image <italic>Im</italic>, we use <italic>G<sub>D</sub></italic> to represent the bounding boxes and <italic>c<sub>D</sub></italic> to represent the confidence scores computed by the Different small-Size Face Detection algorithm. Where, <italic>D</italic> represents the Different Size-Face Detection algorithm and <italic>W<sub>D</sub></italic> represents the weight, we can represent that in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>.<disp-formula id="eqn-5"><label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>D</mml:mi></mml:msub><mml:mspace width="thickmathspace" /></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>I</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>D</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula></p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Low and High Resolution Mask-Wearing Detection: LHMD</title>
<p>For the face anchors <italic>G<sub>A</sub></italic> and <italic>G<sub>D</sub></italic> identified by the proposed algorithm, we perform a merging procedure using the following scheme:</p>
<fig id="fig-12"><graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-12.png"/></fig>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>CNN for Mask-Wearing Detection</title>
<p>The three algorithms: R-CNN, DS-Face and LHMD are able to determine that the image encloses a face but cannot determine if a mask is present. Therefore, we are adding the usage of a CNN to identify mask-wearing. For anchor boxes identified by the Different Size-Face Detection algorithm, we will use a CNN for mask-wearing detection. The face region is enlarged and used as an input to the CNN for prediction. The confidence score designates there is a mask-wearing on the identified face through the CNN, as shown in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>.<disp-formula id="eqn-6"><label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>e</mml:mi><mml:mi>p</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>G</mml:mi><mml:mi>D</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>P</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula>P represents the forward propagation function of CNN. Where P denotes the composition of two convolutions followed by a fully connected (FC) layer. And the Loss function is presented in <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>.<disp-formula id="eqn-7"><label>(7)</label>
<mml:math id="mml-eqn-7" display="block"><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>z</mml:mi><mml:mi>e</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>D</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>G</mml:mi><mml:mi>D</mml:mi></mml:msub><mml:mspace width="thickmathspace" /><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>I</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>W</mml:mi><mml:mi>P</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>G</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula></p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experimental Results</title>
<p>A large dataset of mask-wearing faces is essential for the training phase of deep learning networks. Classification of mask-wearing faces and non- wearing masks. In [<xref ref-type="bibr" rid="ref-18">18</xref>], they build a public dataset of mask-wearing faces (CMFD), and non- wearing masks (IMFD) in the MaskedFace-Net. 137, 016 face images that include different sizes faces and are accessible at [<xref ref-type="bibr" rid="ref-18">18</xref>]. Half of them are mask-wearing faces and the other half of non-wearing mask faces.</p>
<p>The performance of our proposed method is evaluated using multiple criteria. We compute the true positive (TP), false positive FP, false negative FN, the recall and precision rates. Six-fold cross validation model is utilized for the experiments. The dataset is randomly partitioned into six partitions. The Training set contains half of the images (both faces with masks and without masks). The validation subset contains 2/6 and the testing subset contains 1/6 of the whole database.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Analysis</title>
<p>The experimental results of the R-CNN for identifying faces are depicted in <xref ref-type="fig" rid="fig-5">Figs. 5</xref> and <xref ref-type="fig" rid="fig-6">6</xref>. From the results, it is concluded that the R-CNN is fit to identify large faces, but not small faces. While the combined DS-Face algorithm combined with LHMD algorithm plus CNN can identify small faces with high and low resolution as shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, while <xref ref-type="fig" rid="fig-7">Fig. 7</xref> depicts the results of combing the three algorithms DS-Face &#x002B; LHMD &#x002B; R-CNN.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>R-CNN for identifying big faces</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-5.png"/>
</fig>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Detecting small faces with high and low resolution by DS-Face combined with LHMD &#x002B; CNN</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-6.png"/>
</fig>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>DS-Face &#x002B; LHMD &#x002B; R-CNN</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-7.png"/>
</fig>
<p>A comparison of the ability of the three models to detect faces in a given area is depicted in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>. The test included 2000 images from the data set. The number of faces of different sizes and different resolution are counted for each model. We compare the ground truth (the actual number of faces in a given area (the area is given by regions of images of area P x P, where p is the number of pixels) with the three models:<list list-type="simple"><list-item><label>1)</label>
<p>The R-CNN alone</p></list-item><list-item><label>2)</label>
<p>The R-CNN combined with the DS-Face</p></list-item><list-item><label>3)</label>
<p>The R-CNN with both the DS-Face and the LHMD models</p></list-item></list></p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Statistics of the ground truth against R-CNN, R-CNN&#x002B;DS-Face and R-CNN&#x002B;DS-Face&#x002B;LHMD algorithms</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-8.png"/>
</fig>
<p>The ground truth specified that there are 7883 faces with large size and 3170 faces with small sizes among the 2000 image dataset. The R-CNN model shows the ability to detect 7700 large size faces with positive rate of 97.5&#x0025; and 1050 small size faces with true positive rate of 33&#x0025;. It was also observed that the false negative rate (FN) of large size faces is 3.2&#x0025;, while the FN rate of small size faces is around 60.5&#x0025;. The results were expected because R-CNN is trained mainly on large size faces.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Accuracy of the Mask Detection Algorithm</title>
<p>In this section, we are going to evaluate the mask detection Algorithm accuracy. We found that increasing the number of training steps in the model will avoid overfitting. We made different sets of experiments to observe the accuracy with various number of training steps. At the training phase, we labeled the face in the training dataset with wearing or not wearing mask labels. We utilized images from the dataset as training subset to assess the accuracy of the model. We used 10,000 steps in the training procedure and tested the images in the training subset, the accuracy was not satisfactory with missing most of the small faces. We can state that the precision of the model is evaluated by the number of detected faces (large and small). We repeated the experiment by increasing number of training steps to 20,000 steps. We found that the number of detected faces in 1500 images as testing subset is kept at about 1900 target. With increasing the number of training steps, the count of detected faces increased slightly. with 50,000 training steps, the detected faces increased to 2511 faces. We measured the precision and recall rate at 50,000 training step milestone and found them to be 88.3&#x0025; and 86.9&#x0025; respectively. By increasing the training steps to 75,000 steps, the number of detected faces reached around 3000 faces with precision rate of 93.2&#x0025;, and recall of 91.3&#x0025;. The results for different number of training steps is depicted in <xref ref-type="table" rid="table-1">Tab. 1</xref>. We also display the ROC curve on the training image set in <xref ref-type="fig" rid="fig-9 fig-10 fig-11">Figs. 9&#x2013;11</xref>.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>ROC of mask detection algorithm preceded by the R-CNN alone</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-9.png"/>
</fig>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>ROC of the mask detection algorithm preceded by both R-CNN and DS-Face</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-10.png"/>
</fig>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>ROC of the mask detection algorithm preceded by R-CNN with both the DS-Face and the LHMD algorithms</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="IASC_22359-fig-11.png"/>
</fig>
<table-wrap id="table-1"><label>Table 1</label>
<caption>
<title>Number of training steps and accuracy of the model</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Number of training steps</th>
<th align="left">Precision (&#x0025;)</th>
<th align="left">Recall (&#x0025;)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">10,000</td>
<td align="left">80.0</td>
<td align="left">72.4</td>
</tr>
<tr>
<td align="left">20,000</td>
<td align="left">83.5</td>
<td align="left">81.9</td>
</tr>
<tr>
<td align="left">50,000</td>
<td align="left">88.3</td>
<td align="left">86.9</td>
</tr>
<tr>
<td align="left">75,000</td>
<td align="left">93.2</td>
<td align="left">91.3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The accuracy of the trained mall Face model of the DS-Face algorithm is aided by decreasing the scoring threshold to 0.5, to increase the capabilities of the Small Face model to identify the locations of the small faces. DS-Face face achieved detection with precision of 88.6&#x0025; for small faces, and recall rate of 68.9&#x0025;. To measure the accuracy of the R-CNN we cropped the images to include only one target. We applied the cropping to 7,000 images from the training image set. We got over 30,000 images with only one face in it after the cropping phase. We utilized 25,000 images as training input for the R-CNN, and the 5,000 images for the validation phase to measure the accuracy of the R-CNN. The 30,000 cropped images are partitioned into people with and without masks.</p>
<p>We utilized the cross-validation method [<xref ref-type="bibr" rid="ref-24">24</xref>,<xref ref-type="bibr" rid="ref-25">25</xref>], and the CNN composed of six convolution layers followed by pooling layers. The first layer and the second convolution kernel layer are defined as [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>], while the size of the third and fourth convolution kernel layers is [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>]. With this configuration, the precision of the classifier at the last layer reached 90.3&#x0025;. The ROC curve on the training set using the R-CNN alone is shown in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>.</p>
<p>The ROC curves of the R-CNN combined with DS-Face proves that such combination can enhance the accuracy of the classification result. The area under its ROC curve coverage increased to be 0.88, as shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>.</p>
<p>The ROC curve of the combined R-CNN and DS-Face has been increased from 0.83 to reach 0.88. The ROC curve covered of the model that combined R-CNN, DS-Face and the LHMD model has the largest area under the curve than the previous two models. This model is the best among the three models and has coverage area of 0.91.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Comparison of the Three Models</title>
<p>We performed a comparison between the three models using the metrics TPR, FPR, FNR, precision, and recall. The results are demonstrated in <xref ref-type="table" rid="table-2">Tabs. 2</xref> and <xref ref-type="table" rid="table-3">3</xref>. The tables summarize the same results showing the third model that combines the R-CNN and our propose algorithms DS-Face and LHMD works better in terms of all the metrics.</p>
<table-wrap id="table-2"><label>Table 2</label>
<caption>
<title>The metrics TPR, FPR, FNR</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">True positive rate (&#x0025;)</th>
<th align="left">False positive rate (&#x0025;)</th>
<th align="left">False negative rate (&#x0025;)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Mask detection algorithm preceded by R-CNN alone</td>
<td align="left">84.7</td>
<td align="left">43.1</td>
<td align="left">52.7</td>
</tr>
<tr>
<td align="left">M ask detection algorithm preceded by R-CNN and DS-Face algorithms</td>
<td align="left">89.8</td>
<td align="left">25.8</td>
<td align="left">21.9</td>
</tr>
<tr>
<td align="left">Mask detection algorithm preceded by R-CNN with DS-Face and LHMD algorithms</td>
<td align="left">95.6</td>
<td align="left">4.7</td>
<td align="left">4.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-3"><label>Table 3</label>
<caption>
<title>Comparison of precision and recall</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Model</th>
<th align="left">Precision (&#x0025;)</th>
<th align="left">Recall (&#x0025;)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Mask detection algorithm preceded by R-CNN alone</td>
<td align="left">85.4</td>
<td align="left">27.3</td>
</tr>
<tr>
<td align="left">Mask detection algorithm preceded by R-CNN and DS-Face algorithms</td>
<td align="left">93.5</td>
<td align="left">48.1</td>
</tr>
<tr>
<td align="left">Mask detection algorithm preceded by R-CNN with DS-Face and LHMD algorithms</td>
<td align="left">95.5</td>
<td align="left">57.5</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Conclusion</title>
<p>In this research, we propose a framework to incorporate two deep learning procedures to develop a technique for mask-wearing recognition especially in complex scenes and various resolution images. An R-CNN is used to detect regions of faces, which is further enhanced by introducing a different size face detection even for smaller targets. To increase the accuracy of our R-CNN we integrated two more algorithms. DS-Face to increase the accuracy of detecting different size faces especially small faces. While LHMD algorithm is to increase accuracy for low resolution targets. The combined model improves the accuracy to a higher extent. Our experiments proved that the DS-Face with the mask classifier CNN overcome the inadequacies that the R-CNN model alone have when trying to detect small faces. The accuracy of a single deep learning model R-CNN combined with the CNN for mask detection did not meet perform well for mask detection because it misses a lot of small faces and low resolution targets. We reached a conclusion of combining two other algorithms to the R-CNN to detect more faces to achieve better results. We combined the base algorithm R-CNN with other algorithms for better mask detection algorithm to improve the detection accuracy. Our model can be faster and performs in real time by utilizing GPU and distributed computing for faster processing speed especially in complex scenarios.</p>
</sec>
</body>
<back><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> This research was funded by the Deanship of Scientific Research at Princess Nourah bint Abdulrahman University through the Fast-track Research Funding Program.</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Hui</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Azhar</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Madani</surname></string-name></person-group>, &#x201C;<article-title>The continuing 2019-nCoV epidemic threat of novel coronaviruses to global health&#x2014;the latest 2019 novel coronavirus outbreak in wuhan China</article-title>,&#x201D; in <conf-name>Proc. of Int. Journal of Infectious Diseases</conf-name>, <conf-loc>Cleveland, Ohio</conf-loc>, vol. <volume>91</volume>, no. <issue>1</issue>, pp. <fpage>264</fpage>&#x2013;<lpage>266</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Sun</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Highsmith</surname></string-name></person-group>, &#x201C;<article-title>Performance comparison of deep-learning techniques for recognizing birds in aerial images</article-title>,&#x201D; in <conf-name>Proc. IEEE Third Int. Conf. on Data Science in Cyberspace (DSC)</conf-name>, <conf-loc>Athens, Greece</conf-loc>, vol. <volume>3</volume>, pp. <fpage>317</fpage>&#x2013;<lpage>324</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Kundu</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Zhu</surname></string-name></person-group>, &#x201C;<article-title>3d object proposals for accurate object class detection</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>424</fpage>&#x2013;<lpage>432</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Anguelov</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Erhan</surname></string-name></person-group>, &#x201C;<article-title>SSD: Single shot detector</article-title>,&#x201D; in <conf-name>Proc. of European Conference on Computer Vision</conf-name>, <conf-loc>Paris, France</conf-loc>, vol. <volume>4</volume>, pp. <fpage>21</fpage>&#x2013;<lpage>37</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Choi</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Lin</surname></string-name></person-group>, &#x201C;<article-title>Exploit all the layers: Fast and accurate CNN object detector with scale dependent pooling and cascaded rejection classifiers</article-title>,&#x201D; in <conf-name>Proc. of IEEE Conf. on Computer Vision and Pattern Recognition (CVPR)</conf-name>, <conf-loc>Cairo, Egypt</conf-loc>, vol. <volume>13</volume>, pp. <fpage>210</fpage>&#x2013;<lpage>221</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Szegedy</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ioffe</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Vanhoucke</surname></string-name></person-group>, &#x201C;<article-title>Inception-v4 inception-resnet and the impact of residual connections on learning</article-title>,&#x201D; <source>International Journal of Artificial Intelligence</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>123</fpage>&#x2013;<lpage>134</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sadiq</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Rehman</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Motor imagery EEG signals classification based on mode amplitude and frequency components using empirical wavelet transform</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, no. <issue>6</issue>, pp. <fpage>127678</fpage>&#x2013;<lpage>127692</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Cao</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xie</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Yang</surname></string-name></person-group>, &#x201C;<article-title>Feature-fused SSD: Fast detection for small objects</article-title>,&#x201D; in <conf-name>Proc. Ninth Int. Conf. on Graphic and Image Processing</conf-name>, <conf-loc>New York, NY</conf-loc>, vol. <volume>9</volume>, pp. <fpage>106</fpage>&#x2013;<lpage>113</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sadiq</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Zeming</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Rehman</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Motor imagery EEG signals decoding by multivariate empirical wavelet transform-based framework for robust brain&#x2013;Computer interfaces</article-title>,&#x201D; in <source>IEEE Access</source>, vol. <volume>7</volume>, no. <issue>9</issue>, pp. <fpage>171431</fpage>&#x2013;<lpage>171451</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Takacs</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Chandrasekhar</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Tsai</surname></string-name></person-group>, &#x201C;<article-title>Unified real-time tracking and recognition with rotation-invariant fast features</article-title>,&#x201D; in <conf-name>Proc. IEEE Computer Society Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Bern, Germany</conf-loc>, vol. <volume>9</volume>, pp. <fpage>934</fpage>&#x2013;<lpage>941</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zou</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Shi</surname></string-name></person-group>, &#x201C;<article-title>Ship detection in spaceborne optical image with SVD networks</article-title>,&#x201D; <source>IEEE Transactions on Geoscience and Remote Sensing</source>, vol. <volume>54</volume>, no. <issue>10</issue>, pp. <fpage>5832</fpage>&#x2013;<lpage>5845</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Zhou</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Han</surname></string-name></person-group>, &#x201C;<article-title>Learning rotation-invariant convolutional neural networks for object detection in VHR optical remote sensing images</article-title>,&#x201D; <source>IEEE Transactions on Geoscience and Remote Sensing</source>, vol. <volume>54</volume>, no. <issue>12</issue>, pp. <fpage>7405</fpage>&#x2013;<lpage>7415</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Ouyang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Zeng</surname></string-name></person-group>, &#x201C;<article-title>Deepid-net: Deformable deep-convolutional neural networks for object detection</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Ostrava, CZ</conf-loc>, vol. <volume>10</volume>, pp. <fpage>2403</fpage>&#x2013;<lpage>2412</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Sermanet</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>LeCun</surname></string-name></person-group>, &#x201C;<article-title>Traffic sign recognition with multi-scale convolutional networks</article-title>,&#x201D; in <conf-name>Proc. of Int. Joint Conf. on Neural Networks</conf-name>, <conf-loc>Napoli, Italy, vol.</conf-loc> <volume>7</volume><conf-loc>, pp.</conf-loc> <fpage>2809</fpage>&#x2013;<lpage>2813</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Liang</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Traffic-sign detection and classification in the wild</article-title>,&#x201D; <source>Computer Vision and Pattern Recognition</source>, vol. <volume>2</volume>, no. <issue>3</issue>, pp. <fpage>2110</fpage>&#x2013;<lpage>2118</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Jin</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Fu</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Traffic sign recognition with hinge loss trained convolutional neural networks</article-title>,<italic>&#x201D;</italic> <source>IEEE Transactions on Intelligent Transportation Systems</source>, vol. <volume>15</volume>, no. <issue>5</issue>, pp. <fpage>1991</fpage>&#x2013;<lpage>2000</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Sagonas</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Antonakos</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Tzimiropoulos</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Zafeiriou</surname></string-name></person-group>, &#x201C;<article-title>300 faces in-the-wild challenge: Database and results</article-title>,&#x201D; <source>Image and Vision Computing</source>, vol. <volume>47</volume>, no. <issue>2</issue>, pp. <fpage>3</fpage>&#x2013;<lpage>18</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Cabani</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Hammoudi</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Benhabiles</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Melkemi</surname></string-name></person-group>, &#x201C;<article-title>Masked face-net&#x2013;A dataset of correctly/incorrectly masked face images in the context of COVID-19</article-title>,&#x201D; <source>Smart Health</source>, vol. <volume>19</volume>, no. <issue>1</issue>, pp. <fpage>125</fpage>&#x2013;<lpage>137</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Anguelov</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Erhan</surname></string-name></person-group>, &#x201C;<article-title>SSD: Single shot multibox detector</article-title>,&#x201D; <source>Computer Vision</source>, vol. <volume>1</volume>, no. <issue>2</issue>, pp. <fpage>21</fpage>&#x2013;<lpage>37</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>He</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Scale adaptive proposal network for object detection in remote sensing images</article-title>,&#x201D; <source>IEEE Geoscience and Remote Sensing Letters</source>, vol. <volume>16</volume>, no. <issue>6</issue>, pp. <fpage>864</fpage>&#x2013;<lpage>868</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Howard</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Mobilenets: Efficient convolutional neural networks for mobile vision applications</article-title>,&#x201D; <source>Computer Vision</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>214</fpage>&#x2013;<lpage>224</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Freeman</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Roese-Koerner</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Kummert</surname></string-name></person-group>, &#x201C;<article-title>Effnet: An efficient structure for convolutional neural networks</article-title>,&#x201D; <source>Image Processing</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>310</fpage>&#x2013;<lpage>321</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>YOLO9000: Better faster stronger</article-title>,&#x201D; <source>Computer Vision and Pattern Recognition</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>117</fpage>&#x2013;<lpage>129</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sadiq</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Yuan</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Aziz</surname></string-name></person-group>, &#x201C;<article-title>Motor imagery BCI classification based on novel two-dimensional modelling in empirical wavelet transform</article-title>,&#x201D; <source>Electronics Letters</source>, vol. <volume>12</volume>, no. <issue>2</issue>, pp. <fpage>1367</fpage>&#x2013;<lpage>1369</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Sadiq</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Yuan</surname></string-name></person-group>, &#x201C;<article-title>Exploiting dimensionality reduction and neural network techniques for the development of expert brain&#x2013;computer interfaces</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>16</volume>, no. <issue>1</issue>, pp. <fpage>123</fpage>&#x2013;<lpage>134</lpage>, <year>2021</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>