<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">42466</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2023.042466</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Novel Rifle Number Recognition Based on Improved YOLO in Military Environment</article-title>
<alt-title alt-title-type="left-running-head">Novel Rifle Number Recognition Based on Improved YOLO in Military Environment</alt-title>
<alt-title alt-title-type="right-running-head">Novel Rifle Number Recognition Based on Improved YOLO in Military Environment</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Kwon</surname><given-names>Hyun</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>hkwon.cs@gmail.com</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Lee</surname><given-names>Sanghyun</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Artificial Intelligence and Data Science, Korea Military Academy</institution>, <addr-line>Seoul</addr-line>, <country>Korea</country></aff>
<aff id="aff-2"><label>2</label><institution>Graduate School of Information Security, Korea Advanced Institute of Science and Technology</institution>, <addr-line>Daejeon</addr-line>, <country>Korea</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Hyun Kwon. Email: <email>hkwon.cs@gmail.com</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>30</day>
<month>1</month>
<year>2024</year></pub-date>
<volume>78</volume>
<issue>1</issue>
<fpage>249</fpage>
<lpage>263</lpage>
<history>
<date date-type="received"><day>31</day><month>5</month><year>2023</year></date>
<date date-type="accepted"><day>25</day><month>7</month><year>2023</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Kwon and Lee</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Kwon and Lee</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_42466.pdf"></self-uri>
<abstract>
<p>Deep neural networks perform well in image recognition, object recognition, pattern analysis, and speech recognition. In military applications, deep neural networks can detect equipment and recognize objects. In military equipment, it is necessary to detect and recognize rifle management, which is an important piece of equipment, using deep neural networks. There have been no previous studies on the detection of real rifle numbers using real rifle image datasets. In this study, we propose a method for detecting and recognizing rifle numbers when rifle image data are insufficient. The proposed method was designed to improve the recognition rate of a specific dataset using data fusion and transfer learning methods. In the proposed method, real rifle images and existing digit images are fused as training data, and the final layer is transferred to the Yolov5 algorithm model. The detection and recognition performance of rifle numbers was improved and analyzed using rifle image and numerical datasets. We used actual rifle image data (K-2 rifle) and numeric image datasets, as an experimental environment. TensorFlow was used as the machine learning library. Experimental results show that the proposed method maintains 84.42&#x0025; accuracy, 73.54&#x0025; precision, 81.81&#x0025; recall, and 77.46&#x0025; F1-score in detecting and recognizing rifle numbers. The proposed method is effective in detecting rifle numbers.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Machine learning</kwd>
<kwd>deep neural network</kwd>
<kwd>rifle number recognition</kwd>
<kwd>detection</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Future Strategy and Technology Research Institute</funding-source>
<award-id>RN: 23-AI-04</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Hwarang-Dae Research Institute</funding-source>
<award-id>2023B1015</award-id>
</award-group>
<award-group id="awg3">
<funding-source>Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education</funding-source>
<award-id>2021R1I1A1A01040308</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Deep neural networks [<xref ref-type="bibr" rid="ref-1">1</xref>] perform well in image recognition [<xref ref-type="bibr" rid="ref-2">2</xref>&#x2013;<xref ref-type="bibr" rid="ref-5">5</xref>] and speech recognition [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-9">9</xref>]. A deep neural network was applied to a closed-circuit television (CCTV) system [<xref ref-type="bibr" rid="ref-10">10</xref>&#x2013;<xref ref-type="bibr" rid="ref-12">12</xref>] to enhance performance in real-time object recognition. A deep neural network can be linked to CCTV to recognize a specific person or object and provide the recognized information to the observer. In particular, there are many surveillance requirements in military situations; however, because the number of surveillance personnel is small, automated detection using CCTV is necessary. Especially, in a military environment, there is a need for a system that allows the model to detect objects by itself and provide warnings without a person by installing a deep neural network on such CCTVs. In military environments, object detection and recognition can be used to manage military equipment. There are various types of equipment [<xref ref-type="bibr" rid="ref-13">13</xref>] in a military environment, and an automated management system is required. Among these pieces of equipment, the importance of managing the main firearm, the rifle, is high, and the firearm is currently managed by manually writing the rifle number. In this study, we propose a method for automatically detecting and recognizing rifle management.</p>
<p>In addition, the amount of data is insufficient because data collection is limited owing to security issues in military environments. Therefore, to recognize the rifle number, there is a need for methods to increase the rifle number recognition in a situation where the number of data is insufficient. Furthermore, we added a method to improve the detection performance of rifle numbers by combining data from rifle images published on the Internet with other types of numeric image datasets such as modified national institute of standards and technology database (MNIST) [<xref ref-type="bibr" rid="ref-14">14</xref>].</p>
<p>In this study, we propose a method for detecting rifle numbers using deep neural networks in situations where the amount of actual rifle image data is insufficient. The proposed method was designed to improve the recognition rate of a specific dataset using data fusion and transfer learning methods. In the proposed method, real rifle images and existing digit images are fused as training data, and the final layer is transferred to the Yolov5 algorithm model. This method detects the area corresponding to the rifle number and identifies the number of rifles in the detected area. In addition, a method for improving the performance of the proposed method was added by utilizing other datasets in situations where the amount of actual rifle data was insufficient. The contributions of this study are as follows. We propose a rifle number detection method that learns a real rifle dataset and another numerical dataset in a fusion format in situations where there is a lack of real rifle data. In terms of the experimental environment, we utilized the actual K-2 rifle dataset and fused various numerical datasets to conduct a comparative analysis of the performance.</p>
<p>The remainder of this paper is organized as follows. Related studies are explained in <xref ref-type="sec" rid="s2">Section 2</xref>. In <xref ref-type="sec" rid="s3">Section 3</xref>, the methodology is described. In <xref ref-type="sec" rid="s4">Section 4</xref>, the experiments and evaluations are explained. <xref ref-type="sec" rid="s5">Section 5</xref> concludes the paper.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Work</title>
<p>This section describes deep neural networks, object recognition using deep learning, and data augmentation methods.</p>
<sec id="s2_1"><label>2.1</label><title>YOLO Model</title>
<p>The Yolo model is a deep learning model that detects objects by processing the entire image at once. Through this method, it is possible to detect the position of an object in an image and its classification. The Yolo model can be divided into object location detection and object classification.</p>
<p>In terms of object location detection, object position detection uses a rectangular bounding box to measure the length based on the coordinates of the four vertices of the detected object. The bounding box is a regression problem that detects an appropriate object location while reducing the difference between the correct value and the predicted value.</p>
<p>In terms of object classification, we use deep neural networks. The structure of a deep neural network comprises a combination of nodes and an input layer, a hidden layer, and an output layer. In the input layer, a value corresponding to the input data is assigned to each node, and the resulting value is activated through a linear operation such as multiplication and addition. If the value obtained through the activation function was less than a certain value, a value of 0 was provided; if it was more than a certain value, a value of 1 or the current value was provided. This varies depending on the activation function, rectified linear activation unit (ReLU) [<xref ref-type="bibr" rid="ref-15">15</xref>], sigmoid [<xref ref-type="bibr" rid="ref-16">16</xref>], etc., and the calculated result value is transferred to the next layer. The hidden layer is internally composed of several layers, and as the number of layers increases, the number of computations increases. Prior to the improvement of computing technology, operation speed was slow; however, the development of computer technology has made operations faster through the use of parallel operations. In the hidden layer, if the value assigned to each node is less than a certain value through the activation function, the resultant value calculated through multiplication and addition is assigned a value of 0. In the output layer, the calculated value is transferred to the last hidden layer, and the probability value of each class is calculated according to the number of classes required through the softmax layer [<xref ref-type="bibr" rid="ref-17">17</xref>]. The largest value among the probability values was for the class recognized for the input data, and the sum of the probability values of each class was 1. In the deep neural network process, each node and weight parameter is optimized for the deep neural network using a cross-entropy loss function and a gradient descent method, while considering the correspondence between the predicted value for the input value and the actual class. Thus, after the deep neural network is completely trained, if new test data are provided as inputs to the network, a highly accurate prediction value is obtained. The proposed model, which is composed of a complex structure based on a yolo model, was used. The details are described in <xref ref-type="sec" rid="s3">Section 3</xref>.</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>Object Recognition Using Deep Learning</title>
<p>As recognition methods develop, recognition technologies based on specific colors and shapes, such as road signs and license plates, are being actively used. Methods for recognizing objects using deep learning include signboards [<xref ref-type="bibr" rid="ref-18">18</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>], road signs [<xref ref-type="bibr" rid="ref-21">21</xref>&#x2013;<xref ref-type="bibr" rid="ref-23">23</xref>], and vehicle license plates [<xref ref-type="bibr" rid="ref-24">24</xref>&#x2013;<xref ref-type="bibr" rid="ref-26">26</xref>]. It is necessary to divide the boundaries of the signboard in the image to recognize signs of various colors and shapes. To clearly distinguish the boundary line of the signboard, a study is conducted to identify the boundary line of the signboard by photographing a signboard that emits light at night and searching only areas with high pixel brightness values. In the daytime environment, the recognition of a sign using an image is achieved by applying technology that recognizes a road sign or license plate.</p>
<p>Unlike signboards, road signs and license plates are produced to a fixed standard; therefore, they are of constant size. In addition, the color combination is simple so that people can easily recognize it. To recognize road signs, studies have been conducted to extract green and blue areas using the red, green, and blue (RGB) color model to detect the boundary of the sign and extract the boundary line, as well as methods to find the four vertices of the road sign using a polar coordinate system. There are various methods for recognizing license plates: recognizing the license plate of a vehicle based on a feature descriptor, recognizing a license plate pattern using a change in lighting, gray preprocessing, and a multistep process to find the arranged square size on the license plate. In this study, we developed a method for recognizing the numbers on rifles. Rifle numbers are smaller and less visible to humans than license plates, and the background and number of colors are almost similar. In addition, in a situation where it is limited to obtaining a dataset of rifle numbers owing to the military situation, a study was conducted on a number-recognition method that improved the recognition of rifle numbers.</p>
</sec>
<sec id="s2_3"><label>2.3</label><title>Data Augmentation Methods and Data Fusion Methods</title>
<p>It is important to secure enough data using deep-learning technologies. However, in the healthcare field, biofield, and military fields, it is difficult to secure data owing to the privacy issues of patients and confidential military documents. When the image data are insufficient, they are augmented [<xref ref-type="bibr" rid="ref-27">27</xref>] using various linear techniques, such as rotating or expanding the original data and then cropping or flipping it up and down. In addition, methods for data augmentation using generative adversarial networks (GANs) have been proposed [<xref ref-type="bibr" rid="ref-28">28</xref>]. However, these methods have a drawback in that the data cannot be formed into various distributions. In this study, the proposed method augments data using mirroring, random cropping, rotation, and shearing methods. Additionally, the proposed method uses data fusion after multiple datasets are built. We improved the recognition of rifle numbers by combining these datasets with other datasets. Even if the dataset was insufficient, a method for improving the number recognition of rifles in various distributions was applied by combining different types of datasets.</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Proposed System</title>
<p>Data preprocessing is performed on the rifle number dataset used in the proposed method. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the rifle number-related dataset is first collected. After that, the rifle number dataset is scaled to a size of 640&#x2009;&#x00D7;&#x2009;640. After that, mirroring, random cropping, rotation, and shearing are performed for data augmentation. After that, set the box bounding and each classification result value for the annotations, which are the correct values for each data.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Overview of the pre-processing procedure</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-1.tif"/></fig>
<p>The proposed method is divided into a part that detects an object and draws a bounding box, and a part that detects a rifle number inside the bounding box. When detecting the rifle number in the bounding box, transfer learning is performed by applying a fusion method that mixes the real rifle dataset with other types of digit datasets. Transfer learning for the last layer trains the model to specialize in rifle number detection. In terms of the structural model, the proposed methodology is used to detect and recognize the number of rifles. This methodology uses a single neural network structure, predicts the bounding box, and shows the probability value of a recognized class for each test image. First, the model divides the entire image into multiple grids of specific sizes and generates the corresponding anchor boxes in each grid for the predicted scale and size. Each anchor box has an object score, x and y of the box center offset, box width, box height, and predicted class score. This model detects objects rapidly and enables one-stage object detection. In addition, it has good detection accuracy for objects.</p>
<p>In terms of the detailed structure, the methodology is a lightweight algorithm improved by Yolov3 and Yolov5 [<xref ref-type="bibr" rid="ref-29">29</xref>], and data augmentation, an activation function, and multi-graphics processing unit (multi-GPU) learning are performed. The proposed method consists of three parts: backbone, neck, and prediction, as shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The backbone extracts the features of each image. Because a cross-stage partial network (CSPNet) [<xref ref-type="bibr" rid="ref-30">30</xref>] has the advantage of fast processing time, CSPNet is used as a backbone and extracts feature maps from images. In addition, performance was improved by providing various input image sizes through a spatial pyramid pooling layer. The neck extracts feature maps in different stages. A path aggregation network (PANet) [<xref ref-type="bibr" rid="ref-31">31</xref>] was used as a neck to obtain feature pyramids. Feature pyramids help the model perform good detection on unseen data and generalize well in object scaling, which helps to identify the same object of different sizes and scales. The prediction classifies the input image and binds the object in the form of a box. In the prediction, anchor boxes are applied to each object and the final output vectors are generated using the class probability, object scores, and bounding boxes. Convolution, batch normalization, and leaky-ReLU (CBL) are composed of a combination of a convolution layer, batch normalization layer, and a leaky-ReLU activation function. CSP1 is composed of a CBL, residual unit (Res Unit), convolution layer, and concat layer. The residual unit is used in the residual structure and forms the basis of the CBL module. It also uses direct superposition of the tensor through the added layer. Res N contains n residual units of residual blocks and adds a CBL module and a zero-padding layer. CSP2 is composed of a CBL replaced with a residual unit from the existing CSP1. A detailed description of each component is described in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Overview of the methodology</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-2.tif"/></fig><table-wrap id="table-1"><label>Table 1</label><caption><title>Characteristics table and components</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Characteristics</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Cross-stage partial network (CSPNet or CSP)</td>
<td align="left">It backbone and extracts feature maps from images. In addition, performance was improved by providing various input image sizes through a spatial pyramid pooling layer. The neck extracts feature maps in different stages.</td>
</tr>
<tr>
<td align="left">Path aggregation network (PANet)</td>
<td align="left">It used as a neck to obtain feature pyramids. Feature pyramids help the model perform good detection on unseen data and generalize well in object scaling, which helps to identify the same object of different sizes and scales. The prediction classifies the input image and binds the object in the form of a box. In the prediction, anchor boxes are applied to each object and the final output vectors are generated using the class probability, object scores, and bounding boxes.</td>
</tr>
<tr>
<td align="left">Convolution, batch normalization, and leaky-ReLU (CBL)</td>
<td align="left">Combination of a convolution layer, batch normalization layer, and a leaky-ReLU activation function.</td>
</tr>
<tr>
<td align="left">Concatenate (Concat)</td>
<td align="left">A method that connects an array in the direction of the selected axis.</td>
</tr>
<tr>
<td align="left">Convolution layer (Conv)</td>
<td align="left">Calculate as much data as the filter size as one value and use different filters as many as the number of output channels.</td>
</tr>
<tr>
<td align="left">Spatial pyramid pooling (SPP)</td>
<td align="left">A process of converting Region Proposals of different sizes into vectors of fixed size when passing a specific pooling process.</td>
</tr>
<tr>
<td align="left">Batch normalization layer</td>
<td align="left">Batch normalization is to place a normalization layer for each layer and adjust it so that the distorted distribution does not appear.</td>
</tr>
<tr>
<td align="left">Residual block or residual unit</td>
<td align="left">Passing the result of one layer not only to the next layer, but also to a later layer.</td>
</tr>
<tr>
<td align="left">Leaky-ReLU</td>
<td align="left">Set the output value to output a very small value such as not equal to when the input value is negative.</td>
</tr>
<tr>
<td align="left">Upsampling</td>
<td align="left">This is to increase the sampling period of the signal.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In terms of the rifle number-recognition procedure, the methodology is divided into a part that detects the area of the rifle number, and a method that recognizes each number in the detected area. When the input data were entered through the proposed model, the area corresponding to the number of rifles was first detected as a square, after which the process of recognizing each character through segmentation was performed for each character in the detected area.</p>
</sec>
<sec id="s4"><label>4</label><title>Performance Evaluations</title>
<p>To demonstrate the performance of rifle number recognition, an experimental evaluation was performed using the proposed method implemented based on PyTorch, using images related to rifle numbers as a dataset. The firearm used in the experiment was a Korean K2 rifle.</p>
<sec id="s4_1"><label>4.1</label><title>Datasets</title>
<p>The dataset consisted of two main types. Because the number of actual rifle images was small, numerical image data were used. One was the rifle data with the rifle number visible in front and the other was the numerical image data. The dataset was built with 375 images, including actual rifle images and augmentation. The rifle number is located in the center, the rifle body part is taken as a standard, and is based on an image that is not inclined. The rifle number was labeled, the training data consisted of 300 pieces, and the test data consisted of 75 pieces.</p>
<p>The purpose of this dataset was to first detect the rifle number area in the rifle image, then cut out the area containing the rifle number and save it. Subsequently, the recognition accuracy of the rifle number can be increased by detecting only the rifle number in the area in which it is located. <xref ref-type="table" rid="table-2">Table 2</xref> lists the training and test data for the four types. Another dataset, numeric pictures, was divided into four types to test the performance. In the first type: 1000 numbers in each font; in the second type: 25 images of rifle numbers and 1000 numbers in each font; in the third type: 2000 numbers for each font and pictures of 25 rifle numbers; in the fourth type: 949 MNIST images and images of 25 rifle figures.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Training data and test data in four types</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Description</th>
<th align="left">Type 1</th>
<th align="left">Type 2</th>
<th align="left">Type 3</th>
<th align="left">Type 4</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Training data</td>
<td align="left">800</td>
<td align="left">820</td>
<td align="left">1620</td>
<td align="left">779</td>
</tr>
<tr>
<td align="left">Test data</td>
<td align="left">200</td>
<td align="left">205</td>
<td align="left">405</td>
<td align="left">195</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The numbers for each font are images written from 0 to 9 in 100 different fonts, and a dataset was constructed by inputting the numbers for each font using Microsoft Word and capturing and labeling the captured images. The numbers are white, and the background is black. A rifle number image is an image cut out by detecting only the rifle number part of the rifle image, and only the number part is labeled. In each dataset, the training data were divided into 80&#x0025; and test data into 20&#x0025;; the details are listed in <xref ref-type="table" rid="table-2">Table 2</xref>. The sizes of all images used in the training were designated as 416&#x2009;&#x00D7;&#x2009;416. Separately, 11 images of firearms that were not used for training were prepared to test rifle number recognition.</p>

</sec>
<sec id="s4_2"><label>4.2</label><title>Model Configuration</title>
<p>The detection and recognition models used a methodological model. The model was constructed using a binary cross-entropy loss function to calculate the loss of class probability and object score. Stochastic gradient descent (SGD) [<xref ref-type="bibr" rid="ref-32">32</xref>] was used as the optimization function. The model parameters are listed in <xref ref-type="table" rid="table-3">Table 3</xref>. The best experimental values were used as the numerical values of the parameters, listed in <xref ref-type="table" rid="table-3">Table 3</xref>. There is a change in performance according to the change in each parameter; among them, the parameter value corresponding to the sweet spot is the result obtained experimentally.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Model parameters</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Parameter</th>
<th align="left">Values</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Learning rate</td>
<td align="left">0.01</td>
</tr>
<tr>
<td align="left">Momentum</td>
<td align="left">0.937</td>
</tr>
<tr>
<td align="left">Optimizer weight decay</td>
<td align="left">0.0005</td>
</tr>
<tr>
<td align="left">Warmup epochs</td>
<td align="left">3.0</td>
</tr>
<tr>
<td align="left">Warmup initial momentum</td>
<td align="left">0.8</td>
</tr>
<tr>
<td align="left">Warmup initial bias</td>
<td align="left">0.1</td>
</tr>
<tr>
<td align="left">Box loss gain</td>
<td align="left">0.05</td>
</tr>
<tr>
<td align="left">Cls loss gain</td>
<td align="left">0.5</td>
</tr>
<tr>
<td align="left">Anchor-multiple threshold</td>
<td align="left">4.0</td>
</tr>
<tr>
<td align="left">Image HSV-Hue augmentation (fraction)</td>
<td align="left">0.015</td>
</tr>
<tr>
<td align="left">Image HSV-Saturation augmentation (fraction)</td>
<td align="left">0.7</td>
</tr>
<tr>
<td align="left">Image HSV-Value augmentation (fraction)</td>
<td align="left">0.4</td>
</tr>
<tr>
<td align="left">Image flip left-right (probability)</td>
<td align="left">0.5</td>
</tr>
<tr>
<td align="left">Epochs</td>
<td align="left">600</td>
</tr>
<tr>
<td align="left">Batch size</td>
<td align="left">24</td>
</tr>
<tr>
<td align="left">Number of classes</td>
<td align="left">80</td>
</tr>
<tr>
<td align="left">Model depth multiple</td>
<td align="left">0.33</td>
</tr>
<tr>
<td align="left">Layer channel multiple</td>
<td align="left">0.50</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3"><label>4.3</label><title>Experiment Results</title>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the process of detecting the area corresponding to the rifle number and recognizing each number in the detected area. The detection and recognition of the rifle number are performed, as shown in <xref ref-type="fig" rid="fig-3">Figs. 3a</xref>&#x2013;<xref ref-type="fig" rid="fig-3">3d</xref>. First, we trained the model after assigning the labeling of the rifle number area in the image of the rifle. Consequently, it is possible to detect a location with a rifle number for an arbitrary rifle image, and only the corresponding area is cut using the bounding box x- and y-coordinates of the detected rifle number. The number of rifles was then determined for this cut-out area. Each identified rifle number is sorted in the order of the x-coordinates and then outputted as one continuous rifle number.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Process of detecting the area corresponding to the rifle number and recognizing each number in the detected area</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-3.tif"/></fig>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows the image data obtained using different types of numerical image datasets to increase the recognition of actual rifle numbers. The image was larger; however, for ease of identification, the image was cut such that the number part was visible, as shown in <xref ref-type="fig" rid="fig-4">Fig. 4a</xref>. The dataset consisting of the rifle and augmented images shown in <xref ref-type="fig" rid="fig-4">Fig. 4a</xref> was used to detect the location of the rifle number. <xref ref-type="fig" rid="fig-4">Figs. 4b</xref>&#x2013;<xref ref-type="fig" rid="fig-4">4d</xref> show examples of the datasets used to recognize rifle numbers. <xref ref-type="fig" rid="fig-4">Fig. 4b</xref> shows an example of the actual number of rifles. <xref ref-type="fig" rid="fig-4">Fig. 4c</xref> shows an example of the font-specific numbers provided by Microsoft Word, and <xref ref-type="fig" rid="fig-4">Fig. 4d</xref> shows an example from MNIST.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Image data using different types of numeric image datasets to increase recognition of the rifle number</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-4.tif"/></fig>
<p><xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows an example of the results of detecting the rifle number area. In the figure, the area corresponding to the rifle number in the model was detected well by the bounding box. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the results of recognizing the rifle number for each number in the cut-out rifle number bounding box. In the figure, each rifle number is recognized correctly with a high probability.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>Example of the result of detecting the rifle number area</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-5.tif"/></fig><fig id="fig-6"><label>Figure 6</label><caption><title>Result of recognizing the rifle number for each number in the cut-out rifle number bounding box</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-6.tif"/></fig>
<p><xref ref-type="table" rid="table-4">Table 4</xref> lists the accuracy, precision, recall, and F1-score of rifle number recognition for each of the six cases. Accuracy is the percentage match between the number of rifles and the number recognized by the model. Experiments were conducted for six cases: Case 1:1000 numbers for each font; Case 2:1000 numbers for each font, 1000 license plates, and 25 rifle number images; Case 3:2000 numbers for each font, 1000 license plates, and 25 rifle number images; Case 4:949 MNIST images and 25 rifle number images; Case 5: Case 4 with 1000 epochs; and Case 6:25 rifle number images.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Accuracy, precision, recall, and F1-score of rifle number recognition for each of the six cases</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Description</th>
<th align="left">Case 1</th>
<th align="left">Case 2</th>
<th align="left">Case 3</th>
<th align="left">Case 4</th>
<th align="left">Case 5</th>
<th align="left">Case 6</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Accuracy</td>
<td align="left">0&#x0025;</td>
<td align="left">84.42&#x0025;</td>
<td align="left">75.32&#x0025;</td>
<td align="left">63.64&#x0025;</td>
<td align="left">61.04&#x0025;</td>
<td align="left">37.66&#x0025;</td>
</tr>
<tr>
<td align="left">Precision</td>
<td align="left">0&#x0025;</td>
<td align="left">73.54&#x0025;</td>
<td align="left">73.74&#x0025;</td>
<td align="left">77.93&#x0025;</td>
<td align="left">72.93&#x0025;</td>
<td align="left">42.76&#x0025;</td>
</tr>
<tr>
<td align="left">Recall</td>
<td align="left">0&#x0025;</td>
<td align="left">81.81&#x0025;</td>
<td align="left">75.90&#x0025;</td>
<td align="left">57.26&#x0025;</td>
<td align="left">55.60&#x0025;</td>
<td align="left">28.52&#x0025;</td>
</tr>
<tr>
<td align="left">F1-score</td>
<td align="left">0&#x0025;</td>
<td align="left">77.46&#x0025;</td>
<td align="left">74.81&#x0025;</td>
<td align="left">66.02&#x0025;</td>
<td align="left">63.10&#x0025;</td>
<td align="left">34.22&#x0025;</td>
</tr>
<tr>
<td align="left">Number of correct answers per average rifle number</td>
<td align="left">0</td>
<td align="left">5.91</td>
<td align="left">5.27</td>
<td align="left">4.45</td>
<td align="left">4.27</td>
<td align="left">2.64</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In terms of the detection performance for each case, Case 2 performed the best, with an accuracy of 84.42&#x0025; compared to other cases. However, the case with the worst performance is Case 1, where the accuracy is 0&#x0025;. Patient 1 did not detect any numbers in the picture of the rift number. In Cases 1, 2, and 6, the position of the number could not be identified when there were only numbers in each font. However, in the case containing a labeled picture of the rifle number, the area of the rifle number was detected, but the exact number was not recognized in the detected area (average accuracy of 62.34&#x0025;). However, in Case 2, which is a combination of Cases 1 and 6, the area of the rifle number was well detected, and the recognition of the rifle number in the detected area was good. Through this, the number location can be detected through the image of the actual firearm, and each number in the detected area can be correctly identified by learning a numeric dataset such as MNIST.</p>
<p>Cases 2&#x2013;5 were cases in which a numeric dataset, such as MNIST, and a picture of a rifle number were used as the dataset. In Cases 2 and 3, the number of each font increased from 1000 to 2000, but the rifle number recognition performance degraded owing to overfitting. In addition, in Cases 4 and 5, even though the number of training epochs increased from 600 to 1000, the detection performance was slightly lower. In Case 3, the MNIST image is human handwriting, and it is difficult to see that it has a certain shape compared to the number of each font; therefore, the actual rifle number is not recognized relatively well.</p>
</sec>
<sec id="s4_4"><label>4.4</label><title>Experiment Analysis</title>
<p><bold>Dataset.</bold> In the proposed method, we applied a recognition enhancement method by fusion of different datasets to improve rifle number detection. Therefore, MNIST, a numerical data set related to rifle numbers, and numerical fonts provided by Microsoft were built and trained as a dataset. To increase the recognition of rifle numbers, a numeric dataset such as MNIST, Microsoft Word fonts, and data augmentation were used as a data set. In addition, as shown in <xref ref-type="table" rid="table-4">Table 4</xref>, the performance of the model was confirmed by combining various datasets for rifle number recognition in various ways from Case 1 to Case 6. It was possible to increase the detection rate of the rifle number area through the actual rifle number image, and it was possible to see that the recognition of the rifle number was high using MNIST and the font-specific numeric image dataset provided by Microsoft Word. In particular, because it has a characteristic pattern of rifle numbers and a number distribution similar to that provided by Microsoft Word, its performance is better than that of MNIST.</p>

<p><bold>The process of detection and recognition.</bold> In this method, two processes were employed: detection and recognition. Using the proposed model, it can be seen that the rifle number area is first detected in the rifle image, and then the area detected is recognized by the rifle number. Therefore, the detection area must have good performance, and the detected area is separately stored and recognized by each number in the area. In addition, although the rifle number is a total of seven digits, the proposed method detects the number regardless of the number of digits in the rifle number, thereby enabling general number recognition.</p>
<p><bold>Differences from license plate recognition.</bold> There is also a method that uses Py-Tesseract rather than the proposed method for numeric and letter detection, such as rifle number detection. However, it was confirmed through a separate experiment that Py-Tesseract is not suitable for detecting rifle numbers in images taken at various angles, lighting, and sizes and not in images captured by computer input text or scanned documents. Because Py-Tesseract has a limitation in meeting all the options required for image preprocessing (Gaussian blur, threshold, contour finding, contour range finding, etc.) to increase the detection rate of Py-Tesseract, its rifle number detection performance is poor. Therefore, instead of using Py-Tesseract, we used the proposed model to detect the rifle number area and recognize the rifle number.</p>
<p><bold>The proposed method additionally trained on license plate dataset</bold>. The license plate dataset was configured and learned, as shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. A license plate dataset consisting of 80,000 pieces of training data and 10,000 pieces of test data were used for training.</p>
<fig id="fig-7"><label>Figure 7</label><caption><title>Image data using license plate datasets to train the proposed method</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_42466-fig-7.tif"/></fig>
<p>However, the rifle number detection performance of the proposed method was only 79.12&#x0025;, resulting in performance degradation. This is because the license plate dataset differs from the rifle number dataset. First, the letters were different according to the angle difference, and the number of license plate data was included, so the type of letter class increased, the background color change according to the vehicle was reflected, and the number of license plate datasets was large; thus, the rifle number detection performance was rather poor.</p>
<p>When the amount of the actual rifle number dataset was comprehensively considered, the transfer learning method was judged to be effective. Transfer learning is a method of learning with a small amount of data by updating only the parameters of the fully connected layer, which is the last output layer, from a pre-trained model. The purpose of transfer learning is effective when the number of training data is small, and has the advantage of fast learning speed and high accuracy. In the case of license plate, there were differences as opposed to the actual rifle number data. In terms of the background, in the case of license plates, the background was black and white, green, or white. However, in the case of the rifle, the background was fixed to black and white. In terms of angle, the license plate dataset was taken from various CCTV images, and the angle was not constant, but in the case of the rifle dataset, there was some degree of consistency in angle because it was filmed through barcodes. In terms of the number of data, the number of license plate dataset was larger than the rifle number dataset, so the model was trained to be more suitable for license plate number detection rather than rifle number detection, which resulted in an imbalance in the rifle number detection classification performance.</p>
<p><bold>Comparative analysis of accuracy and image processing speed for R-CNN, Faster R-CNN, and proposed method.</bold> In <xref ref-type="table" rid="table-5">Table 5</xref>, we compared the proposed method and analyzed the regions using convolutional neural network (R-CNN) features and a Faster R-CNN model. R-CNN and Faster R-CNN are two-stage methods, whereas the proposed method is one-stage.</p>
<table-wrap id="table-5"><label>Table 5</label><caption><title>Comparative analysis of accuracy and image processing speed for R-CNN, Faster R-CNN, and the proposed method</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Description</th>
<th align="left">R-CNN</th>
<th align="left">Fatser R-CNN</th>
<th align="left">Proposed</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Accuracy</td>
<td align="left">65.3&#x0025;</td>
<td align="left">70.4&#x0025;</td>
<td align="left">76.8&#x0025;</td>
</tr>
<tr>
<td align="left">Number of images processed per second</td>
<td align="left">1</td>
<td align="left">2</td>
<td align="left">45</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The R-CNN is a method for proposing a region using algorithms such as edge boxes and classifying within the region using a CNN. Faster R-CNN is a method that improves speed using a region proposal neural network (RPN) instead of edge boxes. In contrast, the proposed method simultaneously proposed the area and performs classification. Although its accuracy performance is slightly lower than that of the Faster R-CNN, it has the advantage of fast image processing per second.</p>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusion</title>
<p>In this study, we propose a method for recognizing rifle numbers using deep neural networks in situations where actual rifle image data are insufficient. The proposed method was designed to improve the recognition rate of a specific dataset using data fusion and transfer learning methods. In the proposed method, real rifle images and existing digit images were fused as training data, and the final layer was transferred to the Yolov5 algorithm model. In this method, the rifle number area was first detected in the input image and then the rifle number is recognized in the detected area. In addition, a performance analysis was conducted using various data combination cases included in the numeric dataset. The experimental results showed that the proposed method correctly recognized the rifle number with 84.42&#x0025; accuracy, 73.54&#x0025; precision, 81.81&#x0025; recall, and 77.46&#x0025; F1-score. The proposed method can be used to recognize and manage rifle numbers in a military environment and can be used in conjunction with other methods.</p>
<p>In this study, rifle number detection has not been studied differently from previous studies, contributes to sparsity, and has the advantage of real-time rifle number detection. An interesting topic for future studies will be the development of a model that can classify rifle types in addition to detecting rifle numbers. In a situation where the number of rifles was not large, data were combined for each case to improve performance. The availability of military-related images is limited, but if there are sufficient images of rifles, performance can be improved. Although rifle number detection was improved using a unified model with the proposed methodology, the ensemble method could be the subject of future research. Additionally, the proposed method can be used to detect military equipment numbers, such as the serial numbers of communication equipment. And the applicability of the proposed method can be extended to address various security issues [<xref ref-type="bibr" rid="ref-33">33</xref>&#x2013;<xref ref-type="bibr" rid="ref-36">36</xref>].</p>
</sec>
</body>
<back>
<ack>
<p>We thank the editor and anonymous reviewers who provided very helpful comments that improved this paper.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This study was supported by the Future Strategy and Technology Research Institute (RN: 23-AI-04) of Korea Military Academy, the Hwarang-Dae Research Institute (RN: 2023B1015) of Korea Military Academy, and Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (2021R1I1A1A01040308).</p></sec>
<sec><title>Author Contributions</title>
<p>Study conception and design: H. Kwon and S. Lee; data collection: H. Kwon and S. Lee; analysis and interpretation of results: H. Kwon and S. Lee; draft manuscript preparation: H. Kwon. All authors reviewed the results and approved the final version of the manuscript.</p></sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The data and materials used to support the findings of this study are available from the corresponding author upon request after acceptance.</p></sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p></sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zhan</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Evolutionary deep learning: A survey</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>483</volume>, pp. <fpage>42</fpage>&#x2013;<lpage>58</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Mai</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Jeong</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Quispe</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Kim</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Online continual learning in image classification: An empirical survey</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>469</volume>, pp. <fpage>28</fpage>&#x2013;<lpage>51</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Cosa-Linan</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Santhanam</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Jannesari</surname></string-name>, <string-name><given-names>M. E.</given-names> <surname>Maros</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Transfer learning for medical image classification: A literature review</article-title>,&#x201D; <source>BMC Medical Imaging</source>, vol. <volume>22</volume>, no. <issue>1</issue>, no. <issue>22</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>13</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Tao</surname></string-name></person-group>, &#x201C;<article-title>ViTAEv2 Vision transformer advanced by exploring inductive bias for image recognition and beyond</article-title>,&#x201D; <source>International Journal of Computer Vision</source>, vol. <volume>131</volume>, pp. <fpage>1141</fpage>&#x2013;<lpage>1162</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Ning</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Bai</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>HCFNN: High-order coverage function neural network for image classification</article-title>,&#x201D; <source>Pattern Recognition</source>, vol. <volume>131</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>11</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>D. S.</given-names> <surname>Park</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Han</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Qin</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Gulati</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>BigSSL: Exploring the frontier of large-scale semi-supervised learning for automatic speech recognition</article-title>,&#x201D; <source>IEEE Journal of Selected Topics in Signal Processing</source>, vol. <volume>16</volume>, no. <issue>6</issue>, pp. <fpage>1519</fpage>&#x2013;<lpage>1532</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Bhardwaj</surname></string-name>, <string-name><given-names>M. T. B.</given-names> <surname>Othman</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Kukreja</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Belkhier</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Bajaj</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Automatic speech recognition (ASR) systems for children: A systematic literature review</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>12</volume>, no. <issue>9</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>26</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>&#x017B;elasko</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>L. M.</given-names> <surname>Vel&#x00E1;zquez</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Abavisani</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Bhati</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Discovering phonetic inventories with crosslingual automatic speech recognition</article-title>,&#x201D; <source>Computer Speech &#x0026; Language</source>, vol. <volume>74</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>22</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Petridis</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Pantic</surname></string-name></person-group>, &#x201C;<article-title>Visual speech recognition for multiple languages in the wild</article-title>,&#x201D; <source>Nature Machine Intelligence</source>, vol. <volume>4</volume>, pp. <fpage>930</fpage>&#x2013;<lpage>939</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Sukamto</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Ispandi</surname></string-name>, <string-name><given-names>A. S.</given-names> <surname>Putra</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Aisyah</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Toufiq</surname></string-name></person-group>, &#x201C;<article-title>Forensic digital analysis for CCTV video recording</article-title>,&#x201D; <source>International Journal of Science, Technology &#x0026; Management</source>, vol. <volume>3</volume>, no. <issue>1</issue>, pp. <fpage>284</fpage>&#x2013;<lpage>291</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ushasukhanya</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Karthikeyan</surname></string-name></person-group>, &#x201C;<article-title>Automatic human detection using reinforced Faster-RCNN for electricity conservation system</article-title>,&#x201D; <source>Intelligent Automation &#x0026; Soft Computing</source>, vol. <volume>32</volume>, no. <issue>2</issue>, pp. <fpage>1261</fpage>&#x2013;<lpage>1275</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Kim</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Cha</surname></string-name></person-group>, &#x201C;<article-title>Methodology of displaying surveillance area of CCTV camera on the map for immediate response in border defense military system</article-title>,&#x201D; <source>Advances in Intelligent Systems and Computing</source>, vol. <volume>1252</volume>, pp. <fpage>631</fpage>&#x2013;<lpage>637</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Smith</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Doma</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Heilbronn</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Leicht</surname></string-name></person-group>, &#x201C;<article-title>Effect of exercise training programs on physical fitness domains in military personnel: A systematic review and meta-analysis</article-title>,&#x201D; <source>Military Medicine</source>, vol. <volume>187</volume>, no. <issue>9</issue>, pp. <fpage>1065</fpage>&#x2013;<lpage>1073</lpage>, <year>2022</year>; <pub-id pub-id-type="pmid">35247052</pub-id></mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>The MNIST database of handwritten digit images for machine learning research</article-title>,&#x201D; <source>IEEE Signal Processing Magazine</source>, vol. <volume>29</volume>, no. <issue>6</issue>, pp. <fpage>141</fpage>&#x2013;<lpage>142</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Vardi</surname></string-name>, <string-name><given-names>O.</given-names> <surname>Shamir</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Srebro</surname></string-name></person-group>, &#x201C;<article-title>On margin maximization in linear and relu networks</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>35</volume>, pp. <fpage>37024</fpage>&#x2013;<lpage>37036</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. S.</given-names> <surname>Atamanalp</surname></string-name></person-group>, &#x201C;<article-title>Endoscopic decompression of sigmoid volvulus: Review of 748 patients</article-title>,&#x201D; <source>Journal of Laparoendoscopic &#x0026; Advanced Surgical Techniques</source>, vol. <volume>32</volume>, no. <issue>7</issue>, pp. <fpage>763</fpage>&#x2013;<lpage>767</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>S.</given-names> <surname>L&#x00FC;</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Unsupervised domain adaptation via softmax-based prototype construction and adaptation</article-title>,&#x201D; <source>Information Sciences</source>, vol. <volume>609</volume>, pp. <fpage>257</fpage>&#x2013;<lpage>275</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. Y.</given-names> <surname>Arafat</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Ashraf</surname></string-name>, <string-name><given-names>M. J.</given-names> <surname>Iqbal</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Khan</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Urdu signboard detection and recognition using deep learning</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, vol. <volume>81</volume>, pp. <fpage>11965</fpage>&#x2013;<lpage>11987</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Dai</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Gu</surname></string-name></person-group>, &#x201C;<article-title>OSO-YOLOv5: Automatic extraction method of store signboards in street view images based on multi-dimensional analysis</article-title>,&#x201D; <source>ISPRS International Journal of Geo-Information</source>, vol. <volume>11</volume>, no. <issue>9</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>20</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Zeng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ye</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Qu</surname></string-name></person-group>, &#x201C;<article-title>Saliency-aware color harmony models for outdoor signboard</article-title>,&#x201D; <source>Computers &#x0026; Graphics</source>, vol. <volume>105</volume>, pp. <fpage>25</fpage>&#x2013;<lpage>35</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Dong</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Gao</surname></string-name></person-group>, &#x201C;<article-title>Improved YOLOv5 network for real-time multi-scale traffic sign detection</article-title>,&#x201D; <source>Neural Computing and Applications</source>, vol. <volume>35</volume>, no. <issue>10</issue>, pp. <fpage>7853</fpage>&#x2013;<lpage>7865</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>W. Q.</given-names> <surname>Yan</surname></string-name></person-group>, &#x201C;<article-title>Traffic sign recognition based on deep learning</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, vol. <volume>81</volume>, no. <issue>13</issue>, pp. <fpage>17779</fpage>&#x2013;<lpage>17791</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Dobrota</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Stevanovic</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Mitrovic</surname></string-name></person-group>, &#x201C;<article-title>Modifying signal retiming procedures and policies by utilizing high-fidelity modeling with medium-resolution traffic data</article-title>,&#x201D; <source>Transportation Research Record</source>, vol. <volume>2676</volume>, no. <issue>3</issue>, pp. <fpage>660</fpage>&#x2013;<lpage>684</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Shi</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>License plate recognition system based on improved YOLOv5 and GRU</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>11</volume>, pp. <fpage>10429</fpage>&#x2013;<lpage>10439</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Padmasiri</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Shashirangana</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Meedeniya</surname></string-name>, <string-name><given-names>O.</given-names> <surname>Rana</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Perera</surname></string-name></person-group>, &#x201C;<article-title>Automated license plate recognition for resource-constrained environments</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>22</volume>, no. <issue>4</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>29</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Kaur</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ahmed</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Alhumam</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Singla</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Automatic license plate recognition system for vehicles using a CNN</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>71</volume>, no. <issue>1</issue>, pp. <fpage>35</fpage>&#x2013;<lpage>50</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. E.</given-names> <surname>Nour</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Loey</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Mirjalili</surname></string-name></person-group>, &#x201C;<article-title>A comprehensive survey of recent trends in deep learning for digital images augmentation</article-title>,&#x201D; <source>Artificial Intelligence Review</source>, vol. <volume>55</volume>, pp. <fpage>2351</fpage>&#x2013;<lpage>2377</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Adeli</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Generative adversarial U-Net for domain-free few-shot medical diagnosis</article-title>,&#x201D; <source>Pattern Recognition Letters</source>, vol. <volume>157</volume>, pp. <fpage>112</fpage>&#x2013;<lpage>118</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Su</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Gao</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Deep learning for sar ship detection: Past, present and future</article-title>,&#x201D; <source>Remote Sensing</source>, vol. <volume>14</volume>, no. <issue>11</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>44</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Mustaqim</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Fatichah</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Suciati</surname></string-name></person-group>, &#x201C;<article-title>Combination of cross stage partial network and ghostNet with spatial pyramid pooling on YOLOv4 for detection of acute lymphoblastic leukemia subtypes in multi-cell blood microscopic image</article-title>,&#x201D; <source>Scientific Journal of Informatics</source>, vol. <volume>9</volume>, no. <issue>2</issue>, pp. <fpage>139</fpage>&#x2013;<lpage>148</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Rao</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zuo</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Qiao</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>A lightweight object detection method in aerial images based on dense feature fusion path aggregation network</article-title>,&#x201D; <source>ISPRS International Journal of Geo-Information</source>, vol. <volume>11</volume>, no. <issue>3</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>24</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Xiong</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Shang</surname></string-name></person-group>, &#x201C;<article-title>Adjusted stochastic gradient descent for latent factor analysis</article-title>,&#x201D; <source>Information Sciences</source>, vol. <volume>588</volume>, pp. <fpage>196</fpage>&#x2013;<lpage>213</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Choi</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Classifications of restricted web streaming contents based on convolutional neural network and long short-term memory (CNN-LSTM)</article-title>,&#x201D; <source>Journal of Internet Services and Information Security</source>, vol. <volume>12</volume>, no. <issue>3</issue>, pp. <fpage>49</fpage>&#x2013;<lpage>62</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Woo</surname></string-name></person-group>, &#x201C;<article-title>Practical data acquisition and analysis method for automobile event data recorders forensics</article-title>,&#x201D; <source>Journal of Internet Services and Information Security</source>, vol. <volume>12</volume>, no. <issue>3</issue>, pp. <fpage>76</fpage>&#x2013;<lpage>86</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Cabra</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Parra</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Mendez</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Trujillo</surname></string-name></person-group>, &#x201C;<article-title>Mechanisms of authentication toward habitude pattern lock and ECG: An overview</article-title>,&#x201D; <source>Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications</source>, vol. <volume>13</volume>, no. <issue>2</issue>, pp. <fpage>23</fpage>&#x2013;<lpage>67</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Cassavia</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Caviglione</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Guarascio</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Manco</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Zuppelli</surname></string-name></person-group>, &#x201C;<article-title>Detection of steganographic threats targeting digital images in heterogeneous ecosystems through machine learning</article-title>,&#x201D; <source>Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications</source>, vol. <volume>13</volume>, no. <issue>3</issue>, pp. <fpage>50</fpage>&#x2013;<lpage>67</lpage>, <year>2022</year>.</mixed-citation></ref>
</ref-list>
</back></article>