<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">48998</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.048998</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Unmanned Aerial Vehicles General Aerial Person-Vehicle Recognition Based on Improved YOLOv8s Algorithm</article-title>
<alt-title alt-title-type="left-running-head">Unmanned Aerial Vehicles General Aerial Person-Vehicle Recognition Based on Improved YOLOv8s Algorithm</alt-title>
<alt-title alt-title-type="right-running-head">Unmanned Aerial Vehicles General Aerial Person-Vehicle Recognition Based on Improved YOLOv8s Algorithm</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Liu</surname><given-names>Zhijian</given-names></name><email>liuzj984296576@163.com</email></contrib>
<aff><institution>School of Electrical Engineering and Electronic Information, Xihua University</institution>, <addr-line>Chengdu, 610036</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Zhijian Liu. Email: <email>liuzj984296576@163.com</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>26</day>
<month>3</month>
<year>2024</year></pub-date>
<volume>78</volume>
<issue>3</issue>
<fpage>3787</fpage>
<lpage>3803</lpage>
<history>
<date date-type="received">
<day>24</day>
<month>12</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>22</day>
<month>1</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Liu</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Liu</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_48998.pdf"></self-uri>
<abstract>
<p>Considering the variations in imaging sizes of the unmanned aerial vehicles (UAV) at different aerial photography heights, as well as the influence of factors such as light and weather, which can result in missed detection and false detection of the model, this paper presents a comprehensive detection model based on the improved lightweight You Only Look Once version 8s (YOLOv8s) algorithm used in natural light and infrared scenes (L_YOLO). The algorithm proposes a special feature pyramid network (SFPN) structure and substitutes most of the neck feature extraction module with the Special deformable convolution feature extraction module (SDCN). Moreover, the model undergoes pruning to eliminate redundant channels. Finally, the non-maximum suppression algorithm of intersection-union ratio based on minimum point distance (MPDIOU_NMS) algorithm has been integrated to eliminate redundant detection boxes, and a comprehensive validation has been conducted using the infrared aerial dataset and the Visdrone2019 dataset. The comprehensive experimental results demonstrate that when the number of parameters and floating-point operations is reduced by 30% and 20%, respectively, there is a 1.2% increase in mean average precision at a threshold of 0.5 (mAP(0.5)) and a 4.8% increase in mAP(0.5:0.95) on the infrared dataset. Finally, the mAP on the Visdrone2019 dataset has experienced an average increase of 12.4%. The accuracy and recall rates have seen respective increases of 9.2% and 3.6%.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>YOLOv8s</kwd>
<kwd>SPFN</kwd>
<kwd>SDCN</kwd>
<kwd>pruning</kwd>
<kwd>MPDIOU_NMS</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>With the rapid advancement of drone technology and computer vision, drones have found extensive applications in law enforcement, traffic control, surveillance, and reconnaissance. In particular, unmanned aerial vehicles (UAVs) equipped with infrared imaging cameras have the potential to mitigate the effects of weather, lighting, and other environmental factors on UAV imaging [<xref ref-type="bibr" rid="ref-1">1</xref>]. Nevertheless, variations in UAV altitude result in differences in imaging dimensions. The absence of contextual semantic information in the infrared image presents challenges in differentiating the foreground and background of the target. Additionally, the absence of semantic information in certain detection targets hinders accurate target identification following the deployment of the UAV [<xref ref-type="bibr" rid="ref-2">2</xref>]. Hence, the primary objective of this paper is to streamline the model&#x2019;s parameter quantity and complexity, with the aim of enhancing the model&#x2019;s detection accuracy.</p>
<p>The conventional detection algorithm primarily depends on manual screening of image features for training, which makes the process cumbersome and results in poor model robustness [<xref ref-type="bibr" rid="ref-3">3</xref>]. Nevertheless, deep learning algorithms have the potential to compensate for this drawback. Currently, deep learning-based detection algorithms are widely utilized, for instance, in the precise identification of cattle in animal husbandry [<xref ref-type="bibr" rid="ref-4">4</xref>] and the detection of soybean pests in intricate environments [<xref ref-type="bibr" rid="ref-5">5</xref>]. Numerous classical target detection algorithms rooted in deep learning exist, including multi-stage algorithms (e.g., Detection Transformer with YOLOv2 [<xref ref-type="bibr" rid="ref-6">6</xref>], Mask convolutional neural networks [<xref ref-type="bibr" rid="ref-7">7</xref>]) and single-stage algorithms (e.g., You Only Look Once [<xref ref-type="bibr" rid="ref-8">8</xref>], Single shot multibox detector (SSD) [<xref ref-type="bibr" rid="ref-9">9</xref>]). In this paper, the YOLOv8 algorithm has been chosen as the foundational algorithm. The algorithm considers both accuracy and detection speed, and its overall performance surpasses that of other algorithms [<xref ref-type="bibr" rid="ref-10">10</xref>]. Extracting semantic features of small targets in dense target scenes while maintaining a streamlined model has consistently posed a research challenge. Various solutions have previously been proposed for different application scenarios [<xref ref-type="bibr" rid="ref-11">11</xref>]. For instance, Zhao et al. [<xref ref-type="bibr" rid="ref-12">12</xref>] introduced an enhanced YOLOv7 model designed to tackle the challenges related to ship detection and recognition tasks, including irregular ship shapes and size variations. Wang et al. [<xref ref-type="bibr" rid="ref-13">13</xref>] proposed a traffic sign detection algorithm utilizing residual network, to minimize missed and false detections of traffic signs in complex environment conditions. Chen et al. [<xref ref-type="bibr" rid="ref-14">14</xref>] introduced a YOLO algorithm-based UAV for the purpose of detecting the poles and quantifying the distribution network, to improve the efficiency of post-disaster distribution network repairs.</p>
<p>The paper is structured as follows: <xref ref-type="sec" rid="s2">Section 2</xref> introduces the current research status of UAV aerial photography, outlines the existing problems, and presents the proposed methods. <xref ref-type="sec" rid="s3">Section 3</xref> primarily presents a detailed introduction to our proposed Synthetic Fusion Pyramid Network (SFPN) structure, Structural Deep Clustering Network (SDCN) module, pruning, and MPDIOU_NMS algorithm. It also provides a brief overview of the evaluation index and experimental parameter setting. In <xref ref-type="sec" rid="s4">Section 4</xref>, a brief analysis of the two datasets is presented, followed by a detailed examination of the ablation experiments, a comparison of different algorithms, and a comparison of different datasets. The fifth section provides a summary of the paper&#x2019;s findings.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>The conventional detection algorithm relies on the manual extraction of target semantic feature information, followed by transmitting the extracted semantic information to the algorithm network to produce the detection results [<xref ref-type="bibr" rid="ref-15">15</xref>]. However, these detection algorithms are unsuited for complex scenes such as small targets and dense target types. On the one hand, the dataset contains numerous detection targets, while on the other hand, manual feature screening may result in insufficient feature extraction [<xref ref-type="bibr" rid="ref-16">16</xref>]. However, the target detection algorithm based on deep learning can automatically filter and extract image features, requiring only the processed images to be directly input into the network. Therefore, this feature enables it to process large-scale data sets and accommodate more intricate detection tasks.</p>
<p>To address the inadequate network feature extraction capability issue, Cao et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] introduced an algorithm for detecting small targets using an improved YOLOv5s model for UAVs, resulting in a 9.2% increase in mAP(0.5). It is important to note that the analysis needs to fully account for the impact of factors such as inadequate lighting. Hui et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] introduced a small target detection algorithm for UAV remote sensing images, which relies on an enhanced Shifted Window Transformer and a class-weighted classification decoupling head. However, the algorithm&#x2019;s parameter size is excessive for deployment on edge devices. Ali et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] incorporated the Kalman filter into the YOLO algorithm to enable the detection and tracking of vehicles by drones. Lu [<xref ref-type="bibr" rid="ref-20">20</xref>] introduced a hybrid CNN-Transformer model for detecting targets in UAV images, utilizing the Cross-Shaped Window Transformer. However, the model&#x2019;s parameter count approaches 70 M. To address the issue of model lightweight, Qian et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] introduced a lightweight YOLO feature fusion network aiming at multi-scale defect detection. The network demonstrated promising results across three different datasets. Zhu et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] introduced a lightweight and efficient network target detection network for remote sensing, capable of achieving an inference speed of 487 frames per second. Zou et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] proposed a method for lightweight target detection in a coal seam tracking system, utilizing knowledge distillation and model pruning. The proposed approach achieves a Central Processing Unit (CPU) processing speed of 45 frames per second.</p>
<p>In summary, this paper primarily addresses the lightweight nature of the model and its limited feature extraction capability. Consequently, this paper has implemented four enhancements to the YOLOv8s algorithm. The paper first proposed an SFPN structure to facilitate the comprehensive exchange of semantic information across each layer. Secondly, the Structural Deep Clustering Network (SDCN) module is proposed to improve the model&#x2019;s feature extraction capability. Subsequently, the model undergoes further pruning to remove redundant channels. Finally, the MPDIOU_NMS algorithm has been incorporated to mitigate the presence of redundant detection boxes.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Materials and Methods</title>
<sec id="s3_1">
<label>3.1</label>
<title>YOLOv8 Network Structure</title>
<p>The YOLOv8 algorithm framework is depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The network comprises four components: data preprocessing, backbone network, neck network, and detection head. First, the detection image undergoes preprocessing through mosaic data, followed by transmission of the processed image to the YOLOv8 backbone network. Subsequently, the fused features are sent to the neck network for further feature extraction. Ultimately, the detection head identifies and distinguishes the target based on the features extracted from the neck network [<xref ref-type="bibr" rid="ref-24">24</xref>].</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>YOLOv8 network structure</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-1.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>L_YOLO Network Architecture</title>
<p>The workflow comparison between the YOLOv8 algorithm and the L_YOLO algorithm is depicted in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. <xref ref-type="fig" rid="fig-2">Figs. 2a</xref> and <xref ref-type="fig" rid="fig-2">2b</xref> depict the flow charts of the YOLOv8 algorithm and the L_YOLO algorithm, respectively. Compared with the YOLOv8 algorithm, the L_YOLO algorithm primarily enhances three aspects, as depicted in <xref ref-type="fig" rid="fig-2">Fig. 2b</xref>. This paper first improves the network structure. The light blue rectangle represents the bloated network after pruning, and the light yellow indicates that the MPDIOU_NMS algorithm is used to assist in verification or testing during the model inference stage. Compared to the YOLOv8s algorithm, the L_YOLO model demonstrates better detection performance with approximately 30% fewer parameters.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Comparison diagram of YOLOv8 algorithm and L_YOLO algorithm workflow</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-2.tif"/>
</fig>
<p>The specific improvement is depicted in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, the YOLOv8s network structure has undergone improvements in four key aspects: (a) The original YOLOv8s network&#x2019;s feature pyramid network does not fully facilitate the exchange of semantic information between adjacent layers. As a solution, the SFPN structure is proposed to comprehensively exchange the semantic features of different layers. (b) An SDCN feature extraction module is proposed. (c) The trained model undergoes channel pruning to reduce redundant channels and further decrease its overall weight. (d) A MPDIOU_NMS algorithm is proposed.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Framework of L_YOLO algorithm</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-3.tif"/>
</fig>
<sec id="s3_2_1">
<label>3.2.1</label>
<title>SFPN Structure</title>
<p>Considering that the texture characteristics of small targets may diminish or vanish as the network layers increase, this study proposes a SFPN framework derived from the original FPN (<xref ref-type="fig" rid="fig-4">Fig. 4a</xref>) framework [<xref ref-type="bibr" rid="ref-25">25</xref>]. The objective is to facilitate the complete exchange of semantic features across various detection layers and address the issue of semantic feature degradation resulting from the increased depth of layers. The primary improvement is depicted in <xref ref-type="fig" rid="fig-4">Fig. 4b</xref>, where the characteristics of each layer of the trunk network are incorporated into the 12th feature extraction layer, aiming to comprehensively integrate semantic information. Furthermore, the characteristics of layer 4 and layer 6 have been consolidated into layer 21 and layer 24, respectively, thereby enhancing the semantic information within the network.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>SFPN</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-4.tif"/>
</fig>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>SDCN Module</title>
<p>The original YOLOv8s has limited feature extraction capability for dense and small targets. This paper introduces the SDCN module to replace the majority of feature extraction modules in the neck network. The specific replacement is illustrated in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. The module exhibits further improved between the convolution feature extraction module version2 (DCNV2) module [<xref ref-type="bibr" rid="ref-26">26</xref>] and the squeeze Aggregated Excitation layer (SaElayer) [<xref ref-type="bibr" rid="ref-27">27</xref>]. The DCNV2 module is designed to effectively detect dense targets. The module primarily enhances the network&#x2019;s receptive field by stacking multiple deformable convolution modules and incorporating additional skip connections to achieve a more comprehensive structure of gradient flow.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>SDCN module</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-5.tif"/>
</fig>
<p>Excessive incorporation of the DCNV2 module may result in prolonged training time, while insufficient incorporation may lead to inadequate feature extraction capability of the model. Consequently, this study introduces a SaElayer module to the output of DCNV2, and the specific structure of the SDCN module is illustrated in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. The SaElayer module combines the squeeze excitation network module and the dense layer. It also introduces multi-branch fully connected layers with different branch sizes to enhance the network&#x2019;s ability to capture global knowledge. Consequently, incorporating the SaElayer module can enhance the network&#x2019;s focus on valuable semantic information and optimize network bandwidth to reduce model training time.</p>
</sec>
<sec id="s3_2_3">
<label>3.2.3</label>
<title>Channel Pruning</title>
<p>Owing to the constraints of mobile device capabilities, models with excessive parameters cannot be accommodated, necessitating the pruning of the trained YOLOv8s model [<xref ref-type="bibr" rid="ref-28">28</xref>]. Given the low mAP of the basic models in both datasets, the approach taken in this study is to reduce model parameters and complexity while preserving accuracy. In this paper, a Slim pruning method is employed [<xref ref-type="bibr" rid="ref-29">29</xref>], and its operational principle is depicted in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. This method involves three steps to simplify the initial network. Firstly, the network must undergo sparse training, followed by model pruning, and ultimately, the pruned model is restored. If the model is multi-channel, the algorithm also generates supplementary branches.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Pruning flow chart</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-6.tif"/>
</fig>
<p>In this paper, the batch normalization layer (BN) scaling factor in the convolution is initially utilized as the channel scaling factor &#x03B3; of the pruning. It is subsequently multiplied by the output of the channel. Secondly, the network weights and scaling factors are jointly trained, and sparse regularization is applied to the scaling factors. Following the application of channel-level sparse-induced regularization during training, a model is derived wherein numerous scaling factors approach zero. Subsequently, it is possible to eliminate channels with scaling factors close to zero by removing all inbound and outbound connections, as well as the associated weights. Ultimately, the pruned network undergoes fine-tuning [<xref ref-type="bibr" rid="ref-29">29</xref>]. The details are shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Pruning comparison diagram</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-7.tif"/>
</fig>
<p>As depicted in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, the diagram compares the pruning channels discussed in this paper. The x-axis denotes the name of each layer in the network, while the y-axis represents the parameter quantity. The yellow bars indicate the parameter quantity of the primary model channel, whereas the red bars represent the parameter quantity after pruning. To facilitate the successful completion of pruning, it is necessary to bypass network layers that are not amenable to pruning. The DCNv2 layer in the SDCN feature extraction module is skipped in this paper, as well as specific convolution layers and Distribution Focal Loss layers in the detection head layer [<xref ref-type="bibr" rid="ref-30">30</xref>]. Given that the primary objective of this paper is not to achieve absolute lightweight, it is imperative also to consider the trade-off between precision and model size. The pruned model and the basic model depicted in the figure are not expected to exhibit significant differences in appearance.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Pruning channel comparison diagram</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-8.tif"/>
</fig>
</sec>
<sec id="s3_2_4">
<label>3.2.4</label>
<title>MPDIOU_NMS Algorithm</title>
<p>Since both datasets fall within the dense target detection category, redundant detection boxes can impact the model&#x2019;s final target assessment. Given that the non maximum suppression algorithm (NMS) of the YOLOv8s algorithm may result in information loss, and the traditional Soft-NMS algorithm [<xref ref-type="bibr" rid="ref-31">31</xref>] could lead to the suppression of detection boxes due to discriminant errors. Consequently, this paper incorporates the MPDIOU_NMS algorithm in the model verification and testing phase to eliminate redundant detection boxes.</p>
<p>The MPDIOU loss function [<xref ref-type="bibr" rid="ref-32">32</xref>] is depicted in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>. Boxes A and B represent two detection areas, respectively. The coordinates <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denote the position of the upper left corner and the lower right corner of the detection box for A, respectively. The coordinates <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denote the position of the upper left corner and the lower right corner of the B detection box, respectively.</p>
<p><disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>M</mml:mi><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:mi>U</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x2229;</mml:mo><mml:mi>B</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mo>&#x222A;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:msup><mml:mi>w</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi>h</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msubsup><mml:mi>d</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>B</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mi>A</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>This paper introduces a novel MPDIOU-NMS algorithm, which is founded on the MPDIOU loss function. The formula is presented in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>. In contrast to the conventional soft-nms algorithm, this algorithm demonstrates enhanced capability in effectively suppressing redundant detection boxes, thereby improving the model&#x2019;s classification and recognition performance.</p>
<p><disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:mi>M</mml:mi><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:mi>U</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:mi>U</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mi>M</mml:mi><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:mi>U</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mrow><mml:mi>M</mml:mi><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>I</mml:mi><mml:mi>O</mml:mi><mml:mi>U</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mfrac></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2209;</mml:mo><mml:mi>D</mml:mi></mml:math></disp-formula></p>
</sec>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Experimental Parameters Setting</title>
<p>In the training of the YOLOv8 algorithm, this study employs the stochastic gradient descent (SGD) algorithm to optimize the loss function. In this study, the batch size was configured as 32, and the number of threads was set to 16. To achieve the optimal model, 220 training iterations are required.</p>
<p>Furthermore, the pruning parameter reg is established at 0.0005, and the sparse training count is specified as 500. In light of the model&#x2019;s accuracy post-pruning, this study specifies a &#x2018;speed _ up&#x2019; value of 1.5, does not activate the global pruning branch, and conducts 300 rounds of pruning recovery training. The computer configuration utilized in the experiment is presented in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Computer configuration</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Platform</th>
<th>Configuration information</th>
</tr>
</thead>
<tbody>
<tr>
<td>System</td>
<td>Ubuntu 20.04</td>
</tr>
<tr>
<td>GPU</td>
<td>NVIDIA GeForce RTX A5000(24G)</td>
</tr>
<tr>
<td>CPU</td>
<td>15 vCPU AMD EPYC 7543 32-Core Processor</td>
</tr>
<tr>
<td>Language</td>
<td>Python 3.8.0</td>
</tr>
<tr>
<td>GPU calculate platform</td>
<td>CUDA 11.8</td>
</tr>
<tr>
<td>Deep learning framework</td>
<td>Pytorch 2.0.0</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Evaluation Indicators</title>
<p>Precision, Recall, and mAP serve as critical metrics for evaluating the accuracy of a network.</p>
<p><disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mi>d</mml:mi><mml:mi>r</mml:mi></mml:math></disp-formula></p>
<p><disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>m</mml:mi><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mi>A</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>N</mml:mi></mml:mfrac></mml:math></disp-formula>where TP is True Positive, FP is False Positive, FN is False Negative, p(r) is the function of the P-R curve, and K is the number of categories. This paper also uses the number of parameters, and floating point operations (FLOPS), where FLOPS denotes the amount of computation required by the model.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Results Analysis</title>
<sec id="s4_1">
<label>4.1</label>
<title>Dataset Analysis</title>
<p>This paper uses a UAV infrared dataset and a Visdrone2019 auxiliary dataset. The analysis of the detection box size in the UAV infrared dataset is depicted in <xref ref-type="fig" rid="fig-9">Fig. 9a</xref>. Evidently, the dataset pertains to the domain of multi-scale target detection and small target detection. The dataset comprising 6996 pictures was gathered by Shandong Yantai Arrow Photoelectric Technology Co., Ltd. (China). There are six categories: pedestrians, cars, buses, bicycles, trucks, and other targets. This paper&#x2019;s dataset is partitioned in a 8:1:1 ratio, specifically 5724:636:636.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Dataset analysis</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-9.tif"/>
</fig>
<p>The analysis of the detection box size for the Visdrone2019 dataset is depicted in <xref ref-type="fig" rid="fig-9">Fig. 9b</xref>. The AISKYEYE team at Tianjin University collected the data set. The dataset comprises ten categories: pedestrians, cars, bicycles, and tricycles. Evidently, the Visdrone2019 dataset exhibits greater complexity compared to the infrared dataset.</p>

</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Ablation Experiment</title>
<p>Owing to the unique characteristics of certain modules, this study employs the superposition method to conduct the ablation experiment. This paper focuses on the improvement of four modules, namely m1 (SPAN module), m2 (SDCN feature extraction module), m3 (pruning the trained model), and m4 (adding MPDIOU_NMS algorithm). <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the combination of various modules <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, with S0 indicating the absence of any additional improvement module.</p>
<p>As depicted in <xref ref-type="table" rid="table-2">Table 2</xref>, S1 demonstrates an increase in the SFPN structure and a 0.7% increase in mAP compared to S0. However, its GFLOPS has increased by 7.9. S2 denotes that the SDCN module is incorporated on the foundation of S1 to augment the model&#x2019;s feature extraction capability, albeit at the expense of increased model complexity that can be disregarded. Compared to the four indicators of S1, there is an average increase of 1.15%. S3 denotes the pruning of redundant channels in the model. Approximately 30% of the parameters are pruned to uphold optimal model performance. The objective is to reduce the excessive size of the model and to vary the degree of increase for different indicators, making them incomparable to other modules. Upon adding the MPDIOU_NMS algorithm to verify and test the model detection performance, S4 demonstrates a decrease in mAP(0.5) compared to S3, while other aspects show varying degrees of improvement.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Ablation experiment</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>m1</th>
<th>m2</th>
<th>m3</th>
<th>m4</th>
<th>mAP(0.5)</th>
<th>mAP(0.5:0.95)</th>
<th>P</th>
<th>R</th>
<th>Parameters</th>
<th>GFLOPS</th>
</tr>
</thead>
<tbody>
<tr>
<td>S0</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td>0.902</td>
<td>0.595</td>
<td>0.854</td>
<td>0.870</td>
<td>11137096</td>
<td>28.7</td>
</tr>
<tr>
<td>S1</td>
<td>1</td>
<td></td>
<td></td>
<td></td>
<td>0.910</td>
<td>0.602</td>
<td>0.858</td>
<td>0.874</td>
<td>10576056</td>
<td>36.6</td>
</tr>
<tr>
<td>S2</td>
<td>1</td>
<td>1</td>
<td></td>
<td></td>
<td>0.916</td>
<td>0.616</td>
<td>0.870</td>
<td>0.884</td>
<td>11037676</td>
<td>35.5</td>
</tr>
<tr>
<td>S3</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td></td>
<td>0.917</td>
<td>0.624</td>
<td>0.874</td>
<td>0.863</td>
<td>7726636</td>
<td>23.3</td>
</tr>
<tr>
<td>S4</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0.914</td>
<td>0.643</td>
<td>0.872</td>
<td>0.871</td>
<td>7726636</td>
<td>23.3</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Comparison between Different Algorithms</title>
<p>This paper compares four size models (n, s, m, l) in YOLOv5 [<xref ref-type="bibr" rid="ref-33">33</xref>], YOLOv6 [<xref ref-type="bibr" rid="ref-34">34</xref>], and YOLOv8 algorithms, as well as YOLOv7 [<xref ref-type="bibr" rid="ref-35">35</xref>] and YOLOv7x. As depicted in <xref ref-type="table" rid="table-3">Table 3</xref>, YOLOv5l, YOLOv6l, YOLOv7x, and YOLOv8l exhibit superior performance, and their visualization comparison with the L_YOLO algorithm is presented in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>. To emphasize the superiority of this algorithm, the L_YOLO curve is depicted in red, while the other curves are represented in grey. Evidently, given that the average number of parameters is nine times lower, the algorithm exhibits a slightly higher performance than other algorithms on mAP(0.5). It significantly outperforms any other algorithm on mAP(0.5:0.95).</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Comparison between different algorithms</title>
</caption>
<table frame="hsides" >
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>Precision</th>
<th>Recall</th>
<th>mAP(0.5)</th>
<th>mAP(0.5:0.95)</th>
<th>Parameters</th>
<th>GFLOPS</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv8n</td>
<td>0.853</td>
<td>0.837</td>
<td>0.884</td>
<td>0.563</td>
<td>3006818</td>
<td>8.1</td>
</tr>
<tr>
<td>YOLOv8s</td>
<td>0.854</td>
<td>0.87</td>
<td>0.902</td>
<td><bold>0.595</bold></td>
<td>11137906</td>
<td>28.7</td>
</tr>
<tr>
<td>YOLOv8m</td>
<td>0.859</td>
<td>0.878</td>
<td>0.912</td>
<td>0.614</td>
<td>25843234</td>
<td>78.7</td>
</tr>
<tr>
<td>YOLOv8l</td>
<td>0.868</td>
<td>0.889</td>
<td>0.920</td>
<td>0.619</td>
<td>43634450</td>
<td>165.4</td>
</tr>
<tr>
<td>YOLOv7</td>
<td>0.88</td>
<td>0.869</td>
<td>0.914</td>
<td>0.590</td>
<td>37223526</td>
<td>105.2</td>
</tr>
<tr>
<td>YOLOv7x</td>
<td>0.861</td>
<td>0.889</td>
<td>0.922</td>
<td>0.595</td>
<td>70848782</td>
<td>189</td>
</tr>
<tr>
<td>YOLOv6n</td>
<td>0.832</td>
<td>0.813</td>
<td>0.863</td>
<td>0.550</td>
<td>4238722</td>
<td>11.9</td>
</tr>
<tr>
<td>YOLOv6s</td>
<td>0.86</td>
<td>0.851</td>
<td>0.895</td>
<td>0.591</td>
<td>16298594</td>
<td>44</td>
</tr>
<tr>
<td>YOLOv6m</td>
<td>0.861</td>
<td>0.859</td>
<td>0.902</td>
<td>0.593</td>
<td>51998946</td>
<td>161.2</td>
</tr>
<tr>
<td>YOLOv6l</td>
<td>0.855</td>
<td>0.873</td>
<td>0.901</td>
<td>0.600</td>
<td>110897810</td>
<td>391.9</td>
</tr>
<tr>
<td>YOLOv5n</td>
<td>0.857</td>
<td>0.828</td>
<td>0.884</td>
<td>0.563</td>
<td>2509618</td>
<td>7.1</td>
</tr>
<tr>
<td>YOLOv5s</td>
<td>0.872</td>
<td>0.856</td>
<td>0.907</td>
<td>0.596</td>
<td>9113858</td>
<td>23.8</td>
</tr>
<tr>
<td>YOLOv5m</td>
<td>0.873</td>
<td>0.865</td>
<td>0.914</td>
<td>0.610</td>
<td>25068590</td>
<td>64.4</td>
</tr>
<tr>
<td>YOLOv5l</td>
<td>0.882</td>
<td>0.869</td>
<td>0.916</td>
<td>0.617</td>
<td>53136034</td>
<td>134.7</td>
</tr>
<tr>
<td>L_YOLO</td>
<td>0.872</td>
<td>0.871</td>
<td>0.914</td>
<td><bold>0.643</bold></td>
<td>7726636</td>
<td>23.3</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Comparison curves of different algorithms</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-10.tif"/>
</fig>
<p>To provide a more comprehensive representation of the model&#x2019;s performance, this study improves the model to three dimensions for comparative analysis, as illustrated in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>. The abscissa and ordinate denote the mAP(0.5:0.95) and the mAP(0.5), respectively. The area of the circle corresponds to the floating point operation, with a larger area indicating a more complex model. The study concludes that the L_YOLO model is positioned closer to the upper right corner than others, indicating its superior comprehensive performance.</p>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Performance comparison</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-11.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Comparison between Different Datasets</title>
<p>In this paper, two data sets are selected, with <xref ref-type="fig" rid="fig-12">Fig. 12a</xref> representing the infrared dataset and <xref ref-type="fig" rid="fig-12">Fig. 12b</xref> representing the Visdrone2019 dataset. The red curve in the figure illustrates the comparison of two datasets based on mAP(0.5), while the blue curve depicts the comparison based on mAP(0.5:0.95). The solid line represents the L_YOLO algorithm, while the dotted line corresponds to the YOLOv8s algorithm. Evidently, the L_YOLO algorithm outperforms the YOLOv8s algorithm in both datasets, particularly in the Visdrone2019 dataset. The specific data is presented in <xref ref-type="table" rid="table-4">Table 4</xref>. The L_YOLO algorithm enhances the mAP(0.5:0.95) by 4.8% on the infrared dataset and the mAP by an average of 12.9% on the Visdrone2019 dataset. The frames per second (FPS) of the L_YOLO algorithm on two datasets is much lower than that of the YOLOv8 algorithm, primarily because the underlying code of the MPDIOU_NMS algorithm is written in Python. However, the FPS of the L_YOLO algorithm exceeds 25 on both datasets, meeting basic industrial requirements and ensuring suitability for daily use.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Comparison between different datasets</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-12.tif"/>
</fig><table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Different datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Modules</th>
<th>mAP(0.5)</th>
<th>mAP(0.5:0.95)</th>
<th>P</th>
<th>R</th>
<th>GFLOPS</th>
<th>FPS</th>
<th>Parameters</th>
</tr>
</thead>
<tbody>
<tr>
<td>Visdrone2019</td>
<td>YOLOv8s</td>
<td>0.39</td>
<td>0.232</td>
<td>0.507</td>
<td>0.384</td>
<td>28.50</td>
<td>426</td>
<td>11129454</td>
</tr>
<tr>
<td></td>
<td>L_YOLO</td>
<td><bold>0.519</bold></td>
<td><bold>0.351</bold></td>
<td><bold>0.600</bold></td>
<td><bold>0.420</bold></td>
<td>23.03</td>
<td>29.3</td>
<td>7720770</td>
</tr>
<tr>
<td>Infrared</td>
<td>YOLOv8s</td>
<td>0.902</td>
<td>0.595</td>
<td>0.854</td>
<td>0.870</td>
<td>28.70</td>
<td>521</td>
<td>11137906</td>
</tr>
<tr>
<td></td>
<td>L_YOLO</td>
<td><bold>0.914</bold></td>
<td><bold>0.643</bold></td>
<td><bold>0.872</bold></td>
<td>0.871</td>
<td>23.30</td>
<td>31.6</td>
<td>7726636</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This paper also presents the heatmap visualisation for the two datasets using the YOLOv8s algorithm and the L_YOLO algorithm, respectively. The first two images are from the infrared dataset, and the last two are from the Visdrone2019 dataset. Obviously, the L_YOLO algorithm pays more attention to useful semantic information than the YOLOv8s algorithm. The details are shown in <xref ref-type="fig" rid="fig-13">Fig. 13</xref>.</p>
<fig id="fig-13">
<label>Figure 13</label>
<caption>
<title>Comparison of UAV infrared image feature visualization results</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48998-fig-13.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>Aiming at the different imaging of UAV aerial photography at different altitudes and the constraints of low light conditions at night make it challenging for operators to discern the target accurately. This paper presents an L-YOLO algorithm designed to address a series of problems. This paper primarily enhances YOLOv8 in the following ways. Initially, the PAN structure of YOLOv8 was substituted with the SPAN structure, followed by the replacement of the neck network feature extraction module with the SDCN module. Subsequently, the model was further pruned. Finally, the MPDIOU_NMS algorithm is added to assist the model in verification and testing. It has achieved good results on infrared datasets and Visdrone2019 datasets. Nevertheless, it is necessary to acknowledge that there are still numerous shortcomings in contemporary work. In the subsequent investigation, the following issues require resolution: (1) There is a need to enhance the speed of model inference further, for instance, by optimizing the NMS algorithm [<xref ref-type="bibr" rid="ref-36">36</xref>]. (2) Further pruning of the module, such as module pruning, can be implemented by utilizing group-level pruning [<xref ref-type="bibr" rid="ref-37">37</xref>], ensuring that no part of the network layer is skipped.</p>
</sec>
</body>
<back>
<ack>
<p>Thanks to the infrared dataset provided by Shandong Yantai Arrow Optoelectronics Technology Co., Ltd.</p>
</ack>
<sec><title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec><title>Author Contributions</title>
<p>Study conception and design: Zhijian Liu; data collection, analysis, and interpretation of results: Zhijian Liu; draft manuscript preparation: Zhijian Liu. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The data that support the findings of this study are openly available at <ext-link ext-link-type="uri" xlink:href="https://github.com/pastrami06/CMC">https://github.com/pastrami06/CMC</ext-link>.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. Z.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>L. M.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chang</surname></string-name>, and <string-name><given-names>L. X.</given-names> <surname>Yan</surname></string-name></person-group>, &#x201C;<article-title>Infrared small UAV target detection based on depthwise separable residual dense network and multiscale feature fusion</article-title>,&#x201D; <source>IEEE Trans. Instrum. Meas.</source>, vol. <volume>71</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>20</lpage>, <year>Aug. 2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TIM.2022.3198490</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X. C.</given-names> <surname>Zhong</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Jiang</surname></string-name>, and <string-name><given-names>X. L.</given-names> <surname>Jin</surname></string-name></person-group>, &#x201C;<article-title>Estimates of rice lodging using indices derived from UAV visible and thermal infrared image</article-title>,&#x201D; <source>Agr. Forest. Meteorol.</source>, vol. <volume>252</volume>, pp. <fpage>144</fpage>&#x2013;<lpage>154</lpage>, <year>Jan. 2018</year>. doi: <pub-id pub-id-type="doi">10.1016/j.agrformet.2018.01.021</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. M.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Y. Q.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>T. J.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>Y. L.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Underwater trash detection algorithm based on improved YOLOv5s</article-title>,&#x201D; <source>J. Real-Time. Image Process.</source>, vol. <volume>19</volume>, no. <issue>5</issue>, pp. <fpage>911</fpage>&#x2013;<lpage>920</lpage>, <year>Oct. 2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s11554-022-01232-0</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Hao</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Han</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>F.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Cattle body detection based on YOLOv5-EMA for precision livestock farming</article-title>,&#x201D; <source>Anim.</source>, vol. <volume>13</volume>, no. <issue>22</issue>, pp. <fpage>3535</fpage>, <year>Nov. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/ani13223535</pub-id>; <pub-id pub-id-type="pmid">38003152</pub-id></mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L. Q.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>X. M.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H. M.</given-names> <surname>Sun</surname></string-name>, and <string-name><given-names>Y. P.</given-names> <surname>Han</surname></string-name></person-group>, &#x201C;<article-title>Research on CBF-YOLO detection model for common soybean pests in complex environment</article-title>,&#x201D; <source>Comput. Electron. Agr.</source>, vol. <volume>216</volume>, pp. <fpage>108515</fpage>, <year>Jan. 2024</year>. doi: <pub-id pub-id-type="doi">10.1016/j.compag.2023.108515</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Ouyang</surname></string-name></person-group>, &#x201C;<article-title>DEYOv2: Rank feature with greedy matching for end-to-end object detection</article-title>,&#x201D; <year>Jun. 2023</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2306.09165</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Gao</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Lv</surname></string-name></person-group>, &#x201C;<article-title>OP mask R-CNN: An advanced mask R-CNN network for cattle individual recognition on large farms</article-title>,&#x201D; in <conf-name>2023 Int. Conf. Netw. Netw. Appl. (NaNA)</conf-name>, <conf-loc>Qingdao, China</conf-loc>, <year>Oct. 2023</year>, pp. <fpage>601</fpage>&#x2013;<lpage>606</lpage>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P. Y.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>D. J.</given-names> <surname>Ergu</surname></string-name>, <string-name><given-names>F. Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Cai</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Ma</surname></string-name></person-group>, &#x201C;<article-title>A review of yolo algorithm developments</article-title>,&#x201D; <source>Procedia Comput. Sci.</source>, vol. <volume>199</volume>, pp. <fpage>1066</fpage>&#x2013;<lpage>1073</lpage>, <year>Feb. 2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.procs.2022.01.135</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>SSD: Single shot multibox detector</article-title>,&#x201D; in <conf-name>Proc. Comput. Vis.-ECCV 2016: 14th Eur. Conf.</conf-name>, <conf-loc>Amsterdam, The Netherlands</conf-loc>, <year>Sep. 2016</year>, pp. <fpage>21</fpage>&#x2013;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F. M.</given-names> <surname>Talaat</surname></string-name> and <string-name><given-names>H. Z.</given-names> <surname>Eldin</surname></string-name></person-group>, &#x201C;<article-title>An improved fire detection approach based on YOLO-v8 for smart cities</article-title>,&#x201D; <source>Neural. Comput. Appl.</source>, vol. <volume>35</volume>, no. <issue>28</issue>, pp. <fpage>20939</fpage>&#x2013;<lpage>20954</lpage>, <year>Jul. 2023</year>. doi: <pub-id pub-id-type="doi">10.1007/s00521-023-08809-1</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>H. C.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>C. H.</given-names> <surname>Yin</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>Z. X.</given-names> <surname>Bai</surname></string-name></person-group>, &#x201C;<article-title>Dense papaya target detection in natural environment based on improved YOLOv5s</article-title>,&#x201D; <source>Agron.</source>, vol. <volume>13</volume>, no. <issue>8</issue>, pp. <fpage>2019</fpage>, <year>Jul. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/agronomy13082019</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Syafrudin</surname></string-name>, and <string-name><given-names>N. L.</given-names> <surname>Fitriyani</surname></string-name></person-group>, &#x201C;<article-title>CRAS-YOLO: A novel multi-category vessel detection and classification model based on YOLOv5s algorithm</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>11</volume>, pp. <fpage>11463</fpage>&#x2013;<lpage>11478</lpage>, <year>Feb. 2023</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2023.3241630</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Zheng</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>C2Net-YOLOv5: A bidirectional Res2Net-based traffic sign detection algorithm</article-title>,&#x201D; <source>Comput., Mater. Contin.</source>, vol. <volume>77</volume>, no. <issue>2</issue>, pp. <fpage>1949</fpage>&#x2013;<lpage>1965</lpage>, <year>Sep. 2023</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2023.042224</pub-id>; <pub-id pub-id-type="pmid">37303558</pub-id></mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Miao</surname></string-name></person-group>, &#x201C;<article-title>Distribution line pole detection and counting based on YOLO using UAV inspection line video</article-title>,&#x201D; <source>J. Electr. Eng. Technol.</source>, vol. <volume>15</volume>, pp. <fpage>441</fpage>&#x2013;<lpage>448</lpage>, <year>Jun. 2020</year>. doi: <pub-id pub-id-type="doi">10.1007/s42835-019-00230-w</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Andoli</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Mohammed</surname></string-name>, <string-name><given-names>S. C.</given-names> <surname>Tan</surname></string-name>, and <string-name><given-names>W. P.</given-names> <surname>Cheah</surname></string-name></person-group>, &#x201C;<article-title>A review on community detection in large complex networks from conventional to deep learning methods: A call for the use of parallel meta-heuristic algorithms</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>96501</fpage>&#x2013;<lpage>96527</lpage>, <year>Jul. 2021</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2021.3095335</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Diwan</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Anirudh</surname></string-name>, and <string-name><given-names>J. V.</given-names> <surname>Tembhurne</surname></string-name></person-group>, &#x201C;<article-title>Object detection using YOLO: Challenges, architectural successors, datasets and applications</article-title>,&#x201D; <source>Multimed. Tools. Appl.</source>, vol. <volume>82</volume>, no. <issue>6</issue>, pp. <fpage>9243</fpage>&#x2013;<lpage>9275</lpage>, <year>Jan. 2023</year>. doi: <pub-id pub-id-type="doi">10.1007/s11042-022-13644-y</pub-id>; <pub-id pub-id-type="pmid">35968414</pub-id></mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. H.</given-names> <surname>Cao</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>Z. H.</given-names> <surname>Mao</surname></string-name></person-group>, &#x201C;<article-title>UAV small target detection algorithm based on an improved YOLOv5s model</article-title>,&#x201D; <source>J. Vis. Commun. Image Rep.</source>, vol. <volume>97</volume>, pp. <fpage>103936</fpage>, <year>Sep. 2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.jvcir.2023.103936</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. M.</given-names> <surname>Hui</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>STF-YOLO: A small target detection algorithm for UAV remote sensing images based on improved SwinTransformer and class weighted classification decoupling head</article-title>,&#x201D; <source>Meas.</source>, vol. <volume>224</volume>, pp. <fpage>113936</fpage>, <year>Jan. 2024</year>. doi: <pub-id pub-id-type="doi">10.1016/j.measurement.2023.113936</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ali</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Jalal</surname></string-name>, <string-name><given-names>M. H.</given-names> <surname>Alatiyyah</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Alnowaiser</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Park</surname></string-name></person-group>, &#x201C;<article-title>Vehicle detection and tracking in uav imagery via yolov3 and kalman filter</article-title>,&#x201D; <source>Comput., Mater. Contin.</source>, vol. <volume>76</volume>, no. <issue>1</issue>, pp. <fpage>1249</fpage>&#x2013;<lpage>1265</lpage>, <year>Jun. 2023</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2023.038114</pub-id>; <pub-id pub-id-type="pmid">37303558</pub-id></mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Lu</surname></string-name></person-group>, &#x201C;<article-title>A CNN-transformer hybrid model based on CSWin transformer for UAV image object detection</article-title>,&#x201D; <source>IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.</source>, vol. <volume>16</volume>, pp. <fpage>1211</fpage>&#x2013;<lpage>1231</lpage>, <year>Jan. 2023</year>. doi: <pub-id pub-id-type="doi">10.1109/JSTARS.2023.3234161</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Qian</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yang</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Lei</surname></string-name></person-group>, &#x201C;<article-title>LFF-YOLO: A YOLO algorithm with lightweight feature fusion network for multi-scale defect detection</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>10</volume>, pp. <fpage>130339</fpage>&#x2013;<lpage>130349</lpage>, <year>Dec. 2022</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2022.3227205</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Miao</surname></string-name></person-group>, &#x201C;<article-title>SCNet: A lightweight and efficient object detection network for remote sensing</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. 21, pp. <fpage>1</fpage>, <year>Dec. 2023</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2023.3344937</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. B.</given-names> <surname>Zou</surname></string-name> and <string-name><given-names>C. Y.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>A light-weight object detection method based on knowledge distillation and model pruning for seam tracking system</article-title>,&#x201D; <source>Meas.</source>, vol. <volume>220</volume>, pp. <fpage>113438</fpage>, <year>Oct. 2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.measurement.2023.113438</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X. Q.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>H. B.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>Z. M.</given-names> <surname>Jia</surname></string-name>, and <string-name><given-names>Z. J.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>BL-YOLOv8: An improved road defect detection model based on YOLOv8</article-title>,&#x201D; <source>Sens.</source>, vol. <volume>23</volume>, no. <issue>20</issue>, pp. <fpage>8361</fpage>, <year>Sep. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/s23208361</pub-id>; <pub-id pub-id-type="pmid">37896455</pub-id></mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y. F.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>P.</given-names> <surname>An</surname></string-name>, <string-name><given-names>H. Y.</given-names> <surname>Hong</surname></string-name>, and <string-name><given-names>J. H.</given-names> <surname>Hu</surname></string-name></person-group>, &#x201C;<article-title>UAV-YOLOv8: A small-object-detection model based on improved YOLOv8 for UAV aerial photography scenarios</article-title>,&#x201D; <source>Sens.</source>, vol. <volume>23</volume>, no. <issue>32</issue>, pp. <fpage>7190</fpage>, <year>Jul. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/s23167190</pub-id>; <pub-id pub-id-type="pmid">37631727</pub-id></mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X. Z.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Lin</surname></string-name>, and <string-name><given-names>J. F.</given-names> <surname>Dai</surname></string-name></person-group>, &#x201C;<article-title>Deformable ConvNets V2: More deformable, better results</article-title>,&#x201D; in <conf-name>Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.</conf-name>, <conf-loc>Long Beach, CA, USA</conf-loc>, <year>Jun. 2019</year>, pp. <fpage>9308</fpage>&#x2013;<lpage>9316</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Mahendran</surname></string-name></person-group>, &#x201C;<article-title>SENetV2: Aggregated dense layer for channelwise and global representations</article-title>,&#x201D; <year>Nov. 2023</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2311.10807</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. D.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>D. J.</given-names> <surname>He</surname></string-name></person-group>, &#x201C;<article-title>Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning</article-title>,&#x201D; <source>Biosyst. Eng.</source>, vol. <volume>201</volume>, pp. <fpage>271</fpage>&#x2013;<lpage>281</lpage>, <year>Oct. 2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.biosystemseng.2021.08.015</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>J. G.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Z. Q.</given-names> <surname>Shen</surname></string-name>, and <string-name><given-names>G.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Learning efficient convolutional networks through network slimming</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Comput. Vis.</conf-name>, <publisher-loc>Venice, Italy</publisher-loc>, <year>Dec. 2017</year>, pp. <fpage>2736</fpage>&#x2013;<lpage>2744</lpage>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T. T.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>S. Y.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>A. J.</given-names> <surname>Xu</surname></string-name>, and <string-name><given-names>J. H.</given-names> <surname>Ye</surname></string-name></person-group>, &#x201C;<article-title>An approach for plant leaf image segmentation based on YOLOV8 and the improved DEEPLABV3&#x002B;</article-title>,&#x201D; <source>Plants</source>, vol. <volume>12</volume>, no. <issue>19</issue>, pp. <fpage>3438</fpage>, <year>Sep. 2023</year>. doi: <pub-id pub-id-type="doi">10.3390/plants12193438</pub-id>; <pub-id pub-id-type="pmid">37836178</pub-id></mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Bodla</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Chellappa</surname></string-name>, and <string-name><given-names>L. S.</given-names> <surname>Davis</surname></string-name></person-group>, &#x201C;<article-title>Soft-NMS&#x2013;Improving object detection with one line of code</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Comput. Vis.</conf-name>, <publisher-loc>Venice, Italy</publisher-loc>, <year>Apr. 2017</year>, pp. <fpage>5561</fpage>&#x2013;<lpage>5569</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. L.</given-names> <surname>Ma</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>MPDIoU: A loss for efficient and accurate bounding box regression</article-title>,&#x201D; <year>Jul. 2023</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2307.07662</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W. T.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>L. L.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y. L.</given-names> <surname>Long</surname></string-name>, and <string-name><given-names>X. D.</given-names> <surname>Wan</surname></string-name></person-group>, &#x201C;<article-title>Application of local fully convolutional neural network combined with YOLO v5 algorithm in small target detection of remote sensing image</article-title>,&#x201D; <source>PLoS One</source>, vol. <volume>16</volume>, no. <issue>10</issue>, pp. <fpage>10259283</fpage>, <year>Sep. 2021</year>. doi: <pub-id pub-id-type="doi">10.1371/journal.pone.0259283</pub-id>; <pub-id pub-id-type="pmid">34714878</pub-id></mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>L. L.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H. L.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>K. H.</given-names> <surname>Weng</surname></string-name>, and <string-name><given-names>Y. F.</given-names> <surname>Geng</surname></string-name></person-group>, &#x201C;<article-title>YOLOv6: A single-stage object detection framework for industrial applications</article-title>,&#x201D; <year>Sep. 2022</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2209.02976</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Bochkovskiy</surname></string-name>, and <string-name><given-names>H. Y. M.</given-names> <surname>Liao</surname></string-name></person-group>, &#x201C;<article-title>YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors</article-title>,&#x201D; <year>Jul. 2022</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2207.02696</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>J. K.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>D. Y.</given-names> <surname>Dai</surname></string-name>, <string-name><given-names>S. Q.</given-names> <surname>Lin</surname></string-name>, and <string-name><given-names>Z. H.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>D-NMS: A dynamic NMS network for general object detection</article-title>,&#x201D; <source>Neurocomput.</source>, vol. <volume>512</volume>, pp. <fpage>225</fpage>&#x2013;<lpage>234</lpage>, <year>Nov. 2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.neucom.2022.09.080</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G. F.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>X. Y.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>M. L.</given-names> <surname>Song</surname></string-name>, <string-name><given-names>M. B.</given-names> <surname>Mi</surname></string-name>, and <string-name><given-names>X. C.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>DepGraph: Towards any structural pruning</article-title>,&#x201D; in <conf-name>Proc. IEEE/CVF Conf. Comput. Vis. Pattern Recognit.</conf-name>, <publisher-loc>Vancouver, BC, Canada</publisher-loc>, <year>Jun. 2023</year>, pp. <fpage>16091</fpage>&#x2013;<lpage>16101</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>