<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">47469</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.047469</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Simple and Effective Surface Defect Detection Method of Power Line Insulators for Difficult Small Objects</article-title>
<alt-title alt-title-type="left-running-head">A Simple and Effective Surface Defect Detection Method of Power Line Insulators for Difficult Small Objects</alt-title>
<alt-title alt-title-type="right-running-head">A Simple and Effective Surface Defect Detection Method of Power Line Insulators for Difficult Small Objects</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Lu</surname><given-names>Xiao</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>lx_sz@js.sgcc.com.cn</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Jiang</surname><given-names>Chengling</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Ma</surname><given-names>Zhoujun</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Li</surname><given-names>Haitao</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Liu</surname><given-names>Yuexin</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>State Grid Jiangsu Electric Power Co., Ltd.</institution>, <addr-line>Nanjing, 210024</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>State Grid Changzhou Power Supply Company</institution>, <addr-line>Changzhou, 213003</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Xiao Lu. Email: <email>lx_sz@js.sgcc.com.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>25</day><month>4</month><year>2024</year></pub-date>
<volume>79</volume>
<issue>1</issue>
<fpage>373</fpage>
<lpage>390</lpage>
<history>
<date date-type="received"><day>06</day><month>11</month><year>2023</year></date>
<date date-type="accepted"><day>02</day><month>2</month><year>2024</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Lu et al.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Lu et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_47469.pdf"></self-uri>
<abstract>
<p>Insulator defect detection plays a vital role in maintaining the secure operation of power systems. To address the issues of the difficulty of detecting small objects and missing objects due to the small scale, variable scale, and fuzzy edge morphology of insulator defects, we construct an insulator dataset with 1600 samples containing flashovers and breakages. Then a simple and effective surface defect detection method of power line insulators for difficult small objects is proposed. Firstly, a high-resolution feature map is introduced and a small object prediction layer is added so that the model can detect tiny objects. Secondly, a simplified adaptive spatial feature fusion (S-ASFF) module is introduced to perform cross-scale spatial fusion to improve adaptability to variable multi-scale features. Finally, we propose an enhanced deformable attention mechanism (EDAM) module. By integrating a gating activation function, the model is further inspired to learn a small number of critical sampling points near reference points. And the module can improve the perception of object morphology. The experimental results indicate that concerning the dataset of flashover and breakage defects, this method improves the performance of YOLOv5, YOLOv7, and YOLOv8. In practical application, it can simply and effectively improve the precision of power line insulator defect detection and reduce missing detection for difficult small objects.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Insulator defect detection</kwd>
<kwd>small object</kwd>
<kwd>power line</kwd>
<kwd>deformable attention mechanism</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>State Grid Jiangsu Electric Power Co., Ltd. of the Science and Technology Project</funding-source>
<award-id>J2022004</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Insulators, as one of the key equipment, are important for secure power transmission. Its main function is to be used together with the metal fittings to fix the conductors to the tower poles and to keep the insulation between the conductors and the tower poles. However, there are various insulator defects due to harsh environments. The insulation characteristics of the defect insulator gradually deviate from the reliable range until it loses its insulation capacity. Common insulator defects include missing caps, pollution, flashover, breakage, and current leakage. According to research statistics, insulator defects cause more than half and the largest proportion of power accidents [<xref ref-type="bibr" rid="ref-1">1</xref>]. Flashover of insulators causes the second largest number of insulator accidents [<xref ref-type="bibr" rid="ref-2">2</xref>]. Nevertheless, insulator status detection is considered one of the most formidable issues within the realm of power line inspection [<xref ref-type="bibr" rid="ref-3">3</xref>].</p>
<p>To prevent power accidents and ensure the stability of the power supply, power companies must regularly inspect the power system to identify and address any equipment defects. In recent years, the inspection ways have undergone rapid development. Traditional inspection ways include (1) maintenance personnel observation inspection. Manual operation is limited by weather, terrain, and environment, which is inefficient and risky. (2) Using a manned helicopter to shoot videos. The helicopter is flown at a safe distance away from power lines and equipment, and then maintenance personnel take videos of various equipment in the power system for subsequent inspection. But this way is costly and inaccurate. As Unmanned Aerial Vehicle (UAV) technology has been widely used in many fields, such as agriculture and search actions, it has also been promoted in power inspection. UAV inspection offers the benefits of low cost and high efficiency. It can capture all kinds of data efficiently and quickly, such as visible image data. By processing the images, and then using the detection and classification technology based on computer vision, the automatic processing and analysis of images can be achieved so that the intelligent surveillance of the power system can be further realized.</p>
<p>However, after several years of development, the vast amounts of aerial images captured by UAVs have become one of the bottlenecks limiting the intelligent monitoring of power systems. Effectively employing advanced technologies for the intelligent processing and analysis of large-scale images has emerged as a critical concern.</p>
<p>Considerable work has been put into the detection of insulators and their defects to better monitor the power system. Zhang et al. [<xref ref-type="bibr" rid="ref-4">4</xref>] introduced an enhanced YOLOv8s model with multi-scale large kernel attention and lightweight Group Shuffle Convolution (GSConv) to tackle challenges related to sluggish recognition speed and low accuracy. Zhang et al. [<xref ref-type="bibr" rid="ref-5">5</xref>] proposed a densely connected feature pyramid based on YOLOv3. The method can realize the efficient fusion between the positional information of shallow features and the semantic information of deep features. However, it has a large performance gap in object detection with different scales. Jiang [<xref ref-type="bibr" rid="ref-6">6</xref>] proposed a method based on YOLOv5. They adopted a cascade framework to detect all insulator objects and various types of defects in images using the first-level model and second-level model successively. Although the method can detect defects such as flashovers, breakages, and missing caps, its ability to localize the flashovers and breakages with weak edge features is poor and detection precision is low. Xu [<xref ref-type="bibr" rid="ref-7">7</xref>] proposed a super-resolution generative network. They combined GridMask, random erasure algorithm, and adversarial generative network to expand the small object dataset while realizing the boundary clarity of small objects. Then the precision of small object detection was improved by introducing a Transformer and Swin Transformer to improve YOLOv5. However, the structure of the method is more complicated, which is not favorable for practical application.</p>
<p>To address the aforementioned challenges, this paper establishes a dataset encompassing flashover and breakage defects. Additionally, it presents a method for detecting insulator defects in power lines based on YOLOv5. The key contributions of our research can be outlined as follows:
<list list-type="simple">
<list-item><label>&#x25CF;</label>
<p>We introduce a high-resolution feature map to be fused with feature pyramids to enhance the detailed features of small objects in the neck network. Then the small object prediction layer is added to further improve the detection precision of tiny objects.</p></list-item>
<list-item>
<label>&#x25CF;</label><p>We propose an S-ASFF module to improve adaptability to variable scales. It can enhance scale perception for objects by establishing relationships between dimensions of scale and space to retain key information and learn to lead important scale features into a dominant position.</p></list-item>
<list-item>
<label>&#x25CF;</label><p>We propose an EDAM module to highlight weak edge features of insulator defects. It can further inspire the deformable attention mechanism to learn a small number of critical sampling points near the reference point to extract discriminative features.</p></list-item>
<list-item>
<label>&#x25CF;</label><p>Experimental results show that the proposed method can effectively detect insulator flashover and breakage defects and reduce missing detection.</p></list-item>
</list></p>
<p>The rest of this paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> reports the related work. <xref ref-type="sec" rid="s3">Section 3</xref> describes the details of the proposed methodology for defect detection in power lines. Experiment results on the defect detection are presented in <xref ref-type="sec" rid="s4">Section 4</xref>, and a short conclusion is finally drawn in <xref ref-type="sec" rid="s5">Section 5</xref>.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p><bold>Vision foundation models.</bold> Currently, the insulator defect detection models are mainly categorized into traditional image processing-based methods and deep learning-based methods. Methods based on traditional image processing design algorithms that involve the extraction of features [<xref ref-type="bibr" rid="ref-8">8</xref>], including color [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>], morphology [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>], gradient [<xref ref-type="bibr" rid="ref-13">13</xref>], edge [<xref ref-type="bibr" rid="ref-14">14</xref>], texture [<xref ref-type="bibr" rid="ref-15">15</xref>], and spatial characteristics [<xref ref-type="bibr" rid="ref-16">16</xref>]. The extracted features are poorly generalized to different tasks or objects and thus are gradually replaced by deep learning methods. Further, the methods are also susceptible to the interference of complex backgrounds, which is not conducive to small object detection.</p>
<p>The rise of deep learning has positioned it as a prominent technique in the intelligent inspection of power lines. The deep learning-based methods are categorized into two-stage methods and single-stage methods. Algorithms like R-CNN [<xref ref-type="bibr" rid="ref-17">17</xref>], Fast R-CNN [<xref ref-type="bibr" rid="ref-18">18</xref>], and Faster R-CNN [<xref ref-type="bibr" rid="ref-19">19</xref>] exemplify the two-stage methods. Reference [<xref ref-type="bibr" rid="ref-20">20</xref>] proposed an improved Faster R-CNN model based on deep learning to improve the precision of fault detection. The method first replaced the feature extraction network and used a feature pyramid for feature fusion, and finally used RolAlign instead of the RolPooling network to reduce the impact of quantization. Thus, the reduction of missed detection rate and false detection rate were realized. While these two-stage methods offer high detection precision, they face challenges in meeting real-time requirements in practical application scenarios. This is due to the candidate frame generation phase, which introduces significant computational redundancy, leading to a reduction in detection speed.</p>
<p>Single-stage methods, represented by algorithms such as the YOLO [<xref ref-type="bibr" rid="ref-21">21</xref>], YOLOv4 [<xref ref-type="bibr" rid="ref-22">22</xref>], YOLOv5 [<xref ref-type="bibr" rid="ref-23">23</xref>], YOLOv7 [<xref ref-type="bibr" rid="ref-24">24</xref>], and SSD [<xref ref-type="bibr" rid="ref-25">25</xref>], can directly predict the location and object class by using the location information as a potential object. Especially, the YOLO algorithm achieves a faster detection speed compared to two-stage methods, while taking into account a higher detection accuracy. Therefore, the YOLO series is widely used in industrial applications. Hao et al. [<xref ref-type="bibr" rid="ref-26">26</xref>] carried out a new architectural design from the backbone network and neck of YOLOv4, respectively. They designed CSP-ResNeSt to extract stronger features to weaken the influence of complex backgrounds in aerial images. Subsequently, Bi-SimAM-FPN, featuring split-attention blocks, was introduced to address the challenge of accurately identifying small-scale insulator defects. Reference [<xref ref-type="bibr" rid="ref-27">27</xref>] introduced Mina-Net for detecting self-blast in insulators, leveraging the YOLOv4 framework. The approach primarily incorporated shallow feature mapping within the feature pyramid and subsequently enhanced Squeeze-and-Excitation Networks (SENet) to recalibrate features across different levels in the channel direction. In another work, Ding et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] integrated the Assumption-free K-MC2 (AFK-MC2) algorithm into YOLOv5, adapting the K-means method to enhance both accuracy and speed in detecting defects in insulator strings.</p>
<p>The architectural framework of the YOLO series primarily comprises three components, which are the backbone, neck, and head. In terms of extracting multi-scale features, top-down paths, and transversally connected paths can represent multi-scale objects correctly. However, this leads to the fact that the different scale feature layers are only responsible for detecting objects at the corresponding scales, and the different scales are not sufficiently fused. In addition, tiny object detection still suffers from insufficient precision.</p>
<p>In this paper, we choose YOLOv5 as a baseline for four reasons: (1) YOLOv5 is the first model in the YOLO family to apply the gradient shunting idea to design a more efficient network architecture by using Cross Stage Partial Network (CSPNet) [<xref ref-type="bibr" rid="ref-29">29</xref>]. The subsequent YOLOv7 and YOLOv8 are both based on this idea to obtain richer gradient information by branching more gradient flow in parallel and thus obtain higher precision. Therefore, the value of using YOLOv5 as a baseline for our research results can be demonstrated more clearly in essence. (2) YOLOv5 is a popular and mature object detection model for industrial applications. It has been highly optimized compared to other versions and therefore has a very stable performance. (3) YOLOv5 has a strong advantage in the rapid deployment of the model. It not only dramatically improves detection speed while maintaining precision, but is also more friendly and flexible to deploy. (4) YOLOv5 has excellent FPS, and compared with YOLOv8, it is more suitable for deployment and real-time application on devices that do not support GPU. Besides, YOLOv7 and YOLOv8 all have higher parameters and FLOPs than YOLOv5 in the same level of model.</p>
<p><bold>Attention mechanisms.</bold> In recent years, attention mechanisms have been widely used in the field of power line defect detection. Chen et al. [<xref ref-type="bibr" rid="ref-30">30</xref>] added SENet, a channel attention mechanism, to the YOLOv5 backbone network to improve the feature extraction ability of the model. In addition, some researchers combine SENet with other attention mechanisms to solve the problem of low accuracy of power line defect detection. For example, Efficient Channel Attention (ECA) and SENet formed a double attention fusion module [<xref ref-type="bibr" rid="ref-31">31</xref>]. Alternatively, SENet and Concentration-Based Attention Module (CBAM) were introduced respectively to merge object features at different scales and prominent feature information [<xref ref-type="bibr" rid="ref-32">32</xref>]. Transformer [<xref ref-type="bibr" rid="ref-33">33</xref>] based on the self-attention mechanism has achieved great success in the field of natural language processing and some scholars have also applied it in the field of power line defect detection [<xref ref-type="bibr" rid="ref-34">34</xref>,<xref ref-type="bibr" rid="ref-35">35</xref>]. However, Self-attention is computationally heavy because it processes all the pixels in an image. In contrast to the above methods, inspired by the Deformable Attention Mechanism (DAM) proposed in Deformable DETR [<xref ref-type="bibr" rid="ref-36">36</xref>], EDAM proposed in this paper makes use of DAM that can significantly reduce computation and have better performance than the self-attention mechanism. EDAM has more powerful attention ability than DAM (especially on small objects) and more stable training gradients due to the introduction of fusion gating activation.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Methodology</title>
<p>In this paper, we present a simple and effective surface defect detection method for power line insulators. This method addresses challenges such as low precision and missed detection caused by small defect objects, variable scales, and fuzzy edge morphology.</p>
<p><xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the overall framework of the defect detection method proposed in this paper. The CBL is composed of convolution, batch normalization, and the Leaky ReLU activation function. There are two structures contained in the model, the CSP1_X structure and the CSP2_X structure, where X denotes multiple residual units. The CSP2 refers to X &#x003D; 1, indicating that it contains only one residual unit. To maximize the retention of detailed features conducive to small object detection, a shallow high-resolution feature map of size 160 &#x00D7; 160 &#x00D7; 64 in the backbone network is introduced firstly to be fused with the Feature Pyramid Network (FPN). Secondly, we introduce the ASFF module and improve it. The cross-scale fusion of FPN layers at each scale is performed by the adaptive spatial feature fusion layer S-ASFF, so that the fusion is dominated by the most important feature layer at each spatial location, improving the multi-scale characterization capability. Finally, an enhanced deformable attention mechanism is introduced with a fusion gating activation function, which further inspires the model to learn a small number of critical sampling points in a set of sampling points, reducing the defect missed detection rate of weak edge features.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The overall architecture of the proposed method</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-1.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>Strategy Adjustment of YOLOv5 Network Structure</title>
<p>The following is the setting of FPN used to construct the feature pyramid. The original feature representations are denoted as <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, which correspond to the multilevel feature maps <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:msub><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:msub><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> with predetermined strides {8, 16, 32} in feature hierarchy of the input image. To address the challenges that lead to defect missed detection due to insulators with small defect objects, low proportion of pixels occupied, and serious loss of detailed features during convolution down samplings, we introduce the high-resolution feature map <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> after two down samplings in the backbone network and fuse it with <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> of FPN. This way can enhance the detailed features of small objects in the neck network and alleviate the situation of serious loss of the detailed features in the deeper layers. The specific steps are: (1) <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is upsampled and then feature fused with <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> to obtain <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> which increases the detail features in the neck network that are favorable for small object detection; (2) the feature map <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is input into the YOLOv5 head after convolution, batch normalization, and Leaky ReLU activation, which ultimately improves the detection of tiny objects.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Simplified Adaptive Spatial Feature Fusion</title>
<p>To tackle the issue of insufficient feature fusion across different scales, which hampers the network of ability to adapt to significant variations in object scales, S-ASFF is added to the neck network, inspired by the Adaptive Spatial Feature Fusion (ASFF) [<xref ref-type="bibr" rid="ref-37">37</xref>] module. In addition, to reduce the number of parameters, we remove the feature fusion layers of ASFF-1, ASFF-2, and ASFF-3. By utilizing the feature fusion mechanism of S-ASFF, the model can adaptively learn the weights of the different scale feature layers of the FPN at each same position, so that the most important feature layers dominate the fusion. The comparison between the ASFF module and the S-ASFF module is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The original ASFF contains three fusion layers while S-ASFF has only one fusion layer which integrates the feature layer <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> of FPN with more detailed features. The S-ASFF can fuse the bottom feature maps with higher resolution in the FPN so that it can improve the detection ability of smaller objects.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The network structure comparison between ASFF and S-ASFF</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-2.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, S-ASFF is specifically divided into three steps: (1) up-sampling <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> to expand the scale to ensure that the scales of each feature layer are consistent in the process of fusion; (2) spatial filtering for each position after up-sampling to learn the relative importance of different scale layers and to enhance the perceptibility of the scales; and (3) spatial fusion across scales to enhance the adaptability to changes in scales. As shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, where <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is the highest layer and <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is the lowest layer. After up-sampling, the feature maps of each layer are denoted as <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>2</mml:mn><mml:mrow><mml:mo>,</mml:mo></mml:mrow><mml:mn>5</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>The structure of simplified adaptive spatial feature fusion. (a) Represents the overall architecture of S-ASFF with FPN. (b) Represents the details of spatial filtering and cross-scale fusion for S-ASFF</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-3.tif"/>
</fig>
<p>Both scale dimension and spatial dimension are taken into account in S-ASFF which processes spatial weights to the scaled feature maps at each level. A softmax activation function with a control factor <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>&#x03C1;</mml:mi></mml:math></inline-formula> is used to calculate the spatial mask indicating the relative importance of the corresponding positions for the layers across the scales. Taking layer of <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> as an example, the formula for calculating a mask value at pixel (<italic>i, j</italic>) of the 5-th layer is as follows:</p>
<p><disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msup><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:msubsup><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>,</mml:mo></mml:math></disp-formula>where each pixel corresponds to a control factor <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>&#x03C1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to generate <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> for levels referred to as <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mo stretchy="false">[</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>.</p>
<p>For aggregation at scales and filtering of conflicts in space, the formula is as follows:</p>
<p><disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msubsup><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, k &#x2208; [2,5] denotes the feature tensor at pixel (<italic>i, j</italic>) and <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>. <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msubsup><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> denotes a feature tensor as output after being processed by S-ASFF.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Enhanced Deformable Attention Mechanism</title>
<p>To enhance the transform modeling ability of the convolution neural network, the model can adaptively adjust the shape of the convolution kernel to adapt to object features with different morphologies, to enhance the localization ability of insulator defects with fuzzy edge morphology, and then the missed detection of insulator defects is reduced. EDAM is introduced to improve the perception of object morphology. As shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, the given feature tensor <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>F</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="fraktur">R</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mi>H</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>W</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the feature space based on S-ASFF, and after a general convolution, the feature map <italic>y</italic> is obtained. &#x2299; indicates that <italic>K</italic> sampling points of a pixel are matched with the attention mask value, respectively, and <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:mo>&#x2299;</mml:mo></mml:math></inline-formula> indicates that <italic>K</italic> sampling points are paid attention to. The adaptive sampling process is a self-learning process to obtain the offset when the network is trained. EDAM enhances the attention ability of the DAM so that the model can focus on more meaningful locations. Specifically, it zeroes in on the attention values of unimportant sample points by fusing the gating activation function, prompting the model to transition from learning a limited set of sample points near a reference point to focusing on a few crucial sample points within that set.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The structure diagram of the enhanced deformable attention mechanism</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-4.tif"/>
</fig>
<p>The mask is obtained based on the gating activation function <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mrow><mml:mi>&#x2205;</mml:mi></mml:mrow></mml:math></inline-formula>. This activation function zeroes out the negative weights so that the network can learn how to enhance expansion ways for the perceptual domain of attention, extracting effective discriminant features. The formula is represented as follows:</p>
<p><disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>&#x2205;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:mfrac><mml:mrow><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>h</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>0</mml:mn><mml:mrow><mml:mo>,</mml:mo></mml:mrow><mml:mn>1</mml:mn><mml:mo>]</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mo>,</mml:mo></mml:math></disp-formula>where <italic>x</italic> is the feature map after convolution, <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi>&#x03C6;</mml:mi></mml:math></inline-formula> is a predefined hyperparameter and tanh is the hyperbolic tangent function.</p>
<p>Based on the features in the across-scale fusion space, the output after attention for <italic>K</italic> sampling points on a pixel <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is represented as follows:</p>
<p><disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mover><mml:mi>F</mml:mi><mml:mo>&#x02D9;</mml:mo></mml:mover></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mtext>k</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mo>,</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mrow><mml:mover><mml:mi>F</mml:mi><mml:mo>&#x02D9;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is the feature space for EDAM. For a pixel <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> on the feature map <italic>y</italic> after general convolution, <italic>K</italic> is the total number of sampling points on a pixel <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. We set <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>K</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:math></inline-formula>. The range of the convolution kernel is defined by <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow></mml:math></inline-formula>. In this paper, <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mo>,</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, which is represented that <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula> convolution kernel with dilation 1. <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msub><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a mask value for the <italic>k-th</italic> sampling point by gating activation function <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mrow><mml:mi>&#x2205;</mml:mi></mml:mrow></mml:math></inline-formula>. <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the offset position of the <italic>k-th</italic> sampling point. It is worth noting that when the offset coordinates are floating point numbers, we use the nearest neighbor interpolation method to round the offset coordinates to obtain the revised coordinates and determine the offset position <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiments</title>
<sec id="s4_1">
<label>4.1</label>
<title>Dataset Preparation</title>
<p>This paper utilizes a dataset of 1600 raw images. We first collated insulator images containing flashover and breakage from the Electric Power Research Institute (EPRI) and public datasets UPID [<xref ref-type="bibr" rid="ref-38">38</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>]. And then we performed a pre-processing operation in images: Cleaning up the damaged images and resizing them. In addition, image flipping, saturation adjustment, contrast adjustment, and noise addition were used to expand the data. Second, a dataset in YOLO format was produced for training and evaluation. Specifically, the images were labeled using the Labelimg tool to obtain the object categories and coordinate information of ground truth. There are three object classes: Pollution_flashover, broken, and insulator. According to statistics, the sample size of these three classes is 1994, 861, and 1466 in order. The dataset was partitioned into a training set and a test set, with an 8:2 ratio.</p>
<p>The label correlogram for objects of each size in the dataset is shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. It can help identify patterns or correlations in the distribution of object annotations across different classes and scales. It reveals if certain classes are more likely to appear at specific scales [<xref ref-type="bibr" rid="ref-40">40</xref>]. <italic>x</italic> and <italic>y</italic> are the coordinates of the bounding box of the object, and width and height are the size of the bounding box of the object, respectively. From the three coordinate graphs with red labeled boxes, it can be shown that most of the bounding box widths and heights are less than 1/4 or 1/8 of the original size of the entire image. However, because the size of the feeding neural network is fixed at 640 &#x00D7; 640, most of the object pixels in the actual trained images take up a smaller proportion. Overall, the object bounding boxes are distributed in all parts of the x-axis, indicating that the object scale varies greatly. In addition, this dataset covers most of the defect scenes in practical applications which makes the trained model generalizable. To further reduce the network overfitting problem, the model was enhanced with a mosaic data enhancement method for the samples in the dataset before training. Furthermore, data enhancement strategies such as random scaling and random cropping were used to improve the model classification performance.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>The label correlogram for objects of each size in the dataset. (a) Represents a class of graphs depicting the relationship between x, y, width, and height. Similarly, (b) represents a class of graphs depicting the distribution</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-5.tif"/>
</fig>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Experimental Metrics and Implementation Details</title>
<p>For a more comprehensive evaluation of the model, the paper employed three metrics: Precision (P), Recall (R), and F1-Score, to account for the comprehensive prediction of breakage and flashover. Additionally, mAP was utilized as an overarching performance metric to characterize and assess the model&#x2019;s quality. All the experiments in this paper were conducted in the hardware environment shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>The experimental running environment</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Type</th>
<th>Configuration</th>
</tr>
</thead>
<tbody>
<tr>
<td>CPU</td>
<td>Intel(R) Core(TM) i5&#x2013;10500 CPU @ 3.10 GHz</td>
</tr>
<tr>
<td>GPU</td>
<td>Nvidia GeForce RTX3090 (24 G)</td>
</tr>
<tr>
<td>Accelerated environment</td>
<td>CUDA 11.5</td>
</tr>
<tr>
<td>Operating system</td>
<td>Windows10</td>
</tr>
<tr>
<td>Deep learning framework</td>
<td>Pytorch1.10.1</td>
</tr>
<tr>
<td>Programming language</td>
<td>Python3.7.12</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Model Training</title>
<p>The network was fed with images of size 640 &#x00D7; 640, and several key parameters were configured. The batch size was set to 8, momentum to 0.937, initial learning rate to 0.01, and weight decay to 0.0005. The network was trained from scratch for 300 epochs. For the constructed dataset, the number of object classes is inevitably unbalanced. To alleviate the impact of this problem, we adopted YOLOv5 with a class imbalance strategy as the baseline. This strategy is proposed by YOLOv5. The setting of class weights and image weights was introduced. The class weights require calculating the number of labels for each class in the dataset and then taking the reciprocal of the number of class labels. In other words, the greater the number of labels of a certain class, the smaller the weight in the image containing that class. Calculating the sum of all the class weights contained in an image is the image weight. The greater the image weight, the greater the probability of the image being sampled. In particular, the image is selected according to the image weight by random selection method and the number of images is the same as that of the training set. This means that the larger image weight will be more likely to be selected for training. This method can increase the training proportion of the small number of object classes during training so that the model can learn the features of each object class more balanced and prevent over-fitting. The training results are depicted in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. As the number of training epochs gradually increased to 300, the loss curves stabilized, indicating effective training.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>The graphs of training loss and validation loss. (a) Illustrates the bounding box regression loss. (b) Represents the classification loss</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-6.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Comparison with the SOTA Models</title>
<p>In this section, we assessed the detection capabilities of our method. For a fair comparison, we re-implemented the relevant models and employed the same evaluation dataset to calculate performance metrics, including precision, recall, and F1-Score, for the different models. All results are presented in <xref ref-type="table" rid="table-2">Table 2</xref>. The detection model based on Faster RCNN and SSD has low detection precision and recall for flashover and breakage as small objects due to the limitations of the basic network framework, so they are difficult to apply in practice. Compared with YOLOv5s, both YOLOv7 and YOLOv8s have better performance. However, when applied to YOLOv5s, YOLOv7, and YOLOv8s, respectively, our proposed method can bring them greater precision gains and recall rate gains. In particular, when the YOLOv5s baseline was combined with our proposed method, the precision on flashover and breakage defects was improved by 5.7% and 4.7%, respectively. This exceeds the YOLOv8s baseline, even the model YOLOv8s &#x002B; Ours, and significantly narrows the performance gap with YOLOv7. As for the higher gains of our method on YOLOv5, the possible reason is that the YOLOv5 model is not robust enough. There are more robust network structures and performance due to the optimization and improvement of YOLOv7 and YOLOv8 in the aspects of model feature extraction and strategy for matching positive and negative samples.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>The comparison with the state-of-the-art models</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th rowspan="2">Model</th>
<th align="center" colspan="3">Breakage (%)</th>
<th align="center" colspan="3">Flashover (%)</th>
<th rowspan="2">mAP</th>
</tr>
<tr>
<th>P</th>
<th>R</th>
<th>F1-Score</th>
<th>P</th>
<th>R</th>
<th>F1-Score</th>
</tr>
</thead>
<tbody>
<tr>
<td>Faster RCNN</td>
<td>60.6</td>
<td>81.5</td>
<td>69.0</td>
<td>43.6</td>
<td>60.8</td>
<td>51.0</td>
<td>62.8</td>
</tr>
<tr>
<td>SSD</td>
<td>91.2</td>
<td>75.4</td>
<td>83.0</td>
<td>74.9</td>
<td>67.6</td>
<td>71.1</td>
<td>75.3</td>
</tr>
<tr>
<td>FINet [<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
<td>90.3</td>
<td>87.5</td>
<td>88.9</td>
<td>87.3</td>
<td>78.5</td>
<td>82.7</td>
<td>80.5</td>
</tr>
<tr>
<td>AFNet</td>
<td>88.0</td>
<td>87.5</td>
<td>87.7</td>
<td>88.1</td>
<td>81.5</td>
<td>84.6</td>
<td>81.3</td>
</tr>
<tr>
<td>YOLOv5s</td>
<td>87.3</td>
<td>89.6</td>
<td>88.4</td>
<td>83.4</td>
<td>83.0</td>
<td>83.2</td>
<td>83.5</td>
</tr>
<tr>
<td>YOLOv5s &#x002B; Ours</td>
<td>93.1 &#x002B; 5.7</td>
<td>90.6 &#x002B; 1.0</td>
<td>91.8</td>
<td>88.1 &#x002B; 4.7</td>
<td>86.0 &#x002B; 3.0</td>
<td>87.0</td>
<td>85.6 &#x002B; 2.1</td>
</tr>
<tr>
<td>YOLOv7</td>
<td>94.7</td>
<td>91.3</td>
<td>93.0</td>
<td>89.8</td>
<td>86.3</td>
<td>88.0</td>
<td>87.4</td>
</tr>
<tr>
<td>YOLOv7 &#x002B; Ours</td>
<td>95.4 &#x002B; 0.7</td>
<td>92.7 &#x002B; 1.4</td>
<td>94.0</td>
<td>92.0 &#x002B;1.2</td>
<td>87.2&#x002B;0.9</td>
<td>89.5</td>
<td>87.9 &#x002B; 0.5</td>
</tr>
<tr>
<td>YOLOv8s</td>
<td>90.3</td>
<td>89.6</td>
<td>89.9</td>
<td>87.5</td>
<td>75.7</td>
<td>81.2</td>
<td>84.9</td>
</tr>
<tr>
<td>YOLOv8s &#x002B; Ours</td>
<td>92.2 &#x002B; 1.9</td>
<td>89.9&#x002B;0.3</td>
<td>91.0</td>
<td>88.0 &#x002B; 0.5</td>
<td>79.4 &#x002B; 3.7</td>
<td>83.5</td>
<td>86.1 &#x002B; 1.2</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We also provide the deployment performance of each model in practical applications, as shown in <xref ref-type="table" rid="table-3">Table 3</xref>. In the task of power line inspection, the deployment of deep learning models is a comprehensive process involving several key factors. How efficiently the model is deployed on specific hardware for optimal performance is critical. YOLOv7 and YOLOv8s are higher than YOLOv5s in FLOPs, Parameters, and Model Size. For deploying models on hardware devices that do not support GPU, YOLOv5s is more suitable. However, the YOLOv5s is significantly inferior to the YOLOv5s &#x002B; Ours in terms of precision. In actual model deployment, compared with YOLOv7 &#x002B; Ours and YOLOv8s &#x002B; Ours, YOLOv5s &#x002B; Ours has the advantages of small model size, high precision, low computational efficiency and flexible deployment. If optimal precision needs to be considered, YOLOv7 as the baseline can be chosen to detect flashover and breakage defects of power line insulators.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>The comparison of deployment performance in practical application</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th>Model</th>
<th>FLOPs</th>
<th>Parameters (MB)</th>
<th>Model size (MB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Faster RCNN</td>
<td>474.0</td>
<td>28.8</td>
<td>108.2</td>
</tr>
<tr>
<td>SSD</td>
<td>137.0</td>
<td>24.0</td>
<td>91.6</td>
</tr>
<tr>
<td>YOLOv5s</td>
<td>17.0</td>
<td>7.3</td>
<td>14.1</td>
</tr>
<tr>
<td>FINet</td>
<td>16.9</td>
<td>7.3</td>
<td>14.2</td>
</tr>
<tr>
<td>AFNet</td>
<td>17.7</td>
<td>8.1</td>
<td>15.8</td>
</tr>
<tr>
<td>YOLOv7</td>
<td>53.2</td>
<td>37.6</td>
<td>71.3</td>
</tr>
<tr>
<td>YOLOv8s</td>
<td>28.7</td>
<td>11.2</td>
<td>21.5</td>
</tr>
<tr>
<td>YOLOv5 &#x002B; ours</td>
<td>23.1</td>
<td>8.4</td>
<td>16.6</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The results show that our proposed method is effective and lightweight in solving small objects, scale variations, and weak edge morphological features. Firstly, for strategy adjustment of the YOLOv5 network structure, although this strategy only uses simply high-resolution feature map, it brings two major improvement advantages to the model: (1) reduces the loss of details of small objects; (2) brings smaller object-scale information to the second component (S-ASFF). This balances the preference for learning large and small objects, allowing the model to learn tiny object features that are more difficult to learn. Secondly, for S-ASFF, the improved performance gains benefit from the fact that the most important scale layer dominates each position in the feature pyramid with minimal computational cost. For better fusion, we weigh the scale to which the feature pyramid should be scaled with the least computational cost while preserving as much of the small object information as possible from the first proposed component (strategy). Other fusion ways all cause secondary loss of small object information or increase unnecessary computing costs. Finally, due to EDAM introducing the gating activation function <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mrow><mml:mi>&#x2205;</mml:mi></mml:mrow></mml:math></inline-formula> with tanh instead of the sigmoid in DAM, EDAM can suppress all attention values with negative input to 0, which is similar to the negative half region of the ReLU. This will increase the sparsity of attention. Compared to DAM which indiscriminately learns from a small group of sampling points near a certain pixel point, EDAM can focus on more meaningful location learning discriminant features, resulting in improved performance. In addition, EDAM alleviates gradient dispersion because the sensitive range of gradient change is expanded from [0,0.25] to [0,1], and the gradient is more stable during training. Furthermore, our method is a more generic variant method compared to the existing solutions on the YOLO series, which can improve the performance of models in the YOLO family.</p>
<p>All in all, in this paper, a general enhancement method is designed, which reasonably uses high-resolution feature maps and can simply and effectively improve the adaptability of variable multi-scale features and the perception of weak edge features with less increase of parameters and computation. It significantly enhances the detection performance of the YOLO series on difficult small objects. The missing detection of defects in power lines is reduced.</p>
<p>To verify the detection performance of our model on large-scale objects and make the results more convincing, we trained and tested various mainstream algorithms on the public dataset UPID and compared them with the IFD dataset constructed in this paper. As shown in <xref ref-type="table" rid="table-4">Table 4</xref>. For large-scale objects, the AP of our model on both the IFD dataset and the UPID public dataset is best.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>The performance comparison on different datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th>Model</th>
<th>IFD</th>
<th>UPID</th>
</tr>
<tr>
<td/>
<th>Insulator AP (%)</th>
<th>Insulator AP (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>SSD</td>
<td>97.46</td>
<td>91.21</td>
</tr>
<tr>
<td>YOLOv5s</td>
<td>97.50</td>
<td>93.74</td>
</tr>
<tr>
<td>FINet</td>
<td>98.39</td>
<td>94.59</td>
</tr>
<tr>
<td>AFNet</td>
<td>98.21</td>
<td>94.95</td>
</tr>
<tr>
<td>Ours</td>
<td>98.48</td>
<td>95.45</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Object detection was performed on the test set and the results were visualized. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows some of the results. The experiment proves that the detection model has strong generalization capability and wide application potential.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>The visualization of defect detection results of power line equipment. (a) Illustrates the objects can beeffectively identified when the attitude is deformed. (b) Illustrates dense and weakly discharged flashovers can also be effectively detected. (c) Represents broken cross-sections can also be effectively detected when the light is intense. (d) Represents detection results for ice-covered scenarios. (e) Illustrates no missed detection for the case of dense breakage defects and overlapping edges on an insulator. (f) Severe accumulation of dirt on insulators is not mistakenly detected as a flashover</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-7.tif"/>
</fig>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Ablation Study</title>
<p>In this section, we conducted essential ablation experiments to quantitatively analyze the effectiveness of the various equipment proposed. We gradually introduced the relevant modules into the network during training. YOLOv5s served as the baseline method for all ablation studies. For convenience, the strategy adjustment of YOLOv5s network structure is denoted as SAStrgy. The results are shown in <xref ref-type="table" rid="table-5">Table 5</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>The effect of each component</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th rowspan="2">Model</th>
<th align="center" colspan="3">Breakage (%)</th>
<th align="center" colspan="3">Flashover (%)</th>
</tr>
<tr>
<th>P</th>
<th>R</th>
<th>F1-Score</th>
<th>P</th>
<th>R</th>
<th>F1-Score</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv5s</td>
<td>87.31</td>
<td>89.58</td>
<td>88.43</td>
<td>83.37</td>
<td>83.04</td>
<td>83.20</td>
</tr>
<tr>
<td>YOLOv5s &#x002B; SAStrgy</td>
<td>86.92</td>
<td>89.10</td>
<td>88.00</td>
<td>86.77</td>
<td>85.40</td>
<td>86.08</td>
</tr>
<tr>
<td>YOLOv5s &#x002B; SAStrgy &#x002B; S-ASFF</td>
<td>91.89</td>
<td>88.54</td>
<td>90.19</td>
<td>86.45</td>
<td>85.60</td>
<td>86.03</td>
</tr>
<tr>
<td>YOLOv5s &#x002B; SAStrgy &#x002B; S-ASFF &#x002B; DAM</td>
<td>92.38</td>
<td>88.85</td>
<td>90.58</td>
<td>86.95</td>
<td>85.02</td>
<td>85.97</td>
</tr>
<tr>
<td>YOLOv5s &#x002B; SAStrgy &#x002B; S-ASFF &#x002B; EDAM</td>
<td>93.05</td>
<td>90.63</td>
<td>91.82</td>
<td>88.08</td>
<td>86.00</td>
<td>87.03</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We note that directly adding an SAStrgy makes slight degradation for the performance of breakage but improves the precision of flashover. The result suggests an unstable effect. We consider possible reasons for this: The plain YOLOv5s has no advantage for the detection of small objects due to variable scale and weak edge features besides complex backgrounds. The method has difficulty in precise judgment for the defect location and damage degree. For specific defect detection tasks, further optimized strategies are necessary to improve detection performance.</p>
<p>The results show that S-ASFF and EDAM, based on the addition of a SAStrgy, boost the performance of the baseline: The detection precision of flashover is improved by 3.01% and that of breakage by 4.58%. However, EDAM improves the model performance most significantly. The detection precision of flashover is boosted by 4.71% and recall by 2.96% over YOLOv5s. The detection precision of breakage is increased by 5.74% and recall by 1.05%. This is because S-ASFF provides key details to EDAM and reduces interference from complex backgrounds. It enables EDAM to more accurately expand the perceptual domain of attention and extract discriminative features.</p>
<p>To illustrate the benefits of our model more visually. This is shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>. The first row shows an example of breakage and the second row shows an example of flashover. The layer23 will be further used to predict small objects. It is used together with layer 26 and layer 29 as input to yolo heads for classification and regression tasks. Before S-ASFF (as shown in layer 21), features are retained in the object regions because of the stronger semantic information. On the contrary, after the object scale perception of S-ASFF (as shown in layer 22), features are extracted around each object. Compared with layer 21, large-scale insulators and small-scale defect (flashovers and breakages) objects significantly improve the feature distinction in scale and are more sensitive to location. After EDAM (as shown in layer 23), distinguishing features are highlighted from key information retained by the previous module, and object sizes and edge morphological features are precisely learned. At the same time, one may notice that the defect objects can be learned in both layer 26 and layer 29 because YOLOv5 allows ground truth boxes to perform anchor matching in all prediction layers simultaneously to increase the number of positive samples. However, layer 23 contains more accurate small object size and richer edge morphology. Therefore, for SAStrgy, adding a tiny object prediction layer to directly predict the feature map of layer23 is conducive to improving the precision of smaller objects, especially for detecting difficult small objects.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>The results of heat map visualization</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47469-fig-8.tif"/>
</fig>
</sec>
<sec id="s4_6">
<label>4.6</label>
<title>Runtime Analysis</title>
<p>We also recorded the testing time of our method based on YOLOv5. Specifically, with a batch size of 16, our method achieved an inference time of 6.1 ms and a non-maximum suppression (NMS) time of 0.9 ms. Consequently, the total processing time for an image with a size of 640 pixels was 7.1 ms.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>In this paper, we develop research on defect detection in power lines based on YOLOv5, a framework of object detection algorithms in the field of computer vision. We analyze the challenges in this field. Through image dataset construction, image processing, model improvement, and experimental validation of flashover and breakage, we propose a simple and effective surface defect detection method of power line insulators for difficult small objects. After experimental validation, we conclude that our model has the advantages of high precision, low omission rate, stability, and fast convergence when compared with state-of-the-art detection models. It can detect objects in real time. our model can be extended to other defect detection, such as bird&#x2019;s nests on power lines or towers, and hanging foreign objects. The fact that deep learning-based defect detection for flashover and breakage in power lines is under-reported in the literature, so this paper has clear engineering research value. If more sufficient and diverse defect image data can be obtained, the next step will focus on multiple defect detection scenes of various power components, and solve the difficult small object detection problem in this scene with multi-scale objects.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to express their gratitude to State Grid Jiangsu Electric Power Co., Ltd. for support of this study.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This research was funded by State Grid Jiangsu Electric Power Co., Ltd. of the Science and Technology Project (Grant No. J2022004).</p>
</sec>
<sec><title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Xiao Lu, Chengling Jiang; data collection: Xiao Lu, Zhoujun Ma; analysis and interpretation of results: Xiao Lu, Chengling Jiang, Haitao Li; draft manuscript preparation: Xiao Lu, Chengling Jiang, Yuexin Liu. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The datasets can be openly available in UPID at <ext-link ext-link-type="uri" xlink:href="https://github.com/heitorcfelix/public-insulator-datasets">https://github.com/heitorcfelix/public-insulator-datasets</ext-link>. The code/supplementary data supporting the findings of this study are available from the corresponding authors upon a request.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. J.</given-names> <surname>Zhai</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>M. L.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>J. R.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>F.</given-names> <surname>Guo</surname></string-name></person-group>, &#x201C;<article-title>Fault detection of insulator based on saliency and adaptive morphology</article-title>,&#x201D; <source>Multimed. Tools Appl.</source>, vol. <volume>76</volume>, no. <issue>9</issue>, pp. <fpage>12051</fpage>&#x2013;<lpage>12064</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.1007/s11042-016-3981-2</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Mei</surname></string-name>, <string-name><given-names>T. C.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>X. Y.</given-names> <surname>Wu</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Insulator surface dirt image detection technology based on improved watershed algorithm</article-title>,&#x201D; in <conf-name>Proc. 2012 Asia-Pacific Power and Energy Engineering Conf.</conf-name>, <publisher-loc>Shanghai, China</publisher-loc>, <year>2012</year>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. D. F.</given-names> <surname>Ahmed</surname></string-name>, <string-name><given-names>J. C.</given-names> <surname>Mohanta</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Sanyal</surname></string-name></person-group>, &#x201C;<article-title>Inspection and identification of transmission line insulator breakdown based on deep learning using aerial images</article-title>,&#x201D; <source>Electr. Power Syst. Res.</source>, vol. <volume>211</volume>, no. <issue>5</issue>, pp. <fpage>108199</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.epsr.2022.108199</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>B. Q.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Cui</surname></string-name>, <string-name><given-names>Y. S.</given-names> <surname>Lai</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Gao</surname></string-name></person-group>, &#x201C;<article-title>Research on improved YOLOv8 algorithm for insulator defect detection</article-title>,&#x201D; <year>2023</year>. <comment> Accessed: 15 Sep. 2023</comment>. [Online]. Available: <pub-id pub-id-type="doi">10.21203/rs.3.rs-3337929/v1</pub-id></mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X. T.</given-names> <surname>Zhang</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>InsuDet: A fault detection method for insulators of overhead transmission lines using convolutional neural networks</article-title>,&#x201D; <source>IEEE Trans. Instrum. Meas.</source>, vol. <volume>70</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1109/TIM.2021.3127641</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Jiang</surname></string-name></person-group>, &#x201C;<article-title>Multi-defect detection of transmission line insulators based on YOLOV5 UAV aerial photography</article-title>,&#x201D; <comment>M.S. dissertation</comment>, <publisher-name>Guangdong Univ. of Tech.</publisher-name>, <publisher-loc>China</publisher-loc>, <year>2021</year>. </mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>Research on small target detection and defect identification of transmission lines based on machine vision</article-title>,&#x201D; <comment>M.S. dissertation</comment>, <publisher-name>Zhejiang Univ.</publisher-name>, <publisher-loc>China</publisher-loc>, <year>2022</year>. </mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Liao</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>Yang</surname></string-name></person-group>, &#x201C;<article-title>Nonlinear mechanical model of composite insulator interface and nondestructive testing method for weak bonding defects</article-title>,&#x201D; <source>Chin. J. Electr. Eng.</source>, vol. <volume>39</volume>, no. <issue>3</issue>, pp. <fpage>895</fpage>&#x2013;<lpage>905</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1049/hve.2019.0044</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. P.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>Q. W.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>L. L.</given-names> <surname>Chu</surname></string-name>, <string-name><given-names>Y. Q.</given-names> <surname>Zhou</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>Real-time detection and spatial localization of insulators for UAV inspection based on binocular stereo vision</article-title>,&#x201D; <source>Remote Sens.</source>, vol. <volume>13</volume>, no. <issue>2</issue>, pp. <fpage>230</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.3390/rs13020230</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. Y.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>J. B.</given-names> <surname>An</surname></string-name>, and <string-name><given-names>F. U.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Segmentation of insulator images based on HIS color space</article-title>,&#x201D; <source>Journal of Dalian Nationalities University</source>, vol. <volume>12</volume>, no. <issue>5</issue>, pp. <fpage>481</fpage>&#x2013;<lpage>484</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. T.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Han</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Fu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Cao</surname></string-name></person-group>, &#x201C;<article-title>The identification and diagnosis of self-blast defects of glass insulators based on multi-feature fusion</article-title>,&#x201D; <source>Electronic Power</source>, vol. <volume>50</volume>, no. <issue>5</issue>, pp. <fpage>52</fpage>&#x2013;<lpage>58</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>J.</given-names> <surname>An</surname></string-name></person-group>, &#x201C;<article-title>An active contour model based on texture distribution for extracting inhomogeneous insulators from aerial images</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>52</volume>, no. <issue>6</issue>, pp. <fpage>3613</fpage>&#x2013;<lpage>3626</lpage>, <year>2014</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2013.2274101</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H. B.</given-names> <surname>Zai</surname></string-name>, <string-name><given-names>L.</given-names> <surname>He</surname></string-name>, and <string-name><given-names>Y. F.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Target tracking method of transmission line insulator based on multi feature fusion and adaptive scale filter</article-title>,&#x201D; in <conf-name>Proc. ACPEE</conf-name>, <publisher-loc>Chengdu, China</publisher-loc>, <year>2020</year>, pp. <fpage>4</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. G.</given-names> <surname>Yin</surname></string-name>, <string-name><given-names>Y. F.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>Z. X.</given-names> <surname>Gong</surname></string-name>, <string-name><given-names>Y. C.</given-names> <surname>Jiang</surname></string-name>, and <string-name><given-names>J. G.</given-names> <surname>Yao</surname></string-name></person-group>, &#x201C;<article-title>Edge detection of high-voltage porcelain insulators in infrared image using dual parity morphological gradients</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, pp. <fpage>32728</fpage>&#x2013;<lpage>32734</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2019.2900658</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B. F.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>D. L.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Cong</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xia</surname></string-name>, and <string-name><given-names>Y. D.</given-names> <surname>Tang</surname></string-name></person-group>, &#x201C;<article-title>A method of insulator detection from video sequence</article-title>,&#x201D; in <conf-name>Proc. 2012 Fourth Int. Symp. Eng. Educ.</conf-name>, <publisher-loc>Shanghai, China</publisher-loc>, <year>2012</year>, pp. <fpage>386</fpage>&#x2013;<lpage>389</lpage>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. Y.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>Y. J.</given-names> <surname>Zhai</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Dong</surname></string-name> and <string-name><given-names>Y. T.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Self-shattering defect detection of glass insulators based on spatial features</article-title>,&#x201D; <source>Energies</source>, vol. <volume>12</volume>, no. <issue>3</issue>, pp. <fpage>543</fpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.3390/en12030543</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Donahue</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Darrell</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Malik</surname></string-name></person-group>, &#x201C;<article-title>Rich feature hierarchies for accurate object detection and semantic segmentation</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <publisher-loc>Columbus, OH, USA</publisher-loc>, <year>2014</year>, pp. <fpage>580</fpage>&#x2013;<lpage>587</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name></person-group>, &#x201C;<article-title>Fast R- CNN</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <publisher-loc>Boston, MA, USA</publisher-loc>, <year>2015</year>, pp. <fpage>1440</fpage>&#x2013;<lpage>1448</lpage>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. Q.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>K. M.</given-names> <surname>He</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Faster R-CNN towards real-time object detection with region proposal networks</article-title>,&#x201D; <source>IEEE Trans. Pattern Anal. Mach. Intell.</source>, vol. <volume>39</volume>, no. <issue>6</issue>, pp. <fpage>1137</fpage>&#x2013;<lpage>1149</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2577031</pub-id>; <pub-id pub-id-type="pmid">27295650</pub-id></mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J. P.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>H. L.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J. Y.</given-names> <surname>Wei</surname></string-name>, <string-name><given-names>Y. J.</given-names> <surname>Wei</surname></string-name> and <string-name><given-names>M. S.</given-names> <surname>Qin</surname></string-name></person-group>, &#x201C;<article-title>Insulator defect detection based on improved Faster R-CNN</article-title>,&#x201D; in <conf-name>Proc. AEEES</conf-name>, <publisher-loc>Chengdu, China</publisher-loc>, <year>2022</year>, pp. <fpage>541</fpage>&#x2013;<lpage>546</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Divvala</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>You only look once: Unified, real-time object detection</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <publisher-loc>Las Vegas, NV, USA</publisher-loc>, <year>2016</year>, pp. <fpage>779</fpage>&#x2013;<lpage>788</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Bochkovskiy</surname></string-name>, <string-name><given-names>C. Y.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>H. Y. M.</given-names> <surname>Liao</surname></string-name></person-group>, &#x201C;<article-title>YOLOv4: Optimal speed and accuracy of object detection</article-title>,&#x201D; <comment>arXiv:2004.10934</comment>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Jocher</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>ultralytics/yolov5:v6.0-YOLOv5n &#x2018;Nano&#x2019; models, Roboflow integration, TensorFlow export, OpenCV DNN support</article-title>,&#x201D; <source>Zenodo</source>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C. Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Bochkovskiy</surname></string-name>, and <string-name><given-names>H. Y. M.</given-names> <surname>Liao</surname></string-name></person-group>, &#x201C;<article-title>YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <publisher-loc>Vancouver, Canada</publisher-loc>, <year>2023</year>, pp. <fpage>7464</fpage>&#x2013;<lpage>7475</lpage>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>SSD: Single shot multibox detector</article-title>,&#x201D; in <conf-name>Proc. ECCV</conf-name>, <publisher-loc>Cham, Berlin</publisher-loc>, <year>2016</year>, pp. <fpage>21</fpage>&#x2013;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Hao</surname></string-name>, <string-name><given-names>G. K.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>Z. S.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y. L.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>C. Q.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>An Insulator defect detection model in aerial images based on multiscale feature pyramid network</article-title>,&#x201D; <source>IEEE Trans. Instrum. Meas.</source>, vol. <volume>71</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TIM.2022.3200861</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>He</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>An insulator self-blast detection method based on YOLOv4 with aerial images</article-title>,&#x201D; <source>Energy Rep.</source>, vol. <volume>8</volume>, no. <issue>2018</issue>, pp. <fpage>448</fpage>&#x2013;<lpage>454</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.egyr.2021.11.115</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Ding</surname></string-name>, <string-name><given-names>H. N.</given-names> <surname>Cao</surname></string-name>, <string-name><given-names>X. L.</given-names> <surname>Ding</surname></string-name>, and <string-name><given-names>C. H.</given-names> <surname>An</surname></string-name></person-group>, &#x201C;<article-title>High accuracy real-time insulator string defect detection method based on improved YOLOv5</article-title>,&#x201D; <source>Front. Energy Res.</source>, vol. <volume>10</volume>, pp. <fpage>928164</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.3389/fenrg.2022.928164</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C. Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>H. Y. Mark</given-names> <surname>Liao</surname></string-name>, <string-name><given-names>Y. H.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>P. Y.</given-names> <surname>Chen</surname></string-name>, and <string-name><given-names>J. W.</given-names> <surname>Hsieh</surname></string-name></person-group>, &#x201C;<article-title>CSPNet: A new backbone that can enhance learning capability of CNN</article-title>,&#x201D; in <conf-name>Proc. CVPRW</conf-name>, <publisher-loc>Seattle, WA, USA</publisher-loc>, <year>2020</year>, pp. <fpage>1571</fpage>&#x2013;<lpage>1580</lpage>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. L.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Z. J.</given-names> <surname>Fu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, and <string-name><given-names>F.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>A method for power lines insulator defect detection with attention feedback and double spatial pyramid</article-title>,&#x201D; <source>Electr. Power Syst. Res.</source>, vol. <volume>218</volume>, no. <issue>99</issue>, pp. <fpage>109175</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.epsr.2023.109175</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L. R.</given-names> <surname>Li</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Insulator defect detection based on multi-scale feature coding and dual attention fusion</article-title>,&#x201D; <source>Advances in Laser and Optoelectronics</source>, vol. <volume>59</volume>, no. <issue>24</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Xie</surname></string-name>, <string-name><given-names>Y. W.</given-names> <surname>Du</surname></string-name>, <string-name><given-names>Z. J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>T. Y.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Mou</surname></string-name></person-group>, &#x201C;<article-title>Visible light insulator defect detection algorithm based on lightweight improved YOLOv5s</article-title>,&#x201D; <year>2022</year>. <comment>Accessed: 19 Nov. 2022</comment>. [Online]. Available: <pub-id pub-id-type="doi">10.13335/j.10003673.pst.2022.1438</pub-id></mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Vaswani</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Attention is all you need</article-title>,&#x201D; in <conf-name>Proc. 31st Int. Conf. on Neural Information Processing Systems</conf-name>, <publisher-loc>Long Beach, CA, USA</publisher-loc>, <year>2017</year>, pp. <fpage>6000</fpage>&#x2013;<lpage>6010</lpage>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. Z.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>J. F.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>B. Y.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Defect detection model of Angle tower bolts based on Transformer and attention mechanism</article-title>,&#x201D; <source>Computer System Application</source>, vol. <volume>32</volume>, no. <issue>4</issue>, pp. <fpage>248</fpage>&#x2013;<lpage>254</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Xue</surname></string-name>, <string-name><given-names>E. Y.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>W. T.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S. F.</given-names> <surname>Lin</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Mi</surname></string-name></person-group>, &#x201C;<article-title>Foreign body detection in transmission line channel based on the fusion of window self-attention network and YOLOv5</article-title>,&#x201D; <source>Journal of Shanghai Jiaotong University</source>, <year>2023</year>. <comment>Accessed: 30 Nov. 2023</comment>. [Online]. Available: <pub-id pub-id-type="doi">10.16183/j.cnki.jsjtu.2023.301</pub-id></mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>X. Z.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>W. J.</given-names> <surname>Su</surname></string-name>, <string-name><given-names>L. W.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X. G.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>J. F.</given-names> <surname>Dai</surname></string-name></person-group>, &#x201C;<article-title>Deformable DETR: Deformable transformers for end-to-end object detection</article-title>,&#x201D; <comment>arXiv:2010.04159</comment>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Huang</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Learning spatial fusion for single-shot object detection</article-title>,&#x201D; <comment>arXiv:1911.09516</comment>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. J. B.</given-names> <surname>Reddy</surname></string-name>, <string-name><given-names>B. K.</given-names> <surname>Chandra</surname></string-name>, and <string-name><given-names>D. K.</given-names> <surname>Mohanta</surname></string-name></person-group>, &#x201C;<article-title>A DOST based approach for the condition monitoring of 11 kV distribution line insulators</article-title>,&#x201D; <source>IEEE Trans. Dielectr. Electr. Insul.</source>, vol. <volume>18</volume>, no. <issue>2</issue>, pp. <fpage>588</fpage>&#x2013;<lpage>595</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>V. S.</given-names> <surname>Andrel</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Chaves</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Felix</surname></string-name></person-group>, &#x201C;<article-title>Unifying Public datasets for insulator detection and fault classification in electrical power lines</article-title>,&#x201D; <year>2020</year>. <comment>Accessed: 12 Mar. 2020</comment>. [Online]. Available: <ext-link ext-link-type="uri" xlink:href="https://github.com/heitorcfelix/public-insulator-datasets">https://github.com/heitorcfelix/public-insulator-datasets</ext-link></mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Jocher</surname></string-name></person-group>, &#x201C;<article-title>Explaining the labels_correlogram.jpg</article-title>,&#x201D; <year>2023</year>. <comment>Accessed: 12 Oct. 2021</comment>. [Online]. Available: <ext-link ext-link-type="uri" xlink:href="https://github.com/ultralytics/yolov5/issues/5138">https://github.com/ultralytics/yolov5/issues/5138</ext-link></mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z. D.</given-names> <surname>Zhang</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>FINet: An insulator dataset and detection benchmark based on synthetic fog and improved YOLOv5</article-title>,&#x201D; <source>IEEE Trans. Instrum. Meas.</source>, vol. <volume>71</volume>, no. <issue>8</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TIM.2022.3194909</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>