<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">IASC</journal-id>
<journal-id journal-id-type="nlm-ta">IASC</journal-id>
<journal-id journal-id-type="publisher-id">IASC</journal-id>
<journal-title-group>
<journal-title>Intelligent Automation &#x0026; Soft Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2326-005X</issn>
<issn pub-type="ppub">1079-8587</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">36897</article-id>
<article-id pub-id-type="doi">10.32604/iasc.2024.036897</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Multi-Layer Feature Extraction with Deformable Convolution for Fabric Defect Detection</article-title>
<alt-title alt-title-type="left-running-head">Multi-Layer Feature Extraction with Deformable Convolution for Fabric Defect Detection</alt-title>
<alt-title alt-title-type="right-running-head">Multi-Layer Feature Extraction with Deformable Convolution for Fabric Defect Detection</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Jiang</surname><given-names>Jielin</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref><xref ref-type="aff" rid="aff-3">3</xref><xref ref-type="aff" rid="aff-4">4</xref><email>jiangjielin2008@163.com</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Cui</surname><given-names>Chao</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Xu</surname><given-names>Xiaolong</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref><xref ref-type="aff" rid="aff-3">3</xref><xref ref-type="aff" rid="aff-4">4</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Cui</surname><given-names>Yan</given-names></name><xref ref-type="aff" rid="aff-5">5</xref></contrib>
<aff id="aff-1"><label>1</label><institution>School of Computer Science, Nanjing University of Information Science and Technology</institution>, <addr-line>Nanjing, 210044</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>State Key Lab. for Novel Software Technology, Nanjing University</institution>, <addr-line>Nanjing, 210023</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>Engineering Research Center of Digital Forensics, Ministry of Education, Nanjing University of Information Science and Technology</institution>, <addr-line>Nanjing, 210044</addr-line>, <country>China</country></aff>
<aff id="aff-4"><label>4</label><institution>Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science and Technology</institution>, <addr-line>Nanjing, 210044</addr-line>, <country>China</country></aff>
<aff id="aff-5"><label>5</label><institution>College of Mathematics and Information Science, Nanjing Normal University of Special Education</institution>, <addr-line>Nanjing, 210038</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Jielin Jiang. Email: <email>jiangjielin2008@163.com</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>06</day><month>09</month><year>2024</year></pub-date>
<volume>39</volume>
<issue>4</issue>
<fpage>725</fpage>
<lpage>744</lpage>
<history>
<date date-type="received">
<day>15</day>
<month>10</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>06</day>
<month>12</month>
<year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 The Authors.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_IASC_36897.pdf"></self-uri>
<abstract>
<p>In the textile industry, the presence of defects on the surface of fabric is an essential factor in determining fabric quality. Therefore, identifying fabric defects forms a crucial part of the fabric production process. Traditional fabric defect detection algorithms can only detect specific materials and specific fabric defect types; in addition, their detection efficiency is low, and their detection results are relatively poor. Deep learning-based methods have many advantages in the field of fabric defect detection, however, such methods are less effective in identifying multi-scale fabric defects and defects with complex shapes. Therefore, we propose an effective algorithm, namely multi-layer feature extraction combined with deformable convolution (MFDC), for fabric defect detection. In MFDC, multi-layer feature extraction is used to fuse the underlying location features with high-level classification features through a horizontally connected top-down architecture to improve the detection of multi-scale fabric defects. On this basis, a deformable convolution is added to solve the problem of the algorithm&#x2019;s weak detection ability of irregularly shaped fabric defects. In this approach, Roi Align and Cascade-RCNN are integrated to enhance the adaptability of the algorithm in materials with complex patterned backgrounds. The experimental results show that the MFDC algorithm can achieve good detection results for both multi-scale fabric defects and defects with complex shapes, at the expense of a small increase in detection time.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Fabric defect detection</kwd>
<kwd>multi-layer features</kwd>
<kwd>deformable convolution</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Science Foundation of China</funding-source>
<award-id>62001236</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Natural Science Foundation of the Jiangsu Higher Education Institutions</funding-source>
<award-id>20KJA520003</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Fabric is closely related to human life and industrial production. Automated textile equipment has greatly improved the production efficiency of fabric as science and technology have rapidly developed. Broadly, defects will be introduced due to the production environment, equipment faults, etc. Current statistics indicate that the price of fabrics with defects may need to be discounted by 50% [<xref ref-type="bibr" rid="ref-1">1</xref>]. Therefore, in order to bring improved economic benefits to fabric manufacturers, defect detection is an essential step to ensure fabric quality. Defect detection is traditionally performed by a skilled fabric inspector who locates and classifies the defects [<xref ref-type="bibr" rid="ref-2">2</xref>]. The efficiency of a skilled worker working by eye can evaluate fabrics with a width of about 1.8 m at a speed of around 12 m per minute. However, it is difficult to guarantee the accuracy of manual detection due to the influence of objective factors such as variable light and fabric speed and subjective factors such as worker fatigue and experience level [<xref ref-type="bibr" rid="ref-3">3</xref>]. In addition, it is highly time-consuming to train an operator to identify complex fabric defects, resulting in low production efficiency.</p>
<p>In recent decades, many solutions have been proposed to detect defects in specific materials including plain fabric, white gauze and single striped fabric, such as Fourier transform [<xref ref-type="bibr" rid="ref-4">4</xref>], Gabor filter [<xref ref-type="bibr" rid="ref-5">5</xref>] and Wigner distribution. These methods usually require different parameters to be set for different defects in different fabric materials. They can only detect a single fabric material or a single defect type and thus have relative limitations in terms of their defect detection and classification.</p>
<p>With the boom of artificial intelligence research and the arrival of big data era, deep learning methods are widely used in edge computing [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-8">8</xref>], data analysis [<xref ref-type="bibr" rid="ref-9">9</xref>&#x2013;<xref ref-type="bibr" rid="ref-11">11</xref>], image recognition [<xref ref-type="bibr" rid="ref-12">12</xref>], object detection [<xref ref-type="bibr" rid="ref-13">13</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>], image denoizing [<xref ref-type="bibr" rid="ref-19">19</xref>] and other fields. The principle of deep learning is to establish the neural network structure by simulating the operation mode of the human brain. In terms of fabrics, human vision will focus first on obvious features such as the pattern, color, shape and edge contour of the fabric, which more easily attract people&#x2019;s attention. In deep learning, through the hierarchical description of fabric defect features, different convolution templates are used to gradually extract more complex visual shapes, and the fabric&#x2019;s texture information can be expressed through the combination of features of these data. Overall, deep learning methods can automatically extract features from input images effectively without the need for complex hand-designed features. In recent years, deep learning has been gradually applied in the field of fabric defect detection, using methods such as region-based convolutional neural networks (RCNN) [<xref ref-type="bibr" rid="ref-20">20</xref>], which can achieve good detection results for common fabric defects. However, for multi-scale and complex shape defects, these methods usually do not achieve satisfactory detection results, thus resulting in low detection accuracy. Given the shortcomings of current fabric defect detection methods, this paper proposes a multi-layer feature extraction approach combined with deformable convolution (MFDC) for fabric defect detection. The main contributions of the study are as follows:
<list list-type="order">
<list-item>
<p>This paper proposes a generic fabric defect detection method based on deep learning. By integrating Roi Align and Cascade-RCNN approaches, the adaptability of the defect detection algorithm in patterns with complex backgrounds is enhanced by increasing the intersection over union (IOU) threshold, with a clear reduction in close false positions observed.</p></list-item>
<list-item>
<p>Multi-layer feature extraction is applied to improve the detection accuracy of multi-scale fabric defects by combining the semantic features of the upper and lower layers. In addition, visual information from the bottom layer fabric defect features is used to improve the algorithm&#x2019;s detection ability of small defects.</p></list-item>
<list-item>
<p>Deformable convolution is used to enhance the generalization ability of the algorithm to handle complex shape defects and more accurately extract the characteristics of fabric defects. This approach improves the detection accuracy of fabrics with complex shapes and extreme aspect ratio fabric defects.</p></list-item>
</list></p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Considerable progress has been achieved in fabric defect detection following decades of research. Traditional methods can only identify whether the fabric has defects but cannot accurately determine the location of the fabric defects. Additionally, most traditional fabric defect detection algorithms can only deal with plain fabric without background patterns or large fabric defects. Abouelela et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] used median, mean, variance and other features to detect defects based on texture segmentation; their proposed method can meet the requirements of real-time detection, however, it is ineffective for images containing irregular textures. Hu et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] combined wavelet analysis with a Fourier transform for fabric defect detection. The defects identified by the Fourier transform are then denoized by wavelet shrinkage to achieve the purpose of unsupervised detection. Karlekar et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] proposed a wavelet filtering method by combining morphology and a wavelet transform. This method was used to model fabric texture and fabric defects and detect defects in horizontal, vertical and diagonal lines. Zhu et al. [<xref ref-type="bibr" rid="ref-24">24</xref>] obtained the size of the image detection window through the autocorrelation function and calculated the gray level co-occurrence matrix between the image template and the fabric image; in their approach, the appropriate threshold was specified manually to achieve fabric defect detection. By extracting texture information from the gray level co-occurrence matrix, Thakare et al. [<xref ref-type="bibr" rid="ref-25">25</xref>] proposed an improved gray level co-occurrence matrix detection method that combines texture information with self-organizing mapping as the basis for fabric defect classification.</p>
<p>Traditional defect detection methods have high requirements for the regularity of fabric texture and background&#x2014;complex fabric textures will lead to poor defect detection performance. In recent years, convolutional neural networks (CNNs) have been widely used in defect detection, and the approaches used can be divided into two categories, namely, one-stage algorithms, represented by SSD [<xref ref-type="bibr" rid="ref-26">26</xref>] and YOLO [<xref ref-type="bibr" rid="ref-27">27</xref>], and two-stage algorithms based on candidate regions represented by Faster RCNN. Li et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] proposed a focal loss [<xref ref-type="bibr" rid="ref-29">29</xref>] method based on ResNet50 to solve the problem of poor detection effects caused by uneven fabric image samples. Although these two methods are feasible, they also have some limitations, such as requiring extensive computing resources, resulting in slow recognition speed and low recognition accuracy. Zhao [<xref ref-type="bibr" rid="ref-30">30</xref>] proposed an improved non-maximum suppression algorithm, which considers the similarity between defect types in the detection process.</p>
<p>Compared with general defect detection, fabric defect detection has unique characteristics. The size of defects within the same fabric type of fabric can vary markedly. Some defects occupy more than half of the image while others occupy only a few pixels, leading to poor detection effects in two-stage defect detection algorithms such as Faster-RCNN [<xref ref-type="bibr" rid="ref-31">31</xref>]. YOLO [<xref ref-type="bibr" rid="ref-27">27</xref>] is a typical one-stage algorithm with high detection speed but low detection accuracy; therefore, researchers have proposed many improved algorithms based on YOLO, such as using PAN-Net [<xref ref-type="bibr" rid="ref-32">32</xref>] or SPP-net [<xref ref-type="bibr" rid="ref-33">33</xref>] as the network backbone model, using a MISH activation function [<xref ref-type="bibr" rid="ref-34">34</xref>], and adding a K-mean [<xref ref-type="bibr" rid="ref-35">35</xref>] clustering algorithm to improve the detection performance. However, despite these refinements, the detection effects of YOLO and other one-stage defect detection algorithms are still poor in terms of handling small-sized defects.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Multi-Layer Feature Extraction Combined with Deformable Convolution</title>
<p>At present, although researchers have proposed many effective fabric defect detection algorithms, all such approaches have only weak ability to detect multi-scale defects and fabric defects with complex shapes. To solve this problem, in this study, multi-layer feature extraction combined with deformable convolution (MFDC) is proposed for fabric defect detection. First, the fabric image after data enhancement is passed through the ResNet50 backbone network using multi-layer feature extraction technology [<xref ref-type="bibr" rid="ref-36">36</xref>]; the upper and lower layer defect feature semantics are then combined to improve the detection accuracy of multi-scale fabric defects. Second, fusion of the deformable convolution module [<xref ref-type="bibr" rid="ref-37">37</xref>] can effectively extract features of complex-shaped defects, improving the extraction ability and detection accuracy of complex-shaped and extreme aspect ratio fabric defects. Finally, Roi Align and Cascade-RCNN [<xref ref-type="bibr" rid="ref-38">38</xref>] are integrated; by continuously increasing the IOU threshold strategy, this paper improves the Cascade-RCNN network model, thereby reducing close false positions and improving defect detection and location accuracy. The architecture of the proposed MFDC can be seen in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The architecture of the proposed MFDC</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-1.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>Multi-Layer Feature Extraction</title>
<p>Due to improper production equipment, manual operation and other related factors, various defects will be formed during the fabric production process. The size and shape of these fabric defects can differ markedly. For some small defects, the differences in pixel values between the defect and the background area are small, therefore, such defects are often not detected. Traditional CNN-based methods use featurized image pyramid or pyramidal feature hierarchy approaches [<xref ref-type="bibr" rid="ref-39">39</xref>] to solve this problem. The featurized image pyramid approach uses a set of multiple images with different resolutions generated from the same image. The image is then continuously hierarchically down-sampled to generate different features and predict them. Finally, the prediction results for all the feature sizes are counted. Although this method solves the problem of multi-scale defects through multi-scale feature extraction and can improve the detection ability of small-sized defects effectively, it also greatly increases memory usage and model computation requirements, thus increasing the difficulty of training the network. The pyramidal feature hierarchy method directly detects fabric defects on feature maps with different resolutions. Although this method will not add much computational overhead, it nonetheless causes some issues. For example, the semantic information of the underlying features is insufficient: although small defects can be detected, they are often wrongly classified. In addition, high-level feature image resolution is not enough to detect small defects. To solve these two problems, multi-layer feature extraction is applied in this work to improve the detection performance, the details of which are shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Multi-layer feature extraction networks</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-2.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, the size of the input image is 512 &#x00D7; 512. The first part of the multi-layer feature extraction is the backbone of the forward-propagating CNN, which calculates a feature hierarchy with a scaling step size of 2. The size of the feature map is then gradually scaled through convolution and pooling. In this process, the feature maps are arranged from large to small according to their resolution, forming a pyramid structure. Several adjacent layers may output feature maps of the same scale, so we put these feature maps in the same stage, and the last layer of each stage contains the most obvious features. We take the output of this feature layer as part of the feature extraction of fabric defects. The output of each block can be marked as {C1,C2,C3,C4,C5}, in order.</p>
<p>The second part of the proposed approach is a semantic fusion structure. Feature maps with higher resolution and stronger semantics can be obtained by the nearest neighbor upsampling method, and the top output C5 (<inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mtext>size</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>16</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>16</mml:mn></mml:math></inline-formula>); on the left in <xref ref-type="fig" rid="fig-2">Fig. 2</xref> can be obtained by lateral connection. After the number of channels is adjusted through 1 &#x00D7; 1 convolution, the obtained result is the top-level of the semantic fusion structure, labeled as P5 (<inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mtext>size</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>7</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn></mml:math></inline-formula>). The output C4 (<inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mtext>size</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn></mml:math></inline-formula>) is selected through lateral connection; M5 is then upsampled twice by the nearest neighbor interpolation method, and C4 is added to the upsampled results, with the result marked as P4 (<inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mtext>size</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>32</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>32</mml:mn></mml:math></inline-formula>). By analogy, P3, P2 and P1 can also be obtained. Since C1 is only obtained by one convolution of the original image and contains almost no semantic information, P1 does not need to be calculated. Since the aliasing effect generated in the upsampling process will affect the subsequent prediction, we perform a 3 &#x00D7; 3 convolution of all the feature maps obtained by upsampling to eliminate the influence of the aliasing effect and generate the final feature map. We use the top-down path and horizontal connection to combine low-resolution and semantic strong defect feature information with high-resolution and semantic weak defect location information. The resulting feature pyramid has rich semantics at all levels, thus improving the detection accuracy of multi-scale fabric defects, especially for small-sized defects.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Deformable Convolution Module</title>
<p>Generally, the shape of fabric defects varies markedly as a result of different fabric materials and different manufacturing equipment, and fabric defects with complex shapes are commonly classified incorrectly. Traditional algorithms usually employ two strategies to solve this problem. One is to expand the number of defect samples to enhance the model&#x2019;s ability to adapt to the scale transformation of fabric defects; the other is to propose feature-based algorithms for specific defect types. However, both methods have disadvantages. The first approach has low generalization ability due to the limitations of the input fabric defect samples and thus cannot be generalized to general defect detection. The second method type has difficulty in dealing with overly complex fabric defects. The convolution units of traditional defect detection algorithms sample fixed positions of the feature map, however, different positions may occur due to different defect scales or deformed objects, thus their detection performance is poor. In this paper, in order to accurately locate the defects, a deformable convolution method is applied to adapt the scale and receptive field size, thereby improving the algorithm&#x2019;s ability to model complex defects.</p>
<p>As shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, the deformable convolution [<xref ref-type="bibr" rid="ref-38">38</xref>] adds a displacement to the normal sampling coordinates of fabric defects to make the receptive field more representative of the object&#x2019;s actual shape. In the traditional defect detection algorithm, the output of <italic>y</italic> (<italic>p</italic><sub><italic>0</italic></sub>) for each location <italic>p</italic><sub>0</sub> on the feature map <italic>y</italic> can be expressed as:</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mi>y</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mi>w</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where R is a convolution kernel operator (e.g., 3 &#x00D7; 3), and <italic>p</italic><sub>n</sub> is the enumeration of the positions in R. In deformable convolution, R is obtained with offsets <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>&#x03B4;</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where <italic>n</italic> &#x003D; 1,2,3,...,N. <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref> can then be written as:
<disp-formula id="eqn-2">
<label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:mi>y</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mi>w</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Deformable convolution</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-3.tif"/>
</fig>
<p>The fixed-scale convolution is converted into irregular convolution by this offset, and the feature sampling is carried out in the irregular and offset position <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>Since the offset is usually a decimal, non-integer coordinates cannot be used in such discrete data, thus the eigenvalues in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> need to be completed by bilinear interpolation so that x(p) can be expressed by:
<disp-formula id="ueqn-3">
<mml:math id="mml-ueqn-3" display="block"><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext>q</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mtext>G</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>q</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>q</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="ueqn-4">
<mml:math id="mml-ueqn-4" display="block"><mml:mspace width="2em" /><mml:mspace width="thinmathspace" /><mml:mo>=</mml:mo><mml:mrow><mml:mtext>G</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>q</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>g</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mtext>g</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-3">
<label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mspace width="2em" /><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mrow><mml:mtext>q</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>q</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03B4;</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents any position. The coordinate q corresponding to the receptive field before the deformable convolution is selected from all the positions in the feature map <italic>x</italic>, and G represents the two-dimensional bilinear interpolation kernel function.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>The Improved Cascade-RCNN</title>
<p>In CNN-based fabric defect detection algorithms, an IOU threshold is required to define positive or negative samples. If a lower IOU threshold is set for training, more noise will be generated. However, if a higher IOU threshold is used for training, the number of positive training examples will decrease drastically, commonly resulting in training model overfitting. In addition, detectors trained using a single IOU threshold often do not produce optimal results when tested with other IOU thresholds. The traditional method to solve these two problems is iterative bounding box regression [<xref ref-type="bibr" rid="ref-39">39</xref>], which states that a single box regression is insufficient to generate accurate positional information, thus, multiple iterations are required to fine-tune the bounding box, as shown in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.
<disp-formula id="eqn-4">
<label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>f</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>f</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>After each iteration, the distribution of the bounding box changes to a certain extent, however, the algorithm&#x2019;s classifier is trained based on the initial bounding box, and a single IOU threshold will generate more outliers. A single regressor cannot achieve good results at all IOU thresholds; therefore, proposals of differing quality will correspond to detection branches with different abilities. Detection branches trained under different IOU thresholds can achieve superior detection results. To solve this problem, this paper proposes an improved Cascade-RCNN approach, as shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>An improved Cascade-RCNN. Labels C1, C2 and C3 are categories, and B1, B2 and B3 are BBoxes</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-4.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, the regression was divided into three stages with three regressors (regressor1, regressor2 and regressor3). As shown in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>, the three detectors are trained by increasing the IOU thresholds gradually, with the different stages corresponding to different IOU thresholds. This method can help to eliminate outliers and close false positives and adapt to the new proposal distribution. The output of the previous detector is used as the input for the later higher-quality detector. In this way, we can ensure that there are sufficient positive samples on each branch to reduce overfitting. The regression function of the Cascade-RCNN can be written as:
<disp-formula id="eqn-5">
<label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:msup><mml:mi>f</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>b</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x22EF;</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>b</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>, <italic>T</italic> is the number of cascade stages, b is the data distribution of the corresponding stage, each branch <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is optimized by the training data <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> on each branch, cascaded regression is a resampling procedure that changes the distribution of hypotheses to be processed by the different stages. The multiple specialized regressors <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> are optimized for the resampled distributions of the different stages. This opposes to the single f, these differences enable more precise localization than iterative BBox, with no further human engineering. And the loss function can be expressed as:
<disp-formula id="eqn-6">
<label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mrow><mml:mtext>g</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>l</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn><mml:mo>]</mml:mo></mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>g</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <italic>b</italic><sup><italic>t</italic></sup> is derived from the output of <italic>b</italic><sup>1</sup> after all branch operations and can be expressed as <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mi>b</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, g is the ground truth value of data <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <italic>&#x03B2;</italic> &#x003D; 1 is the trade-off coefficient, [.] is the indicator function, and <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the label of data <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> at stage <italic>t</italic> for a given threshold. At each stage <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula>, the R-CNN includes a classifier <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and a regressor <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> optimized for IOU threshold <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003E;</mml:mo><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. This guarantees a sequence of effectively trained detectors of increaseing quality. By repeating the same cascade procedure, the quality of the hypotheses is sequentially improved, higher quality detectors only need to operate under higher quality hypotheses, this enables high quality object detection.</p>
<p>In Cascade-RCNN, the role of ROI Pooling is to pool the corresponding area into a fixed-size feature map within the overall feature map based on the position coordinates of the preselected box, with subsequent classification and regression operations performed within the preselected box. The position coordinates of the preselected box obtained by regression are usually floating point numbers, and the pooled feature map requires a fixed size. Therefore, there are two quantization processes for ROI Pooling; the first is to quantify the boundaries of the candidate box into integer coordinate values, and the second is to divide the quantized boundary area into M &#x00D7; M units on average and quantify the coordinates of each unit. After two quantification stages, there is a certain deviation between the candidate box and the initial position of the regression process. When the fabric defect size is small, this deviation will lead to lower detection accuracy. To address this problem, ROI Align is used to replace ROI Pooling, and bilinear interpolation is used to obtain pixel values whose coordinates are floating point numbers. The candidate area is divided into M &#x00D7; M units such that the floating point number boundary and each unit coordinate are not quantized. Four coordinate positions are fixed in each cell&#x2014;the values of these four positions are calculated by bilinear interpolation and the maximum pooling operation is then performed. The backpropagation for ROI Pooling can be expressed as:
<disp-formula id="eqn-7">
<label>(7)</label>
<mml:math id="mml-eqn-7" display="block"><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:munder><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">[</mml:mo><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the pixel on the feature map before the pooling operation, <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the source of the pixel value of <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msup><mml:mi>j</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> pixel on the <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> area after the pooling operation. The backpropagation for ROI Align can be written as:
<disp-formula id="eqn-8">
<label>(8)</label>
<mml:math id="mml-eqn-8" display="block"><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:munder><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>[</mml:mo><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mn>1</mml:mn><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mi>h</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mi>w</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>r</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where <italic>d</italic>(.) represents the distance between two points, and <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:math></inline-formula> are the difference between the abscissa and ordinate of <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>. The advantages of ROI Align are its ability to solve the misalignment problem caused by the two quantization processes of ROI Pooling and its capacity to enhance the detection accuracy of small-sized fabric defects.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiment</title>
<sec id="s4_1">
<label>4.1</label>
<title>Experimental Datasets</title>
<p>To verify the effectiveness of the proposed MFDC method, this paper uses two public datasets, namely the Ali Tianchi dataset (2019 Ali Tianchi Guangdong Industrial Intelligent Manufacturing Innovation Competition dataset) and ZJU-Leaper [<xref ref-type="bibr" rid="ref-40">40</xref>], for experiments. <xref ref-type="fig" rid="fig-5">Figs. 5</xref> and <xref ref-type="fig" rid="fig-6">6</xref> show the detailed statistics of the ZJU-Leaper and Ali Tianchi datasets, respectively.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Data analysis (ZJU-Leaper dataset)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-5a.tif"/><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-5b.tif"/>
</fig><fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Data analysis (Ali Tianchi dataset)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-6a.tif"/><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-6b.tif"/>
</fig>
<p>ZJU-Leaper: Group4 of the ZJU-Leaper dataset released by Zhejiang University was used in the analysis. Group4 consists of three kinds of fabric materials, comprising 3721 defective images and 14,884 undefective images with an image resolution of 512 &#x00D7; 512. In our experiment, all data were randomly divided into a training set and a test set in a 4:1 ratio. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows the details of the ZJU-Leaper training set. <xref ref-type="fig" rid="fig-5">Fig. 5a</xref> shows the length&#x2013;width ratio of fabric defects, and <xref ref-type="fig" rid="fig-5">Fig. 5b</xref> shows the distribution of fabric defect area. The standard COCO dataset format is used in this paper, i.e., Small (s &#x003C; 32), Medium (32 &#x2264; s &#x003C; 96), and Large (96 &#x2264; s &#x003C; &#x002B;&#x221E;). The fabric defect area s can be expressed as:
<disp-formula id="eqn-9">
<label>(9)</label>
<mml:math id="mml-eqn-9" display="block"><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:msqrt><mml:mi>w</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>h</mml:mi></mml:msqrt></mml:math></disp-formula>where <italic>w</italic> and <italic>h</italic> represent the width and the height of the fabric defect, respectively. <xref ref-type="fig" rid="fig-5">Fig. 5c</xref> shows the number of defects in the fabric images, and <xref ref-type="fig" rid="fig-5">Fig. 5d</xref> is the height and width scatter plot of the fabric defect.</p>
<p>Ali Tianchi: This dataset was released by Ali company. In contrast to the ZJU-Leaper dataset, this dataset provides numerous high-resolution images with a resolution of 4096 &#x00D7; 1800. In addition, this dataset contains 15 fabric defect category types with more complex fabric background colors. In this paper, 3107 defective fabric images and 6000 non-defective fabric images were used, which were randomly divided into a training set and a testing set in a 4:1 ratio. The dataset statistics are shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<p><xref ref-type="fig" rid="fig-6">Fig. 6a</xref> shows the length&#x2013;width ratio of fabric defects, where the horizontal axis represents the length&#x2013;width ratio, and the vertical axis shows the number of images. As shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, there are numerous fabric defects with length&#x2013;width ratios greater than 10, which make fabric defect detection highly challenging.</p>
<p><xref ref-type="fig" rid="fig-6">Fig. 6b</xref> shows the area of the fabric defects, which is similar to that of the ZJU-Leaper dataset. <xref ref-type="fig" rid="fig-6">Fig. 6c</xref> shows the number of fabric defects, with up to 20 defects present in each image. <xref ref-type="fig" rid="fig-6">Fig. 6d</xref> shows the number of each category of 15 kinds of fabric defects, <xref ref-type="fig" rid="fig-6">Fig. 6e</xref> shows the number of images showing each fabric defect type, and <xref ref-type="fig" rid="fig-6">Fig. 6f</xref> shows a pie chart of the proportion of the various fabric defect types, illustrating that the proportion of different fabric defects varies greatly.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Experiments and Results</title>
<p>In the proposed MFDC model, ResNet50 is used as a backbone and pre-trained parameters in ImageNet are used for model initialization. The input images are randomly flipped and rotated for data enhancement. The learning rate of each image was set as 0.00125, the IOU thresholds were set as 0.5, 0.6, and 0.7, and the momentum factor and weight decay factor of optimization parameters were set as 0.9 and 0.0001, respectively. In this process, we hot-start the learning rate, which helps to slow down the overfitting phenomenon in the initial stages and keep the distribution stable. To accelerate the convergence speed of MFDC, the proposed network is pre-trained for a total of 20 epochs.</p>
<p>The IOU threshold is usually used in evaluating the performance of object detection models. When the overlap between the prediction box and the real box is greater than the IOU threshold value, the corresponding samples are called positive samples; otherwise, the samples are called negative samples. IOU can be written as:
<disp-formula id="eqn-10">
<label>(10)</label>
<mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:mtext>IOU</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>D</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x2229;</mml:mo><mml:mi>G</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>T</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x222A;</mml:mo><mml:mi>G</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>T</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Since the two open datasets used in the experiment include many fabric defect types, the use of visual object detection boxes alone cannot fully and objectively reflect the benefits of the MFDC model. Therefore, the common evaluation criteria of the COCO standard format data are introduced as evaluation indicators, including Precision, Recall, Accuracy (ACC) and mAP.
<disp-formula id="eqn-11">
<label>(11)</label>
<mml:math id="mml-eqn-11" display="block"><mml:mrow><mml:mtext>precision</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>L</mml:mi><mml:mi>L</mml:mi><mml:mi>D</mml:mi><mml:mi>e</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-12">
<label>(12)</label>
<mml:math id="mml-eqn-12" display="block"><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mi>A</mml:mi><mml:mi>L</mml:mi><mml:mi>L</mml:mi><mml:mi>G</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>T</mml:mi><mml:mi>r</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>P</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-13">
<label>(13)</label>
<mml:math id="mml-eqn-13" display="block"><mml:mrow><mml:mtext>ACC</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>where <italic>FP</italic> is the number of false positive samples, <italic>FN</italic> is the number of false negative samples, <italic>TP</italic> is the number of true positive samples, and <italic>TN</italic> is the number of true negative samples. In the fabric defect detection algorithm, these indexes cannot independently evaluate the detection performance; therefore, we introduce the average precision (AP) index, which can be written as:
<disp-formula id="eqn-14">
<label>(14)</label>
<mml:math id="mml-eqn-14" display="block"><mml:mrow><mml:mtext>AP</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>R</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mi>R</mml:mi></mml:math></disp-formula>where <italic>R</italic> stands for recall, <italic>P</italic> stands for precision, and <italic>AP</italic> is the curve integral of <italic>P</italic>(<italic>R</italic>). To more comprehensively verify the detection performance of the proposed MFDC, the average of <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mrow><mml:mtext>AP</mml:mtext></mml:mrow></mml:math></inline-formula> (mAP) is also used as the evaluation index.
<disp-formula id="eqn-15">
<label>(15)</label>
<mml:math id="mml-eqn-15" display="block"><mml:mrow><mml:mtext>mAP</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mi>A</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>To demonstrate the superiority of the proposed MFDC algorithm, this paper uses two common evaluation indicators (mAP and ACC) to estimate the performance of the MFDC. The proposed MFDC is compared with two advanced defect detection-based algorithms (Faster-RCNN and Cascade-RCNN). <xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows the training process of the three algorithms on the Ali Tianchi dataset. In <xref ref-type="fig" rid="fig-7">Fig. 7a</xref>, the vertical axis shows the mAP value, which ranges from 0 to 1, and the horizontal axis shows the training time; the total training process is 20 epochs. As shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, the average mAP of MFDC is 0.52, which is much higher than both Faster-RCNN and Cascade-RCNN. A similar conclusion can be drawn from <xref ref-type="fig" rid="fig-8">Fig. 8a</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Comparison of experimental results of Ali Tianchi</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-7.tif"/>
</fig><fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Comparison of experimental results of ZJU-Leaper</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-8.tif"/>
</fig>
<p>The ACC values range from 0 to 100. In <xref ref-type="fig" rid="fig-7">Fig. 7b</xref>, the vertical axis shows the ACC value and the horizontal axis shows the number of iterations in the training phase. In <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, the average ACC of the MFDC algorithm is 97; this algorithm thus achieves higher ACC values than those of Faster-RCNN and Cascade-RCNN. Similar conclusions can be obtained from <xref ref-type="fig" rid="fig-8">Fig. 8</xref>. As shown in <xref ref-type="fig" rid="fig-7">Figs. 7</xref> and <xref ref-type="fig" rid="fig-8">8</xref>, MFDC has a stronger defect detection ability.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Ablation Experiment</title>
<p>To further demonstrate the effectiveness of the proposed FAMDC algorithm, a series of ablation experiments were conducted using three algorithms, namely multi-layer feature extraction, deformable convolution module, and the improved Cascade-RCNN. Here, we report the mean of the five results. The evaluation of these ablation experiments was performed using the Ali Tianchi and ZJU-Leaper datasets.</p>
<p>As shown in <xref ref-type="table" rid="table-1">Table 1</xref>, on the Ali Tianchi dataset, we use Cascade-RCNN as a benchmark for comparison. For the Cascade-RCNN&#x002B;A approach, multi-layer feature extraction improves the detection capability of multi-scale fabric defects by combining the semantic features of the upper and lower layers, especially in terms of the detection capability of small defects. The mAP(S) and mAP indices are improved by 0.111 and 0.061, respectively. For Cascade-RCNN&#x002B;A&#x002B;B, deformable convolution is applied to enhance the generalization ability when dealing with complex shape defects, and the mAP is increased by 0.033. For MFDC, the improved Cascade-RCNN is added to enhance the adaptability of the defect detection algorithm in complex pattern backgrounds. In this instance, the mAP(S) and mAP indices are improved by 0.036 and 0.024, respectively. As shown in <xref ref-type="table" rid="table-1">Table 1</xref>, similar conclusions can be drawn for the ZJU-Leaper dataset.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Comparison of test results of three algorithms</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th></th>
<th align="center" colspan="4">Ali Tianchi dataset</th>
<th align="center" colspan="4">ZJU-Leaper dataset</th>
</tr>
<tr>
<td></td>
<td>ACC</td>
<td>mAP(S)</td>
<td>mAP</td>
<td>FPS</td>
<td>ACC</td>
<td>mAP(S)</td>
<td>mAP</td>
<td>FPS</td>
</tr>
</thead>
<tbody>
<tr>
<td>Faster-RCNN</td>
<td>85.8</td>
<td>0.059</td>
<td>0.351</td>
<td><bold>9.08</bold></td>
<td>85.7</td>
<td>0.084</td>
<td>0.661</td>
<td><bold>30.3</bold></td>
</tr>
<tr>
<td>Cascade-RCNN</td>
<td>87.7</td>
<td>0.073</td>
<td>0.398</td>
<td>8.29</td>
<td>87.9</td>
<td>0.097</td>
<td>0.698</td>
<td>26.9</td>
</tr>
<tr>
<td>Cascade-RCNN&#x002B;A</td>
<td>92.4</td>
<td>0.184</td>
<td>0.459</td>
<td>8.12</td>
<td>91.7</td>
<td>0.271</td>
<td>0.729</td>
<td>26.53</td>
</tr>
<tr>
<td>Cascade-RCNN&#x002B;A&#x002B;B</td>
<td>95.1</td>
<td>0.196</td>
<td>0.492</td>
<td>7.90</td>
<td>94.6</td>
<td>0.288</td>
<td>0.761</td>
<td>25.76</td>
</tr>
<tr>
<td>A&#x002B;B&#x002B;C (MFDC)</td>
<td><bold>96.7</bold></td>
<td><bold>0.232</bold></td>
<td><bold>0.516</bold></td>
<td>7.81</td>
<td><bold>96.8</bold></td>
<td><bold>0.352</bold></td>
<td><bold>0.768</bold></td>
<td>25.39</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn>
<p>Note: A: Multi-layer feature extraction; B: Deformable convolution module; C: Improved Cascade-RCNN. mAP(S): mAP value of fabric defects with areas less than 32 <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 32.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p><xref ref-type="table" rid="table-1">Table 1</xref> indicates the proposed MFDC achieves significantly better mAP and ACC indices than the other two compared algorithms. In terms of the running time of the compared methods, on the Ali Tianchi dataset, MFDC achieves 7.81 FPS, a value that is 0.48 FPS and 1.27 FPS slower than the Cascade-RCNN and Faster-RCNN algorithms, respectively. On the ZJU-Leaper dataset, MFDC achieves a speed of 24.39 FPS, a value that is 0.11 FPS and 5.91 FPS slower than Cascade-RCNN and Faster-RCNN, respectively. By comparing the evaluation indexes of the training set and test set on two public datasets, our results show that the MFDC algorithm can greatly improve mAP and ACC with a small increase in calculation time. Therefore, the MFDC algorithm has significant advantages in terms of its average fabric detection accuracy and its ability to correctly identify whether fabric images contain defects.</p>

<p>Some typical visual results of fabric defect detection by different methods are shown in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>. The detection is divided into two parts: one part is the fabric defect type, and the other is a numerical expression of the algorithm&#x2019;s confidence, as expressed by values ranging from 0 to 1. For easy to compare, <xref ref-type="fig" rid="fig-9">Fig. 9</xref> is artificially enlarged locally.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Visual comparison of experimental results</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-9a.tif"/><graphic mimetype="image" mime-subtype="tif" xlink:href="IASC_36897-fig-9b.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-9">Fig. 9a</xref> shows a fabric image with a complex background. The results show that all the algorithms can detect large-sized defects. The Faster-RCNN algorithm fails to detect the small defect due to the high degree of integration between the defect and the fabric background. Although Cascade-RCNN can detect the small defect, its confidence level is 0.47, indicating that Cascade-RCNN has a relatively weak learning ability for such defects. The proposed MFDC algorithm can identify small defects with a much higher confidence level of 0.9; therefore, MFDC has a stronger ability to detect fabric defects against complex backgrounds.</p>

<p>In <xref ref-type="fig" rid="fig-9">Fig. 9b</xref>, the defects are similar to the background of the fabric itself in both color and pattern. In this instance, Faster-RCNN only identifies one defect. Cascade-RCNN detects the leftmost and rightmost fabric defects but has serious overlapping of the detection frames, the pattern in the middle of the fabric image was wrongly detected, and the two fabric defects on the right of the fabric image are missed. The MFDC can accurately detect the two defects on the right side that are similar to the background color but can also identify the fabric defects on the fabric pattern accurately without obvious overlap box phenomenon. This outcome indicates that MFDC has a stronger detection ability for close false positions and can additionally detect fabric defects that are difficult to recognize with the naked eye.</p>

<p><xref ref-type="fig" rid="fig-9">Fig. 9c</xref> contains various scales and types of fabric defects. Faster-RCNN only detects one obvious stained defect and misses all other defects. The Cascade-RCNN algorithm detects a defect in the middle of the image and a relatively small, stained defect in the upper part and misses all other defects. The MFDC method detects most of the small fabric defects accurately, a with confidence value exceeding 0.7 in some cases. In addition, MFDC can detect fabric defects with extreme aspect ratio in the middle of the image. Similar conclusions can be drawn from <xref ref-type="fig" rid="fig-9">Fig. 9e</xref>.</p>

<p>The defect in <xref ref-type="fig" rid="fig-9">Fig. 9d</xref> involves missing printing. The shape of this type of defect is usually complex. The same background pattern has a high degree of randomness and is usually mixed with other patterns of the fabric&#x2019;s background. As shown in <xref ref-type="fig" rid="fig-9">Fig. 9d</xref>, the Faster-RCNN algorithm cannot detect any defects. Cascade-RCNN only detects defects at four positions with low confidence, with a severe overlap of frames. In contrast, the MFDC algorithm detects all defects with a confidence level of around 0.9; therefore, MFDC has a stronger detection ability for defects with complex shapes.</p>

</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This paper presents a multi-layer feature extraction method combined with deformable convolution (MFDC) for fabric defect detection. Using ResNet50 as a backbone network, a multi-layer feature extraction approach is applied to improve the detection effect of multi-scale fabric defects, and deformable convolution is incorporated to detect irregularly shaped fabric defects. By integrating RoiAlign with Cascade-RCNN, close false positives are reduced through the continuously enhanced IOU threshold, and the detection accuracy is significantly improved. This study&#x2019;s experimental results show that the proposed MFDC algorithm can greatly improve detection accuracy, at the expense of a small increase in detection time, and achieve better mAP and ACC indicator indices compared to other similar algorithms.</p>
<p>The MFDC proposed in this paper needs enough labeled data as training samples to obtain good defect detection performance. However, in practical industrial applications, obtaining high-quality labels can be a bottleneck, due to the time consuming and expensive annotation process. Therefore, the detection performance of MFDC will decrease with the insufficient data samples. In the future, this method will integrate an end-to-end semi-supervised detection framework to make it have better defect detection performance.</p>
</sec>
</body>
<back>
<ack>
<p>The authors express their heartfelt thanks to the supervisor for his direction and unwavering support during this study.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This work was supported in part by the National Science Foundation of China under Grant 62001236, in part by the Natural Science Foundation of the Jiangsu Higher Education Institutions of China under Grant 20KJA520003.</p>
</sec>
<sec><title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Study conception and design: Jielin Jiang, Chao Cui; data collection: Chao Cui, Xiaolong Xu; analysis and interpretation of results: Jielin Jiang, Chao Cui, Yan Cui; draft manuscript preparation: Chao Cui, Jielin Jiang. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>Due to the nature of this research, participants of this study did not agree for their data to be shared publicly, so supporting data is not available.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. S.</given-names> <surname>Normuminovich</surname></string-name>, <string-name><given-names>A. F.</given-names> <surname>Payziyevna</surname></string-name>, and <string-name><given-names>A. N.</given-names> <surname>Azimdjanovna</surname></string-name></person-group>, &#x201C;<article-title>Analysis of the effectiveness of the textile industry</article-title>,&#x201D; <source>J. Hunan Univ. (Natural Sci. Ed.)</source>, vol. <volume>48</volume>, no. <issue>12</issue>, pp. <fpage>1587</fpage>&#x2013;<lpage>1597</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Automatic fabric defect detection using a wide-and-light network</article-title>,&#x201D; <source>Appl. Intell.</source>, vol. <volume>51</volume>, no. <issue>7</issue>, pp. <fpage>4945</fpage>&#x2013;<lpage>4961</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1007/s10489-020-02084-6</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>H. P.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Research on nonwoven fabric defect online detection system using machine vision,&#x201D; M.S. thesis, Huazhong University of Science and Technology</article-title>, <publisher-loc>China</publisher-loc>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>R. N.</given-names> <surname>Bracewell</surname></string-name> and <string-name><given-names>R. N.</given-names> <surname>Bracewell</surname></string-name></person-group>, <source>The Fourier Transform and its Applications</source>. <publisher-loc>NY, USA</publisher-loc>: <publisher-name>McGraw-Hill</publisher-name>, <year>1986</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. L.</given-names> <surname>Raheja</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Kumar</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Chaudhary</surname></string-name></person-group>, &#x201C;<article-title>Fabric defect detection based on GLCM and Gabor filter: A comparison</article-title>,&#x201D; <source>Optik</source>, vol. <volume>124</volume>, no. <issue>23</issue>, pp. <fpage>6469</fpage>&#x2013;<lpage>6474</lpage>, <year>2013</year>. doi: <pub-id pub-id-type="doi">10.1016/j.ijleo.2013.05.004</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhou</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>Heart-rate analysis of healthy and insomnia groups with detrended fractal dimension feature in edge</article-title>,&#x201D; <source>Tsinghua Sci. Technol.</source>, vol. <volume>27</volume>, no. <issue>2</issue>, pp. <fpage>325</fpage>&#x2013;<lpage>332</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.26599/TST.2021.9010030</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Xu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Game theory for distributed IoV task offloading with fuzzy neural network in edge computing</article-title>,&#x201D; <source>IEEE Trans. Fuzzy Syst.</source>, vol. <volume>30</volume>, no. <issue>11</issue>, pp. <fpage>4593</fpage>&#x2013;<lpage>4604</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TFUZZ.2022.3158000</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Qi</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>He</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Dou</surname></string-name></person-group>, &#x201C;<article-title>DisCOV: Distributed COVID-19 detection on X-ray images with edge-cloud collaboration</article-title>,&#x201D; <source>IEEE Trans. Serv. Comput.</source>, vol. <volume>15</volume>, no. <issue>3</issue>, pp. <fpage>1206</fpage>&#x2013;<lpage>1219</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TSC.2022.3142265</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Qi</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Bilal</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Song</surname></string-name></person-group>, &#x201C;<article-title>Privacy-aware point of interest category recommendation in internet of things</article-title>,&#x201D; <source>IEEE Internet Things J.</source>, vol. <volume>9</volume>, no. <issue>21</issue>, pp. <fpage>21398</fpage>&#x2013;<lpage>21408</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/JIOT.2022.3181136</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>A long short-term memory-based model for greenhouse climate prediction</article-title>,&#x201D; <source>Int. J. Intell. Syst.</source>, vol. <volume>37</volume>, no. <issue>1</issue>, pp. <fpage>135</fpage>&#x2013;<lpage>151</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1002/int.22620</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Qi</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Dou</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xu</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>A correlation graph based approach for personalized and compatible web APIs recommendation in mobile APP development</article-title>,&#x201D; <source>IEEE Trans. Knowl. Data Eng.</source>, vol. <volume>34</volume>, no. <issue>4</issue>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TKDE.2022.3168611</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Simonyan</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Zisserman</surname></string-name></person-group>, &#x201C;<article-title>Very deep convolutional networks for large-scale image recognition</article-title>,&#x201D; <comment>arXiv preprint arXiv:1409.1556</comment>, vol. <volume>2014</volume>, pp. <fpage>1409</fpage>&#x2013;<lpage>1556</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Xu</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>Object detection with deep learning: A review</article-title>,&#x201D; <source>IEEE Trans. Neural Netw. Learn. Syst.</source>, vol. <volume>30</volume>, no. <issue>11</issue>, pp. <fpage>3212</fpage>&#x2013;<lpage>3232</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.1109/TNNLS.2018.2876865</pub-id>; <pub-id pub-id-type="pmid">30703038</pub-id></mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Haque</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Murshed</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Paul</surname></string-name></person-group>, &#x201C;<article-title>On stable dynamic background generation technique using Gaussian mixture models for robust object detection</article-title>,&#x201D; in <conf-name>Proc. IEEE Fifth Int. Conf. Adv. Video Sig. Based Surveill.</conf-name>, <publisher-loc>Washington, DC, USA</publisher-loc>, <year>2008</year>, pp. <fpage>41</fpage>&#x2013;<lpage>48</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Stauffer</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Grimson</surname></string-name></person-group>, &#x201C;<article-title>Adaptive background mixture models for real-time tracking</article-title>,&#x201D; in <conf-name>Proc. IEEE Comput. Soc. Conf. Comput. Vis. Pattern Recognit.</conf-name>, <publisher-loc>Fort Collins, CO, USA</publisher-loc>, <year>1999</year>, pp. <fpage>246</fpage>&#x2013;<lpage>252</lpage>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. K.</given-names> <surname>Yadav</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Singh</surname></string-name></person-group>, &#x201C;<article-title>A combined approach of kullback-leibler divergence and background subtraction for moving object detection in thermal video</article-title>,&#x201D; <source>Infrared Phys. Technol.</source>, vol. <volume>76</volume>, no. <issue>8</issue>, pp. <fpage>21</fpage>&#x2013;<lpage>31</lpage>, <year>2016</year>. doi: <pub-id pub-id-type="doi">10.1016/j.infrared.2015.12.027</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ray</surname></string-name>, and <string-name><given-names>D.</given-names> <surname>Yadav</surname></string-name></person-group>, &#x201C;<article-title>Moving human detection and tracking from thermal video through intelligent surveillance system for smart applications</article-title>,&#x201D; <source>Multimed. Tools</source>, vol. <volume>81</volume>, no. <issue>18</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>20</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Sharma</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Yadav</surname></string-name></person-group>, &#x201C;<article-title>Histogram-based adaptive learning for background modelling: Moving object detection in video surveillance</article-title>,&#x201D; <source>Int. J. Telemed. Clin. Pract.</source>, vol. <volume>2</volume>, no. <issue>1</issue>, pp. <fpage>74</fpage>&#x2013;<lpage>92</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.1504/IJTMCP.2017.082107</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Buades</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Coll</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Morel</surname></string-name></person-group>, &#x201C;<article-title>A review of image denoising algorithms, with a new one</article-title>,&#x201D; <source>Multiscale Model. Simul.</source>, vol. <volume>4</volume>, no. <issue>2</issue>, pp. <fpage>490</fpage>&#x2013;<lpage>530</lpage>, <year>2005</year>. doi: <pub-id pub-id-type="doi">10.1137/040616024</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name></person-group>, &#x201C;<article-title>Fast R-CNN</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Comput. Vis.</conf-name>, <publisher-loc>Santiago, Chile</publisher-loc>, <year>2015</year>, pp. <fpage>1440</fpage>&#x2013;<lpage>1448</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Abouelela</surname></string-name>, <string-name><given-names>H. M.</given-names> <surname>Abbas</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Eldeeb</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Wahdan</surname></string-name>, and <string-name><given-names>S. M.</given-names> <surname>Nassar</surname></string-name></person-group>, &#x201C;<article-title>Automated vision system for localizing structural defects in textile fabrics</article-title>,&#x201D; <source>Pattern Recognit. Lett.</source>, vol. <volume>26</volume>, no. <issue>10</issue>, pp. <fpage>1435</fpage>&#x2013;<lpage>1443</lpage>, <year>2005</year>. doi: <pub-id pub-id-type="doi">10.1016/j.patrec.2004.11.016</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>G. H.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Unsupervised defect detection in textiles based on fourier analysis and wavelet shrinkage</article-title>,&#x201D; <source>Appl. Opt.</source>, vol. <volume>54</volume>, no. <issue>10</issue>, pp. <fpage>2963</fpage>&#x2013;<lpage>2980</lpage>, <year>2015</year>. doi: <pub-id pub-id-type="doi">10.1364/AO.54.002963</pub-id>; <pub-id pub-id-type="pmid">25967212</pub-id></mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Karlekar</surname></string-name>, <string-name><given-names>M. S.</given-names> <surname>Biradar</surname></string-name>, and <string-name><given-names>K. B.</given-names> <surname>Bhangale</surname></string-name></person-group>, &#x201C;<article-title>Fabric defect detection using wavelet filter</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. Comput. Commun. Control Autom.</conf-name>, <publisher-loc>Jeju Island, Republic of Korea</publisher-loc>, <year>2015</year>, pp. <fpage>712</fpage>&#x2013;<lpage>715</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Pan</surname></string-name>, <string-name><given-names>W. D.</given-names> <surname>Gao</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>YARN-DYED fabric defect detection based on auto correlation function and GLCM</article-title>,&#x201D; <source>Autex Res. J.</source>, vol. <volume>15</volume>, no. <issue>3</issue>, pp. <fpage>226</fpage>&#x2013;<lpage>232</lpage>, <year>2015</year>. doi: <pub-id pub-id-type="doi">10.1515/aut-2015-0001</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>V. S.</given-names> <surname>Thakare</surname></string-name> and <string-name><given-names>N. N.</given-names> <surname>Patil</surname></string-name></person-group>, &#x201C;<article-title>Classification of texture using gray level co-occurrence matrix and self-organizing map</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. Electron. Syst. Sig. Process. Comput. Technol.</conf-name>, <publisher-loc>Poznan, Poland</publisher-loc>, <year>2014</year>, pp. <fpage>350</fpage>&#x2013;<lpage>355</lpage>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>SSD: Single shot multibox detector</article-title>,&#x201D; in <conf-name>Proc. European Conf. Comput. Vis.</conf-name>, <publisher-loc>Amsterdam, Netherlands</publisher-loc>, <year>2016</year>, pp. <fpage>21</fpage>&#x2013;<lpage>37</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Divvala</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>You Only Look Once: Unified, real-time object detection</article-title>,&#x201D; in <conf-name>Proc. IEEE Conf. Comput. Vis. Pattern Recognit.</conf-name>, <publisher-loc>Las Vegas, NV, USA</publisher-loc>, <year>2016</year>, pp. <fpage>779</fpage>&#x2013;<lpage>788</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Research on fabric defect detection method based on convolutional neural network,&#x201D; M.S. thesis, Huazhong University of Science and Technology</article-title>, <publisher-loc>China</publisher-loc>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Hershey</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>CNN architectures for large-scale audio classification</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Acoust. Speech Sig. Process.</conf-name>, <publisher-loc>LA, USA</publisher-loc>, <year>2017</year>, pp. <fpage>131</fpage>&#x2013;<lpage>135</lpage>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>Z. Y.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>Research on recognition and detection of textile defects based on deep learning,&#x201D; M.S. thesis, Huazhong University of Science and Technology</article-title>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Faster R-CNN: Towards real-time object detection with region proposal networks</article-title>,&#x201D; <source>IEEE Trans. Pattern Anal. Mach. Intell.</source>, vol. <volume>39</volume>, no. <issue>6</issue>, pp. <fpage>1137</fpage>&#x2013;<lpage>1149</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.1109/TPAMI.2016.2577031</pub-id>; <pub-id pub-id-type="pmid">27295650</pub-id></mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Fu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Ding</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Paisley</surname></string-name></person-group>, &#x201C;<article-title>PanNet: A deep network architecture for pan-sharpening</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Comput. Vis.</conf-name>, <publisher-loc>Venice, Italy</publisher-loc>, <year>2017</year>, pp. <fpage>1753</fpage>&#x2013;<lpage>1761</lpage>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ren</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Spatial pyramid pooling in deep convolutional networks for visual recognition</article-title>,&#x201D; <source>IEEE Trans. Pattern Anal. Mach. Intell.</source>, vol. <volume>37</volume>, no. <issue>9</issue>, pp. <fpage>1904</fpage>&#x2013;<lpage>1916</lpage>, <year>2015</year>. doi: <pub-id pub-id-type="doi">10.1109/TPAMI.2015.2389824</pub-id>; <pub-id pub-id-type="pmid">26353135</pub-id></mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Misra</surname></string-name></person-group>, &#x201C;<article-title>Mish: A self regularized non-monotonic neural activation function</article-title>,&#x201D; <source>arXiv preprint arXiv:1908.08681</source>, vol. <volume>4</volume>, no. <issue>2</issue>, pp. <fpage>10</fpage>&#x2013;<lpage>48550</lpage>, <year>1908</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. M.</given-names> <surname>Hafiz</surname></string-name></person-group>, &#x201C;<article-title>K-nearest neighbour and support vector machine hybrid classification</article-title>,&#x201D; <source>arXiv preprint arXiv:2007.00045</source>, vol. <volume>19</volume>, no. <issue>4</issue>, pp. <fpage>33</fpage>&#x2013;<lpage>41</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Guo</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Men</surname></string-name></person-group>, &#x201C;<article-title>Hyper feature fusion pyramid network for object detection</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Multimed. Expo Workshops</conf-name>, <publisher-loc>San Diego, CA, USA</publisher-loc>, <year>2018</year>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Dai</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Deformable convolutional networks</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. Comput. Vis.</conf-name>, <publisher-loc>Las Vegas, NV, USA</publisher-loc>, <year>2017</year>, pp. <fpage>764</fpage>&#x2013;<lpage>773</lpage>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Cai</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Vasconcelos</surname></string-name></person-group>, &#x201C;<article-title>Cascade RCNN: Delving into high quality object detection</article-title>,&#x201D; in <conf-name>Proc. IEEE Conf. Comput. Vis. Pattern Recognit.</conf-name>, <publisher-loc>Salt, USA</publisher-loc>, <year>2018</year>, pp. <fpage>6154</fpage>&#x2013;<lpage>6162</lpage>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T. Y.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Doll&#x00E1;r</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, <string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Hariharan</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Belongie</surname></string-name></person-group>, &#x201C;<article-title>Feature pyramid networks for object detection</article-title>,&#x201D; in <conf-name>Proc. IEEE Conf. Comput. Vis. Pattern Recognit.</conf-name>, <publisher-loc>HI, USA</publisher-loc>, <year>2017</year>, pp. <fpage>2117</fpage>&#x2013;<lpage>2125</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>ZJU-Leaper: A benchmark dataset for fabric defect detection and a comparative study</article-title>,&#x201D; <source>IEEE Trans. Artif. Intell.</source>, vol. <volume>1</volume>, no. <issue>3</issue>, pp. <fpage>219</fpage>&#x2013;<lpage>232</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1109/TAI.2021.3057027</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>