<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">33417</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2023.033417</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Boosted Stacking Ensemble Machine Learning Method for Wafer Map Pattern Classification</article-title>
<alt-title alt-title-type="left-running-head">Boosted Stacking Ensemble Machine Learning Method for Wafer Map Pattern Classification</alt-title>
<alt-title alt-title-type="right-running-head">Boosted Stacking Ensemble Machine Learning Method for Wafer Map Pattern Classification</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Choi</surname><given-names>Jeonghoon</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Suh</surname><given-names>Dongjun</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>dongjunsuh@knu.ac.kr</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Otto</surname><given-names>Marc-Oliver</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Convergence and Fusion System Engineering, Kyungpook National University</institution>, <addr-line>Sangju</addr-line>, 37224, <country>Korea</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Mathematics, Natural and Economic Sciences, Ulm University of Applied Sciences</institution>, <addr-line>Ulm, 89075</addr-line>, <country>Germany</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Dongjun Suh. Email: <email>dongjunsuh@knu.ac.kr</email></corresp>
</author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2022-10-28"><day>28</day>
<month>10</month>
<year>2022</year></pub-date>
<volume>74</volume>
<issue>2</issue>
<fpage>2945</fpage>
<lpage>2966</lpage>
<history>
<date date-type="received"><day>16</day><month>6</month><year>2022</year></date>
<date date-type="accepted"><day>02</day><month>8</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Choi et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Choi et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_33417.pdf"></self-uri>
<abstract>
<p>Recently, machine learning-based technologies have been developed to automate the classification of wafer map defect patterns during semiconductor manufacturing. The existing approaches used in the wafer map pattern classification include directly learning the image through a convolution neural network and applying the ensemble method after extracting image features. This study aims to classify wafer map defects more effectively and derive robust algorithms even for datasets with insufficient defect patterns. First, the number of defects during the actual process may be limited. Therefore, insufficient data are generated using convolutional auto-encoder (CAE), and the expanded data are verified using the evaluation technique of structural similarity index measure (SSIM). After extracting handcrafted features, a boosted stacking ensemble model that integrates the four base-level classifiers with the extreme gradient boosting classifier as a meta-level classifier is designed and built for training the model based on the expanded data for final prediction. Since the proposed algorithm shows better performance than those of existing ensemble classifiers even for insufficient defect patterns, the results of this study will contribute to improving the product quality and yield of the actual semiconductor manufacturing process.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Wafer map</kwd>
<kwd>pattern classification</kwd>
<kwd>machine learning</kwd>
<kwd>boosted stacking ensemble</kwd>
<kwd>semiconductor manufacturing processing</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>A wafer is a basic unit created to evaluate electrical properties during semiconductor manufacturing [<xref ref-type="bibr" rid="ref-1">1</xref>], where wafer map fabrication is used to visualize the location of defects on the wafer map. Defective IC chips usually show defect patterns on the wafer map. These defect patterns include useful information about the semiconductor manufacturing process. Thus, wafer map defect pattern classification is essential to investigate the root cause of such defects occurring in the semiconductor manufacturing process. For example, physical etching frequently produces edge-ring patterns, while chemical etching often produces circle and scratch patterns. Therefore, accurate identification and classification of these defect patterns increases the chances of fixing the root cause of the main problem&#x00A0;[<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>In the actual semiconductor manufacturing process, the occurrence of defects is very rare. In general, there are very few cases with detectable defect patterns when collecting manufacturing process data, and most of the data are in a normal state. Since it is necessary to classify data with a small defect pattern by learning the imbalanced dataset, the classification accuracy is very poor and time consuming [<xref ref-type="bibr" rid="ref-3">3</xref>]. Furthermore, pattern classification for the collected wafer map data relies on visual inspection by skilled engineers. Engineers randomly select samples from entire wafers and use high-resolution microscopy to analyze defects, which is a time-consuming and inconsistent process [<xref ref-type="bibr" rid="ref-4">4</xref>]. In order to save time and money in this process, it is essential to study automated wafer map pattern classification algorithms [<xref ref-type="bibr" rid="ref-5">5</xref>]. A considerable amount of research is underway now on wafer map defect classification using feature extraction algorithms in the semiconductor manufacturing process. Wafer map defect patterns were successfully classified in initial studies by applying only machine learning without applying feature extraction techniques [<xref ref-type="bibr" rid="ref-6">6</xref>]. In addition, research investigating the techniques for extracting features by concentrating on the features of the wafer map has been performed [<xref ref-type="bibr" rid="ref-7">7</xref>&#x2013;<xref ref-type="bibr" rid="ref-10">10</xref>]. Further, this feature extraction technique was applied to analyze spatial defect patterns using machine learning and automated clustering algorithms [<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-13">13</xref>]. In recent studies, defects have been analyzed by directly extracting features from deep learning-based images. There are also many studies that have successfully implemented wafer map defect classification by applying the feature extraction technique followed by an ensemble learning algorithm [<xref ref-type="bibr" rid="ref-14">14</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>].</p>
<p>In order to improve wafer map pattern classification accuracy, this study aims to suggest a Boosted Stacking Ensemble Machine Learning (BSEML) algorithm that applies data augmentation to insufficient defect patterns. With a given training dataset, data augmentation is first performed through CAE-based model learning. Then, features are extracted through handcrafted feature extraction techniques based on features such as density, Radon, and geometry. The extracted feature vectors are combined to construct a BSEML model that performs final prediction. The contributions of this study are listed as follows.
<list list-type="order">
<list-item><p>The effectiveness of the proposed technique was verified using wafer datasets collected from semiconductor manufacturers.</p></list-item>
<list-item><p>The computational efficiency was increased by extracting the key defect pattern information hidden from the original image using various feature extraction techniques.</p></list-item>
<list-item><p>Data augmentation was performed using a CAE-based model to solve the problem of lack of defect patterns and imbalance, and the accuracy of the proposed model was improved using augmented data.</p></list-item>
</list></p>
<p>The rest of this study is structured as follows. Section 2 briefly describes the techniques used in related studies. Section 3 introduces the proposed algorithm. Section 4 describes the data structure and experimental methods. Sections 5 and 6 contain the results of the study and conclusions</p>
</sec>
<sec id="s2"><label>2</label><title>Related Work</title>
<p>In the past few years, there have been many studies that have applied machine learning to wafer map pattern classification. These are largely divided into two types based on the method of extracting the features of the wafer map and classifying the defects. <xref ref-type="table" rid="table-1">Tab. 1</xref> summarizes literature reported in the related studies.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Machine learning approaches for wafer map pattern classification</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Ref. No</th>
<th align="left">Method</th>
<th align="left">Ensemble method</th>
<th align="left">Input feature</th>
<th align="left">Input shape</th>
<th align="left">Classifier</th>
<th align="left">Data processing</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-6">6</xref>]</td>
<td align="left">MFE</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">30</td>
<td align="left">SVM</td>
<td align="left">EOL test</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-24">24</xref>]</td>
<td align="left">MFE</td>
<td align="left">-</td>
<td align="left">Features</td>
<td align="left">53</td>
<td align="left">JLNDA-FD</td>
<td align="left">Denoising</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">MFE</td>
<td align="left">Bagging</td>
<td align="left">Features</td>
<td align="left">4</td>
<td align="left">DT</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td align="left">MFE</td>
<td align="left">Voting</td>
<td align="left">Features</td>
<td align="left">66</td>
<td align="left">LR, RF, GBM, ANN</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td align="left">MFE</td>
<td align="left">Stacking</td>
<td align="left">Spatial</td>
<td align="left">10</td>
<td align="left">AB, ET, XGB</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td align="left">CNN</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">(286, 400)</td>
<td align="left">CNN</td>
<td align="left">Simulated generation</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">CNN</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">(100, 100)</td>
<td align="left">DCNN</td>
<td align="left">Noise reduction</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-15">15</xref>]</td>
<td align="left">CNN</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">(256, 256)</td>
<td align="left">CNN</td>
<td align="left">Contrast, binarization</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td align="left">CNN</td>
<td align="left">Stacking</td>
<td align="left">Wafer map</td>
<td align="left">(300, 300)</td>
<td align="left">ECNN</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-30">30</xref>]</td>
<td align="left">CNN</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">(64, 64)</td>
<td align="left">CNN</td>
<td align="left">CAE</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td align="left">CNN</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">(224, 224)</td>
<td align="left">CBAM</td>
<td align="left">C-Mean filtering</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-32">32</xref>]</td>
<td align="left">CNN</td>
<td align="left">-</td>
<td align="left">Wafer map</td>
<td align="left">(416, 416)</td>
<td align="left">YOLO</td>
<td align="left">-</td>
</tr>
<tr>
<td align="left">Proposed</td>
<td align="left">MFE</td>
<td align="left">Boosted Stacking</td>
<td align="left">Features</td>
<td align="left">(32, 32), 59</td>
<td align="left">DT, SVM, RF, KNN, XGB</td>
<td align="left">CAE</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s2_1"><label>2.1</label><title>Wafer Map Pattern Classification</title>
<p>The first method is to extract hand-made features and build a ready-made classifier. The most commonly used features for feature extraction techniques in this approach include density, geometry, and Radon properties [<xref ref-type="bibr" rid="ref-7">7</xref>]. Such handcrafted feature extraction reduces dimensionality by transforming the wafer map into a vector form. Next, it takes a vector as input and makes predictions in the classification model. This step involves various existing learning algorithms such as support vector machine (SVM), logistic regression (LR), naive Bayes (NB), and K-nearest neighbors (K-NN). The SVM model can be constructed more simply than the existing neural network model, and it is characterized by less overfitting as it has no effect on multi-class data classification and error data [<xref ref-type="bibr" rid="ref-10">10</xref>]. In addition, Baly&#x00A0;et&#x00A0;al.&#x00A0;preprocessed the wafer map through End-of-line (EOL) test before classification using the SVM classifier [<xref ref-type="bibr" rid="ref-6">6</xref>]. The LR model is a widely used classification model, which provides probabilities for classified classes. This is a big advantage over models that can only do final classification [<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
<p>The NB model is based on Bayes&#x2019; theorem and learns very quickly compared to existing learning algorithms. In particular, it allows easy and quick prediction in multi-class classification that is probabilistically independent [<xref ref-type="bibr" rid="ref-22">22</xref>]. It has high accuracy as the K-NN model checks and compares all classification system values, and the error data is excluded from the comparison target, thus not affecting the resulting value [<xref ref-type="bibr" rid="ref-23">23</xref>]. Yu&#x00A0;et&#x00A0;al.&#x00A0;maximized classification performance through image denoising with median filter using an algorithm based on a KNN classifier [<xref ref-type="bibr" rid="ref-24">24</xref>]. Studies based on these methods focus on designing model optimizations to enhance the performance of pattern classification. These methods, however, do not overcome the limitations of the models, and some important information from the raw wafer map image might be lost.</p>
<p>The second method is a CNN-based raw image classification method. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the method aims to detect defects by extracting features from the wafer map based on image data. CNNs are end-to-end models designed to process two or more dimensional arrays as input. The end-to-end model approach is useful as it does not require the development of feature extractors [<xref ref-type="bibr" rid="ref-25">25</xref>]. CNN can directly extract the convolution features and apply them to the wafer map since the wafer map is expressed as a two-dimensional array. Such advantages allow this method to be actively applied to the classification of wafer maps [<xref ref-type="bibr" rid="ref-26">26</xref>&#x2013;<xref ref-type="bibr" rid="ref-28">28</xref>]. In addition, CNN-based wafer map classification studies using various data processing techniques have been conducted until recently [<xref ref-type="bibr" rid="ref-28">28</xref>&#x2013;<xref ref-type="bibr" rid="ref-32">32</xref>].</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>The architecture of convolution neural network approach [<xref ref-type="bibr" rid="ref-25">25</xref>]</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-1.png"/></fig>
</sec>
<sec id="s2_2"><label>2.2</label><title>Ensemble Model Learning of Handcrafted Features</title>
<p>The ensemble system is constructed based on principles such as reliability estimation, data fusion, and unbalanced data processing. The performance of an ensemble system depends on the accuracy of individual classifiers and the number of base-level classifiers included [<xref ref-type="bibr" rid="ref-33">33</xref>]. However, it is very difficult to select an appropriate classifier for designing an ensemble system. The ML classifier used in wafer defect classification may be suitable for some defects, but may not be suitable for recognizing all defect classes [<xref ref-type="bibr" rid="ref-34">34</xref>]. The ensemble techniques are used to overcome the limitations of individual classifiers in ML. Learning by assigning specific weights to individual classifiers ensures robustness for all defect classes. The goal of the ensemble classification technique is to integrate the prediction results of various ML models within the given training data and generate the final prediction result with improved accuracy [<xref ref-type="bibr" rid="ref-35">35</xref>]. It also facilitates fast classification through minimal calculations, coupled with handcrafted features that improve defect identification on large-scale wafer data [<xref ref-type="bibr" rid="ref-36">36</xref>].</p>
<p>In recent years, increasing interest in ensemble techniques has led to the emergence of various ensemble-based algorithms such as Voting, Bagging, Boosting, AdaBoost, XGBoost, and Mixture of Experts (MoE) [<xref ref-type="bibr" rid="ref-37">37</xref>]. Accordingly, studies applying the ensemble classification technique to classify wafer map defect patterns have appeared. The voting method first combines different algorithm models.</p>
<p>There are three types of voting methods for deriving the result: the majority, hard, and soft voting methods. Through experimental verification, the soft voting ensemble method has been verified to have the best performance for deriving the final result [<xref ref-type="bibr" rid="ref-17">17</xref>]. The bagging ensemble method allows for redundancy in the data sample and extracts the sample, and then learns by using different sample combinations within the same algorithm, decision tree (DT) or random forest (RF). Subsequently, the average of the results is calculated to obtain the final result. A robust model for various defect patterns has been presented according to the mathematical model of DT, an internal algorithm [<xref ref-type="bibr" rid="ref-18">18</xref>].</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Proposed Method</title>
<p>This section describes the technique proposed in this study in detail. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows the process for the proposed technique. The process is as follows. There are cases in which raw wafer image data have a class imbalance or lack defect patterns. Data augmentation using the CAE model is implemented to expand the data by matching the ratio for the overall insufficient pattern. To extract the features of augmented data as much as possible through the ML-based classifier, the amount of computation is reduced by reducing the dimension of the 2D array image to a 1D array while minimizing the loss of feature information due to the lowering of the dimension [<xref ref-type="bibr" rid="ref-38">38</xref>]. By extracting Radon, density, and geometric-based features, the feature vectors are maintained and summed into the BSEML model. Finally, the summed feature vectors are learned by the base-level classifier inside the BSEML model and the final prediction is performed by the meta-level classifier.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>The architecture of the proposed method</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-2.png"/></fig>
<sec id="s3_1"><label>3.1</label><title>Feature Extraction</title>
<p>The feature extraction technique makes a one-dimensional array by reducing the dimension of a two-dimensional array of the wafer map that exists as an image. With the dimension reduction, not only the amount of computation is reduced, but also important feature information is vectorized and converted into a one-dimensional vector [<xref ref-type="bibr" rid="ref-39">39</xref>]. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> presents sample wafer maps from each defect pattern type.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Sample images</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-3.png"/></fig>
<p>First, the density-based feature extraction technique is a method of calculating how densely the defects are in the corresponding section of the wafer map [<xref ref-type="bibr" rid="ref-9">9</xref>]. In order to extract the density-based features, each wafer map is divided into 20 parts of the (6, 6) region, and the failure density in each part is calculated. As shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, the defect density distributions in the respective wafer regions are also different for different defect patterns.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Density-based images</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-4.png"/></fig>
<p>Second, the Radon-based feature extraction technique is a method to generate an image of a two-dimensional representation of a wafer map by Radon transformation based on projection [<xref ref-type="bibr" rid="ref-40">40</xref>]. A projection is constructed by creating a few parallel rays from an object of interest in a 2D image, transferring the object&#x2019;s integral contrast along with all of the rays to a single pixel in the projection. A Sinogram, which depicts the original image in a linear transform, is a collection of these projections from various angles [<xref ref-type="bibr" rid="ref-8">8</xref>]. The Radon transformation is expressed in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mi mathvariant="bold">M</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>&#x03B4;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mi>cos</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>+</mml:mo><mml:mi>y</mml:mi><mml:mi>sin</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03C1;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mi mathvariant="bold">M</mml:mi></mml:mrow></mml:math></inline-formula> is a wafer map of size m&#x00D7;n. Each element in <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mi mathvariant="bold">M</mml:mi></mml:mrow></mml:math></inline-formula> is set as 1 to denote a defective die, and 0 otherwise. <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>&#x03C1;</mml:mi></mml:math></inline-formula> denotes the distance between the point of origin and the line, and <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>&#x03B8;</mml:mi></mml:math></inline-formula> indicates angle from the x-axis. <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the response of a projection, and <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>&#x03B4;</mml:mi></mml:math></inline-formula> is the impulse function. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows the results of Radon transformation for eight common defect classes, identified by linear transformation of a collection of projections for different angles.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>Radon-based images</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-5.png"/></fig>
<p>Third, a geometry-based feature extraction technique is used to evaluate the geometric properties of each wafer map [<xref ref-type="bibr" rid="ref-41">41</xref>]. Geometry-based features have been derived by calculating local, statistical, and linear properties based on the analysis of various wafer map patterns and consultation with domain experts. The scale and rotation of these properties are invariant, and a region-labeling algorithm is used. The algorithm reveals the most prominent areas of the wafer defect pattern.</p>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the most prominent regions with the maximum area for each wafer map defect class. This function is also considered noise filtering to remove defect noise that is randomly present on the wafer map image. As a result, a total of 59 handcrafted features were extracted, containing 13 densities, 40 Radon shapes, and six geometric features, which were used to train the model.</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Geometry-based images</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-6.png"/></fig>
</sec>
<sec id="s3_2"><label>3.2</label><title>The BSEML Model</title>
<p><xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows the architecture of the BSEML model. In the proposed model, a base-level classifier was constructed using four ML classifiers: random forest (RF), decision tree (DT), KNN, and SVM. The meta-level classifier was constructed using the extreme gradient boosting classifier. The identification accuracy of the star base-level classifier depended on the wafer map defect class. Individual classifiers failed to achieve high accuracy as each classifier had its own learning capabilities and parameter values. Therefore, an ensemble approach of collecting the best results from all classifiers, aggregating them, and putting them in the meta-level classifier was used to obtain the final classification results for all defect classes. A summary of each individual classifier is as follows.</p>
<fig id="fig-7"><label>Figure 7</label><caption><title>The architecture of the BSEML model</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-7.png"/></fig>
<p>A decision tree (DT), also called a classification and regression tree used in both classification and regression analysis, is a classification model that divides the independent variable space while sequentially applying various rules. In predicting target variables or solving classification problems, the model enables checking which explanatory variable is the most important influencing factor and determines the detailed criteria for the prediction and classification of each explanatory variable [<xref ref-type="bibr" rid="ref-42">42</xref>].</p>
<p>A random forest (RF) is a bagging ensemble algorithm that trains several DT models and synthesizes the results to make the prediction. The bagging ensemble algorithm is a method of training individual DT models with a sampled dataset by allowing duplicates from the original dataset. In addition, DT is based on the principle of uncertainty called entropy, and the concept of entropy is expressed by the following expression [<xref ref-type="bibr" rid="ref-43">43</xref>].
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msubsup><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the probability value at which each element can come out, and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the uncertainty of the random variable numerically. Entropy in ML means that the higher the value, the lower the probability of successful classification. Therefore, RF and DT models are trained by selecting a predicted value that lowers the entropy index. Values predicted through multiple models are averaged to produce a final predicted value. The RF algorithm improves the generalization performance of the predictive model by randomly selecting features for further diversity in the DT model [<xref ref-type="bibr" rid="ref-44">44</xref>].</p>
<p>KNN is an algorithm that is used to determine the classification of new data. The KNN for classification is expressed as follows [<xref ref-type="bibr" rid="ref-45">45</xref>].
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mi>Q</mml:mi><mml:mo>}</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mfrac><mml:mn>1</mml:mn><mml:mi>t</mml:mi></mml:mfrac><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>:</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mi>T</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>q</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>For the input data <inline-formula id="ieqn-9a"><mml:math id="mml-ieqn-9a"><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula>, the KNN classifier predicts a label, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Where, <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a set of <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>s close to <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mrow><mml:mover><mml:mi>x</mml:mi><mml:mo>&#x007E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula>, and <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>T</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is a function that outputs 1 if <italic>y</italic> is true, and 0 if false. By comparing the existing data with the newly input data, the input data are classified based on the existing data that are similar. Since this algorithm checks and compares all the values of the existing classification system, the accuracy is high, and error data are excluded from comparison by only using the k nearest data. Therefore, the error data do not significantly affect the result value [<xref ref-type="bibr" rid="ref-46">46</xref>].</p>
<p>SVM is an algorithm that performs classification using support vectors and hyperplanes. The data are classified by maximizing the margin between the separated hyperplane and the support vector while minimizing the error [<xref ref-type="bibr" rid="ref-47">47</xref>]. Training by maximizing the margin may lead to some errors, but the classification accuracy is high for newly input data. Training by minimizing errors may lead to incorrect classification due to a narrow margin. The expression to maximize the SVM margin is as follows [<xref ref-type="bibr" rid="ref-48">48</xref>].
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mtext mathvariant="italic">maximize</mml:mtext></mml:mrow><mml:mi>M</mml:mi></mml:math></disp-formula>
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mi>s</mml:mi><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mo>.</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msubsup><mml:msubsup><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:math></disp-formula>
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x22EF;</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mi>M</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo></mml:math></disp-formula>
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msubsup><mml:mi>&#x03BC;</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>C</mml:mi></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>M</mml:mi></mml:math></inline-formula> denotes the margin, <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> denotes the slope to hyperplane, <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>x</mml:mi></mml:math></inline-formula> denotes the point on the hyperplane, and <italic>y</italic> denotes the ground truth class. By adding a tuning parameter <italic>C</italic> that allows the error <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> in the case of misclassification for each observation, data <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> was appropriately allowed to have a soft margin. Therefore, in this study, the accuracy of defect classification was improved by selecting a method for maximizing the margin [<xref ref-type="bibr" rid="ref-49">49</xref>].</p>
<p>The proposed BSEML model is an ensemble technique combining base-level classifiers to improve prediction performance [<xref ref-type="bibr" rid="ref-50">50</xref>]. Based on the stacking ensemble structure, the error rate of individual classifiers is minimized. A stacking ensemble consists of a base-level classifier and a meta-level classifier. All base-level classifiers are trained with different approaches to perform target tasks using different learning algorithms. The data diversity of the ensemble model was improved by selecting different base-level classifiers with different parameter boundaries. Since the classifier selected this way was trained with the same extracted features, various predictive models were created with the same input data according to the decision boundary, thereby preserving the uniqueness of each classifier [<xref ref-type="bibr" rid="ref-33">33</xref>]. The meta-level classifier was trained to integrate the robustness of different base-level classifiers by verifying base-level classifier would be more accurate for each class of defects when performing the target task.</p>
<p>The base-level classifier output is then provided to the meta-level classifier to make final predictions [<xref ref-type="bibr" rid="ref-51">51</xref>]. In this study, extreme gradient boosting (XGB) was selected as a meta-level classifier to construct a boosted stacking ensemble.</p>
<p>XGB is the most popular algorithm in tree-based ensemble learning, which is based on the principle of boosting. A strong prediction model is built by weighing the learning error of the weak learner and reflecting it sequentially on the next learning model. Although the model is based on a gradient boosting machine (GBM), it works by solving the problems of slow execution time and lack of regularization, which are the weaknesses of GBM [<xref ref-type="bibr" rid="ref-52">52</xref>].</p>
<p>In this experiment, the meta-level classifier increased the accuracy of final predictions by applying weights to predictions of weak leaner models among basic classifiers and performing parallel learning. <xref ref-type="table" rid="table-2">Tab. 2</xref> shows the algorithm for the BSEML model.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>The algorithm of BSEML model</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left"><bold>Algorithm:</bold> Algorithm for proposed model</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><bold>Input</bold>: training data <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>; base-level classifier <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>; initialized distribution <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>; normalization factor <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>; meta-level classifier <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mi>H</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"><bold>Output</bold>: trained ensemble classifier <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>H</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<italic>Initialisation:</italic></td>
</tr>
<tr>
<td align="left">1:&#x2002;&#x2002;&#x2002;&#x2002;<bold>Step 1</bold>: learn base-level classifiers</td>
</tr>
<tr>
<td align="left">2:&#x2002;&#x2002;&#x2002;&#x2002;<bold>for</bold> <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> to <italic>T</italic> do</td>
</tr>
<tr>
<td align="left">4:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;learn <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> based on <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mi>D</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">5:&#x2002;&#x2002;&#x2002;&#x2002;<bold>end for</bold></td>
</tr>
<tr>
<td align="left">6:&#x2002;&#x2002;&#x2002;&#x2002;<bold>Step 2</bold>: construct new data set of predictions</td>
</tr>
<tr>
<td align="left">7:&#x2002;&#x2002;&#x2002;&#x2002;<bold>for</bold> <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> to <italic>m</italic> do</td>
</tr>
<tr>
<td align="left">8:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mi></mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> where <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mi></mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">9:&#x2002;&#x2002;&#x2002;&#x2002;<bold>End for</bold></td>
</tr>
<tr>
<td align="left">10:&#x2002;&#x2002;&#x2002;&#x2002;<bold>Step 3</bold>: learn a meta-level classifier</td>
</tr>
<tr>
<td align="left">11:&#x2002;&#x2002;&#x2002;&#x2002;<bold>for</bold> n&#x2009;<inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> to <italic>j</italic> do</td>
</tr>
<tr>
<td align="left">12:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;Determine weight <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">13:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;Initialized distribution <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msup><mml:mi>m</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">14:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;Update weights</td>
</tr>
<tr>
<td align="left">15:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">16:&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>n</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">17:&#x2002;&#x2002;&#x2002;&#x2002;<bold>End for</bold></td>
</tr>
<tr>
<td align="left">18:&#x2002;&#x2002;&#x2002;&#x2002;Learn <italic>H</italic> based on <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">19:&#x2002;&#x2002;&#x2002;&#x2002;<bold>Return</bold> <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mi>H</mml:mi></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Once the BSEML model is trained, it can be utilized to classify wafer map patterns. Given wafer map <italic>x</italic> as a new input, the predictive label <italic>y</italic> is derived by the following process. The wafer map <italic>x</italic> is augmented by the CAE model and enters <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msup><mml:mi>g</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>F</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to generate a feature vector. The feature vectors are aggregated to obtain the final probability prediction <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msup><mml:mrow><mml:mover><mml:mi>H</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to form <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msup><mml:mi>g</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>F</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and used as inputs to the BSEML model. When training data <italic>D</italic> is input to the BSEML model, base-level classifiers are trained. Next, a new dataset <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is created based on the data learned in h. Finally, when <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is transferred to the meta-level classifier, the weights for weak learners with low accuracy are updated to perform final class prediction. This gives high weights to weak learners with low accuracy and low weights to strong leaners with high accuracy, resulting in appropriate weight updates [<xref ref-type="bibr" rid="ref-53">53</xref>].
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="italic">argmax</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mover><mml:mi>H</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref> shows the equation for the final class prediction. In <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msup><mml:mrow><mml:mover><mml:mi>H</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>B</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, with the class corresponding to the largest element, the final prediction <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> for the wafer map <italic>x</italic> is obtained.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experimental Analysis</title>
<sec id="s4_1"><label>4.1</label><title>Data Description</title>
<p>The WM-811K dataset, obtained in an actual industrial process, was used in this study; the dataset is publicly available in [<xref ref-type="bibr" rid="ref-54">54</xref>]. The dataset is a map of 811,457 wafers generated from over 40,000 detectors during circuit testing in the manufacturing process. Defect patterns were marked by domain experts in 172,950 wafer maps of theirs. For the experiment, only the wafer maps labeled in the dataset were used. A labeled wafer map belongs to one of the nine defect classes: Center, Donut, Edge-Ring, Edge-Local, Local, Random, Near-full, Scratch, and None. Each wafer map was checked in a two-dimensional array before being passed on to augmentation preprocessing.</p>
<p>As feature extraction was not possible for array elements with fewer than 100 defective elements, four abnormal wafer maps were removed. These four abnormal wafer maps were found to belong to the None class. Therefore, the number of datasets was reduced to 172,496. <xref ref-type="table" rid="table-3">Tab. 3</xref> shows the defect distribution in the labeled dataset. The None class defect occupies the most in the total. The shape of the wafer map varies from (26, 26) to (300, 300). The dataset obtained from the actual process has very few defect patterns and requires a lot of money and time. Therefore, 14,326 training datasets were extracted by randomly sampling from the labeled dataset. In order to apply the wafer map to a later process, all wafer maps were reshaped into (32, 32) where defect patterns were evenly distributed.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Dispersion of data classes in the dataset</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Class Index</th>
<th align="left">Defect pattern</th>
<th align="left">Wafer</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">Center</td>
<td align="left">4294</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">Donut</td>
<td align="left">555</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">Edge-local</td>
<td align="left">5189</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">Edge-ring</td>
<td align="left">9680</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">Local</td>
<td align="left">3593</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">Near-full</td>
<td align="left">149</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">Random</td>
<td align="left">866</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">Scratch</td>
<td align="left">1193</td>
</tr>
<tr>
<td align="left">9</td>
<td align="left">None</td>
<td align="left">147472</td>
</tr>
<tr>
<td align="left">Total</td>
<td align="left"/>
<td align="left">172946</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_2"><label>4.2</label><title>Convolutional Auto-Encoder for Data Augmentation</title>
<p>In the dataset acquired from the actual process, there is a difference in the amount of data for each defect class, and in severe cases, the data is biased toward only the majority class. Machine learning algorithms proceed with learning by assuming that each class has an equal ratio. As for a dataset with a class imbalance, machine learning does not perform precise learning and is biased toward the class which occupies a large proportion of the dataset [<xref ref-type="bibr" rid="ref-55">55</xref>].</p>
<p>The WM-811K used in this study has an imbalanced dataset. The None class accounts for more than 90&#x0025; of the total defects, and there are insufficient defect patterns in the Donut and Near-full classes.</p>
<p>Therefore, in order to expand the number of defect images in the dataset and improve the generalization ability of the model, a data augmentation method based on CAE was used [<xref ref-type="bibr" rid="ref-56">56</xref>].</p>
<p>CAE is a variant of convolutional neural networks that is used as a tool for unsupervised learning of convolution filters [<xref ref-type="bibr" rid="ref-57">57</xref>]. CAE is usually applied in image reconstruction processes to minimize reconstruction errors by learning optimal filters. In AE, the images must be spread out as single vectors and the network must be designed regarding to the constraint on the number of inputs.</p>
<p>However, unlike normal AE that completely ignores the 2D image structure, CAE is a feature extractor that can learn even from two-dimensional images [<xref ref-type="bibr" rid="ref-58">58</xref>]. <xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows the CAE parameters and architecture employed in this study. For each convolution 2D layer of the encoder and decoder of the CAE model, a kernel size of (3, 3) was used, and MaxPooling2D was applied. The kernel size of the pooling layer was (2, 2). ReLu was used for the activation functions of all layers, and a sigmoid for classification was used for the deconvolution layer at the end of the model. The entire process is explained as follows. The received input image data passes through the convolution layer while maintaining the spatial information in the encoder unit. The information passes through the layer, then through the central latent space layer, and finally through the decoder unit with noise added. The noise scale was set to 10&#x0025; to minimize the effect on the defect pattern in this study [<xref ref-type="bibr" rid="ref-59">59</xref>].</p>
<fig id="fig-8"><label>Figure 8</label><caption><title>The architecture of CAE model</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-8.png"/></fig>
</sec>
<sec id="s4_3"><label>4.3</label><title>Structural Similarity Index Measure Methods for Augmented Data Validation</title>
<p>SSIM was used to compare the difference between the original wafer image data and the augmented wafer image data. SSIM is a method designed to evaluate visual similarity rather than numerical error. SSIM specializes in deriving the structural information of the image and compares the degree of distortion of the structural information [<xref ref-type="bibr" rid="ref-60">60</xref>]. The SSIM equation is expressed as <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref>, and the following equations represent its internal equations [<xref ref-type="bibr" rid="ref-3">3</xref>].
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>S</mml:mi><mml:mi>S</mml:mi><mml:mi>I</mml:mi><mml:mi>M</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>2</mml:mn><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mi>L</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>, <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the average <italic>x</italic> of the image, <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> denotes the average <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow></mml:math></inline-formula> of the image, and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> denotes the normalization constant for brightness. The following equation represents the average luminance.
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mi>C</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>y</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref>, <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the standard deviation of input image, <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the standard deviation of the <italic>y</italic> image, and <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> denotes the contrast term constant. The following equation represents the contrast of the image.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mi>S</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref>, <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the correlation coefficient between <italic>x</italic> and <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mi>y</mml:mi></mml:math></inline-formula>. The correlation coefficient between <italic>x</italic> and <italic>y</italic> was calculated to compare the structures of the original image and the augmented image.</p>
<p>By comparing and analyzing raw image data and augmented image data with the SSIM scale, augmented image data with an SSIM value of 90&#x0025; or more were used as feature extraction model input value [<xref ref-type="bibr" rid="ref-61">61</xref>]. <xref ref-type="fig" rid="fig-9">Fig. 9</xref> shows a comparison between the raw image and the augmented image.</p>
<fig id="fig-9"><label>Figure 9</label><caption><title>The comparison of generated image to the original image</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-9.png"/></fig>
</sec>
<sec id="s4_4"><label>4.4</label><title>Experimental Settings</title>
<p>In this study, a dataset of four cases was constructed from the training dataset for performance evaluation. Since the defect class is in a very unbalanced state in the original data, three augmentations were performed to solve this problem. <xref ref-type="table" rid="table-4">Tab. 4</xref> shows the wafer map data organized by enhancement ratio. Case 1 consists of raw wafer data as they are. Case 2 consists of a 30&#x0025; augmented data from raw wafer data. Cases 3 and 4 consist of 40&#x0025;, and 50&#x0025; augmented data from raw wafer data, respectively. In addition, while performing data augmentation, the ratio of each defect pattern in the original data was maintained as much as possible. However, the None class was excluded from augmentation as it contained too much data compared to other defect classes. In each experiment, 80&#x0025; of the data was used as the training dataset for the performance model, and the remaining 20&#x0025; was used as the test dataset.</p>
<table-wrap id="table-4"><label>Table 4</label><caption><title>Distribution of generated images</title></caption>
 
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Case index</th>
<th align="left">Defect pattern type</th>
<th align="left">Data augmentation</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="9">Case 1 (Original)</td>
<td align="left">Center</td>
<td align="left">90</td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">12</td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">285</td>
</tr>
<tr>
<td align="left">Edge-ring</td>
<td align="left">31</td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">297</td>
</tr>
<tr>
<td align="left">Near-full</td>
<td align="left">23</td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">74</td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">65</td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">13,489</td>
</tr>
<tr>
<td align="left"><bold>Total</bold></td>
<td align="left"/>
<td align="left"><bold>14,366</bold></td>
</tr>
<tr>
<td align="left" rowspan="9">Case 2 (30&#x0025; augmentation)</td>
<td align="left">Center</td>
<td align="left">630</td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">80</td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">888</td>
</tr>
<tr>
<td align="left">Edge-ring</td>
<td align="left">527</td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">891</td>
</tr>
<tr>
<td align="left">Near-full</td>
<td align="left">96</td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">592</td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">576</td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">13,489</td>
</tr>
<tr>
<td align="left"><bold>Total</bold></td>
<td align="left"/>
<td align="left"><bold>17,769</bold></td>
</tr>
<tr>
<td align="left" rowspan="9">Case 3 (40&#x0025; augmentation)</td>
<td align="left">Center</td>
<td align="left">900</td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">252</td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">1,184</td>
</tr>
<tr>
<td align="left">Edge-ring</td>
<td align="left">806</td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">1,188</td>
</tr>
<tr>
<td align="left">Near-full</td>
<td align="left">268</td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">888</td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">864</td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">13,489</td>
</tr>
<tr>
<td align="left"><bold>Total</bold></td>
<td align="left"/>
<td align="left"><bold>19,839</bold></td>
</tr>
<tr>
<td align="left" rowspan="9">Case 4 (50&#x0025; augmentation)</td>
<td align="left">Center</td>
<td align="left">1,170</td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">302</td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">1,480</td>
</tr>
<tr>
<td align="left">Edge-ring</td>
<td align="left">1,054</td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">1,485</td>
</tr>
<tr>
<td align="left">Near-full</td>
<td align="left">324</td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">1,110</td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">1,080</td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">13,489</td>
</tr>
<tr>
<td align="left"><bold>Total</bold></td>
<td align="left"/>
<td align="left"><bold>21,494</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This experiment was performed using Python 3.6 in the Ubuntu 12.04 environment, and handcrafted feature extraction was obtained through the scikit-image library [<xref ref-type="bibr" rid="ref-62">62</xref>]. The scikit-learn library, Ensemble-Pytorch library, and XGBoost library were used together for training and comparison models [<xref ref-type="bibr" rid="ref-63">63</xref>&#x2013;<xref ref-type="bibr" rid="ref-65">65</xref>].</p>
</sec>
<sec id="s4_5"><label>4.5</label><title>Validation Methods</title>
<p>Macro-average <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula>, micro-average <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula>, and confusion matrix were used to evaluate classification performance. These are performance metrics commonly used for classifying wafer map patterns, mainly on imbalanced data [<xref ref-type="bibr" rid="ref-66">66</xref>]. <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> are described by the following expressions.
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>C</mml:mi></mml:mfrac><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mi>F</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msubsup><mml:mrow><mml:mo>&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mi>F</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-18"><label>(18)</label><mml:math id="mml-eqn-18" display="block"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where, <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denote precision and recall, respectively. <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> calculates the unweighted average <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> score for each class. <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> allows efficient calculation for a minority class in data with class imbalance by giving same weights to individual classes. Whereas, <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> proceeds with learning by giving the same weight to individual classes, and a class with a large amount of data greatly affects the calculation in data with class imbalance. In this experiment, <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> was smaller than <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> in general, due to the distribution of class imbalance in the dataset.</p>
<p>The confusion matrix is a table that supports the visualization of the performance of a trained classification algorithm in a classification problem. Each row of the matrix denotes an instance of the predicted class, and each column presents an instance of the actual class. The confusion matrix used in this experiment was normalized for effective analysis [<xref ref-type="bibr" rid="ref-67">67</xref>].</p>
</sec>
<sec id="s4_6"><label>4.6</label><title>Compared Models</title>
<p>The proposed model was compared with two basic classifiers and four ensemble-based models. <xref ref-type="table" rid="table-5">Tab. 5</xref> shows the <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> scores of the proposed and compared models for each defect. The two base-level classifiers, the SVM and KNN classifier, are provided by the scikit-learn library, which are the base-level classifiers used in the stacking layer of the BSEML model. The four ensemble-based models selected include voting, stacking, bagging, and boosting models. As for the internal models of the voting ensemble model, SVM, KNN, RF, and DT were selected in the same way as the BSEML model and trained using soft vote method. The internal model of the stacking classifier was also selected in the same way as the BSEML model, and DT was used for the meta-level classifier [<xref ref-type="bibr" rid="ref-68">68</xref>]. As the bagging model, BaggingClassifier, a classifier provided by scikit-learn, was used, and learning was conducted by setting <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi>b</mml:mi><mml:mi>a</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:mrow><mml:mtext mathvariant="italic">estimator</mml:mtext></mml:mrow></mml:math></inline-formula> to RF-based classifier and n_estimator to 100. GBM was used as the boosting model, and <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>n</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:mrow><mml:mtext mathvariant="italic">estimator</mml:mtext></mml:mrow></mml:math></inline-formula> was set to 200 for learning considering that it is robust for overfitting [<xref ref-type="bibr" rid="ref-69">69</xref>].</p>
<table-wrap id="table-5"><label>Table 5</label><caption><title><inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> score comparison analysis of the proposed model distribution for generated image</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Metric</th>
<th align="left">Case</th>
<th align="left">SVM</th>
<th align="left">KNN</th>
<th align="left">Voting</th>
<th align="left">Stacking</th>
<th align="left">Bagging</th>
<th align="left">Boosting</th>
<th align="left">BSEML</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext>macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left">1</td>
<td align="left">0.457</td>
<td align="left">0.274</td>
<td align="left">0.471</td>
<td align="left">0.517</td>
<td align="left">0.432</td>
<td align="left">0.398</td>
<td align="left"><bold>0.525</bold></td>
</tr>
<tr>
<td align="left"/>
<td align="left">2</td>
<td align="left">0.535</td>
<td align="left">0.647</td>
<td align="left">0.668</td>
<td align="left">0.821</td>
<td align="left">0.807</td>
<td align="left">0.851</td>
<td align="left"><bold>0.896</bold></td>
</tr>
<tr>
<td align="left"/>
<td align="left">3</td>
<td align="left">0.545</td>
<td align="left">0.761</td>
<td align="left">0.765</td>
<td align="left">0.867</td>
<td align="left">0.833</td>
<td align="left">0.872</td>
<td align="left"><bold>0.897</bold></td>
</tr>
<tr>
<td align="left"/>
<td align="left">4</td>
<td align="left">0.526</td>
<td align="left">0.756</td>
<td align="left">0.753</td>
<td align="left">0.891</td>
<td align="left">0.843</td>
<td align="left">0.901</td>
<td align="left"><bold>0.931</bold></td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext>micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left">1</td>
<td align="left">0.788</td>
<td align="left">0.836</td>
<td align="left">0.879</td>
<td align="left">0.922</td>
<td align="left">0.898</td>
<td align="left">0.907</td>
<td align="left"><bold>0.933</bold></td>
</tr>
<tr>
<td align="left"/>
<td align="left">2</td>
<td align="left">0.839</td>
<td align="left">0.877</td>
<td align="left">0.885</td>
<td align="left">0.926</td>
<td align="left">0.912</td>
<td align="left">0.923</td>
<td align="left"><bold>0.961</bold></td>
</tr>
<tr>
<td align="left"/>
<td align="left">3</td>
<td align="left">0.806</td>
<td align="left">0.892</td>
<td align="left">0.895</td>
<td align="left">0.930</td>
<td align="left">0.957</td>
<td align="left">0.927</td>
<td align="left"><bold>0.977</bold></td>
</tr>
<tr>
<td align="left"/>
<td align="left">4</td>
<td align="left">0.799</td>
<td align="left">0.873</td>
<td align="left">0.895</td>
<td align="left">0.934</td>
<td align="left">0.961</td>
<td align="left">0.933</td>
<td align="left"><bold>0.978</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Results and Discussion</title>
<p><xref ref-type="table" rid="table-5">Tab. 5</xref> provides a comparison of <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">micro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> for each data size case between the compared models and the proposed model. As shown in <xref ref-type="table" rid="table-5">Tab. 5</xref>, the proposed model outperformed other models in all cases. In particular, it showed a remarkable increase in <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, indicating better classification performance on data with class imbalance. In Case 1, with raw data, <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>F</mml:mi><mml:msub><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">macro</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, showed the best performance with the largest margin. While the accuracy of the pre-model models increased with data augmentation, the proposed model had the best classification performance among them. <xref ref-type="table" rid="table-6">Tab. 6</xref> shows a comparison of the <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> scores of seven models by data case for various defect classes. The ensemble models showed different robustness in each defect class. In particular, the stacking model and the bagging model showed strength in random and scratch classes, respectively.</p>
<table-wrap id="table-6"><label>Table 6</label><caption><title><inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:mi mathvariant="bold-italic">F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> score comparison analysis of the proposed model for every defect class</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Case index</th>
<th align="left">Defect</th>
<th align="left">SVM</th>
<th align="left">KNN</th>
<th align="left">Voting</th>
<th align="left">Stacking</th>
<th align="left">Bagging</th>
<th align="left">Boosting</th>
<th align="left">BSEML</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left" rowspan="9">Case 1</td>
<td align="left">Center</td>
<td align="left">0.558</td>
<td align="left">0.176</td>
<td align="left">0.378</td>
<td align="left">0.390</td>
<td align="left">0.400</td>
<td align="left"><bold>0.500</bold></td>
<td align="left">0.475</td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">0.323</td>
<td align="left">0.333</td>
<td align="left">0.452</td>
<td align="left">0.365</td>
<td align="left">0.331</td>
<td align="left">0.436</td>
<td align="left"><bold>0.565</bold></td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">0.424</td>
<td align="left">0.208</td>
<td align="left">0.466</td>
<td align="left">0.543</td>
<td align="left">0.454</td>
<td align="left">0.534</td>
<td align="left"><bold>0.554</bold></td>
</tr>
<tr>
<td align="left">Edge-Ring</td>
<td align="left">0.400</td>
<td align="left">0.316</td>
<td align="left">0.400</td>
<td align="left">0.545</td>
<td align="left">0.222</td>
<td align="left">0.308</td>
<td align="left"><bold>0.656</bold></td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">0.277</td>
<td align="left">0.148</td>
<td align="left">0.292</td>
<td align="left">0.323</td>
<td align="left">0.276</td>
<td align="left">0.336</td>
<td align="left"><bold>0.370</bold></td>
</tr>
<tr>
<td align="left">Near-Full</td>
<td align="left">0.167</td>
<td align="left">0.556</td>
<td align="left">0.933</td>
<td align="left">0.933</td>
<td align="left">0.667</td>
<td align="left">0.222</td>
<td align="left"><bold>0.750</bold></td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">0.650</td>
<td align="left">0.080</td>
<td align="left">0.545</td>
<td align="left"><bold>0.776</bold></td>
<td align="left">0.515</td>
<td align="left">0.595</td>
<td align="left">0.727</td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">0.267</td>
<td align="left">0.132</td>
<td align="left">0.346</td>
<td align="left">0.296</td>
<td align="left"><bold>0.348</bold></td>
<td align="left">0.320</td>
<td align="left">0.261</td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">0.969</td>
<td align="left">0.978</td>
<td align="left">0.980</td>
<td align="left">0.981</td>
<td align="left">0.979</td>
<td align="left">0.933</td>
<td align="left"><bold>0.982</bold></td>
</tr>
<tr>
<td align="left" rowspan="9">Case 2</td>
<td align="left">Center</td>
<td align="left">0.403</td>
<td align="left">0.580</td>
<td align="left">0.603</td>
<td align="left">0.861</td>
<td align="left"><bold>0.916</bold></td>
<td align="left">0.868</td>
<td align="left">0.869</td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">1.000</td>
<td align="left">0.821</td>
<td align="left">0.951</td>
<td align="left">1.000</td>
<td align="left">1.000</td>
<td align="left">1.000</td>
<td align="left"><bold>1.000</bold></td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">0.276</td>
<td align="left">0.353</td>
<td align="left">0.416</td>
<td align="left">0.605</td>
<td align="left">0.754</td>
<td align="left">0.627</td>
<td align="left"><bold>0.827</bold></td>
</tr>
<tr>
<td align="left">Edge-Ring</td>
<td align="left">0.703</td>
<td align="left">0.790</td>
<td align="left">0.889</td>
<td align="left">0.934</td>
<td align="left">0.955</td>
<td align="left">0.958</td>
<td align="left"><bold>0.958</bold></td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">0.192</td>
<td align="left">0.262</td>
<td align="left">0.246</td>
<td align="left">0.512</td>
<td align="left">0.494</td>
<td align="left">0.565</td>
<td align="left"><bold>0.565</bold></td>
</tr>
<tr>
<td align="left">Near-Full</td>
<td align="left">0.371</td>
<td align="left">0.636</td>
<td align="left">0.389</td>
<td align="left">0.778</td>
<td align="left">0.982</td>
<td align="left">0.830</td>
<td align="left"><bold>0.983</bold></td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">0.385</td>
<td align="left">0.726</td>
<td align="left">0.807</td>
<td align="left"><bold>0.957</bold></td>
<td align="left">0.845</td>
<td align="left">0.926</td>
<td align="left">0.926</td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">0.514</td>
<td align="left">0.684</td>
<td align="left">0.741</td>
<td align="left">0.863</td>
<td align="left"><bold>0.941</bold></td>
<td align="left">0.908</td>
<td align="left">0.909</td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">0.970</td>
<td align="left">0.971</td>
<td align="left">0.969</td>
<td align="left">0.967</td>
<td align="left">0.975</td>
<td align="left">0.975</td>
<td align="left"><bold>0.979</bold></td>
</tr>
<tr>
<td align="left" rowspan="9">Case 3</td>
<td align="left">Center</td>
<td align="left">0.435</td>
<td align="left">0.756</td>
<td align="left">0.694</td>
<td align="left">0.871</td>
<td align="left">0.883</td>
<td align="left">0.891</td>
<td align="left"><bold>0.931</bold></td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">0.994</td>
<td align="left">0.871</td>
<td align="left">0.887</td>
<td align="left">1.000</td>
<td align="left">1.000</td>
<td align="left">0.994</td>
<td align="left"><bold>1.000</bold></td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">0.273</td>
<td align="left">0.561</td>
<td align="left">0.524</td>
<td align="left">0.680</td>
<td align="left">0.789</td>
<td align="left">0.700</td>
<td align="left"><bold>0.826</bold></td>
</tr>
<tr>
<td align="left">Edge-Ring</td>
<td align="left">0.714</td>
<td align="left">0.891</td>
<td align="left">0.905</td>
<td align="left">0.969</td>
<td align="left">0.970</td>
<td align="left">0.971</td>
<td align="left"><bold>0.983</bold></td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">0.285</td>
<td align="left">0.441</td>
<td align="left">0.464</td>
<td align="left">0.658</td>
<td align="left">0.667</td>
<td align="left">0.639</td>
<td align="left"><bold>0.810</bold></td>
</tr>
<tr>
<td align="left">Near-Full</td>
<td align="left">0.382</td>
<td align="left">0.673</td>
<td align="left">0.762</td>
<td align="left">0.824</td>
<td align="left">0.859</td>
<td align="left">0.878</td>
<td align="left"><bold>0.944</bold></td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">0.371</td>
<td align="left">0.857</td>
<td align="left">0.856</td>
<td align="left">0.918</td>
<td align="left">0.935</td>
<td align="left">0.898</td>
<td align="left"><bold>0.971</bold></td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">0.386</td>
<td align="left">0.825</td>
<td align="left">0.797</td>
<td align="left">0.909</td>
<td align="left">0.905</td>
<td align="left">0.906</td>
<td align="left"><bold>0.951</bold></td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">0.971</td>
<td align="left">0.967</td>
<td align="left">0.974</td>
<td align="left">0.976</td>
<td align="left"><bold>0.978</bold></td>
<td align="left">0.969</td>
<td align="left">0.975</td>
</tr>
<tr>
<td align="left" rowspan="9">Case 4</td>
<td align="left">Center</td>
<td align="left">0.405</td>
<td align="left">0.718</td>
<td align="left">0.665</td>
<td align="left">0.891</td>
<td align="left">0.923</td>
<td align="left">0.891</td>
<td align="left"><bold>0.951</bold></td>
</tr>
<tr>
<td align="left">Donut</td>
<td align="left">1.000</td>
<td align="left">0.886</td>
<td align="left">0.940</td>
<td align="left">0.995</td>
<td align="left">0.995</td>
<td align="left">0.995</td>
<td align="left"><bold>0.999</bold></td>
</tr>
<tr>
<td align="left">Edge-Loc</td>
<td align="left">0.297</td>
<td align="left">0.557</td>
<td align="left">0.567</td>
<td align="left">0.758</td>
<td align="left">0.785</td>
<td align="left">0.759</td>
<td align="left"><bold>0.875</bold></td>
</tr>
<tr>
<td align="left">Edge-Ring</td>
<td align="left">0.699</td>
<td align="left">0.865</td>
<td align="left">0.912</td>
<td align="left">0.971</td>
<td align="left">0.976</td>
<td align="left">0.972</td>
<td align="left"><bold>0.981</bold></td>
</tr>
<tr>
<td align="left">Loc</td>
<td align="left">0.234</td>
<td align="left">0.489</td>
<td align="left">0.483</td>
<td align="left">0.693</td>
<td align="left">0.741</td>
<td align="left">0.693</td>
<td align="left"><bold>0.858</bold></td>
</tr>
<tr>
<td align="left">Near-Full</td>
<td align="left">0.350</td>
<td align="left">0.630</td>
<td align="left">0.588</td>
<td align="left">0.861</td>
<td align="left">0.915</td>
<td align="left"><bold>0.959</bold></td>
<td align="left">0.885</td>
</tr>
<tr>
<td align="left">Random</td>
<td align="left">0.424</td>
<td align="left">0.803</td>
<td align="left">0.943</td>
<td align="left">0.953</td>
<td align="left">0.921</td>
<td align="left">0.916</td>
<td align="left"><bold>0.981</bold></td>
</tr>
<tr>
<td align="left">Scratch</td>
<td align="left">0.354</td>
<td align="left">0.796</td>
<td align="left">0.912</td>
<td align="left">0.918</td>
<td align="left">0.917</td>
<td align="left">0.971</td>
<td align="left"><bold>0.975</bold></td>
</tr>
<tr>
<td align="left">None</td>
<td align="left">0.971</td>
<td align="left">0.966</td>
<td align="left">0.967</td>
<td align="left">0.976</td>
<td align="left">0.976</td>
<td align="left">0.947</td>
<td align="left"><bold>0.977</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The proposed model presents good performance for all defect classes. Such results indicate that the proposed model increases the <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> score for each defect class by appropriately utilizing the combination of ensemble models for each different defect pattern. <xref ref-type="fig" rid="fig-10">Fig. 10</xref> shows the normalized confusion matrix of the basic classifier SVM, voting ensemble model and the proposed model. A clear improvement in the classification performance is observed from the base-level classifier to the ensemble classifier and then to the proposed model. The base-level classifier shows high accuracy only for the Donut and None classes. The ensemble classifier shows high accuracy for all defect classes except for a specific defect class. The proposed model combines the strengths of ensemble classifiers to achieve very high defect detection rates for all classes. In addition, the stacking ensemble model and the BSEML model were compared to investigate which defect patterns were weighted as learning progressed. <xref ref-type="fig" rid="fig-11">Fig. 11</xref> shows the average of the learning weight matrix for each defect in the Case 4 data set in the form of a bar graph. A larger weight of the classifier for each defect class indicates that the classifier has greater impact on the final prediction. The proposed model has a tendency to give high weights to defect patterns showing high classification difficulty. The proposed model was trained with a low weight for the Donut and Edge-Loc classes, for which it showed higher classification accuracy than the stacking ensemble model. This means that the classification performance of the proposed model can be improved by assigning appropriate weights to defect patterns for learning.</p>
<fig id="fig-10"><label>Figure 10</label><caption><title>Confusion matrix of the baseline models and proposed model</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-10.png"/></fig><fig id="fig-11"><label>Figure 11</label><caption><title>The comparison of weights for each defect classes</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_33417-fig-11.png"/></fig>
</sec>
<sec id="s6"><label>6</label><title>Conclusion</title>
<p>In this study, an algorithm that combines the reinforcement of insufficient defect patterns with an excellent hybrid model was proposed. The proposed method performs data augmentation using CAE on an image-type wafer map and features were subsequently extracted by applying density-based, geometry-based, and Radon-based feature extraction methods. This feature extraction technique improved the efficiency of the wafer defect identification system by providing detailed information about the wafer map and reducing the amount of computation required for learning. Then, four machine learning classifiers were stacked, and an ensemble model was built by using the XGB Classifier as a meta-level classifier. The proposed method demonstrated superior classification performance compared to those of the base-level classifier and ensemble models and showed robustness against insufficient defects. The effectiveness of the proposed method was verified experimentally using real data sets.</p>
<p>The improved classification performance demonstrated in this study is expected to have a significant effect on the stable automation of wafer map classification, leading to an improvement in product quality and yield in the actual semiconductor manufacturing process. Based on the proposed model, it will be possible to develop a model that guarantees robust performance while maintaining higher performance in various manufacturing domains, and it will also be possible to develop a model optimized for any domain by applying actual datasets from various manufacturing fields.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> This work was funded by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1A5A8033165) and the &#x201C;Human Resources Program in Energy Technology&#x201D; of the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and was granted financial resources from the Ministry of Trade, Industry &#x0026; Energy, Republic of Korea (No. 20214000000200).</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Kuo</surname></string-name> and <string-name><given-names>S. J.</given-names> <surname>Bae</surname></string-name></person-group>, &#x201C;<article-title>Detection of spatial defect patterns generated in semiconductor fabrication processes</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>24</volume>, no. <issue>3</issue>, pp. <fpage>392</fpage>&#x2013;<lpage>403</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. S.</given-names> <surname>Fenner</surname></string-name>, <string-name><given-names>M. K.</given-names> <surname>Jeong</surname></string-name> and <string-name><given-names>J. C.</given-names> <surname>Lu</surname></string-name></person-group>, &#x201C;<article-title>Optimal automatic control of multistage production processes</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>18</volume>, no. <issue>1</issue>, pp. <fpage>94</fpage>&#x2013;<lpage>103</lpage>, <year>2005</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Hong</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Suh</surname></string-name></person-group>, &#x201C;<article-title>Supervised-learning-based intelligent fault diagnosis for mechanical equipment</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>116147</fpage>&#x2013;<lpage>116162</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. G.</given-names> <surname>Shankar</surname></string-name> and <string-name><given-names>Z. W.</given-names> <surname>Zhong</surname></string-name></person-group>, &#x201C;<article-title>Defect detection on semiconductor wafer surfaces</article-title>,&#x201D; <source>Microelectronic Engineering</source>, vol. <volume>77</volume>, no. <issue>3&#x2013;4</issue>, pp. <fpage>337</fpage>&#x2013;<lpage>346</lpage>, <year>2005</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C. M.</given-names> <surname>Tan</surname></string-name> and <string-name><given-names>K. T.</given-names> <surname>Lau</surname></string-name></person-group>, &#x201C;<article-title>Automated wafer defect map generation for process yield improvement</article-title>,&#x201D; in <conf-name>2011 Int. Symp. on Integrated Circuits</conf-name>, <conf-loc>Singapore</conf-loc>, pp. <fpage>313</fpage>&#x2013;<lpage>316</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Baly</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Hajj</surname></string-name></person-group>, &#x201C;<article-title>Wafer classification using support vector machines</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>25</volume>, no. <issue>3</issue>, pp. <fpage>373</fpage>&#x2013;<lpage>383</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Ming-Ju</surname></string-name>, <string-name><given-names>J. -S. R.</given-names> <surname>Jang</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Jui-Long</surname></string-name></person-group>, &#x201C;<article-title>Wafer map failure pattern recognition and similarity ranking for large-scale datasets</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>28</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Jafari-Khouzani</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Soltanian-Zadeh</surname></string-name></person-group>, &#x201C;<article-title>Radon transform orientation estimation for rotation invariant texture analysis</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>27</volume>, no. <issue>6</issue>, pp. <fpage>1004</fpage>&#x2013;<lpage>1008</lpage>, <year>2005</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E. E.</given-names> <surname>Schadt</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Ellis</surname></string-name> and <string-name><given-names>W. H.</given-names> <surname>Wong</surname></string-name></person-group>, &#x201C;<article-title>Feature extraction and normalization algorithms for high-density oligonucleotide gene expression array data</article-title>,&#x201D; <source>Journal of Cellular Biochemistry</source>, vol. <volume>84</volume>, no. <issue>S37</issue>, pp. <fpage>120</fpage>&#x2013;<lpage>125</lpage>, <year>2001</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. E.</given-names> <surname>Mavroforakis</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Theodoridis</surname></string-name></person-group>, &#x201C;<article-title>A geometric approach to support vector machine (SVM) classification</article-title>,&#x201D; <source>IEEE Transactions on Neural Networks</source>, vol. <volume>17</volume>, no. <issue>3</issue>, pp. <fpage>671</fpage>&#x2013;<lpage>682</lpage>, <year>2006</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F. -L.</given-names> <surname>Chen</surname></string-name> and <string-name><given-names>S. -F.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>A neural-network approach to recognize defect spatial pattern in semiconductor fabrication</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>13</volume>, no. <issue>3</issue>, pp. <fpage>366</fpage>&#x2013;<lpage>373</lpage>, <year>2000</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Singh</surname></string-name>, <string-name><given-names>S. S.</given-names> <surname>Gleason</surname></string-name>, <string-name><given-names>J. K. W.</given-names> <surname>Tobin</surname></string-name>, <string-name><given-names>T. P.</given-names> <surname>Karnowski</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Lakhani</surname></string-name></person-group>, &#x201C;<article-title>Rapid yield learning through optical defect and electrical test analysis</article-title>,&#x201D; <source>Metrology, Inspection, and Process Control for Microlithography XII</source>, vol. <volume>3332</volume>, pp. <fpage>232</fpage>&#x2013;<lpage>242</lpage>, <year>1998</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. P.</given-names> <surname>Cunningham</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Mackinnon</surname></string-name></person-group>, &#x201C;<article-title>Statistical methods for visual defect metrology</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>11</volume>, no. <issue>1</issue>, pp. <fpage>48</fpage>&#x2013;<lpage>53</lpage>, <year>1998</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Kyeong</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Kim</surname></string-name></person-group>, &#x201C;<article-title>Classification of mixed-type defect patterns in wafer bin maps using convolutional neural networks</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>31</volume>, no. <issue>3</issue>, pp. <fpage>395</fpage>&#x2013;<lpage>402</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. -C.</given-names> <surname>Chien</surname></string-name>, <string-name><given-names>M. -T.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>J. -D.</given-names> <surname>Lee</surname></string-name></person-group>, &#x201C;<article-title>Inspection and classification of semiconductor wafer surface defects using CNN deep learning networks</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>10</volume>, no. <issue>15</issue>, pp. 5340, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Xu</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Wafer defect pattern recognition and analysis based on convolutional neural network</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>32</volume>, no. <issue>4</issue>, pp. <fpage>566</fpage>&#x2013;<lpage>573</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Saqlain</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Jargalsaikhan</surname></string-name> and <string-name><given-names>J. Y.</given-names> <surname>Lee</surname></string-name></person-group>, &#x201C;<article-title>A voting ensemble classifier for wafer map defect patterns identification in semiconductor manufacturing</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>32</volume>, no. <issue>2</issue>, pp. <fpage>171</fpage>&#x2013;<lpage>182</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Piao</surname></string-name>, <string-name><given-names>C. H.</given-names> <surname>Jin</surname></string-name>, <string-name><given-names>J. Y.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>J. -Y.</given-names> <surname>Byun</surname></string-name></person-group>, &#x201C;<article-title>Decision tree ensemble-based wafer map failure pattern recognition based on radon transform-based features</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>31</volume>, no. <issue>2</issue>, pp. <fpage>250</fpage>&#x2013;<lpage>257</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L. L. Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>K. S. M.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>K. C. C.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>S. J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>A. Y. A.</given-names> <surname>Hwang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>TestDNA-E: Wafer defect signature for pattern recognition by ensemble learning</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>35</volume>, no. <issue>2</issue>, pp. <fpage>373</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>B.</given-names> <surname>van der Waal</surname></string-name></person-group>, &#x201C;<article-title>Wafer defect patterns recognition based on OPTICS and multi-label classification</article-title>,&#x201D; in <conf-name>2016 IEEE Advanced Information Management, Communicates, Electronic and Automation Control Conf. (IMCEC)</conf-name>, <conf-loc>Xi An, China</conf-loc>, pp. <fpage>912</fpage>&#x2013;<lpage>915</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Naseem</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Togneri</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Bennamoun</surname></string-name></person-group>, &#x201C;<article-title>Linear regression for face recognition</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>32</volume>, no. <issue>11</issue>, pp. <fpage>2106</fpage>&#x2013;<lpage>2112</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. -L.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>J. M.</given-names> <surname>Pe&#x00F1;a</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Robles</surname></string-name></person-group>, &#x201C;<article-title>Feature selection for multi-label naive Bayes classification</article-title>,&#x201D; <source>Information Sciences</source>, vol. <volume>179</volume>, no. <issue>19</issue>, pp. <fpage>3218</fpage>&#x2013;<lpage>3229</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zong</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Efficient kNN classification with different numbers of nearest neighbors</article-title>,&#x201D; <source>IEEE Transactions on Neural Networks and Learning Systems</source>, vol. <volume>29</volume>, no. <issue>5</issue>, pp. <fpage>1774</fpage>&#x2013;<lpage>1785</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yu</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Lu</surname></string-name></person-group>, &#x201C;<article-title>Wafer map defect detection and recognition using joint local and nonlocal linear discriminant analysis</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>29</volume>, no. <issue>1</issue>, pp. <fpage>33</fpage>&#x2013;<lpage>43</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Albawi</surname></string-name>, <string-name><given-names>T. A.</given-names> <surname>Mohammed</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Al-Zawi</surname></string-name></person-group>, &#x201C;<article-title>Understanding of a convolutional neural network</article-title>,&#x201D; in <conf-name>2017 Int. Conf. on Engineering and Technology (ICET)</conf-name>, <conf-loc>Antalya, Turkey</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Defect pattern recognition on wafers using convolutional neural networks</article-title>,&#x201D; <source>Quality and Reliability Engineering International</source>, vol. <volume>36</volume>, no. <issue>4</issue>, pp. <fpage>1245</fpage>&#x2013;<lpage>1257</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Nakazawa</surname></string-name> and <string-name><given-names>D. V.</given-names> <surname>Kulkarni</surname></string-name></person-group>, &#x201C;<article-title>Wafer map defect pattern classification and image retrieval using convolutional neural network</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>31</volume>, no. <issue>2</issue>, pp. <fpage>309</fpage>&#x2013;<lpage>314</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Ishida</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Nitta</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Fukuda</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Kanazawa</surname></string-name></person-group>, &#x201C;<article-title>Deep learning-based wafer-map failure pattern recognition framework</article-title>,&#x201D; in <conf-name>20th Int. Symp. on Quality Electronic Design (ISQED)</conf-name>, <conf-loc>Santa Clara, CA, USA</conf-loc>, pp. <fpage>291</fpage>&#x2013;<lpage>297</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. -Y.</given-names> <surname>Hsu</surname></string-name> and <string-name><given-names>J. -C.</given-names> <surname>Chien</surname></string-name></person-group>, &#x201C;<article-title>Ensemble convolutional neural networks with weighted majority for wafer bin map pattern classification</article-title>,&#x201D; <source>Journal of Intelligent Manufacturing</source>, vol. <volume>33</volume>, no. <issue>1</issue>, pp. <fpage>831</fpage>&#x2013;<lpage>844</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T. -H.</given-names> <surname>Tsai</surname></string-name> and <string-name><given-names>Y. -C.</given-names> <surname>Lee</surname></string-name></person-group>, &#x201C;<article-title>A light-weight neural network for wafer map classification based on data augmentation</article-title>,&#x201D; <source>IEEE Transactions on Semiconductor Manufacturing</source>, vol. <volume>33</volume>, no. <issue>4</issue>, pp. <fpage>663</fpage>&#x2013;<lpage>672</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Yu</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Essaf</surname></string-name></person-group>, &#x201C;<article-title>Improved wafer map inspection using attention mechanism and cosine normalization</article-title>,&#x201D; <source>Machines</source>, vol. <volume>10</volume>, no. <issue>2</issue>, pp. <fpage>146</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P. P.</given-names> <surname>Shinde</surname></string-name>, <string-name><given-names>P. P.</given-names> <surname>Pai</surname></string-name> and <string-name><given-names>S. P.</given-names> <surname>Adiga</surname></string-name></person-group>, &#x201C;<article-title>Wafer defect localization and classification using deep learning techniques</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>10</volume>, pp. <fpage>39969</fpage>&#x2013;<lpage>39974</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Polikar</surname></string-name></person-group>, &#x201C;<chapter-title>Ensemble learning</chapter-title>,&#x201D; in <source>Ensemble Machine Learning</source>, <publisher-loc>Bostan, MA, USA</publisher-loc>: <publisher-name>Springer</publisher-name>, pp. <fpage>1</fpage>&#x2013;<lpage>34</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Bonaccorso</surname></string-name></person-group>, &#x201C;<chapter-title>Important elements in machine learning</chapter-title>,&#x201D; in <source>Machine Learning Algorithms</source>, Birmingham, <publisher-loc>MB, UK</publisher-loc>: <publisher-name>Packt Publishing Ltd.</publisher-name>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Xu</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Gu</surname></string-name></person-group>, &#x201C;<article-title>Sentiment classification: The contribution of ensemble learning</article-title>,&#x201D; <source>Decision Support Systems</source>, vol. <volume>57</volume>, pp. <fpage>77</fpage>&#x2013;<lpage>93</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Kang</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Kang</surname></string-name></person-group>, &#x201C;<article-title>A stacking ensemble classifier with handcrafted and convolutional features for wafer map pattern classification</article-title>,&#x201D; <source>Computers in Industry</source>, vol. <volume>129</volume>, pp. <fpage>103450</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O.</given-names> <surname>Sagi</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Rokach</surname></string-name></person-group>, &#x201C;<article-title>Ensemble learning: A survey</article-title>,&#x201D; <source>Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery</source>, vol. <volume>8</volume>, no. <issue>4</issue>, pp. <fpage>e1249</fpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>K. K.</given-names> <surname>Paliwal</surname></string-name></person-group>, &#x201C;<article-title>Feature extraction and dimensionality reduction algorithms and their applications in vowel recognition</article-title>,&#x201D; <source>Pattern Recognition</source>, vol. <volume>36</volume>, no. <issue>10</issue>, pp. <fpage>2429</fpage>&#x2013;<lpage>2439</lpage>, <year>2003</year>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Nixon</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Aguado</surname></string-name></person-group>, &#x201C;<chapter-title>Image processing</chapter-title>,&#x201D; in <source>Feature Extraction and Image Processing for Computer Vision</source>, <publisher-loc>London, UK</publisher-loc>: <publisher-name>Academic Press</publisher-name>, pp. <fpage>83</fpage>&#x2013;<lpage>136</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V. F.</given-names> <surname>Leavers</surname></string-name></person-group>, &#x201C;<article-title>Use of the two-dimensional radon transform to generate a taxonomy of shape for the characterization of abrasive powder particles</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>22</volume>, no. <issue>12</issue>, pp. <fpage>1411</fpage>&#x2013;<lpage>1423</lpage>, <year>2000</year>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>R. M.</given-names> <surname>Haralock</surname></string-name> and <string-name><given-names>L. G.</given-names> <surname>Shapiro</surname></string-name></person-group>, &#x201C;<chapter-title>Computer vision: Overview</chapter-title>,&#x201D; in <source>Computer and Robot Vision</source>, <publisher-loc>California, USA</publisher-loc>: <publisher-name>Addison-Wesley Longman Publishing Co., Inc.</publisher-name>, <year>1992</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. R.</given-names> <surname>Quinlan</surname></string-name></person-group>, &#x201C;<article-title>Learning decision tree classifiers</article-title>,&#x201D; <source>ACM Computing Surveys (CSUR)</source>, vol. <volume>28</volume>, no. <issue>1</issue>, pp. <fpage>71</fpage>&#x2013;<lpage>72</lpage>, <year>1996</year>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Claramunt</surname></string-name></person-group>, &#x201C;<article-title>A spatial entropy-based decision tree for classification of geographical information</article-title>,&#x201D; <source>Transactions in GIS</source>, vol. <volume>10</volume>, no. <issue>3</issue>, pp. <fpage>451</fpage>&#x2013;<lpage>467</lpage>, <year>2006</year>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Paul</surname></string-name>, <string-name><given-names>D. P.</given-names> <surname>Mukherjee</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Das</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Gangopadhyay</surname></string-name>, <string-name><given-names>A. R.</given-names> <surname>Chintha</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Improved random forest for classification</article-title>,&#x201D; <source>IEEE Transactions on Image Processing</source>, vol. <volume>27</volume>, no. <issue>8</issue>, pp. <fpage>4012</fpage>&#x2013;<lpage>4024</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zong</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Cheng</surname></string-name></person-group>, &#x201C;<article-title>Learning k for kNN classification</article-title>,&#x201D; <source>ACM Transactions on Intelligent Systems and Technology (TIST)</source>, vol. <volume>8</volume>, no. <issue>3</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>19</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Bell</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Bi</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Greer</surname></string-name></person-group>, &#x201C;<article-title>KNN model-based approach in classification</article-title>,&#x201D; in <conf-name>OTM Confederated Int. Conf. on the Move to Meaningful Internet Systems</conf-name>, <conf-loc>Catania, Italy</conf-loc>, pp. <fpage>986</fpage>&#x2013;<lpage>996</lpage>, <year>2003</year>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Joachims</surname></string-name></person-group>, &#x201C;<chapter-title>Making large-scale SVM learning practical</chapter-title>,&#x201D; in <source>Advances in Kernel Methods</source>, <publisher-loc>London, England</publisher-loc>: <publisher-name>Technical Report</publisher-name>, pp. <fpage>169</fpage>, <year>1998</year>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Shi</surname></string-name> and <string-name><given-names>J. A.</given-names> <surname>Suykens</surname></string-name></person-group>, &#x201C;<article-title>Support vector machine classifier with pinball loss</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>36</volume>, no. <issue>5</issue>, pp. <fpage>984</fpage>&#x2013;<lpage>997</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Lau</surname></string-name> and <string-name><given-names>Q.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>Online training of support vector classifier</article-title>,&#x201D; <source>Pattern Recognition</source>, vol. <volume>36</volume>, no. <issue>8</issue>, pp. <fpage>1913</fpage>&#x2013;<lpage>1920</lpage>, <year>2003</year>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Bieshaar</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zernetsch</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Hubert</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Sick</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Doll</surname></string-name></person-group>, &#x201C;<article-title>Cooperative starting movement detection of cyclists using convolutional neural networks and a boosted stacking ensemble</article-title>,&#x201D; <source>IEEE Transactions on Intelligent Vehicles</source>, vol. <volume>3</volume>, no. <issue>4</issue>, pp. <fpage>534</fpage>&#x2013;<lpage>544</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Pavlyshenko</surname></string-name></person-group>, &#x201C;<article-title>Using stacking approaches for machine learning models</article-title>,&#x201D; in <conf-name>2018 IEEE Second Int. Conf. on Data Stream Mining &#x0026; Processing (DSMP)</conf-name>, <conf-loc>Lviv, Ukraine</conf-loc>, pp. <fpage>255</fpage>&#x2013;<lpage>258</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>T.</given-names> <surname>He</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Benesty</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Khotilovich</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Xgboost: Extreme gradient boosting</article-title>,&#x201D; <source>R Package Version 0.4-2</source>, vol. <volume>1</volume>, no. <issue>4</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>4</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-53"><label>[53]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>V. B.</given-names> <surname>Vaghela</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ganatra</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Thakkar</surname></string-name></person-group>, &#x201C;<article-title>Boost a weak learner to a strong learner using ensemble system approach</article-title>,&#x201D; in <conf-name>2009 IEEE Int. Advance Computing Conf.</conf-name>, <conf-loc>Patiala, India</conf-loc>, pp. <fpage>1432</fpage>&#x2013;<lpage>1436</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-54"><label>[54]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>LAB</surname></string-name></person-group>, &#x201C;<article-title>WM-811k datasets</article-title>,&#x201D; in <italic>LSWMD Data (Accessed 12 July 2020)</italic>. [Online]. Available: <uri xlink:href="https://mirlab.org/dataSet/public">https://mirlab.org/dataSet/public</uri>.</mixed-citation></ref>
<ref id="ref-55"><label>[55]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Kaur</surname></string-name>, <string-name><given-names>H. S.</given-names> <surname>Pannu</surname></string-name> and <string-name><given-names>A. K.</given-names> <surname>Malhi</surname></string-name></person-group>, &#x201C;<article-title>A systematic review on imbalanced data challenges in machine learning</article-title>,&#x201D; <source>ACM Computing Surveys</source>, vol. <volume>52</volume>, no. <issue>4</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>36</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-56"><label>[56]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Luo</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Wen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Fei</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Shuo</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>GPR B-scan image denoising via multi-scale convolutional autoencoder with data augmentation</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>10</volume>, no. <issue>11</issue>, pp. <fpage>1269</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-57"><label>[57]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Shi</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Guizani</surname></string-name></person-group>, &#x201C;<article-title>Deep features learning for medical image analysis with convolutional autoencoder neural network</article-title>,&#x201D; <source>IEEE Transactions on Big Data</source>, vol. <volume>7</volume>, no. <issue>4</issue>, pp. <fpage>750</fpage>&#x2013;<lpage>758</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-58"><label>[58]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Masci</surname></string-name>, <string-name><given-names>U.</given-names> <surname>Meier</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Cire&#x015F;an</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Schmidhuber</surname></string-name></person-group>, &#x201C;<article-title>Stacked convolutional auto-encoders for hierarchical feature extraction</article-title>,&#x201D; in <conf-name>Int. Conf. on Artificial Neural Networks</conf-name>, <conf-loc>Berlin, Hidelberg, Germany</conf-loc>, pp. <fpage>52</fpage>&#x2013;<lpage>59</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-59"><label>[59]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. S.</given-names> <surname>Seyfio&#x011F;lu</surname></string-name>, <string-name><given-names>A. M.</given-names> <surname>&#x00D6;zbayo&#x011F;lu</surname></string-name> and <string-name><given-names>S. Z.</given-names> <surname>G&#x00FC;rb&#x00FC;z</surname></string-name></person-group>, &#x201C;<article-title>Deep convolutional autoencoder for radar-based classification of similar aided and unaided human activities</article-title>,&#x201D; <source>IEEE Transactions on Aerospace and Electronic Systems</source>, vol. <volume>54</volume>, no. <issue>4</issue>, pp. <fpage>1709</fpage>&#x2013;<lpage>1723</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-60"><label>[60]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Brunet</surname></string-name>, <string-name><given-names>E. R.</given-names> <surname>Vrscay</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>On the mathematical properties of the structural similarity index</article-title>,&#x201D; <source>IEEE Transactions on Image Processing</source>, vol. <volume>21</volume>, no. <issue>4</issue>, pp. <fpage>1488</fpage>&#x2013;<lpage>1499</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-61"><label>[61]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Hore</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Ziou</surname></string-name></person-group>, &#x201C;<article-title>Image quality metrics: PSNR vs. SSIM</article-title>,&#x201D; in <conf-name>2010 20th Int. Conf. on Pattern Recognition</conf-name>, <conf-loc>Istanbul, Turkey</conf-loc>, pp. <fpage>2366</fpage>&#x2013;<lpage>2369</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-62"><label>[62]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Walt</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Sch&#x00F6;nberger</surname></string-name>, <string-name><given-names>J. L.</given-names> <surname>Nunez</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Boulogne</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Warner</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Scikit-image: Image processing in Python</article-title>,&#x201D; <source>PeerJ</source>, vol. <volume>2</volume>, pp. <fpage>e453</fpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-63"><label>[63]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Pedregosa</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Varoquaux</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Gramfort</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Michel</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Thirion</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Scikit-learn: Machine learning in Python</article-title>,&#x201D; <source>The Journal of Machine Learning Research</source>, vol. <volume>12</volume>, pp. <fpage>2825</fpage>&#x2013;<lpage>2830</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-64"><label>[64]</label><mixed-citation publication-type="web"><collab>Ensemble-PyTorch</collab>, <year>2021</year>. [Online]. Available: <uri xlink:href="https://ensemble-pytorch.readthedocs.io/">https://ensemble-pytorch.readthedocs.io/</uri>.</mixed-citation></ref>
<ref id="ref-65"><label>[65]</label><mixed-citation publication-type="web"><collab>XGBoost</collab>, <year>2016</year>. [Online]. Available: <uri xlink:href="https://xgboost.readthedocs.io/">https://xgboost.readthedocs.io/</uri>.</mixed-citation></ref>
<ref id="ref-66"><label>[66]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Opitz</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Burst</surname></string-name></person-group>, &#x201C;<article-title>Macro f1 and macro f1</article-title>,&#x201D; arXiv preprint arXiv:1911.03347, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-67"><label>[67]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Visa</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Ramsay</surname></string-name>, <string-name><given-names>A. L.</given-names> <surname>Ralescu</surname></string-name> and <string-name><given-names>E.</given-names> <surname>Van Der Knaap</surname></string-name></person-group>, &#x201C;<article-title>Confusion matrix-based feature selection</article-title>,&#x201D; <source>MAICS</source>, vol. <volume>710</volume>, pp. <fpage>120</fpage>&#x2013;<lpage>127</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-68"><label>[68]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Kozak</surname></string-name> and <string-name><given-names>U.</given-names> <surname>Boryczka</surname></string-name></person-group>, &#x201C;<article-title>Multiple boosting in the ant colony decision forest meta-classifier</article-title>,&#x201D; <source>Knowledge-Based Systems</source>, vol. <volume>75</volume>, pp. <fpage>141</fpage>&#x2013;<lpage>151</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-69"><label>[69]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>V. K.</given-names> <surname>Ayyadevara</surname></string-name></person-group>, &#x201C;<chapter-title>Gradient boosting machine</chapter-title>,&#x201D; in <source>Pro Machine Learning Algorithms</source>, <publisher-loc>Berkeley, CA, USA</publisher-loc>: <publisher-name>Springer</publisher-name>, pp. <fpage>117</fpage>&#x2013;<lpage>134</lpage>, <year>2018</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>
















