<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">60319</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.060319</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Semi-Supervised New Intention Discovery for Syntactic Elimination and Fusion in Elastic Neighborhoods</article-title>
<alt-title alt-title-type="left-running-head">Semi-Supervised New Intention Discovery for Syntactic Elimination and Fusion in Elastic Neighborhoods</alt-title>
<alt-title alt-title-type="right-running-head">Semi-Supervised New Intention Discovery for Syntactic Elimination and Fusion in Elastic Neighborhoods</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Wu</surname><given-names>Di</given-names></name><xref rid="cor1" ref-type="corresp">&#x002A;</xref><email>wudiwudi@hebeu.edu.cn</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Feng</surname><given-names>Liming</given-names></name></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Wang</surname><given-names>Xiaoyu</given-names></name></contrib>
<aff id="aff-1"><institution>School of Information and Electronic Engineering, Hebei University of Engineering</institution>, <addr-line>Handan, 056038</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Di Wu. Email: <email>wudiwudi@hebeu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>26</day><month>03</month><year>2025</year>
</pub-date>
<volume>83</volume>
<issue>1</issue>
<fpage>977</fpage>
<lpage>999</lpage>
<history>
<date date-type="received">
<day>29</day>
<month>10</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>1</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_60319.pdf"></self-uri>
<abstract>
<p>Semi-supervised new intent discovery is a significant research focus in natural language understanding. To address the limitations of current semi-supervised training data and the underutilization of implicit information, a Semi-supervised New Intent Discovery for Elastic Neighborhood Syntactic Elimination and Fusion model (SNID-ENSEF) is proposed. Syntactic elimination contrast learning leverages verb-dominant syntactic features, systematically replacing specific words to enhance data diversity. The radius of the positive sample neighborhood is elastically adjusted to eliminate invalid samples and improve training efficiency. A neighborhood sample fusion strategy, based on sample distribution patterns, dynamically adjusts neighborhood size and fuses sample vectors to reduce noise and improve implicit information utilization and discovery accuracy. Experimental results show that SNID-ENSEF achieves average improvements of 0.88%, 1.27%, and 1.30% in Normalized Mutual Information (NMI), Accuracy (ACC), and Adjusted Rand Index (ARI), respectively, outperforming PTJN, DPN, MTP-CLNN, and DWG models on the Banking77, StackOverflow, and Clinc150 datasets. The code is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/qsdesz/SNID-ENSEF">https://github.com/qsdesz/SNID-ENSEF</ext-link>, accessed on 16 January 2025.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Natural language understanding</kwd>
<kwd>semi-supervised new intent discovery</kwd>
<kwd>syntactic elimination contrast learning</kwd>
<kwd>neighborhood sample fusion strategies</kwd>
<kwd>bidirectional encoder representations from transformers (BERT)</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Nature Science Foundation of Hebei Province</funding-source>
<award-id>F2021402005</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Dialogue generation is a key research area in natural language processing [<xref ref-type="bibr" rid="ref-1">1</xref>], with intent recognition serving as its foundation. Accurate intent identification is essential for addressing dialogue generation challenges. However, existing models cannot directly recognize undefined intents, requiring unknown intents to be mapped to predefined categories. New intent discovery clusters similar unknown intents, facilitating intent definition and reducing the complexity of dialogue generation across various domains [<xref ref-type="bibr" rid="ref-2">2</xref>]. Leveraging labeled intent data for semi-supervised new intent discovery is crucial [<xref ref-type="bibr" rid="ref-3">3</xref>], as it improves unknown intent recognition and advances dialogue generation development [<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>Pre-trained models possess strong sentence representation capabilities. Bidirectional Encoder Representations from Transformers (BERT), proposed by Devlin et al. [<xref ref-type="bibr" rid="ref-5">5</xref>], laid the foundation for pre-trained models with its encoder-only architecture. It captures rich contextual information using the Masked Language Modeling (MLM) task and the Next Sentence Prediction (NSP) task. Building on BERT, Roberta, proposed by Liu [<xref ref-type="bibr" rid="ref-6">6</xref>], removes the NSP objective and optimizes hyperparameters. DistilBERT, introduced by Sanh [<xref ref-type="bibr" rid="ref-7">7</xref>] reduce the size of BERT using knowledge distillation techniques while retaining most of its performance, making it more efficient for real-time applications. ALBERT, proposed by Lan et al. [<xref ref-type="bibr" rid="ref-8">8</xref>], introduces parameter sharing and factorized embedding techniques, significantly reducing the model size while maintaining competitiveness in natural language understanding tasks. Efficiently Learning an Encoder that Classifies Token Replacements Accurately (ELECTRA), proposed by Clark et al. [<xref ref-type="bibr" rid="ref-9">9</xref>], draws inspiration from Generative Adversarial Networks (GANs), where the model learns to distinguish between real and fake labels to achieve strong sentence representation capabilities. Despite the success of these models in sentence representation, these approaches still face challenges related to resource consumption and reliance on large datasets.</p>
<p>Reliance on manual annotation is reduced by leveraging unlabeled data, which is particularly useful in scenarios with abundant unlabeled data. Celik et al. [<xref ref-type="bibr" rid="ref-10">10</xref>] proposed a teacher-student learning paradigm based on feature refinement and pseudo-labeling, minimizing dependence on labeled data. Jin et al. [<xref ref-type="bibr" rid="ref-11">11</xref>] introduced DictABSA, a knowledge-enhanced framework for Aspect-based Sentiment Analysis (ABSA), incorporating descriptive knowledge from the Oxford Dictionary to address the challenge of large-scale supervised corpora. Yang et al. [<xref ref-type="bibr" rid="ref-12">12</xref>] proposed a Node-level Capsule Graph Neural Network (NCGNN) to prevent feature overmixing during learning. Xiu et al. [<xref ref-type="bibr" rid="ref-13">13</xref>] created Semi-supervised Hybrid Tensor Networks (SHTN), utilizing unsupervised modules to generate pseudo-labels. Yang et al. [<xref ref-type="bibr" rid="ref-14">14</xref>] introduced a Sequential Visual and Semantic Consistency (SVSC) semi-supervised learning method, combining visual and semantic aspects with word-level coherence regularization. Wang et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] proposed a Semiotic Signal Integration Network (SSIN), combining syntactic and semantic features while addressing computational resource demands. SVSC uses unlabeled data for visual-semantic integration. Zhao et al. [<xref ref-type="bibr" rid="ref-16">16</xref>] developed PromptMR, a series of prompt learning methods for metonymy resolution, mitigating resource scarcity. While these studies reduce labeled data dependence, they do not thoroughly address the impact of pseudo-labeling noise.</p>
<p>To address the noise problem associated with unlabeled data, researchers have proposed numerous data enhancement techniques to explicitly expand labeled datasets. Wei et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] introduced Easy Data Augmentation (EDA), which consists of four powerful data augmentation methods aimed at augmenting labeled data. Zhao et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] proposed an edge enhancement technique, utilizing explicit graph-based approaches to expand the labeled data. Whitehouse et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] introduced a novel data enhancement method based on Large Language Models (LLMs), leveraging LLMs to enhance raw data at both the context and entity levels. Thakur et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] proposed enhanced sentence embeddings using Siamese BERT networks (SBERT) to improve data quality. Qiu et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] developed a hierarchical framework combining large language models with deep reinforcement learning, effectively inducing cooperative behavior among agents to extract complex semantic information and improve distillation data labeling quality. This approach also makes efficient use of unlabeled data. Ziyaden et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] proposed a combined data enhancement strategy, expanding the training dataset through the integration of EDA techniques with text translation. These studies minimize the impact of label noise through various data enhancement strategies. However, they generally fail to deeply explore the full potential of the information available in the training data.</p>
<p>Information between data structures can be utilized by comparative learning. The utilization of training data is enriched. Clustering Contrastive Learning (CCL) was proposed by Qin et al. [<xref ref-type="bibr" rid="ref-23">23</xref>]. Cluster graphs are played as individual graphs in contrastive learning. Model feature distribution uniformity is enhanced. A new Asymmetric Contrastive Learning for Graphs (GraphACL) was proposed by Xiao et al. [<xref ref-type="bibr" rid="ref-24">24</xref>]. Anchor and nearby neighbors are selected as positive example pairs with different samples. Discriminative representations of the discourse are obtained. A pre-training paradigm based on comparative learning was proposed by Gao et al. [<xref ref-type="bibr" rid="ref-25">25</xref>], considering an asymmetric view of the neighboring nodes, enhancing the model&#x2019;s ability to discover new intents. A novel contrastive learning to improve diversity and discriminability for domain adaptation (IDD-ICL) was proposed by Xu et al. [<xref ref-type="bibr" rid="ref-26">26</xref>]. A new implicit contrast learning loss is designed at the sample level to implicitly enhance the samples in the source domain. Data intrinsic structure information is used by the above methods to aid training. The number of training data is increased. However, the issues of training data validity and feature vector matching are not considered.</p>
<p>To alleviate the mismatch between feature acquisition and task, a Robust and Adaptive Prototypical learning framework (RAP) was proposed by Zhang et al. [<xref ref-type="bibr" rid="ref-27">27</xref>]. Instances are forced to aggregate toward their corresponding prototypes. Decision boundaries suitable for new intent categories are formed. A Cluster semantic enhanced Prompt Learning (CsePL) was proposed by Liang et al. [<xref ref-type="bibr" rid="ref-28">28</xref>]. Two-level contrast learning with labeled semantic alignment is utilized to diminish the dominance of existing intents. The spacing within classes is reduced. A new Interactive Supervision for New Intent Discovery (INS-NID) was proposed by Hu et al. [<xref ref-type="bibr" rid="ref-29">29</xref>]. A connection between parameter clustering and representation learning is established. A novel Semi-Supervised Fuzzy c-means approach was proposed by Oskouei et al. [<xref ref-type="bibr" rid="ref-30">30</xref>], which applies adaptive weights to each feature based on its importance in clustering, thereby ensuring an optimal clustering structure. A Multi-view Clustering Intent Discovery Framework (MCIDF) was proposed by Liu et al. [<xref ref-type="bibr" rid="ref-31">31</xref>]. A two-branch representation learning strategy is employed by MCIDF to learn high-quality discourse representations. The degree of cohesion is enhanced. The Graph Smoothing Filter (GSF) was proposed by Shi et al. [<xref ref-type="bibr" rid="ref-32">32</xref>]. Structural relations are explicitly utilized to filter the high-frequency noise contained in semantically ambiguous samples on the clustering boundary. While model adaptability in feature vector extraction is improved, the implicit information in the sample distribution pattern remains underutilized.</p>
<p>In summary, a Semi-supervised New Intent Discovery model for Elastic Neighborhood Syntactic Elimination and Fusion (SNID-ENSEF) is proposed to enhance the utilization of implicit data information. Syntactic elimination contrast learning is employed to maximize valid data usage and reduce invalid training samples, improving training data quality. Features conducive to new intent discovery are generated. Neighborhood sample fusion strategies exploit intrinsic data structure, replacing sample representations with neighborhood cluster representations, thereby reducing task difficulty and improving new intent discovery accuracy.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>The SNID-ENSEF Model</title>
<p>The SNID-ENSEF model framework is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The SNID-ENSEF model framework</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-1.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the framework is divided into three parts. The first part presents the Semi-supervised New Intent Discovery Framework, which includes sentence representation pre-training, sentence indicating learning, and new intent discovery. Sentence representation pre-training uses the Banking dataset, containing both labeled &#x201C;known data&#x201D; and unlabeled &#x201C;unknown data.&#x201D; For the known data, a cross-entropy classification task is performed, while for the unknown data, a mask prediction task is used. Both tasks are pre-trained jointly with outputs pooled from the pooling layer. Sentence indicating learning applies elastic neighborhood selection, where the neighborhood radius is determined by an elastic algorithm. The positive sample domain is refined by calculating an elimination ratio using supervised information to reduce ineffective samples. Data augmentation replaces verbs with semantically similar ones to enhance data diversity, followed by the computation of contrastive learning loss to complete the sentence representation training. For new intent discovery, the trained model generates sentence representations, which are processed through a nearest-neighbor fusion strategy. The nearest-neighbor domain size is selected, and samples are fused to obtain the final representation. Intent classification algorithms are then used to discover new intents and form new intent clusters. The second part illustrates the Example of Invalid Sample Elimination, showing the proportion of ineffective and effective samples in a pie chart, where two ineffective samples are eliminated from a total of 20, increasing the proportion of effective samples. The third part depicts the Example of Sample Fusion, where the Neighbor Sample represents the neighboring domain, the Original Sample is the pre-fusion sample, and the Fusional Sample is the resulting fused sample. A sample&#x2019;s neighborhood is selected and fused using mean aggregation to improve the accuracy of the representation.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Sentence Representation Pre-Training</title>
<p>High-quality sentence representation is essential for accurate new intent discovery. Multi-task pre-training is conducted using the BERT model to adapt representations for this task, integrating masked language modeling and sentence classification. Through predicting missing words and classifying sentences, the model learns intent-aware representations, enhancing its ability to handle unseen topics and diverse intents. The core of masked language modeling is to mask certain words in a sentence and predict them based on the remaining context. This process enables the model to capture both the semantics of intent-related words and the overall sentence structure. An illustration of this task is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The illustration of the masked language modeling task</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-2.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, text denotes the text input to the model, the words in the sentence are masked partially using a random masking strategy, and the predicted words are output after model modeling. Predicting masked words in a sentence allows the model to understand the internal structure of the sentence and learn sentence information. The equation of loss <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> denotes the masked prediction probability distribution, <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:msub><mml:mrow><mml:mi>W</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the predicted word, <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:msub><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the masked sentence, and <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the number of masked words. The loss in masked tasks is reduced. The ability to predict masked words using sentence context is improved. Sentence dependencies are captured more effectively. The core idea is for the classification task to generate deep feature representations of the sentences. The corresponding sentence label is then calculated. Key features of the text are extracted during the classification process. Sentence comprehension is improved. The illustration of the classification task is shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>The illustration of classification task</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-3.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, <italic>CLS</italic> is the output of the model, Model Output is the model output layer, Liner Layer is the linear layer, and Softmax is the normalization layer. The <italic>CLS</italic> output from the model is fed into a linear layer. Several linear layers reduce the high-dimensional features to match the number of classes. A Softmax layer normalizes the probabilities to a range between 0 and 1. The classification probability <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is obtained. Combined with the class label, it is then processed through cross-entropy loss for calculation. The equation of cross-entropy classification loss <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the number of categorized samples, <italic>C</italic> is the number of categories, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:msub><mml:mrow><mml:mi>&#x03C6;</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is a symbolic function (0 or 1), indicating that the true category of sample <italic>n</italic> is equal to <italic>c</italic> takes 1. Otherwise it takes 0, which is the predicted probability that sample <italic>n</italic> belongs to the category. The masked language modeling loss is added to the classification loss, resulting in the total multi-task pre-training loss <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. The equation of <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>u</mml:mi><mml:mi>l</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the loss of the masked language modeling task, and <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the loss of the classification task. Joint training of the two tasks helps prevent the SNID-ENSEF model from overfitting on a single task or data type. A balanced optimization across different tasks results in effective initial sentence representations, providing a solid foundation for subsequent training. The final layer of the SNID-ENSEF model connects to a mean pooling layer, preserving overall semantic information. The representation vectors for each word are averaged, producing a sentence vector that captures the combined semantic information of the words in the sentence. The equation of the pooled representation <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>Y</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>o</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:munderover><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the number of word vectors, <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:msub><mml:mrow><mml:mi>F</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the feature vector of a word, and <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>i</mml:mi></mml:math></inline-formula> indicates the position of the word vector. Mean pooling is applied to the word vectors by calculating the average representation of all words. The influence of random or irrelevant words is reduced on the overall representation. The overall representation improves stability and mitigates noise to some extent. The process facilitates subsequent training and the discovery of new intents.</p>
<p>Multi-task pre-training allows the SNID-ENSEF model to learn data features from different perspectives. The distinguishability of sentence representations is enhanced, and understanding of intent domain sentences is improved. The pooling layer outputs sentence representations, reducing the impact of noise and reinforcing stability.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Sentence Representation Learning</title>
<p>After multi-task pre-training, universal intent sentence representations are obtained, but they lack task-specific optimization for new intent discovery. To address this, a syntactic contrastive learning approach is proposed. First, syntactic elimination increases the proportion of valid samples in the positive sample domain by removing invalid ones, improving training efficiency. This is akin to clearing clutter, allowing the model to focus on relevant data. Second, syntactic data augmentation enriches sample diversity, introducing varied representations within the same category. Together, these strategies help the model more effectively locate useful samples and benefit from enhanced sample diversity, improving new intent discovery.</p>
<p>The selection method for positive samples is crucial in contrastive learning. A semi-supervised approach is used to maximize the number of positive samples for training. Supervised information is combined to flexibly adjust the neighborhood radius of positive samples and define the positive sample domain. The elastic neighborhood radius <italic>R</italic> is chosen to maximize the number of positive sample domains while minimizing the boundary of ineffective samples. The significance of finding the elastic neighborhood radius lies in identifying the optimal region around each sample to balance useful data with minimal irrelevant noise, ensuring more accurate intent classification. Within an appropriate elastic neighborhood radius, only the most relevant data surrounding each example is included. This process is akin to continuously zooming in or out until the optimal level of detail is achieved. The selection of the elastic neighborhood radius <italic>R</italic> is shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The selection of the elastic neighborhood radius <italic>R</italic></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-4.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, <italic>R</italic> represents the elastic neighborhood radius, <italic>N</italic> denotes the number of iterations, and <italic>MN</italic> is a variable indicating whether invalid samples exist in the neighborhood during the <italic>N</italic>-th iteration. When <italic>M</italic> is 0, it means there are no invalid samples in the neighborhood, and when <italic>M</italic> is 1, it means there are invalid samples in the neighborhood. <italic>K</italic> represents the state change counter. When <italic>K</italic> is greater than 2, it indicates that the neighborhood radius has undergone a large-small-moderate or small-large-moderate state change, meaning <italic>R</italic> is the appropriate neighborhood radius. <italic>N</italic> is equal to 0 and used to check whether the loop has run at least once. During the first iteration of the loop, there is no historical state, so the comparison of states is skipped. The overall process begins by calculating the upper and lower bounds of the elastic neighborhood radius <italic>R</italic> as the initial input. A binary search method is used to find the appropriate neighborhood radius. If invalid samples are found within the radius <italic>R</italic>, the radius is reduced until no invalid samples are present. Then, R is increased until invalid samples are just present. If no invalid samples are found within the neighborhood of radius <italic>R</italic>, <italic>R</italic> is first increased until invalid samples are present, then decreased until no invalid samples are found. The illustration of the elastic neighborhood radius <italic>R</italic> is shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>The illustration of the elastic neighborhood radius <italic>R</italic></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-5.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, yellow samples represent valid samples, while green samples represent invalid samples. The orange cross-circles indicate that the radius is too large, causing too many invalid samples in the neighborhood. The yellow dashed circles represent a slightly smaller neighborhood radius, which cannot include as many valid samples as possible. The green circles represent the appropriate neighborhood radius. If the judgment condition is <italic>K</italic> &#x003C; 1, the green circle cannot be obtained, and the search will always fall into ranges that are either too large or too small. Therefore, the judgment condition is set to <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mrow><mml:mi>K</mml:mi></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mn>2</mml:mn></mml:math></inline-formula>, allowing for an adjustment after the radius becomes too large or too small.</p>
<p>Due to the presence of ineffective samples in the positive sample domain and a more significant number of unknown ineffective samples, supervised information is used to calculate the number of ineffective samples in the selection strategy and to remove the ineffective samples from the supervised portion. The number of ineffective samples is then used to estimate the ineffective sample ratio in the positive sample domain and to calculate the reduction sample rate. It ensures that after the elimination of positive samples, the overall ineffective sample ratio increases, improving the training effectiveness of syntactic contrastive learning. Before the elimination of ineffective samples, the equation of the sample efficiency <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mi>&#x03B5;</mml:mi></mml:math></inline-formula> under the initial elastic neighborhood radius <italic>R</italic> is shown below:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mi>&#x03B5;</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the total number of samples, and <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the estimated number of ineffective samples. The process of eliminating ineffective samples can be understood as extracting <italic>L</italic> samples from <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> samples. After extracting <italic>L</italic> samples, the proportion of effective samples in the total can fall into three scenarios. Remaining unchanged, decreasing, or increasing. When <italic>L</italic> samples are extracted, and <italic>Q</italic> effective samples are present, the efficiency remains unchanged. The equation for calculating <italic>Q</italic> is shown below:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>Q</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B5;</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>L</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the total number of samples, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the estimated number of ineffective samples, <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mi>&#x03B5;</mml:mi></mml:math></inline-formula> indicates the efficiency of the elastic neighborhood samples, and <italic>L</italic> signifies the number of samples to be eliminated. When <italic>L</italic> samples are extracted, and the number of effective samples is greater than <italic>Q</italic>, the efficiency after extraction is less than that before extraction. The equation for calculating the probability <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>O</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mi>Q</mml:mi></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represents the probability of having e-positive samples in the eliminated samples, and the summation indicates the cumulative probability. When the number of extracted effective samples is less than <italic>Q</italic>, the efficiency after extraction is lower than that before extraction. The equation for calculating the probability <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represents the probability of extracting e-positive samples, and the summation indicates the cumulative probability. After calculating the probabilities, <italic>L</italic> is taken when the probability <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mrow><mml:msub><mml:mrow><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> exceeds 50%. The negative impact of ineffective samples on training effectiveness far exceeds the benefits of a low proportion of additional effective samples. Therefore, when the extraction probability exceeds 50%, the overall training performance of the model is improved. The equation for calculating the reduction rate <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>p</mml:mi></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>p</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mi>L</mml:mi><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula>where <italic>L</italic> represents the number of eliminated samples, and <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the total number of samples. The equation for calculating the positive sample domain <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mrow><mml:msub><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mrow><mml:msub><mml:mrow><mml:mi>H</mml:mi></mml:mrow><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the original positive sample domain, and <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>p</mml:mi></mml:math></inline-formula> denotes the reduction rate. It indicates that the positive sample domain is eliminated with a probability of <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>p</mml:mi></mml:math></inline-formula>. After obtaining a suitable positive sample domain, data augmentation is applied to the positive samples using a combination of syntactic data enhancement and random token replacement. Data augmentation increases the diversity of the training data, enhancing the model&#x2019;s ability to recognize sentences. The positive sample data augmentation is shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>The positive sample data augmentation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-6.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the <italic>i</italic> training sample, <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>J</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the positive sample uniformly selected from <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> positive sample domain, <italic>G</italic> indicates the sample after syntactic data augmentation, and <italic>S</italic> represents the sample after random token replacement. The specific operation of syntactic data augmentation involves two steps. The first step considers syntactic factors in selecting verbs as replacement words, and the second step involves choosing synonyms for substitution. Using the sentence &#x201C;What steps are taken to transfer many into my account&#x201D; as an example, it demonstrates how syntactic augmentation selects and replaces syntactic tokens. An example of syntactic replacement is shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>The illustration of syntactic replacement</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-7.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, the original sentence is analyzed using Stanza CoreNLP [<xref ref-type="bibr" rid="ref-33">33</xref>] for part-of-speech tagging, obtaining the part-of-speech for each word in the sentence, such as &#x2018;VB&#x2019; for verbs and &#x2018;NN&#x2019; for nouns. &#x2019;Syntactic Select&#x2019; refers to the process of selecting the word with the part-of-speech tag &#x2018;VB&#x2019; (the word &#x2018;transfer&#x2019;). &#x201C;Similarity search and select&#x201D; refers to searching for the list of candidate words with the highest semantic similarity to the selected word. Using an open-source online dictionary [<xref ref-type="bibr" rid="ref-34">34</xref>], a semantic similarity search is performed for the word &#x2018;transfer,&#x2019; identifying the 15 most semantically similar words as candidates for replacement. One word is randomly selected from this list to replace the word in the VB position. This results in the syntactically augmented sentence. The illustration of random token replacement is shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>The illustration of random token replacement</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-8.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, &#x2018;Random select&#x2019; refers to the random selection of positions for replacement words, while &#x2018;Random replace&#x2019; indicates the process of completing data augmentation using randomly selected words for substitution. By applying both syntactic enhancement and random replacement strategies to the sentences, positive samples for data augmentation are obtained. Subsequently, syntactic elimination contrastive learning is used to train the model. The equation of syntactic elimination contrastive learning loss <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mi>O</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi><mml:mi>O</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:munder><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mrow><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msubsup><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mrow><mml:mover><mml:mi>J</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> represents the number of samples, <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the sample index, <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the sentence embedding of the <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>i</mml:mi></mml:math></inline-formula>-th sample, <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mrow><mml:msub><mml:mrow><mml:mi>M</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mi>e</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the index of the negative sample for <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, <italic>S</italic> is the sentence embedding of sample <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> after random replacement, <italic>G</italic> is the sentence embedding of sample <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mrow><mml:msub><mml:mrow><mml:mi>J</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> after syntactic data augmentation, and <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>J</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is the <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>k</mml:mi></mml:math></inline-formula>-th embedding of the negative pair after augmentation. <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mi>&#x03C4;</mml:mi></mml:math></inline-formula> is the temperature parameter, and <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:mo>.</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the similarity function on the normalized feature vectors.</p>
<p>The model optimizes syntactic elimination contrastive learning to cluster sentences with the same intent, reducing representation differences and achieving a more compact distribution in vector space. Conversely, it maximizes differences between representations of different intents, creating a more dispersed distribution. This adjustment of sentence representations establishes a solid foundation for new intent discovery.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>New Intent Discovery</title>
<p>In new intent discovery, the distance between samples of the same intent is key to accurate intent identification. Large intra-class distances can separate similar intent samples, while small inter-class distances may cause misclassification. To address this, a neighborhood sample fusion strategy replaces sample vectors with the mean of their neighborhood vectors, reducing noise and outliers. This results in more compact representations, decreasing intra-class distance while increasing inter-class distance, thereby improving intent recognition accuracy. This enhances the accuracy of new intent discovery. The illustration of the neighborhood sample fusion strategy is shown in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>The illustration of the neighborhood sample fusion strategy</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-9.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-9">Fig. 9</xref>, the numbers represent sample indices. In <xref ref-type="fig" rid="fig-9">Fig. 9a</xref>, the samples to be tested are selected as the central samples. The neighborhood sample fusion strategy is applied to the green sample with index 0, selecting the <italic>k</italic> nearest-neighbors of the 0-th sample vector. In <xref ref-type="fig" rid="fig-9">Fig. 9b</xref>, the size of the sample neighborhood is determined based on the elastic neighborhood selection strategy. The yellow samples are the neighbors of the green sample at index 0, while the pink samples represent other samples surrounding the green sample. In <xref ref-type="fig" rid="fig-9">Fig. 9c</xref>, the 0-th sample is replaced with the mean of its neighbor samples. The equation of the mean of the sample vectors <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>k</mml:mi></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mi>k</mml:mi></mml:math></inline-formula> represents the number of nearest-neighbor samples, <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>n</mml:mi></mml:math></inline-formula> is the index of the nearest-neighbor sample, and <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mrow><mml:msub><mml:mrow><mml:mi>S</mml:mi></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the vector of the <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>n</mml:mi></mml:math></inline-formula>-th neighbor sample. The neighborhood sample fusion strategy in creating cohesive clusters is based on the idea of similarity and adjacency, which is akin to placing similar items closer together. This reduces disorder and improves the accuracy of intent detection. The neighborhood can be compared to the classification areas in a library, where books on similar topics are placed in the same area, making it easier to find books on related subjects quickly. This helps the model recognize new intents more efficiently. As a result, the determination of sample similarity becomes more reliable, effectively lowering the difficulty of new intent discovery and aiding in the more accurate identification and definition of new intent categories. After determining the number of new intents, a similarity measurement method is employed to classify the intent sentence vectors into <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> new intent categories. An initial label selection algorithm is then used to choose <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> intent sentences as pseudo-labels for these new intent classes [<xref ref-type="bibr" rid="ref-32">32</xref>]. The remaining sentences are categorized into their respective new intent classes using similarity measures. The equation of the new intent class <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mrow><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math></inline-formula> is shown below:
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mrow><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:msubsup><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msubsup><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:math></inline-formula> denotes the ordinal number of the smallest value in the range in which the function is taken. <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo>.</mml:mo><mml:mo>,</mml:mo><mml:mo>.</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> calculate the spatial distance between the vectors. <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mrow><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> denotes the vector of unclassified intent sentences. <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mrow><mml:msub><mml:mrow><mml:mi>h</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula> denotes the vector of intent sentences for the <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mi>i</mml:mi></mml:math></inline-formula>-th pseudo-label.</p>
<p>The SNID-ENSEF model achieves the ability to represent intent sentences through multi-task pre-training and fine-tuning with contrastive learning, resulting in a uniform distribution of intent sentences within the vector space. New intent classes are obtained using a similarity classification method.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experimental Results and Analysis</title>
<sec id="s3_1">
<label>3.1</label>
<title>Experimental Environment and Datasets</title>
<p>Model development and experiments are conducted on a cloud server, with an Nvidia GeForce RTX 3090 GPU utilized for training the BERT model. The Adam optimizer is utilized, and experiments are conducted with the Python programming language and the PyTorch framework. The versions are used Python 3.8.18, PyTorch 1.12.0, and CUDA 11.3. The experimental hyperparameter settings are as follows. The learning rate (<italic>Lr</italic>) is set to 1e-5, the batch size (<italic>Bs</italic>) is set to 128, the number of training epochs (<italic>Ep</italic>) is set to 50, and the elimination rate (<italic>p</italic>) is set to 0.05. The SNID-ENSEF model is tested on three publicly available intent datasets. Banking77 is a dataset of banking dialogues containing 77 intents derived from conversations in the banking context. StackOverflow is a large-scale dataset collected from an online <italic>Q</italic> &#x0026; <italic>A</italic> platform. Clinc150 encompasses a wide range of user intents and scenarios, not limited to specific domains [<xref ref-type="bibr" rid="ref-32">32</xref>].</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Evaluation Indicators</title>
<p>To evaluate the performance of the models Adjusted Rand Coefficient (<italic>ARI</italic>), Accuracy (<italic>ACC</italic>), and Normalized Mutual Information (<italic>NMI</italic>) are used to evaluate the performance of the SNID-ENSEF model as well as to compare the models [<xref ref-type="bibr" rid="ref-35">35</xref>]. Adjusted Rand coefficients are used to measure the degree of similarity between the categorization results and the real situation. Accuracy is used to measure the proportion of accurate categorization. Normalized mutual information measures the consistency between the categorization results and the real labels. The three evaluation indicators are distributed in [0, 1], with larger values representing more accurate categorization results.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Attention Headcount Analysis</title>
<p>In syntactic elimination contrastive learning, the magnitude of the elimination rate significantly impacts the effectiveness of training samples. To verify the rationale behind the chosen elimination rate, the effects of different elimination rates on the SNID-ENSEF model are examined. Five elimination rates ranging from 0.01 to 0.05 are selected around the optimal elimination rate, with the evaluation metrics displayed for the Stackoverflow dataset. The illustration of elimination rate variation is shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>The illustration of elimination rate variation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-10.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, the model performs best at an elimination rate of 0.03. <xref ref-type="fig" rid="fig-10">Fig. 10a</xref> shows the change in the ACC index with the elimination rate. <xref ref-type="fig" rid="fig-10">Fig. 10b</xref> shows the change in the ARI index with the elimination rate. <xref ref-type="fig" rid="fig-10">Fig. 10c</xref> shows the change in the NMI index with the elimination rate. <xref ref-type="fig" rid="fig-10">Fig. 10d</xref> shows the change of the sum of the three indicators with the elimination rate. The choice of elimination rate significantly impacts the effectiveness of the model&#x2019;s training samples. When the elimination rate is too low, the effectiveness of the training samples may decrease or remain unchanged, failing to eliminate the training interference from invalid samples. Conversely, if the elimination rate is too high, the model may incorrectly remove valid samples, leading to a reduction in effective training data and overall poorer model performance. Therefore, selecting an elimination rate of 0.03 strikes a balance between removing invalid samples and retaining valid ones, providing the model with an adequate training dataset, which helps enhance its performance and generalization ability.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Learning Rate Analysis</title>
<p>To select an appropriate parameter <italic>k</italic> for the neighborhood sample fusion strategy, the effects of different <italic>k</italic> values on the SNID-ENSEF model are examined. Values near the ten best <italic>k</italic> values are used. The variations in SNID-ENSEF model metrics under different parameters of the neighborhood sample fusion strategy are shown in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>.</p>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Variation of metrics with different neighbors <italic>k</italic> in Banking77</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-11.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-11">Fig. 11</xref>, differences in model performance are observed under various settings of <italic>k</italic>. <xref ref-type="fig" rid="fig-11">Fig. 11a</xref> shows the change in the NMI index with <italic>k</italic>. <xref ref-type="fig" rid="fig-11">Fig. 11b</xref> shows the change of ACC index with <italic>k</italic>. <xref ref-type="fig" rid="fig-11">Fig. 11c</xref> shows the change in the ARI index with <italic>k</italic>. <xref ref-type="fig" rid="fig-11">Fig. 11d</xref> shows the change in the sum of the three indicators with <italic>k</italic>. When <italic>k</italic> is too small, the neighborhood sample fusion strategy fails to filter out noise. Conversely, if <italic>k</italic> is too large, new noise may be introduced, which prevents the stabilization of new intents toward their respective classes, ultimately hindering the achievement of optimal results. Based on the experimental results, an appropriate <italic>k</italic> value is chosen to achieve effective outcomes.</p>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Ablation Experiments</title>
<p>The SNID-ENSEF model is primarily divided into the syntactic elimination contrastive learning module (SECL) and the neighborhood sample fusion strategy module (NSFS).SECL contains Syntactic augmentation (SA) and Elastic neighborhood elimination (ENE). Different stage combinations are used in the StackOverflow dataset. An ablation study is conducted to analyze the SNID-ENSEF model. The ablation experiments validate the effectiveness of the syntactic elimination contrastive learning module and the neighborhood sample fusion strategy module. The selection of modules for SNID-ENSEF ablation experiments is shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>The selection of modules for SNID-ENSEF ablation experiments</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Experiment</th>
<th colspan="2">SECL</th>
<th>NSFS</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
</tr>
<tr align="center">
<th>Number</th>
<th>SA</th>
<th>ENE</th>
<th></th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>1</td>
<td><inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td></td>
<td></td>
<td>81.96</td>
<td>75.60</td>
<td>87.60</td>
</tr>
<tr align="center">
<td>2</td>
<td></td>
<td><inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td></td>
<td>82.14</td>
<td>75.87</td>
<td>87.70</td>
</tr>
<tr align="center">
<td>3</td>
<td><inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td></td>
<td>82.36</td>
<td>76.19</td>
<td>87.90</td>
</tr>
<tr align="center">
<td>4</td>
<td></td>
<td></td>
<td><inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td>82.09</td>
<td>76.15</td>
<td>87.80</td>
</tr>
<tr align="center">
<td>5</td>
<td><inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td></td>
<td><inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td>82.27</td>
<td>76.32</td>
<td>87.90</td>
</tr>
<tr align="center">
<td>6</td>
<td></td>
<td><inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td>82.30</td>
<td>76.46</td>
<td>88.00</td>
</tr>
<tr align="center">
<td>7</td>
<td><inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td>82.79</td>
<td>76.95</td>
<td>88.30</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="table-1">Table 1</xref>, Experiment 1 involves only using syntactic enhancement. Experiment 2 focuses solely on elastic neighborhood ablation. Experiment 3 combines both syntactic enhancement and elastic neighborhood ablation. Experiment 4 utilizes only the neighborhood sample fusion strategy. Experiment 5 implements a combination of syntactic enhancement and neighborhood sample fusion. Experiment 6 pairs elastic neighborhood ablation with neighborhood sample fusion. Finally, Experiment 7 represents the full SNID-ENSEF model, which employs both syntactic ablation contrastive learning and neighborhood sample fusion strategies.</p>

<p>Experiments 1&#x2013;3 show that elastic neighborhood elimination outperforms syntactic enhancement in syntactic elimination contrastive learning. While data augmentation enriches sample diversity, elastic neighborhood elimination fundamentally increases the proportion of effective samples, improving data efficiency. Thus, it provides more valid training data. Both methods contribute to expanding training data, and their combination enhances sentence understanding and new intent recognition accuracy. Experiment 4 demonstrates that neighborhood sample fusion significantly aids in recognizing new intents. Experiments 5&#x2013;6 confirm that combining syntactic elimination contrastive learning with neighborhood sample fusion achieves a more uniform distribution and that elastic neighborhood elimination is more effective than syntactic enhancement. The ARI, ACC, and NMI of syntactic elimination comparative learning are higher than those of the neighborhood sample fusion strategy module. In the absence of syntactic elimination comparative learning, the ACC value decreased by 0.41% compared to the results with both modules. When the density distribution-aware comparative learning module is missing, the ACC value drops by 1.68%. In conclusion, the two modules introduced in the study significantly contribute to the accuracy of new intent discovery. The ablation experiments of random token replacement (RTR) and syntactic token replacement (STR) are shown in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Ablation experiment of syntactic data enhancement</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Experiment</th>
<th colspan="2">SA</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
</tr>
<tr align="center">
<th>number</th>
<th>STR</th>
<th>RTR</th>
<th></th>
<th></th>
<th></th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>1</td>
<td><inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td></td>
<td>81.74</td>
<td>75.40</td>
<td>87.52</td>
</tr>
<tr align="center">
<td>2</td>
<td></td>
<td><inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td>81.69</td>
<td>75.18</td>
<td>87.44</td>
</tr>
<tr align="center">
<td>3</td>
<td><inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mi>&#x2713;</mml:mi></mml:math></inline-formula></td>
<td>81.96</td>
<td>75.60</td>
<td>87.60</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In <xref ref-type="table" rid="table-2">Table 2</xref>, the experimental results show that SRT significantly outperforms RRT, while RRT shows a relatively smaller improvement. However, when combined, SRT enhances sentence diversity, while RRT introduces some appropriate noise, improving the model&#x2019;s robustness and leading to better performance.</p>

</sec>
<sec id="s3_6">
<label>3.6</label>
<title>Comparison Experiments</title>
<p>To evaluate the performance of the proposed SNID-ENSEF model, comparative experiments are conducted on the Banking77, Stackoverflow, and Clinc150 datasets. The benchmark models for comparison include: Kmeans&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-36">36</xref>]: A traditional clustering algorithm that improves the initialization of cluster centroids. PTJN [<xref ref-type="bibr" rid="ref-37">37</xref>]: A robust pseudo-label training and source domain joint training network. Noisy pseudo-labels are refined using prior knowledge, and a new extractor-generator-corrector architecture is introduced. ELECTR [<xref ref-type="bibr" rid="ref-38">38</xref>]: A Transformer-based language representation pre-training model that draws on the ideas of GANs. It trains the model by distinguishing between real words and &#x201C;fake&#x201D; words generated by a small generator model. DPN [<xref ref-type="bibr" rid="ref-39">39</xref>]: An end-to-end deep contrastive clustering algorithm. The algorithm jointly updates model parameters and clustering centers through supervised and self-supervised learning, optimizing the use of labeled and unlabeled data. MPNET [<xref ref-type="bibr" rid="ref-40">40</xref>]: A new pre-training model that improves traditional pre-training methods through a &#x201C;Masked and Permuted Pre-training&#x201D; strategy. MTP-CLNN [<xref ref-type="bibr" rid="ref-35">35</xref>]: A multi-task pre-training model for new intent discovery has been proposed. Utilizing self-supervised signals in the representation space to improve the accuracy of new intent discovery. USNID [<xref ref-type="bibr" rid="ref-41">41</xref>]: A new intent discovery model that introduces a centroid-guided clustering mechanism. DWG [<xref ref-type="bibr" rid="ref-32">32</xref>]: A new intent discovery model that employs a novel diffusion-weighted graph framework. This framework uses a weighted method based on semantic similarity and local structure for contrastive learning.</p>
<p>As shown in <xref ref-type="table" rid="table-3">Table 3</xref>, the SNID-ENSEF model exhibits strong performance in terms of NMI, ARI, and ACC across the Banking77, StackOverflow, and Clinc150 datasets. Compared to the highest-performing models (DWG) in terms of NMI, ARI, and ACC from Kmeans&#x002B;&#x002B;, PTJN, ELECTR, DPN, MPNET, MTP-CLNN, USNID, and DWG, the SNID-ENSEF model shows improvements of 1.17%, 1.15%, and 0.34% in NMI, 0.88%, 1.86%, and 1.07% in ARI, 2.27%, 0.9%, and 0.75% in ACC, respectively. The training of the SNID-ENSEF model utilizes elastic neighborhood boundaries to select positive sample domains, ensuring a high quantity of training data while eliminating ineffective samples to enhance sample efficiency. Additionally, by referencing syntactic information and substituting meaningful words in sentences, the model increases the diversity of training samples, thereby improving training effectiveness. The use of the neighborhood sample fusion strategy reduces noise and decreases the difficulty of the new intent discovery task. By combining these approaches, the SNID-ENSEF model learns high-quality intent sentence representations from limited training samples, enhancing the accuracy of the new intent discovery task.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Model performance across different datasets</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Models</th>
<th colspan="3">Banking77</th>
<th colspan="3">StackOverflow</th>
<th colspan="3">Clinc150</th>
</tr>
<tr align="center">
<th></th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Kmeans&#x002B;&#x002B;</td>
<td>78.06</td>
<td>53.89</td>
<td>67.29</td>
<td>68.10</td>
<td>54.93</td>
<td>74.78</td>
<td>90.24</td>
<td>70.05</td>
<td>79.29</td>
</tr>
<tr align="center">
<td>PTJN</td>
<td>81.69</td>
<td>59.20</td>
<td>71.77</td>
<td>75.43</td>
<td>61.90</td>
<td>74.18</td>
<td>94.41</td>
<td>81.07</td>
<td>87.35</td>
</tr>
<tr align="center">
<td>ELECTR</td>
<td>82.98</td>
<td>60.16</td>
<td>70.94</td>
<td>75.70</td>
<td>64.15</td>
<td>77.38</td>
<td>91.24</td>
<td>74.59</td>
<td>82.02</td>
</tr>
<tr align="center">
<td>DPN</td>
<td>82.58</td>
<td>61.21</td>
<td>72.96</td>
<td>78.39</td>
<td>68.59</td>
<td>84.23</td>
<td>95.11</td>
<td>86.72</td>
<td>89.06</td>
</tr>
<tr align="center">
<td>MPNET</td>
<td>86.59</td>
<td>67.92</td>
<td>77.54</td>
<td>79.14</td>
<td>72.58</td>
<td>84.81</td>
<td>95.28</td>
<td>84.41</td>
<td>89.70</td>
</tr>
<tr align="center">
<td>MTP-CLNN</td>
<td>85.77</td>
<td>67.60</td>
<td>76.82</td>
<td>81.62</td>
<td>74.74</td>
<td>86.60</td>
<td>96.08</td>
<td>86.97</td>
<td>91.24</td>
</tr>
<tr align="center">
<td>USNID</td>
<td>87.41</td>
<td>69.54</td>
<td>78.36</td>
<td>80.13</td>
<td>74.90</td>
<td>85.66</td>
<td>96.42</td>
<td>86.77</td>
<td>90.36</td>
</tr>
<tr align="center">
<td>DWG</td>
<td>86.28</td>
<td>67.56</td>
<td>78.67</td>
<td>81.64</td>
<td>75.09</td>
<td>87.40</td>
<td>96.89</td>
<td>90.05</td>
<td>94.49</td>
</tr>
<tr align="center">
<td><bold>SNID-ENSEF</bold></td>
<td><bold>87.45</bold></td>
<td><bold>70.42</bold></td>
<td><bold>80.94</bold></td>
<td><bold>82.79</bold></td>
<td><bold>76.95</bold></td>
<td><bold>88.30</bold></td>
<td><bold>97.23</bold></td>
<td><bold>91.12</bold></td>
<td><bold>95.24</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-3fn1" fn-type="other">
<p>Note: Bold indicates the model with the best results in the dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Discussion</title>
<sec id="s4_1">
<label>4.1</label>
<title>Generalized Performance Test</title>
<p>To validate the generalization ability of the model, two additional datasets were introduced to evaluate its performance across different domains. MCID [<xref ref-type="bibr" rid="ref-42">42</xref>]: An open-source intent detection dataset for COVID-19 chatbots focusing on the healthcare domain. It contains sixteen intents and is used to test the applicability of the model in the medical field. HWU64 [<xref ref-type="bibr" rid="ref-43">43</xref>]: A dataset consisting of 25716 utterances across 21 domains and 64 intents. Compared to Clinc, which has fewer domains, HWU64 enables the testing of the performance of the model across a broader range of domains. The results are presented in <xref ref-type="table" rid="table-4">Table 4</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Model performance across different datasets</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Models</th>
<th colspan="3">MCID</th>
<th colspan="3">HWU64</th>
</tr>
<tr align="center">
<th></th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>MTP-CLNN</td>
<td>83.75</td>
<td>73.22</td>
<td>84.36</td>
<td>74.19</td>
<td>60.79</td>
<td>78.95</td>
</tr>
<tr align="center">
<td>DWG</td>
<td>85.06</td>
<td>74.16</td>
<td>85.45</td>
<td>74.69</td>
<td>61.27</td>
<td>80.44</td>
</tr>
<tr align="center">
<td>SNID-ENSEF</td>
<td>86.55</td>
<td>76.09</td>
<td>87.21</td>
<td>75.95</td>
<td>62.12</td>
<td>81.93</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="table-4">Table 4</xref>, the ESEF-SNID model demonstrates an improvement over models such as DWG and MTP-CLNN, exhibiting stable performance across different datasets. This stability to some extent validates the generalization capability of the ESEF-SNID model.</p>

</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Expectations and Future Prospects</title>
<p>With the rapid development of large language models, an increasing number of task-specific models are being enhanced by these large models. Integrating large language models will further improve the performance of models on specific tasks. For the ESEF-SNID model, leveraging large language models can refine the distinction of previously unknown intents, allowing for a more detailed differentiation of broadly separated intents, thereby increasing the accuracy of intent discovery. Another future direction involves converting newly discovered intents into defined intents. However, this process requires significant human effort and computational resources. Therefore, integrating large language models to assist in defining discovered intents is a crucial area that needs to be addressed in future work.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Practical Application</title>
<p>Virtual assistants are able to respond to users&#x2019; questions. The application of new intent discovery in virtual assistants enables them to provide appropriate replies to various user inquiries, allowing them to more intelligently address a wide range of user needs without being limited by predefined tasks. This increase in flexibility has a profound impact on user satisfaction and interaction experience, making conversations more engaging and open-ended. For example, when a home voice assistant encounters a newly introduced term for the first time, it may not provide an effective response because it cannot recognize the meaning of the new term. However, through the discovery of new intent, the assistant can capture this intent, allowing it to provide appropriate replies in the future when the term or its associated intent is mentioned again.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Discussion of Marginal Cases</title>
<p>To discuss the ability of the SNID-ENSEF Model to recognize intent meaning overlap and intent sentence similarity, two major overlapping intent categories in the Banking dataset Card and Transaction intents-were extracted into four sub-overlapping intents, resulting in a total of twenty-nine categories. The performance of DWG and SNID-ENSEF was then tested in extreme cases. The dataset labels and intent distribution are shown in <xref ref-type="table" rid="table-5">Table 5</xref>. The model performance is shown in <xref ref-type="table" rid="table-6">Table 6</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Overlapping intent data set label settings</title>
</caption>
<table>
<colgroup>
<col/>
<col width="125mm"/>
</colgroup>
<thead>
<tr align="center">
<th>Intent type</th>
<th>Specific intent label</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Card type</td>
<td>visa-or-mastercard, supported-cards-and-currencies, disposable-card-limits, getting-virtual-card</td>
</tr>
<tr align="center">
<td>Card payment</td>
<td>Card-payment-not-recognised, declined-card-payment, card-payment-fee-charged</td>
</tr>
<tr align="center">
<td>Card function</td>
<td>Card-not-working, card-swallowed, compromised-card, card-about-to-expire</td>
</tr>
<tr align="center">
<td>Card loss</td>
<td>Lost-or-stolen-card, lost-or-stolen-phone, compromised-card</td>
</tr>
<tr align="center">
<td>Transfer problem</td>
<td>Pending-transfer, failed-transfer, declined-transfer, cancel-transfer</td>
</tr>
<tr align="center">
<td>Payment problem</td>
<td>Refund-not-showing-up, request-refund, pending-card-payment, pending-transfer</td>
</tr>
<tr align="center">
<td>Top-up problem</td>
<td>Top-up-failed, top-up-reverted,top-up-by-card-charge, top-up-by-bank-transfer-charge</td>
</tr>
<tr align="center">
<td>Balance problem</td>
<td>Balance-not-updated-after-cash-deposit, balance-not-updated-transfer, pending-cash-withdrawal</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Performance under extreme conditions</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Models</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>DWG</td>
<td>61.29</td>
<td>57.30</td>
<td>81.70</td>
</tr>
<tr align="center">
<td>SNID-ENSEF</td>
<td>72.74</td>
<td>68.02</td>
<td>85.54</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="table-6">Table 6</xref>, in extreme cases, SNID-ENSEF outperforms the strongest competing model, DWG, in the NMI, ARI, and ACC metrics. This indicates that SNID-ENSEF still retains a certain ability to recognize intents even under extreme conditions.</p>

</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Real-Time Performance Index</title>
<p>To test the model&#x2019;s actual performance, SNID-ENSEF and the DWG model were tested on the BANKING dataset, and real-time performance metrics were recorded for comparison. The experimental results are shown in <xref ref-type="table" rid="table-7">Table 7</xref>.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Comparison experiment of actual performance index</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Models</th>
<th>Training run time</th>
<th>Train video memory usage</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>DWG</td>
<td>26 m 25 s</td>
<td>17,166 MB</td>
</tr>
<tr align="center">
<td>SNID-ENSEF</td>
<td>27 m 15 s</td>
<td>17,176 MB</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="table-7">Table 7</xref>, compared to the DWG model, the SNID-ENSEF model is 50 s slower. However, thanks to the matrix operations used in the proposed method, this time difference is within an acceptable range. The memory usage increased by 10 MB without any trade-off between space and performance. Overall, the SNID-ENSEF model does not have significant disadvantages in terms of time and memory usage compared to the strongest competing model while showing an improvement in performance.</p>

</sec>
<sec id="s4_6">
<label>4.6</label>
<title>Statistical Significance Test</title>
<p>Perform significance testing on the model&#x2019;s various metrics to verify the performance improvement of the SNID-ENSEF model compared to other competing models. The formula for the <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mi>t</mml:mi></mml:math></inline-formula>-test is as follows:
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>X</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mfrac><mml:mrow><mml:mi>S</mml:mi><mml:mi>D</mml:mi></mml:mrow><mml:msqrt><mml:mi>n</mml:mi></mml:msqrt></mml:mfrac></mml:mfrac></mml:math></disp-formula>where <italic>X</italic> represents the data point to be tested, <inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> represents the mean of the other data, <italic>SD</italic> represents the standard deviation, and <inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:mi>n</mml:mi></mml:math></inline-formula> represents the number of data points. Calculate the mean and standard deviation of the metrics listed in <xref ref-type="table" rid="table-3">Table 3</xref> to compute the <italic>t</italic>-value. Then, obtain the <inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:mi>p</mml:mi></mml:math></inline-formula>-value from the corresponding <italic>t</italic>-distribution. Set the null hypothesis: There is no significant difference between the SNID-ENSEF model and the competing models. Set the alternative hypothesis: there is a significant difference between the metrics of the SNID-ENSEF model and the other competing models. Reject the null hypothesis if the <italic>p</italic>-value is less than 0.05. The specific calculations are as follows.</p>

<p>As shown in <xref ref-type="table" rid="table-8">Table 8</xref>, the <inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mi>p</mml:mi></mml:math></inline-formula>-values for all metrics of the SNID-ENSEF model are less than 0.05 compared to the competing models, allowing us to reject the null hypothesis. Additionally, the silhouette scores for the strongest competing model, DWG, and the SNID-ENSEF model are calculated. The silhouette score of the DWG model is 0.6811, while the silhouette score of the SNID-ENSEF model is 0.8755, further validating the performance improvement of the SNID-ENSEF model.</p>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>The statistical significance test results of the model</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr align="center">
<th>Index</th>
<th colspan="3">Banking77</th>
<th colspan="3">StackOverflow</th>
<th colspan="3">Clinc150</th>
</tr>
<tr align="center">
<th></th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
<th>NMI (%)</th>
<th>ARI (%)</th>
<th>ACC (%)</th>
</tr>
</thead>
<tbody>
<tr align="center">
<td>Mean</td>
<td>83.89</td>
<td>63.61</td>
<td>73.24</td>
<td>77.12</td>
<td>68.76</td>
<td>80.24</td>
<td>94.44</td>
<td>82.76</td>
<td>86.10</td>
</tr>
<tr align="center">
<td>SD</td>
<td>3.37</td>
<td>5.47</td>
<td>4.16</td>
<td>4.66</td>
<td>7.64</td>
<td>4.91</td>
<td>2.51</td>
<td>6.47</td>
<td>5.80</td>
</tr>
<tr align="center">
<td>t</td>
<td>2.99</td>
<td>3.51</td>
<td>5.23</td>
<td>3.44</td>
<td>3.03</td>
<td>4.63</td>
<td>3.14</td>
<td>3.65</td>
<td>4.46</td>
</tr>
<tr align="center">
<td><inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mi>p</mml:mi></mml:math></inline-formula></td>
<td>0.02</td>
<td>0.008</td>
<td>0.001</td>
<td>0.009</td>
<td>0.017</td>
<td>0.003</td>
<td>0.015</td>
<td>0.007</td>
<td>0.003</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_7">
<label>4.7</label>
<title>Select Radius Adjustment Thresholds and Parameters</title>
<p>In order to fully demonstrate the initial value selection and parameters of the elastic neighborhood strategy, a set of 669 samples is taken, and the Euclidean distance from the first sample to all other samples is calculated. For the sake of convenience, the Euclidean distances in this paper are scaled by a factor of 1000. The relationship between the samples and their distances is shown in <xref ref-type="fig" rid="fig-12">Fig. 12</xref>.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>The relationship between sample and distance</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_60319-fig-12.tif"/>
</fig>
<p>In <xref ref-type="fig" rid="fig-12">Fig. 12</xref>, the distances between the first sample and all other samples are plotted, where the minimum distance is denoted as <inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>. At this point, the neighborhood of the sample contains only one sample, and there are no invalid samples. The maximum distance is denoted as <inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:math></inline-formula>, at which point the neighborhood contains all samples, and there are certainly invalid samples. This choice of values ensures that the elastic neighborhood strategy will have a solution. The change in the value of <italic>R</italic> after the judgment process is determined by the following formula:
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mi>D</mml:mi></mml:mrow><mml:mrow><mml:mi>E</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>S</mml:mi><mml:mi>A</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:math></disp-formula></p>

<p>In the distance range close to 600, where 600 samples are distributed, <inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:msub><mml:mrow><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is approximately 1. Therefore, the value of <italic>R</italic> is adjusted as <italic>R</italic>&#x002B;1 or <italic>R</italic>&#x2212;1.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>In dialogue generation, discovering new intents from unknown ones can enhance the ability to recognize unknown intents and advance the development of dialogue generation. A Semi-supervised New Intent Discovery for Elastic Neighborhood Syntactic Elimination and Fusion model (SNID-ENSEF) is proposed in this paper. By employing syntactic elimination comparative learning and syntactic data augmentation to introduce true synonyms, the richness of training samples is enhanced, allowing the model to learn intent sentence features. Ineffective samples are eliminated through the elastic selection of positive sample domains. It significantly increases the quantity and effectiveness of training samples. As a result, the capabilities of sentence representation are improved. Additionally, sample noise is filtered out by the neighborhood sample fusion strategy. The transformation addresses the new intent classification problem. The difficulty of discovering new intents is reduced, which enhances the accuracy of new intent discovery. The experimental results indicate that the SNID-ENSEF model achieves average improvements of 0.88%, 1.27%, and 1.30% in the NMI, ACC, and ARI, respectively, compared to baseline models PTJN, DPN, MTP-CLNN, and DWG, demonstrating the superior intent discovery capabilities of the SNID-ENSEF model. In summary, researching semi-supervised intent discovery is essential. In daily life, SNID-ENSEF can make voice assistants more intelligent by remembering new things you mention and recognizing them, allowing for smoother responses in future conversations. In future work, integrating large language models to enhance the performance of SNID-ENSEF or using large models to define unknown intents recognized by the SNID-ENSEF model will be key areas we focus on.</p>
</sec>
</body>
<back>
<ack>
<p>The authors look forward to the insightful comments and suggestions of the anonymous reviewers and editors, which will go a long way towards improving the quality of this paper.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This work is supported by Research Projects of the Nature Science Foundation of Hebei Province (F2021402005).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Di Wu, Liming Feng; data collection: Xiaoyu Wang; analysis and interpretation of results: Di Wu, Liming Feng, Xiaoyu Wang; draft manuscript preparation: Di Wu, Liming Feng. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The datasets used or analyzed during the current study are available from the corresponding author, Di Wu, on reasonable request.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Singh</surname> <given-names>GV</given-names></string-name>, <string-name><surname>Firdaus</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chauhan</surname> <given-names>DS</given-names></string-name>, <string-name><surname>Ekbal</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bhattacharyya</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Zero-shot multitask intent and emotion prediction from multimodal data: a benchmark study</article-title>. <source>Neurocomputing</source>. <year>2024</year>;<volume>569</volume>(<issue>3</issue>):<fpage>127128</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2023.127128</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Musto</surname> <given-names>C</given-names></string-name>, <string-name><surname>Martina</surname> <given-names>AFM</given-names></string-name>, <string-name><surname>Iovine</surname> <given-names>A</given-names></string-name>, <string-name><surname>Narducci</surname> <given-names>F</given-names></string-name>, <string-name><surname>de Gemmis</surname> <given-names>M</given-names></string-name>, <string-name><surname>Semeraro</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Tell me what you Like: introducing natural language preference elicitation strategies in a virtual assistant for the movie domain</article-title>. <source>J Intell Inform Syst</source>. <year>2024</year>;<volume>62</volume>(<issue>2</issue>):<fpage>575</fpage>&#x2013;<lpage>99</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10844-023-00835-8</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Al-Besher</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>K</given-names></string-name>, <string-name><surname>Sangeetha</surname> <given-names>M</given-names></string-name>, <string-name><surname>Butsa</surname> <given-names>T</given-names></string-name></person-group>. <article-title>BERT for conversational question answering systems using semantic similarity estimation</article-title>. <source>Comput Mater Contin</source>. <year>2022</year>;<volume>70</volume>(<issue>3</issue>):<fpage>4763</fpage>&#x2013;<lpage>80</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2022.021033</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chandrakala</surname> <given-names>C</given-names></string-name>, <string-name><surname>Bhardwaj</surname> <given-names>R</given-names></string-name>, <string-name><surname>Pujari</surname> <given-names>C</given-names></string-name></person-group>. <article-title>An intent recognition pipeline for conversational AI</article-title>. <source>Int J Inform Technol</source>. <year>2024</year>;<volume>16</volume>(<issue>2</issue>):<fpage>731</fpage>&#x2013;<lpage>43</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s41870-023-01642-8</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Devlin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chang</surname> <given-names>MW</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>K</given-names></string-name>, <string-name><surname>Toutanova</surname> <given-names>K</given-names></string-name></person-group>. <article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>, In: <conf-name>Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers)</conf-name>; <year>2019</year>. p. <fpage>4171</fpage>&#x2013;<lpage>4186</lpage>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>RoBERTa: a robustly optimized bert pretraining approach</article-title>. <comment>arXiv:1907.11692. 2019</comment>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Sanh</surname> <given-names>V</given-names></string-name></person-group>. <article-title>DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</article-title>. <comment>arXiv:1910.01108. 2019</comment>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Lan</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>M</given-names></string-name>, <string-name><surname>Goodman</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gimpel</surname> <given-names>K</given-names></string-name>, <string-name><surname>Sharma</surname> <given-names>P</given-names></string-name>, <string-name><surname>Soricut</surname> <given-names>R</given-names></string-name></person-group>. <article-title>ALBERT: a Lite BERT for self-supervised learning of language representations</article-title>. <comment>arXiv:1909.11942. 2019</comment>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Clark</surname> <given-names>K</given-names></string-name>, <string-name><surname>Luong</surname> <given-names>MT</given-names></string-name>, <string-name><surname>Le</surname> <given-names>QV</given-names></string-name>, <string-name><surname>Manning</surname> <given-names>CD</given-names></string-name></person-group>. <article-title>ELECTRA: pre-training text encoders as discriminators rather than generators</article-title>. <comment>arXiv:2003.10555. 2020</comment>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>&#x00C7;elik</surname> <given-names>A</given-names></string-name>, <string-name><surname>K&#x00FC;&#x00E7;&#x00FC;kmanisa</surname> <given-names>A</given-names></string-name>, <string-name><surname>Urhan</surname> <given-names>O</given-names></string-name></person-group>. <article-title>Feature distillation from vision-language model for semisupervised action classification</article-title>. <source>Turkish J Electr Eng Comput Sci</source>. <year>2023</year>;<volume>31</volume>(<issue>6</issue>):<fpage>1129</fpage>&#x2013;<lpage>45</lpage>. doi:<pub-id pub-id-type="doi">10.55730/1300-0632.4038</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jin</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>B</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Back to common sense: oxford dictionary descriptive knowledge augmentation for aspect-based sentiment analysis</article-title>. <source>Inform Process Manag</source>. <year>2023</year>;<volume>60</volume>(<issue>3</issue>):<fpage>103260</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ipm.2022.103260</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Dai</surname> <given-names>W</given-names></string-name>, <string-name><surname>Li</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>H</given-names></string-name></person-group>. <article-title>NCGNN: node-level capsule graph neural network for semisupervised classification</article-title>. <source>IEEE Trans Neural Netw Learn Syst</source>. <year>2022</year>;<volume>35</volume>(<issue>1</issue>):<fpage>1025</fpage>&#x2013;<lpage>39</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TNNLS.2022.3179306</pub-id>; <pub-id pub-id-type="pmid">35679381</pub-id></mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xiu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>F</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Hybrid tensor networks for fully supervised and semi-supervised hyperspectral image classification</article-title>. <source>IEEE J Sel Top Appl Earth Obs Remote Sens</source>. <year>2023</year>;<volume>16</volume>:<fpage>7882</fpage>&#x2013;<lpage>95</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bai</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Sequential visual and semantic consistency for semi-supervised text recognition</article-title>. <source>Pattern Recognit Lett</source>. <year>2024</year>;<volume>178</volume>(<issue>1</issue>):<fpage>174</fpage>&#x2013;<lpage>80</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.patrec.2024.01.008</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Qiu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tan</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Multivariate graph neural networks on enhancing syntactic and semantic for aspect-based sentiment analysis</article-title>. <source>Appl Intell</source>. <year>2024</year>;<volume>54</volume>(<issue>22</issue>):<fpage>11672</fpage>&#x2013;<lpage>89</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10489-024-05802-6</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>B</given-names></string-name>, <string-name><surname>Jin</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Prompt learning for metonymy resolution: enhancing performance with internal prior knowledge of pre-trained language models</article-title>. <source>Knowl Based Syst</source>. <year>2023</year>;<volume>279</volume>(<issue>3</issue>):<fpage>110928</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.knosys.2023.110928</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wei</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>K</given-names></string-name></person-group>. <article-title>EDA: easy data augmentation techniques for boosting performance on text classification tasks</article-title>. In: <conf-name>Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP)</conf-name>; <year>2019</year>; <publisher-loc>Hong Kong, China</publisher-loc>. p. <fpage>6382</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>T</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Neves</surname> <given-names>L</given-names></string-name>, <string-name><surname>Woodford</surname> <given-names>O</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Shah</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Data augmentation for graph neural networks</article-title>. <source>Proc AAAI Conf Artif Intell</source>. <year>2021</year>;<volume>35</volume>(<issue>12</issue>):<fpage>11015</fpage>&#x2013;<lpage>23</lpage>. doi:<pub-id pub-id-type="doi">10.1609/aaai.v35i12.17315</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Whitehouse</surname> <given-names>C</given-names></string-name>, <string-name><surname>Choudhury</surname> <given-names>M</given-names></string-name>, <string-name><surname>Aji</surname> <given-names>A</given-names></string-name></person-group>. <article-title>LLM-powered data augmentation for enhanced cross-lingual performance</article-title>. In: <conf-name>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</conf-name>; <year>2023</year>; <publisher-loc>Singapore</publisher-loc>. p. <fpage>671</fpage>&#x2013;<lpage>86</lpage>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Thakur</surname> <given-names>N</given-names></string-name>, <string-name><surname>Reimers</surname> <given-names>N</given-names></string-name>, <string-name><surname>Daxenberger</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gurevych</surname> <given-names>I</given-names></string-name></person-group>. <article-title>Augmented SBERT: data augmentation method for improving Bi-encoders for pairwise sentence scoring tasks</article-title>. In: <conf-name>Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</conf-name>; <year>2021</year>. p. <fpage>296</fpage>&#x2013;<lpage>310</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Qiu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Qu</surname> <given-names>C</given-names></string-name></person-group>. <article-title>ILTS: inducing intention propagation in decentralized multi-agent tasks with large language models</article-title>. In: <conf-name>Proceedings of the 33rd ACM International Conference on Information and Knowledge Management</conf-name>; <year>2024</year>; <publisher-loc>New Orleans, LA, USA</publisher-loc>. p. <fpage>3989</fpage>&#x2013;<lpage>93</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ziyaden</surname> <given-names>A</given-names></string-name>, <string-name><surname>Yelenov</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hajiyev</surname> <given-names>F</given-names></string-name>, <string-name><surname>Rustamov</surname> <given-names>S</given-names></string-name>, <string-name><surname>Pak</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Text data augmentation and pre-trained Language Model for enhancing text classification of low-resource languages</article-title>. <source>PeerJ Comput Sci</source>. <year>2024</year>;<volume>10</volume>(<issue>5</issue>):<fpage>e1974</fpage>. doi:<pub-id pub-id-type="doi">10.7717/peerj-cs.1974</pub-id>; <pub-id pub-id-type="pmid">38660166</pub-id></mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Qin</surname> <given-names>P</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Li</surname> <given-names>D</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>G</given-names></string-name></person-group>. <article-title>CC-GNN: a clustering contrastive learning network for graph semi-supervised learning</article-title>. <source>IEEE Access</source>. <year>2024</year>;<volume>12</volume>:<fpage>71956</fpage>&#x2013;<lpage>69</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xiao</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Simple and asymmetric graph contrastive learning without augmentations</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2024</year>;<volume>36</volume>:<fpage>1</fpage>&#x2013;<lpage>24</lpage>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>C</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>W</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Improving diversity and discriminability based implicit contrastive learning for unsupervised domain adaptation</article-title>. <source>Appl Intell</source>. <year>2024</year>;<volume>54</volume>(<issue>20</issue>):<fpage>10007</fpage>&#x2013;<lpage>17</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10489-024-05351-y</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Kumar</surname> <given-names>R</given-names></string-name>, <string-name><surname>Patidar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Varshney</surname> <given-names>V</given-names></string-name>, <string-name><surname>Vig</surname> <given-names>L</given-names></string-name>, <string-name><surname>Shroff</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Intent detection and discovery from user logs via deep semi-supervised contrastive clustering</article-title>. In: <conf-name>Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</conf-name>; <year>2022</year>; <publisher-loc>New Orleans, LA, USA</publisher-loc>. p. <fpage>1836</fpage>&#x2013;<lpage>53</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Bai</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>T</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>Z</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>New intent discovery with attracting and dispersing prototype</article-title>. In: <conf-name>Proceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024)</conf-name>; <year>2024</year>; <publisher-loc>New Orleans, LA, USA</publisher-loc>. p. <fpage>12193</fpage>&#x2013;<lpage>206</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>L</given-names></string-name></person-group>. <article-title>ClusterPrompt: cluster semantic enhanced prompt learning for new intent discovery</article-title>. In: <conf-name>Findings of the Association for Computational Linguistics: EMNLP 2023</conf-name>; <year>2023</year>. p. <fpage>10468</fpage>&#x2013;<lpage>81</lpage>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>He</surname> <given-names>L</given-names></string-name>, <string-name><surname>Nie</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Interactive supervision for new intent discovery</article-title>. <source>IEEE Signal Process Lett</source>. <year>2024</year>;<volume>31</volume>:<fpage>1680</fpage>&#x2013;<lpage>4</lpage>. doi:<pub-id pub-id-type="doi">10.1109/LSP.2024.3416882</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Oskouei</surname> <given-names>AG</given-names></string-name>, <string-name><surname>Samadi</surname> <given-names>N</given-names></string-name>, <string-name><surname>Tanha</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Feature-weight and cluster-weight learning in fuzzy c-means method for semi-supervised clustering</article-title>. <source>Appl Soft Comput</source>. <year>2024</year>;<volume>161</volume>(<issue>2</issue>):<fpage>111712</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.asoc.2024.111712</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name></person-group>. <article-title>New intent discovery with multi-view clustering</article-title>. In: <conf-name>ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)</conf-name>; <year>2024</year>; <publisher-loc>Seoul,
Republic of Korea</publisher-loc>: <publisher-name>IEEE</publisher-name>. p. <fpage>12381</fpage>&#x2013;<lpage>5</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shi</surname> <given-names>W</given-names></string-name>, <string-name><surname>An</surname> <given-names>W</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>P</given-names></string-name></person-group>. <article-title>A diffusion weighted graph framework for new intent discovery</article-title>. In: <conf-name>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing</conf-name>; <year>2023</year>. p. <fpage>8033</fpage>&#x2013;<lpage>42</lpage>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Manning</surname> <given-names>CD</given-names></string-name>, <string-name><surname>Surdeanu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Bauer</surname> <given-names>J</given-names></string-name>, <string-name><surname>Finkel</surname> <given-names>JR</given-names></string-name>, <string-name><surname>Bethard</surname> <given-names>S</given-names></string-name>, <string-name><surname>McClosky</surname> <given-names>D</given-names></string-name></person-group>. <article-title>The Stanford CoreNLP natural language processing toolkit</article-title>. In: <conf-name>Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System Demonstrations</conf-name>; <year>2014</year>; <publisher-loc>Baltimore, MD, USA</publisher-loc>. p. <fpage>55</fpage>&#x2013;<lpage>60</lpage>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Qi</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Wantwords: an open-source online reverse dictionary system</article-title>. In: <conf-name>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</conf-name>; <year>2020</year>. p. <fpage>175</fpage>&#x2013;<lpage>81</lpage>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhan</surname> <given-names>LM</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>XM</given-names></string-name>, <string-name><surname>Lam</surname> <given-names>A</given-names></string-name></person-group>. <article-title>New intent discovery with pre-training and contrastive learning</article-title>. In: <conf-name>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</conf-name>; <year>2022</year>; <publisher-loc>Dublin, Ireland</publisher-loc>. p. <fpage>256</fpage>&#x2013;<lpage>69</lpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>David</surname> <given-names>A</given-names></string-name></person-group>. <article-title>k-means&#x002B;&#x002B;: the advantages of careful seeding</article-title>. In: <conf-name>Proceedings of the Eighteenth Annual ACM-SIAM Symposium on Discrete Algorithms</conf-name>; <year>2007</year>; <publisher-loc>New Orleans, LA, USA</publisher-loc>: <publisher-name>ACM-SIAM</publisher-name>. p. <fpage>1027</fpage>&#x2013;<lpage>35</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>An</surname> <given-names>W</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>F</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>W</given-names></string-name></person-group>. <article-title>New user intent discovery with robust pseudo label training and source domain joint training</article-title>. <source>IEEE Intell Syst</source>. <year>2023</year>;<volume>38</volume>(<issue>4</issue>):<fpage>21</fpage>&#x2013;<lpage>31</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MIS.2023.3283909</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Clark</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Electra: pre-training text encoders as discriminators rather than generators</article-title>. <comment>arXiv:200310555. 2020</comment>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>An</surname> <given-names>W</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Generalized category discovery with decoupled prototypical network</article-title>. <source>Proc AAAI Conf Artif Intell</source>. <year>2023</year>;<volume>37</volume>(<issue>11</issue>):<fpage>12527</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1609/aaai.v37i11.26475</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Song</surname> <given-names>K</given-names></string-name>, <string-name><surname>Tan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>T</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>TY</given-names></string-name></person-group>. <article-title>Mpnet: masked and permuted pre-training for language understanding</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>16857</fpage>&#x2013;<lpage>67</lpage>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Long</surname> <given-names>F</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>K</given-names></string-name></person-group>. <article-title>USNID: a framework for unsupervised and semi-supervised new intent discovery</article-title>. <comment>arXiv:2304.07699v1</comment>. <year>2023</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhan</surname> <given-names>LM</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>XM</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Effectiveness of pre-training for few-shot intent classification</article-title>. In: <conf-name>Findings of the Association for Computational Linguistics: EMNLP 2021</conf-name>; <year>2021</year>; <publisher-name>ACL</publisher-name>. p. <fpage>1114</fpage>&#x2013;<lpage>20</lpage>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Eshghi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Swietojanski</surname> <given-names>P</given-names></string-name>, <string-name><surname>Rieser</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Benchmarking natural language understanding services for building conversational agents</article-title>. In: <conf-name>Increasing Naturalness and Flexibility in Spoken Dialogue Interaction: 10th International Workshop on Spoken Dialogue Systems</conf-name>; <year>2021</year>; <publisher-name>Springer</publisher-name>. p. <fpage>165</fpage>&#x2013;<lpage>83</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>