<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">63209</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.063209</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>GSPT-CVAE: A New Controlled Long Text Generation Method Based on T-CVAE</article-title>
<alt-title alt-title-type="left-running-head">GSPT-CVAE: A New Controlled Long Text Generation Method Based on T-CVAE</alt-title>
<alt-title alt-title-type="right-running-head">GSPT-CVAE: A New Controlled Long Text Generation Method Based on T-CVAE</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Zhao</surname><given-names>Tian</given-names></name><email>102201084@hbut.edu.cn</email></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Tu</surname><given-names>Jun</given-names></name><email>tujun@mail.hbut.edu.cn</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Quan</surname><given-names>Puzheng</given-names></name></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Xiong</surname><given-names>Ruisheng</given-names></name></contrib>
<aff id="aff-1"><institution>School of Computer Science, Hubei University of Technology</institution>, <addr-line>Wuhan, 430070</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Authors: Tian Zhao. Email: <email>102201084@hbut.edu.cn</email>; Jun Tu. Email: <email>tujun@mail.hbut.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>09</day><month>06</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>1</issue>
<fpage>1351</fpage>
<lpage>1377</lpage>
<history>
<date date-type="received">
<day>08</day>
<month>1</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>4</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_63209.pdf"></self-uri>
<abstract>
<p>Aiming at the problems of incomplete characterization of text relations, poor guidance of potential representations, and low quality of model generation in the field of controllable long text generation, this paper proposes a new GSPT-CVAE model (Graph Structured Processing, Single Vector, and Potential Attention Computing Transformer-Based Conditioned Variational Autoencoder model). The model obtains a more comprehensive representation of textual relations by graph-structured processing of the input text, and at the same time obtains a single vector representation by weighted merging of the vector sequences after graph-structured processing to get an effective potential representation. In the process of potential representation guiding text generation, the model adopts a combination of traditional embedding and potential attention calculation to give full play to the guiding role of potential representation for generating text, to improve the controllability and effectiveness of text generation. The experimental results show that the model has excellent representation learning ability and can learn rich and useful textual relationship representations. The model also achieves satisfactory results in the effectiveness and controllability of text generation and can generate long texts that match the given constraints. The ROUGE-1 F1 score of this model is 0.243, the ROUGE-2 F1 score is 0.041, the ROUGE-L F1 score is 0.22, and the PPL-Word score is 34.303, which gives the GSPT-CVAE model a certain advantage over the baseline model. Meanwhile, this paper compares this model with the state-of-the-art generative models T5, GPT-4, Llama2, and so on, and the experimental results show that the GSPT-CVAE model has a certain competitiveness.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Controllable text generation</kwd>
<kwd>textual graph structuring</kwd>
<kwd>text relationships</kwd>
<kwd>potential characterization</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In today&#x2019;s information age, Deep Learning (DL), as one of the important branches of machine learning, has made remarkable achievements in the fields of image recognition, speech processing, and natural language processing (NLP). Meanwhile, text generation, as one of the key issues in natural language processing, focuses on how to make computers generate natural, coherent, and meaningful texts. With the continuous progress of natural language processing technology, text generation models have achieved remarkable success in dialog systems, automatic writing, and translation. However, in practical applications, users increasingly need to control the generated text, as traditional text generation models often lack sufficient controllability regarding generated content, style, and tone. In this context, controlled text generation has become a highly anticipated research direction, aiming to enhance the controllability of model-generated text and better adapt to the specific needs of users.</p>
<p>Controlled text generation (CTG) task [<xref ref-type="bibr" rid="ref-1">1</xref>] refers to the generation of natural language text that satisfies constraints such as topic, emotion, and style while ensuring grammatical correctness. It is an essential tool in human-computer interaction [<xref ref-type="bibr" rid="ref-2">2</xref>]. Early approaches were mainly based on templates, which could only control the structure of the text. Thus, the generated text lacked diversity in expression. In recent years, with the complexity of application scenarios, the demand for controllability of various aspects of text has been increasing. To meet these flexible and diverse needs, data-driven neural network methods have become the basic methods for controllable text generation. Large-scale pre-trained language models can generate fluent text that meets constraints through different strategies [<xref ref-type="bibr" rid="ref-3">3</xref>] such as fine-tuning [<xref ref-type="bibr" rid="ref-4">4</xref>] and cue learning.</p>
<p>Control conditions in CTG can be categorized as explicit or implicit. Explicit control refers to providing well-defined instructions through human-computer interaction, such as input prompts, to guide the model in generating text with a specific style, like a Shakespearean or humorous tone [<xref ref-type="bibr" rid="ref-5">5</xref>]. In contrast, implicit control ensures that the generated content adheres to certain standards even when these requirements are not explicitly stated. This includes maintaining non-toxic, inoffensive, and nondiscriminatory language. For example, intelligent customer service systems should consistently convey a positive and optimistic tone to improve user experience. To meet these implicit expectations, the model needs to autonomously adjust its output and prevent content that may cause social concerns.</p>
<p>Research on controlled text generation broadly encompasses three foci of interest from different dimensions. The first focus of attention is primarily on the relationship between neighboring texts (short text environment), such as story completion, poetry generation, etc. The main methods used are fine-tuning based on pre-trained models [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>], cue learning [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>], conditional pre-trained language models, attribute bootstrapping [<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>], and adversarial generative structures [<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>]; The second focus is on textual diversity control, mainly using approaches based on sampling strategies [<xref ref-type="bibr" rid="ref-17">17</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>], variational autoencoders [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-20">20</xref>], and diffusion models [<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>]; The third focus of attention is then on the global relations of the text (extended textual contexts), and the mainstream methods include multi-step generation strategies [<xref ref-type="bibr" rid="ref-23">23</xref>&#x2013;<xref ref-type="bibr" rid="ref-26">26</xref>], semantic consistency discriminators [<xref ref-type="bibr" rid="ref-27">27</xref>], and relevance-based knowledge enhancement [<xref ref-type="bibr" rid="ref-28">28</xref>&#x2013;<xref ref-type="bibr" rid="ref-30">30</xref>].</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>With the rapid advancement of Large Language Models (LLMs) and their expanding role in NLP, significant improvements have been made in text generation quality. However, real-world applications impose more complex and rigorous content requirements. For instance, in finance [<xref ref-type="bibr" rid="ref-31">31</xref>] and news reporting [<xref ref-type="bibr" rid="ref-32">32</xref>], models must not only prevent misleading or biased outputs but also align precisely with specific conditions and user preferences. These may involve mimicking a particular writing style or incorporating poetic elements. To address such needs, Controllable Text Generation also known as Controlled or Constrained Text Generation has emerged, ensuring generated content meets both quality standards and application specific demands.</p>
<p>However, LLMs excel in logical reasoning, text analysis, and problem-solving [<xref ref-type="bibr" rid="ref-33">33</xref>] but struggle with Controlled Sentiment and Style Generation (CTG) tasks, which demand accuracy and precise sentiment-tone alignment. A key limitation is their insufficient grasp of subtle emotional expressions; while capable of simulating emotions, they often misjudge intensity and shifts, leading to text that may be too flat or too strong, undermining emotional precision in contexts like customer service or literary creation [<xref ref-type="bibr" rid="ref-34">34</xref>]. Additionally, LLMs struggle with style control, as their mimicry relies on statistical patterns rather than proper stylistic understanding. This makes them ineffective in blending complex styles, such as professionalism with humor, often producing content that fails to meet expectations. Furthermore, CTG tasks require high context sensitivity, yet LLMs usually falter in maintaining emotional coherence in long texts due to context misinterpretation or memory limitations. The subjective nature of CTG evaluation further complicates optimization, as user preferences vary across cultural and individual factors, and LLMs lack the adaptability to these differences. Thus, CTG expands LLM applications and exposes deficiencies in emotion, style, and context handling. Enhancing LLMs for CTG requires larger and higher-quality datasets and advanced sentiment modeling and context management to meet its intricate demands [<xref ref-type="bibr" rid="ref-35">35</xref>].</p>
<p>Therefore, most researchers nowadays will use a combination of pre-trained LLM with latent variable models, the most typical of which are VAEs [<xref ref-type="bibr" rid="ref-36">36</xref>]. However, the use of pre-trained LLM in combination with latent variable models leads to the following two main problems: a) The encoders of large pre-trained models such as Transformer process the text mainly in a linearized way such as linear transformations [<xref ref-type="bibr" rid="ref-37">37</xref>]. In contrast, their encoders usually use the last hidden state to generate the latent space, producing a vector sequence [<xref ref-type="bibr" rid="ref-38">38</xref>]. The result of this is that a more comprehensive representation of the relationships between the input texts is not available, which leads to less accurate latent representations obtained in the subsequent hidden space; b) The latent representations received from the latent variable model fail to achieve the expected results in terms of validity and controllability in the subsequently guided text generation after adding the decoding process to the pre-trained model [<xref ref-type="bibr" rid="ref-39">39</xref>].</p>
<p>At the same time, controlled text generation faces key challenges affecting practical applications. In automated content creation (news, blogs, novels), incomplete text representation may omit key facts, reducing credibility. At the same time, poor potential guidance can cause deviations from user intent, leading to unfocused blog topics. Low generation quality manifests as repetitive or unclear text, affecting user acceptance. In education, incomplete representation may omit key concepts, poor guidance may result in mismatched difficulty levels, and low quality can introduce grammar errors, reducing readability. In customer service, incomplete representation leads to inaccurate responses, poor guidance prevents adaptation to user needs, and low quality can make responses unprofessional. In healthcare, missing contextual details in generated medical advice can mislead patients, poor guidance may cause content to deviate from the intended topic, and low quality may result in incorrect medical terminology, reducing trust. These issues persist in traditional generative models, as shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Generation failures of traditional generative models</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Generted text</th>
<th align="center">Error analysis</th>
</tr>
</thead>
<tbody>
<tr>
<td>Quantum entanglement is a new technology that enables instantaneous communication over infinite distances, allowing information to be transmitted faster than the speed of light. Many modern devices, such as smartphones and Wi-Fi routers, already integrate quantum entanglement for ultra-secure encryption.</td>
<td>The generated text contains physically incorrect content, i.e., quantum entanglement cannot be used for FTL communication and is not widely used in smart devices. The model, in the absence of explicit knowledge constraints, may generate content that appears reasonable but is actually wrong.</td>
</tr>
<tr>
<td>The artificial intelligence has improve diagnostic accuracy by analyzing large amounts of datas, it also assisting in surgeries and patient monitoring. The deep learning algorithms are very helps in early disease detection.</td>
<td>The text contains grammatical errors (e.g., &#x2018;datas&#x2019; should be &#x2018;data&#x2019;, &#x2018;is very helpful&#x2019; should be &#x2018;is very helpful&#x2019;), which may be due to the model&#x2019;s failure to learn the grammar rules correctly, or instability in the decoding phase.</td>
</tr>
<tr>
<td>Renewable energy has significantly reduced electricity costs in developing countries. However, due to its high initial costs, it has led to increased electricity prices for consumers. This economic burden makes renewable energy less accessible to rural populations, yet it remains the cheapest option available.</td>
<td>The text contains contradictory logic, stating on the one hand that renewable energy reduces the cost of electricity, but on the other hand stating that the high initial cost leads to an increase in the cost of electricity, and then ultimately stating that it is the &#x2019;cheapest option&#x2019;. The model may be locally consistent in the generation of long texts, but the overall semantics are conflicting.</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Given the current problems of controllable long text generation, the main content of the research in this paper is the effectiveness and controllability of generating long text. The research method is to abandon the traditional encoder structure and combine the pre-training model with the text graph-structured latent variable model by graph-structured processing of the input text as well as modeling the graph-structured text relations, which can not only utilize the pre-trained language model as a reliable tool for feature extraction or a reliable tool for text decoding, it can also use the hidden space of the latent variable model to get the comprehensive features of the text, which helps to improve the controllability of long texts.</p>
<p>In this paper, the GPT-2 model is chosen as the decoder for the GSPT-CVAE model because 1) GPT-2 is an autoregressive-based language model that produces text by continuously generating sequences of words [<xref ref-type="bibr" rid="ref-40">40</xref>]. In contrast, models such as BERT are based on Masked Language Modeling (MLM), which can only create parts of text. As a result, GPT-2 is more coherent and natural in generating text and can produce more diverse text. 2) Compared to traditional VAE, GPT-2 adopts a parameter-sharing strategy. This means that the parameters of the model are used in both parts of the encoder and decoder, which reduces the number of parameters that need to be trained and improves the efficiency and generalization of the model. 3) GPT-2 is a pre-trained language model that can be used directly for generating text without additional pre-training or fine-tuning. In contrast, models such as BERT usually require pre-training or fine-tuning on specific tasks, which requires more time and computational resources. Therefore, GPT-2 not only meets the requirements of GSPT-CVAE continuous text generation but also achieves a good balance between resource utilization and performance and is thus selected as the decoder for this model.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Conditional Variational Auto Encoder</title>
<p>In this paper, we choose controlled long text generation based on the prompt method, i.e., generating open-domain long text based on the input prompt text, and the hidden space of the latent variable model C-VAE is utilized to obtain latent representations from textual relations. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> illustrates the model diagram of the variational autoencoder VAE, whose architecture consists of an encoder (inference network) and a decoder (generation network), where the encoder maps the input data x to the probability distribution <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:msub><mml:mi>q</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> of the latent variable z, and then samples from this distribution are obtained, and the latent variable z is obtained through the computation of the mean, variance, and parameters. The decoder generates the reconstructed probability distribution <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2223;</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> of the data <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, using z as input. The traditional VAE decoder <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is usually in autoregressive form; in this paper, we use a pre-trained GPT-2-based model as the decoder.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Diagram of the basic structure of the C-VAE model used in our GSPT-CVAE model. In controlled text generation, x and <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> refer to prompt text and generated text, respectively. <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>c</mml:mi></mml:math></inline-formula> is the condition, <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> is the mean value, <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> is the variance, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>z</mml:mi></mml:math></inline-formula> is the latent variable</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-1.tif"/>
</fig>
<p>The goal of C-VAE training is to maximize the data log-likelihood <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula>. However, since posterior inference is difficult to achieve, C-VAE approximates the posterior distribution through variational inference methods to maximize the lower bound on the marginal probability of the observed data (ELBO):
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2265;</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo>&#x223C;</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msub><mml:mrow><mml:mtext>log</mml:mtext></mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">&#x03B8;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtext>KL</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2225;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:mi>&#x1D49F;</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mi mathvariant="normal">K</mml:mi><mml:mi mathvariant="normal">L</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi mathvariant="bold-italic">&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2225;</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi mathvariant="bold-italic">z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="bold-italic">y</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> is the KL dispersion between the latent space distribution <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>q</mml:mi><mml:mrow><mml:mi>&#x03D5;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and the prior distribution <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> learned by the encoder, which is used to ensure the continuity and compactness of the latent space. <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>x</mml:mi></mml:math></inline-formula> is the input text, and <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>y</mml:mi></mml:math></inline-formula> is the condition.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Graph Convolutional Networks (GCN) Encoder</title>
<p>Graph Convolutional Encoder is a model for learning graph data representation by combining a node&#x2019;s neighbor information and features to generate a low-dimensional representation of the node. The focus of this paper is on the global relations of the text. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> illustrates a graph convolutional encoder with a two-layer convolutional network. Each word in the input text is treated as a node in the graph, the node features are vector representations of the input text, and the connecting lines between the nodes are the values of the elements in the resulting adjacency matrix A. After the text passes through the graph convolutional network, the text features are updated to obtain a new vector representation of the text.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Graph convolutional encoder (with two layers of convolutional networks)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-2.tif"/>
</fig>
<p>The GCN formula is as follows:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msup><mml:mi>H</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mover><mml:mi>D</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:msup><mml:mrow><mml:mover><mml:mi>D</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mrow></mml:msup><mml:msup><mml:mi>H</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Here <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is the new adjacency matrix obtained by adding the unit matrix to the adjacency matrix <italic>A</italic>, <italic>D</italic> is the degree matrix of the adjacency matrix <italic>A</italic>, <italic>W</italic> is the weight parameter matrix of the Lth layer, <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> is a nonlinear activation function, e.g., ReLU(Rectified Linear Unit). For a two-layer convolutional network, the forward propagation formula is:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>z</mml:mi><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>L</mml:mi><mml:mi>U</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mi>X</mml:mi><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>For the loss function, the traditional cross-entropy function is chosen.</p>
<p>The rest of the paper is organized as follows. <xref ref-type="sec" rid="s3">Section 3</xref> details the overall architectural design of the model proposed in this paper. <xref ref-type="sec" rid="s4">Section 4</xref> is the experimental part and comparative analysis of the experimental results. <xref ref-type="sec" rid="s5">Section 5</xref> summarizes the work of this paper and provides an outlook on the next steps.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Model Architecture (GSPT-CVAE)</title>
<p>The overall model diagram of the GSPT-CVAE model proposed in this paper is shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. The model consists of three main modules:</p>

<p><list list-type="bullet">
<list-item>
<p>In the encoding stage, the text is graph-structured to obtain the graph-structured representation of the text, then passes through the GCN encoder to collect text node representations, i.e., the vector sequences of the text representations, and proceeds with the subsequent work.</p></list-item>
<list-item>
<p>The vector sequences are combined into a single vector representation using the vector merging layer, and this representation is then fed into the linear layer to predict both the prior and posterior distributions, which results in the latent representation Z.</p></list-item>
<list-item>
<p>In the decoding stage, The latent representation Z is added to the Transformer decoder through a combination of embedding and latent attention computation to guide subsequent decoding generation.</p></list-item>
</list></p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Graph convolutional encoder (with two layers of convolutional networks)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-3.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>Textual Graph Structuring</title>
<p>After the input text data is encoded, the resulting text vector can accurately represent the nodes in the graph, i.e., the feature matrix X. Thus, only the adjacency matrix A of the text nodes needs to be constructed. Afterward, the feature matrix X and the adjacency matrix A are inputted into the GCN encoder with a two-layer convolutional network to obtain new node representations containing comprehensive relationships among the texts.</p>
<p>Commonly used methods for structuring text graphs include dependent syntactic analysis trees, sentence component analysis, AMR (Abstract Meaning Representation) graphs, information extraction graphs, knowledge graphs, etc. Still, all of these methods have some limitations. For example, sentence component analysis may have difficulty in describing sentence components in specific languages and text types [<xref ref-type="bibr" rid="ref-41">41</xref>]; the construction of AMR graphs relies on the subjective interpretation of manual annotators, which may lead to inconsistency and subjectivity problems [<xref ref-type="bibr" rid="ref-42">42</xref>]; information extraction graphs usually focus on the extraction of entities and relations, which may ignore important information in the context, etc. [<xref ref-type="bibr" rid="ref-43">43</xref>].</p>
<p>A filter exists in a convolutional neural network, which extracts information from an image through a constantly moving fixed-size filter window. Inspired by the filter window, this paper proposes to design a &#x201C;sliding window&#x201D; to traverse the input text. In this window, the adjacency matrix A is constructed by the features of the text, and the word embedding of the text is used as the feature matrix X of the text, thus completing the graph structuring of the text.</p>
<p>However, the sliding window approach also introduces a new problem: the relationships between texts are far or near. When using a sliding window to obtain relationships, if the window size is too small, the complete relationship information cannot be obtained; if the window size is too large, unnecessary information, i.e., noise, is introduced. This affects the construction of the final graph.</p>
<p>In response to the new problems triggered, a dynamic sliding window can be designed to solve the problem. The so-called dynamic sliding is based on the strength of the textual relationship, which adjusts the window size in real time to capture the textual relationship under different relationship strengths. The specific process is as follows: set the initial window size k (e.g., the initial size is set to 3), and then traverse the entire text sequence, the three words within the first window, the average cosine similarity value can be obtained after the cosine similarity is calculated using its vector representation. Set a threshold T when the average cosine similarity is greater than the threshold T, increase the window size, and vice versa, reduce the window (minimum of 2) until traversing the entire text sequence, and ultimately get the adjacency matrix A, in which each element of the element matrix in the adjacency matrix A represents the strength of the relationship between word i and word j, the larger the value indicates that the more similar the two words are to each other, and the smaller the value indicates that the more the similarity is lower. The specific process is shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Dynamic sliding window to construct the adjacency matrix</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-4.tif"/>
</fig>
<p>Traditional Transformer only relies on Positional Encoding and Self-Attention to model inter-word relationships, which makes it easy to ignore long-distance dependencies. In this paper, the method adopts GCN to model the text as a Graph Representation. It dynamically connects the related words through the sliding window to enhance the global dependency modeling. In the dynamic sliding window, the strengths and weaknesses of the relationships between the texts are further recorded, and the final graph structure representation is improved according to the strengths and weaknesses of the relationships so that the resulting textual relationships have better strengths and weaknesses compared to the textual relationship representations obtained from traditional coding. The following demonstrates how the dynamic sliding window captures strong and weak textual relations.</p>
<p>Now consider the set of word vectors for two windows <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mrow><mml:mi mathvariant="normal">W</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>&gt;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">W</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula> and containing <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> contains <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, and the average cosine similarity between them is:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:mi>A</mml:mi><mml:mi>&#x03BD;</mml:mi><mml:mi>g</mml:mi><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>a</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow></mml:munder><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>b</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow></mml:munder><mml:mrow><mml:mtext>Cosine Similarity</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> are the window sizes, respectively. Define a threshold T when <italic>AvgCosineSimilarity</italic><inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&gt;</mml:mo><mml:mi>T</mml:mi></mml:math></inline-formula>, it represents a higher similarity between the two windows, which means that the textual contexts within the two windows are similar, i.e., better textual relations can be obtained when <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is increased to the size of <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>. Similarly, when <italic>AvgCosineSimilarity</italic> <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x003C;</mml:mo><mml:mi>T</mml:mi></mml:math></inline-formula>, which represents a lower similarity between the two windows, suggests that there is a large gap between the textual contexts within the two windows, at which point the window can be narrowed down to focus on a tighter context. This proves that this paper&#x2019;s dynamic sliding window approach can better capture the semantic relationships between texts.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Vector Sequence Merge</title>
<p>When data is encoded, traditional Transformer models use only the last hidden state from the encoder to generate the latent space. After the final self-attention layer, a sequence of vectors is obtained with the number of vectors as the number of input tokens. However, the vector sequence obtained after such encoding may not be sufficient to summarize continuous data and retain long-term knowledge, mainly due to the limitations of the self-attention mechanism, positional encoding, etc. This paper proposes a vector sequence merging method to merge vector sequences into a single vector through attention scores for subsequent potential representation generation work. A single vector contains both global and local input text information, and the single vector can better process and retain feature information than vector sequences. Also, due to the use of Attention Score Weighted Merging, each feature is weighted and merged during vector sequence merging, thus further emphasizing important features and weakening minor features. The following is a validation of the methodology of this paper in terms of both weights and expectations.</p>
<p>First, from the point of view of weights, suppose there is an input sequence <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2026;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, where <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>x</mml:mi></mml:math></inline-formula> denotes the xth element in the sequence. Using the self-attention mechanism, it is possible to compute a vector <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">i</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> of attention weights at each position and then apply these weights to the original vector sequence. This results in a weighted vector and Z, where the weighted sum for each position i is computed as follows:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi>Z</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the attentional weight from position <italic>i</italic> to position <italic>j</italic>. To combine this sequence into a single vector, consider weighting and summing all the positions, i.e., <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msub><mml:mi>z</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> rearrange this equation and introduce attentional weights:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Obtained by swapping the summation order:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></disp-formula></p>
<p>The portion in parentheses is the sum of all the attentional weights for position j, which can be thought of as the composite weight for position j. The weight of position j is the sum of all the attentional weights for position j. Let <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> The above equation can be simplified as:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>f</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mi>p</mml:mi><mml:mi>u</mml:mi><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></disp-formula></p>
<p>This equation shows that the weighted composite <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msub><mml:mi>C</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula> obtained through the self-attention mechanism and the weighted sum of the original vector sequence X form the final representation. These synthesized weights can better capture the relationships between different positions in the sequence, thus providing richer semantic information.</p>
<p>Moving on to the expectation perspective, suppose again that there is an input sequence X, denoted <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, of length T. Each <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a vector. Now compute the attentional weight <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, where the attentional weight <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is denoted:
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is some rating or score associated with <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can be computed using any appropriate mechanism (e.g., dot product, scaled dot product, etc.). The attention-weighted average vector is then computed:
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:math></disp-formula></p>
<p>First, the attention weights are of the nature: <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:msubsup><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>, this is because the weights are standardized probability distributions. Next, the expected value of the weighted average vector of attention is computed:
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>A</mml:mi><mml:mi>v</mml:mi><mml:mi>g</mml:mi><mml:mo stretchy="false">]</mml:mo></mml:mtd><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.470em" minsize="2.470em">[</mml:mo></mml:mrow></mml:mstyle><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.470em" minsize="2.470em">]</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>T</mml:mi></mml:munderover><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy="false">]</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy="false">]</mml:mo><mml:mspace width="1em" /><mml:mrow><mml:mtext>(linear property)</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">[</mml:mo></mml:mrow></mml:mstyle><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">]</mml:mo></mml:mrow></mml:mstyle><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mspace width="1em" /><mml:mrow><mml:mtext>(Definition of expectation)</mml:mtext></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.470em" minsize="2.470em">(</mml:mo></mml:mrow></mml:mstyle><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.470em" minsize="2.470em">)</mml:mo></mml:mrow></mml:mstyle><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>By normalizing <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, the weighted average vector of expected attention will be more concentrated on the parts with higher scores <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, i.e., the model will pay more attention to the tasks&#x2019; relevant parts.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Embedding Combined with Latent Attention Computation</title>
<p>After obtaining the potential representation Z of the potential space, Z is usually added to the subsequent decoding and generation process by either concatenating Z with the input vectors of the decoder or multiplication operations, etc. or by fusing Z with each layer of the decoder according to the attention weights. In this paper, we not only use the method of concatenating Z with the input vectors but also add Z to the K and V matrices of the attention computation to obtain new <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mtext>K'</mml:mtext></mml:math></inline-formula> and <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mtext>V'</mml:mtext></mml:math></inline-formula> matrices and compute the attention. In this way, Z is fully integrated into the decoder generation process for better control of text generation.</p>
<p>For the traditional embedding approach, the drawback is the introduction of noise in the self-attention computation. The proof is as follows: <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>e</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></inline-formula> is defined as the marker embedding of the two inputs, and <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the attention weight of the ith and jth markers. According to the self-attention formula.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>Q</mml:mi><mml:mo>,</mml:mo><mml:mi>K</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi>Q</mml:mi><mml:mi>T</mml:mi></mml:msup><mml:mi>K</mml:mi></mml:mrow><mml:msqrt><mml:mi>d</mml:mi></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mi>V</mml:mi></mml:math></disp-formula></p>
<p>The expression <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can be obtained as:
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mi>q</mml:mi></mml:msup><mml:msub><mml:mi>e</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:msub><mml:mi>e</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mi>e</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mi>q</mml:mi></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mi>T</mml:mi></mml:msup><mml:msup><mml:mi>W</mml:mi><mml:mi>k</mml:mi></mml:msup><mml:msub><mml:mi>e</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:math></disp-formula></p>
<p><inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are the parameter matrices used to map the input marker embedding into the query (Q) and key (K) spaces. Denoting the right-hand side of this equation as <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, Z is embedded directly into the tags according to the embedding, i.e.:
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:msup><mml:mo stretchy="false">]</mml:mo><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">[</mml:mo><mml:msup><mml:mi>W</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>It can be noted that the resulting equation introduces a redundant term <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> introduces additional noise to the attention mechanism.</p>
<p>To solve the problems caused by the embedding method, this paper introduces potential attention computation while using the embedding method; specifically, the potential representations obtained previously are divided into L vectors <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mo stretchy="false">[</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x2026;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mi>L</mml:mi></mml:msub><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>, and the potential representations are merged into the original self-attention computation by adding the potential representations to the matrices K and V in each layer of the attention computation.</p>
<p>Integrate Z into K and V to get a new <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mtext>K</mml:mtext></mml:math></inline-formula><sup>&#x2032;</sup>
and <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mtext>V</mml:mtext></mml:math></inline-formula><sup>&#x2032;</sup>
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:msup><mml:mi>K</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mi>K</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>K</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mspace width="1em" /><mml:msup><mml:mi>V</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mi>V</mml:mi></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>The attention score <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> can be denoted as <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>&#x03B1;</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>Q</mml:mi><mml:mo>&#x2217;</mml:mo><mml:msup><mml:mi>K</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, this is the first time Z is fused with the feature vector, after which the attention score is multiplied by the new value matrix in the subsequent self-attention computation. <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x2217;</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, this is the second time Z is fused with the feature vector, and in this computation, the value matrix <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mtext>V'</mml:mtext></mml:math></inline-formula> already contains information about the latent variable Z. When the embedding method is used again, the effect of noise is resolved.
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo stretchy="false">[</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi>V</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">]</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mi>V</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>V</mml:mi></mml:mtd></mml:mtr></mml:mtable><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>It can be seen that in the results of the above equation, the noise <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mo stretchy="false">(</mml:mo><mml:mi>z</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> present in the original embedding method is finally multiplied by a matrix that contains information about the incorporated latent variable Z. Therefore, this item is equivalent to the weighting of the latent variable Z and is no longer noisy.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Results and Discussion</title>
<sec id="s4_1">
<label>4.1</label>
<title>Dataset Selection</title>
<p>The datasets used in this paper are Arxiv [<xref ref-type="bibr" rid="ref-44">44</xref>], Yelp [<xref ref-type="bibr" rid="ref-45">45</xref>], WritingPrompts [<xref ref-type="bibr" rid="ref-46">46</xref>], and WikiPlots [<xref ref-type="bibr" rid="ref-47">47</xref>]. In this paper, the Arxiv dataset is used to generate controlled text, and the Arxiv dataset is divided into a training dataset, testing dataset, and validation dataset in the ratio of 80:10:10. Arxiv is an online dataset that extracts abstracts from the &#x2018;Arxiv&#x2019; articles. Arxiv is an online dataset for extracting abstracts from &#x2018;arxiv&#x2019; articles. Arxiv mainly searches for topic queries in arxiv and then collects article content-matching Abstracts. This paper selects article types with keywords &#x2018;artificial intelligence,&#x2019; &#x2018;computer vision,&#x2019; and &#x2018;text generation.&#x2019; Meanwhile, this paper experiments with the GSPT-CVAE model on WritingPrompts and WikiPlots datasets to show the model&#x2019;s performance under different data domains. On the other hand, the Yelp dataset is mainly used for pre-experimental analysis and textual relationship validation visualization experiments.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Selection of Comparison Methods</title>
<p>The comparison methods chosen for this paper are as follows:
<list list-type="bullet">
<list-item>
<p>FIST: This method is a technique for fine-tuning natural language processing models by introducing special tokens to guide the model to focus on specific tasks or samples during the fine-tuning process [<xref ref-type="bibr" rid="ref-48">48</xref>]. These unique tokens can mark important locations or sample attributes in the input sequence, thus making the model better adapted to the target task.</p></list-item>
<list-item>
<p>PSA: By introducing a pseudo-self-attention mechanism into the generation process, the model can selectively focus on different parts of the input text during the text generation process. This approach can effectively control specific attributes or styles of the generated text, such as sentiment, theme, or tone, leading to a more accurate text generation task.</p></list-item>
<list-item>
<p>Fusion: this method realizes precise control of the attributes, style, and content of the generated text by combining information from multiple input modalities (e.g., text, image, speech, etc.) and introducing external control signals. This method can fuse multimodal details to make the generated text richer and more diverse, which can meet the needs of different application scenarios, such as emotion generation and theme transformation.</p></list-item>
<list-item>
<p>Dynamic Prompting (DP) [<xref ref-type="bibr" rid="ref-49">49</xref>]: Dynamic cueing techniques control text generation by dynamically generating cues based on context. This approach utilizes contextual information to adjust prompt words in real time to make the generated text more coherent and aligned with expectations. It has the advantage of being adaptive and automatically adjusting the strategy according to different generation tasks.</p></list-item>
</list></p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Evaluation Metrics</title>
<p>This paper evaluates the following automated metrics for the target text:
<list list-type="bullet">
<list-item>
<p>Perplexity (PPL): complexity PPL is used to evaluate language models and is often considered a proxy for the generation quality. All GPT -2-based models use a byte-level tokenization scheme (PPL-BPE), based on which the models&#x2019; word-level complexity (PPL-Word) is also computed in this paper to compare them with previous models.</p></list-item>
<list-item>
<p>ROUGE: The ROUGE score calculates the n-gram overlap between the test-generated text and the given target text. For completeness, ROUGE scores are computed for n-gram overlaps (ROUGE-1, ROUGE-2, and ROUGE-L) as well as F1 scores for each precision (P) and recall (R).</p></list-item>
<list-item>
<p>BLEU: The BLEU score considers the degree of N-gram overlap between the generated text and the reference text. BLEU-4 is a form of the BLEU metric that evaluates the overlap of 4-grams (four consecutive words) and normalizes these matches. The BLEU value ranges from 0 to 1, with values closer to 1 indicating that the generated text is closer to the reference text.</p></list-item>
<list-item>
<p>Manual Evaluation: Generated Readability, Relevance, and Completeness were manually scored (out of 10) in the model robustness experiment.</p></list-item>
</list></p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Parameter Settings</title>
<p>This paper uses the smallest publicly available version of GPT-2 as the model&#x2019;s decoder for computational efficiency. Specifically, it has L &#x003D; 12 layers, H &#x003D; 12 attention heads per layer, and a model dimensionality of d &#x003D; 768 units, totaling 117 million parameters. In addition, To address the posterior collapse problem, we implement a cyclic annealing schedule by modifying the coefficient <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula> before KL divergence in (2). Specifically, we have kept <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula> close to zero in the first half of the cyclic schedule, linearly annealed <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula> to 1 in the next one-fourth of the cyclic schedule and kept <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mi>&#x03B2;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> in the remaining one-fourth of the cyclic schedule. Finally, In evaluation, we generate stories using the top-k top-p random sampling scheme with k &#x003D; 100 and p &#x003D; 0.95. Temperature smoothing technique is also applied with T &#x003D; 0.95.</p>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Threshold Selection</title>
<p>When structuring a text graph, the text similarity within the window needs to be compared with a threshold to dynamically adjust the window size to better capture the relationships between the texts. Therefore, the selection of the size of the threshold is critical. By choosing different thresholds for several experiments, the results are shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. As can be seen from the figure, when the threshold is too small, almost all pairs of nodes in the graph are connected, resulting in the graph becoming too dense with nodes connected by a large number of irrelevant edges. These edges represent texts unrelated to the context, which can interfere with subsequent feature propagation and learning. At the same time, when the threshold is too large, the connectivity of the adjacency matrix decreases, and too low a connectivity may lead to isolated nodes or small isolated subgraphs, which prevents the GCN from efficiently propagating the features and, thus, from capturing the relationships between texts.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Between the threshold T and the node degree distribution of the adjacency matrix</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-5.tif"/>
</fig>
</sec>
<sec id="s4_6">
<label>4.6</label>
<title>Results</title>
<sec id="s4_6_1">
<label>4.6.1</label>
<title>Pre-Experiment on VAE</title>
<p>To assess the effectiveness of transformer-based latent variable modeling, this study initially conducted pre-experiments using the VAE architecture on two small datasets. The prior followed a standard spherical Gaussian distribution, N(1, I). The VAE model performed pure unsupervised learning, where unlabeled linguistic text was encoded into a distributed representation and then decoded. The Arxiv and Yelp datasets were chosen for the pre-experiments. The Arxiv dataset was compiled by searching for topics in Arxiv, collecting article abstracts that matched the selected keywords: &#x201C;Artificial Intelligence,&#x201D; &#x201C;Computer Vision,&#x201D; and &#x201C;Text Generation.&#x201D; The Yelp dataset, a public collection of restaurant reviews from the &#x201C;Yelp&#x201D; website, was also used. In this study, reviews with ratings above three were classified as positive, while those with ratings of three or below were considered negative.</p>
</sec>
<sec id="s4_6_2">
<label>4.6.2</label>
<title>Data Visualizations</title>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> illustrates the posterior z of the text in the test dataset in a two-dimensional space using t-SNE. As can be seen from the figure, the model learns meaningful latent spaces and clusters the high-dimensional data based on the proximity between latent codes. In the Arxiv dataset, the &#x201C;Artificial Intelligence&#x201D; cluster sits between the &#x201C;Computer Vision&#x201D; and &#x201C;Language Generation&#x201D; clusters, which coincides with our understanding of these topics. This visualization shows the model&#x2019;s excellent representation learning capabilities.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Pre-experiment results. (a) Arxiv: Topic are draw in different colors: red for artificial intelligence; blue for computer vision; green for language generation; (b) Yelp: Sentiment are draw in two colors: red for negative; blue for positive</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-6.tif"/>
</fig>
<p>Secondly, to illustrate the enhancement of text structure and coherence by graph structuring, this paper conducts comparative experiments on the Yelp dataset to compare the text structure and coherence between the original text representation and the textual relational representations after the text graph structuring process. The results are visualized by dimensionality reduction in t-SNE. As shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, graph-structured texts (red dots) are more centrally clustered in 2D space and have firmer semantic consistency and coherence than original texts (blue dots). This suggests that the relationship between different text parts can be better captured through text graph structuring, thus improving the overall quality and controllability of the generated text.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>t-SNE visualization: comparison of textual relational representations of raw text and graph-structured text. The blue points represent the downscaled representation of the original text and the red points represent the downscaled representation of the graph-structured text</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-7.tif"/>
</fig>
</sec>
<sec id="s4_6_3">
<label>4.6.3</label>
<title>Comparative Experiments</title>
<p><xref ref-type="fig" rid="fig-8">Figs. 8</xref> and <xref ref-type="fig" rid="fig-9">9</xref> show the experimental results, which show that the word-level PPL and byte-level PPL of the GSPT-CVAE model are lower than those of other models, both from the beginning iteration to the 40,000th iteration and after the model has converged. In the ROUGE scoring metrics, the F1 scores of ROUGE-1, ROUGE-2, and ROUGE-L are higher than those of the traditional model, proving that the model in this paper has better performance, generation effectiveness, and controllability.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Comparison of experimental results between the GSPT-CVAE model and other models on the PPL-Word and PPL-BPE metrics</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-8a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-8b.tif"/>
</fig><fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Comparison of experimental results between the GSPT-CVAE model and other models on the rouge score and BLEU-4 metrics</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-9.tif"/>
</fig>
<p>Meanwhile, the method used in the GSPT-CVAE model was compared with the comparative method chosen for this paper for various metrics, and the method used in the GSPT-CVAE model has better performance than other methods used for controlling text generation. <xref ref-type="table" rid="table-2">Table 2</xref> shows the experimental results of each method on the evaluation metrics, and <xref ref-type="table" rid="table-3">Table 3</xref> analyzes the advantages and disadvantages of the other methods and explains the superiority of GSPT-CVAE.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Comparison of different methods for assessing metrics for controlled text generation</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Methods</th>
<th>PPL-BPE</th>
<th>PPL-word</th>
<th>ROUGE-1 F1</th>
<th>ROUGE-2 F1</th>
<th>ROUGE-L F1</th>
</tr>
</thead>
<tbody>
<tr>
<td>FIST</td>
<td>26.5</td>
<td>38.9</td>
<td>0.181</td>
<td>0.023</td>
<td>0.17</td>
</tr>
<tr>
<td>PSA</td>
<td>47.8</td>
<td>79.5</td>
<td>0.188</td>
<td>0.026</td>
<td>0.172</td>
</tr>
<tr>
<td>Fusion</td>
<td>&#x2013;</td>
<td>36.0</td>
<td>0.223</td>
<td>0.038</td>
<td>0.206</td>
</tr>
<tr>
<td>DP</td>
<td>34.7</td>
<td>34.6</td>
<td>0.24</td>
<td>0.04</td>
<td>0.215</td>
</tr>
<tr>
<td>GSPT-CVAE</td>
<td>21.786</td>
<td>34.303</td>
<td>0.243</td>
<td>0.041</td>
<td>0.22</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Analysis of the strengths and weaknesses of the methods used in other models and the superiority of the methods used in GSPT-CVAE</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Method</th>
<th align="center">Features and advantages</th>
<th align="center">Limitations</th>
<th align="center">Improvements of GSPT-CVAE</th>
</tr>
</thead>
<tbody>
<tr>
<td>FIST</td>
<td>- Guides the model to focus on specific tasks or samples through special tokens - Marks important positions or sample attributes</td>
<td>- Relies solely on tokens, making it difficult to capture complex semantic relationships - Limited to specific tasks, weak cross-task generalization</td>
<td>- Introduces structured text graphs to capture global semantic relationships - Stronger cross-task generalization capability generalization</td>
</tr>
<tr>
<td>PSA</td>
<td>- Uses pseudo self-attention to selectively focus on different parts of the input - Marks important positions or sample attributes</td>
<td>- Relies solely on tokens, making it difficult to capture complex semantic relationships - Limited to specific tasks, weak cross-task generalization</td>
<td>- Models global semantics using latent variables - Utilizes CVAE to achieve multi-attribute joint control</td>
</tr>
<tr>
<td>Fusion</td>
<td>- Integrates multi-modal inputs (e.g., text, images, speech), and generates richer and more diverse text in multi-modal scenarios - Provides precise control over text attributes, style, and content</td>
<td>- Performance drops in single-modal scenarios (e.g., text only) - Requires significant computational resources for multi-modal integration, reducing efficiency</td>
<td>- Uses latent variables and structured text graphs, eliminating the need for multi-modal input - Fully captures semantic relationships in single-modal scenarios</td>
</tr>
<tr>
<td>Dynamic Prompting (DP)</td>
<td>- Dynamically generates prompts based on context with adaptive capabilities - Ensures coherent text generation and adapts to various tasks</td>
<td>- Limited adaptability in complex control scenarios - Dynamic adjustments to context prompts may lead to instability in text control</td>
<td>- Stabilizes text control using latent variables and attention mechanisms - Provides efficient context modeling</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Furthermore, we compare our model with T5, GPT-4, and Llama 2, representing state-of-the-art large language models. As seen in <xref ref-type="fig" rid="fig-10">Figs. 10</xref> and <xref ref-type="fig" rid="fig-11">11</xref>, GSPT-CVAE achieves a lower word-level PPL (34.303) than T5 (35.245) and Llama 2 (35.526), indicating its superior efficiency in controllable text generation. In terms of BLEU-4 scores, our model maintains a stable advantage, surpassing T5 (0.065), GPT-4 (0.064), and Llama 2 (0.063) with a final score of 0.066, demonstrating better consistency in text fluency. Regarding the ROUGE evaluation, GSPT-CVAE outperforms T5, GPT-4, and Llama 2 across all F1-score metrics. Precisely, in ROUGE-1, our model attains 0.243, compared to T5 (0.24), GPT-4 (0.237), and Llama 2 (0.235), confirming its superiority in capturing relevant textual features. Similarly, ROUGE-2 F1 scores show that GSPT-CVAE achieves 0.049, while T5, GPT-4, and Llama 2 obtain 0.047, 0.045, and 0.044, respectively, further reinforcing our model&#x2019;s ability to generate semantically rich and structurally coherent content. Lastly, for ROUGE-L F1, our model attains 0.22, which is higher than T5 (0.215), GPT-4 (0.212), and Llama 2 (0.21), highlighting its effectiveness in producing well-structured and meaningful long text.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Comparison of experimental results between the GSPT-CVAE model and other models on the PPL-Word and PPL-BPE metrics</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-10.tif"/>
</fig><fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Comparison of experimental results between the GSPT-CVAE model and other models on the PPL-Word and PPL-BPE metrics</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-11.tif"/>
</fig>
<p>Finally, we conducted two more comparison experiments of the model on WritingPrompts and WikiPlots datasets in addition to the Arxiv dataset, and the results are shown in <xref ref-type="table" rid="table-4">Table 4</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Comparison of different methods for assessing metrics for controlled text generation</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>PPL-BPE</th>
<th>PPL-word</th>
<th>ROUGE-1 F1</th>
<th>ROUGE-2 F1</th>
<th>ROUGE-L F1</th>
</tr>
</thead>
<tbody>
<tr>
<td colspan="6"><bold>WritingPrompts dataset</bold></td>
</tr>
<tr>
<td>T5</td>
<td>23.8</td>
<td>28.6</td>
<td>0.258</td>
<td>0.045</td>
<td>0.343</td>
</tr>
<tr>
<td>GPT-4</td>
<td>22.9</td>
<td>27.9</td>
<td>0.262</td>
<td>0.048</td>
<td>0.348</td>
</tr>
<tr>
<td>Llama 2</td>
<td>23.5</td>
<td>28.3</td>
<td>0.260</td>
<td>0.047</td>
<td>0.345</td>
</tr>
<tr>
<td>GSPT-CVAE</td>
<td>23.1</td>
<td>27.8</td>
<td>0.265</td>
<td>0.049</td>
<td>0.350</td>
</tr>
<tr>
<td colspan="6"><bold>WikiPlots dataset</bold></td>
</tr>
<tr>
<td>T5</td>
<td>25.1</td>
<td>30.2</td>
<td>0.238</td>
<td>0.041</td>
<td>0.315</td>
</tr>
<tr>
<td>GPT-4</td>
<td>24.5</td>
<td>29.7</td>
<td>0.241</td>
<td>0.044</td>
<td>0.318</td>
</tr>
<tr>
<td>Llama 2</td>
<td>24.8</td>
<td>30.0</td>
<td>0.240</td>
<td>0.043</td>
<td>0.317</td>
</tr>
<tr>
<td>GSPT-CVAE</td>
<td>24.2</td>
<td>29.4</td>
<td>0.243</td>
<td>0.045</td>
<td>0.320</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>These results collectively demonstrate that GSPT-CVAE surpasses traditional controllable text generation models and maintains competitive performance against advanced large-scale models such as T5, GPT-4, and Llama 2, particularly in controllability, generation fluency, and coherence.</p>
</sec>
<sec id="s4_6_4">
<label>4.6.4</label>
<title>Robustness Experiment</title>
<p>To analyze the performance of GSPT-CVAE under adversarial conditions, this paper conducts robustness experiments on the model in terms of noisy inputs, very short cues, and domain shifts, respectively. Among them, noisy input is set to randomly insert noise (e.g., spelling mistakes, random symbols, missing words, etc.) into the input text to simulate dirty data in the real world; very short cue is set to input cues of extremely short length (less than five words); and domain transfer is set to go from the Arxiv scientific paper dataset to the CNN/DailyMail News dataset. The experimental results are shown in <xref ref-type="table" rid="table-5">Tables 5</xref>&#x2013;<xref ref-type="table" rid="table-7">7</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Noisy input test</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>BLEU</th>
<th>ROUGE-L</th>
<th>PPL</th>
</tr>
</thead>
<tbody>
<tr>
<td>GSPT-CVAE</td>
<td>18.4</td>
<td>42.5</td>
<td>26.1</td>
</tr>
<tr>
<td>T5</td>
<td>16.8</td>
<td>40.2</td>
<td>28.7</td>
</tr>
<tr>
<td>GPT-4</td>
<td>20.1</td>
<td>45.3</td>
<td>24.5</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Short prompt test</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>Readability</th>
<th>Relevance</th>
<th>Completeness</th>
</tr>
</thead>
<tbody>
<tr>
<td>GSPT-CVAE</td>
<td>8.3</td>
<td>7.9</td>
<td>7.5</td>
</tr>
<tr>
<td>T5</td>
<td>7.5</td>
<td>7.2</td>
<td>6.8</td>
</tr>
<tr>
<td>GPT-4</td>
<td>9.1</td>
<td>8.5</td>
<td>8.7</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Cross-domain test</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>ROUGE-1</th>
<th>ROUGE-2 F1</th>
<th>ROUGE-L</th>
</tr>
</thead>
<tbody>
<tr>
<td>GSPT-CVAE</td>
<td>39.2</td>
<td>19.5</td>
<td>35.1</td>
</tr>
<tr>
<td>T5</td>
<td>36.8</td>
<td>17.9</td>
<td>33.2</td>
</tr>
<tr>
<td>GPT-4</td>
<td>42.1</td>
<td>21.3</td>
<td>37.4</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From the experimental results, when performing the input noise test, GSPT-CVAE outperforms T5 but is slightly lower than GPT-4. Its PPL value is relatively low, indicating it maintains good text fluency on noisy data. Due to the use of GCN structure to model global textual relations, even if part of the input information is corrupted, the model can still fill in the missing information by using the graph structure to reduce semantic distortion. In the future, data enhancement methods, such as spelling correction and noise simulation training, can be used to improve the model&#x2019;s robustness further. In the very short cue test, GSPT-CVAE can extend the short cue better and generate long text with certain contextual logic, which performs better than T5. At the same time, the information completeness score is lower than that of GPT-4, which may be because the latent variable relies on more contextual information. There is not enough information under the short cue. In the future, Few-shot Learning training can be added to adapt to very short cue scenarios or knowledge distillation can be introduced to enhance the short text generation capability of GSPT-CVAE by borrowing the high-quality text generated by GPT-4. Finally, in the cross-domain test, the generalization ability of GSPT-CVAE is more potent, which may be because the GCN structure can extract the global relationship, and even if the style of the text varies a lot, it can still maintain a certain degree of contextual consistency. The structure of the CVAE provides a more flexible control of latent variables, which is adaptable to the contexts of different domains.</p>
</sec>
<sec id="s4_6_5">
<label>4.6.5</label>
<title>Ablation Experiment</title>
<p>To verify that the methods used in the GSPT-CVAE model are indeed effective, this paper conducts ablation experiments on the three methods used in the model, i.e., the structured processing of the text map, the merging of the vector sequences, and the combination of the embedding and the potential attention computation. It visualizes the experimental results of the assessment metrics. As shown in <xref ref-type="fig" rid="fig-12">Fig. 12</xref>, after eliminating the structured processing of the text map, both the F1 and BLEU scores in the ROUGE decreased significantly, indicating the effectiveness of the structured text map. In contrast, after eliminating the potential attention computation, the F1 score was lower than that of the GSPT-CVAE model except in some moments. In contrast, the BLEU score was ultimately lower than the GSPT-CVAE model&#x2019;s. Finally, after eliminating the merging layer of vector sequences Finally, after eliminating the vector sequence merging layer, the F1 scores are slightly higher than the original model in some moments. However, the overall scores are still significantly lower than the GSPT-CVAE model. The BLEU scores are lower than the GSPT-CVAE model in iterations and only gradually equal to the GSPT-CVAE model after the model converges.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Results of ablation experiments of the three methods on each evaluation metrics</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63209-fig-12.tif"/>
</fig>
<p>The results of the above ablation experiments show that all three methods used in the GSPT-CVAE model have enhanced the effectiveness and controllability of the model&#x2019;s text generation, which also verifies the superiority of the GSPT-CVAE model proposed in this paper in terms of controllable text generation.</p>
</sec>
<sec id="s4_6_6">
<label>4.6.6</label>
<title>Generated Cases</title>
<p><xref ref-type="table" rid="table-8">Table 8</xref> shows the performance of the traditional CVAE model, T-CVAE model, Bert-based CVAE model, and the GSPT-CVAE model proposed in this paper in the actual text generation cases. From the table, it can be seen that all models can generate the corresponding text based on the conditions when the conditions are given. However, the text generated by the traditional CVAE model differs significantly from the target text. The length of the generated text is relatively short compared with other models. However, the text generated by the T-CVAE model is close to the target text in terms of sentence meaning and length, there are many sick sentences and logical errors in the text, and even grammatical errors; Bert-based CVAE model has a good overall performance in generating performance, but there are a few semantic ambiguities and other problems; while the GSPT-CVAE model proposed in this paper is the closest to the target text in terms of both text length and text quality, and there are no grammatical, logical, or semantic errors in the generated text. It is proved that the model has excellent long-text generation capability.</p>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>Cases generated by different models on the Arxiv test dataset</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Prompt text</th>
<th align="center">Reactive programming in standard ML</th>
</tr>
</thead>
<tbody>
<tr>
<td>Target text</td>
<td>A reactive system is a system that maintains a continuous interaction with its environment and is activated by receiving input events from the environment and generating output events in response. Modern programming languages specialized for programming such systems use a paradigm based on the concepts of transience and activation. We present a standard ML library that provides a basic prototype for programming reactive systems. The library is an underlying system upon which more complex reactive behaviors can be built, and it provides a convenient framework for prototypical extensions to existing reactive languages.</td>
</tr>
<tr>
<td>Generted text (CVAE)</td>
<td>We created a Python program similar to JavaScript but faster. It supports web and mobile apps, using HTML-like elements and functions for interactions. It can sort numbers and is memory-efficient, though it needs more testing for long tasks.</td>
</tr>
<tr>
<td>Generted text (T-CVAE)</td>
<td>We introduce a reactive ML language focused on dynamic values, adding more values and operations. Our aim is a reactive programming framework, demonstrated through an MFG tool case. We use a simplified predicate calculus model and briefly touch on type theory. Simplified reactive methods make this language practical for developers.</td>
</tr>
<tr>
<td>Generted text (Bert-CVAE)</td>
<td>We present a language based on ML&#x2019;s reactive paradigm, enabling explicit value assignments and dynamic updates. This reactive programming form in ML allows multi-value operations and uses specialized compilers. A case study illustrates its dynamic features, utilizing ML within simple predicate calculus to demonstrate functionality and provide new insights into programming languages.</td>
</tr>
<tr>
<td>Generted text (GSPT-CVAE)</td>
<td>We present a dynamic programming language based on ML&#x2019;s reactive programming paradigm, which extends ML by enabling explicit value assignments. This language supports multiple values, operations on values, and compiler usage. Using the MFG tool, we implement it in ML with a first-class predicate calculus and reformulate its type theory similarly to reactive programming. A case study demonstrates its implementation, yielding a type-theoretic description.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusions</title>
<p>This paper presents a GSPT-CVAE model for controlled text generation. The model improves the effectiveness and controllability of text generation through three innovative approaches: text graph structuring, vector sequence merging, and combining embedding with latent attention computation. Unlike traditional controlled text generation, the model focuses on long text generation efforts through cueing methods, i.e., given an input cue text, the model generates long text within the domain of that cue text. The superiority of the GSPT-CVAE model and the veracity of the methodology used in the model are verified through experiments. The GSPT-CVAE model is of great significance in practical applications. It can significantly improve the efficiency and quality of content creation, especially for scenarios that require rich expression and diversified generation, such as literary creation and news writing. The model can support personalized teaching by generating personalized learning materials according to different topics and objectives in education. In addition, the model is widely used for automatic report generation, for example, in business, healthcare, and scientific research, where it can generate structured, logical, and long text reports based on input prompts, thus significantly reducing manual editing time. These application scenarios demonstrate the potential of the GSPT-CVAE model as an indispensable tool for text-generation tasks. The model can be deployed as a microservices-based API, allowing developers to integrate it into existing content management platforms, chatbots, and automated writing assistants. By leveraging a distributed inference architecture (e.g., deployed on a cloud-based GPU cluster or deployed at the edge using model quantization), GSPT-CVAE can efficiently process real-time text generation requests. Additionally, a Retrieval Augmented Generation (RAG) pipeline can be added to enable the model to access external knowledge bases, thus improving factual accuracy and reducing illusions. Future work will focus on optimizing the inference speed of the model, improving its robustness to brief prompts, and developing interactive fine-tuning mechanisms that dynamically incorporate user feedback to improve the quality of the output. These advances will further solidify the reliability and scalability of GSPT-CVAE for real-world text generation tasks. However, the model still has some limitations, such as the sparse representation of graphs obtained by structuring text graphs when the input prompts are too short. This leads to problems such as the model results may not be optimal. Therefore, there is still a long way to go in the research of controlled text generation, and there are many methods or new models that can be improved or optimized in the future, and more efforts need to be invested in the research.</p>
</sec>
</body>
<back>
<ack>
<p>None.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Jun Tu; data collection, analysis, interpretation of results and draft manuscript preparation: Tian Zhao; manuscript guidance: Puzheng Quan, Ruisheng Xiong. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Data openly available in a public repository. The data that support the findings of this study are openly available in github at <ext-link ext-link-type="uri" xlink:href="https://github.com/gcunhase/ArXivAbsTitleDataset">https://github.com/gcunhase/ArXivAbsTitleDataset</ext-link>; <ext-link ext-link-type="uri" xlink:href="https://github.com/seanreed1111/yelp-reviews">https://github.com/seanreed1111/yelp-reviews</ext-link> (accessed on 10 April 2025).</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Song</surname> <given-names>H</given-names></string-name>, <string-name><surname>Li</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>M</given-names></string-name>, <string-name><surname>Song</surname> <given-names>D</given-names></string-name></person-group>. <article-title>A survey of controllable text generation using transformer-based pre-trained language models</article-title>. <source>ACM Comput Surv</source>. <year>2023</year>;<volume>56</volume>(<issue>3</issue>):<fpage>1</fpage>&#x2013;<lpage>37</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3676955</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Prabhumoye</surname> <given-names>S</given-names></string-name>, <string-name><surname>Black</surname> <given-names>AW</given-names></string-name>, <string-name><surname>Salakhutdinov</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Exploring controllable text generation techniques</article-title>. In: <conf-name>Proceedings of the 28th International Conference on Computational Linguistics</conf-name>. <publisher-loc>Barcelona, Spain (Online)</publisher-loc>; 2020. p. <fpage>1</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Yuan</surname> <given-names>W</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Hayashi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Neubig</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Pre-train, prompt, and predict: a systematic survey of prompting methods in natural language processing</article-title>. <source>ACM Comput Surv</source>. <year>2023</year>;<volume>55</volume>(<issue>9</issue>):<fpage>1</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3560815</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zheng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>H</given-names></string-name>, <string-name><surname>Han</surname> <given-names>X</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Toward unified controllable text generation via regular expression instruction</article-title>. In: <conf-name>Proceedings of the 13th International Joint Conference on Natural Language Processing and the 3rd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics (Volume 1: Long Papers)</conf-name>. <publisher-loc>Nusa Dua, Bali</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; 2023. p. <fpage>1</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>W</given-names></string-name>, <string-name><surname>Song</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Mao</surname> <given-names>Z</given-names></string-name></person-group>. <chapter-title>Text style transfer with contrastive transfer pattern mining</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Rogers</surname> <given-names>A</given-names></string-name>, <string-name><surname>Boyd-Graber</surname> <given-names>J</given-names></string-name>, <string-name><surname>Okazaki</surname> <given-names>N</given-names></string-name></person-group>, editors. <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>. <publisher-loc>Toronto, ON, Canada</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>7914</fpage>&#x2013;<lpage>27</lpage>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Parallel refinements for lexically constrained text generation with BART</article-title>. In: <conf-name>Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing</conf-name>. <publisher-loc>Online and Punta Cana, Dominican Republic</publisher-loc>; <year>2021</year>. p. <fpage>8653</fpage>&#x2013;<lpage>66</lpage>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Cui</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Che</surname> <given-names>W</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>T</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>G</given-names></string-name></person-group>. <chapter-title>Revisiting pre-trained models for chinese natural language processing</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Cohn</surname> <given-names>T</given-names></string-name>, <string-name><surname>He</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: EMNLP 2020</source>. <publisher-loc>Online</publisher-loc>; <year>2020</year>. p. <fpage>657</fpage>&#x2013;<lpage>68</lpage>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Ouyang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Almeida</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wainwright</surname> <given-names>C</given-names></string-name>, <string-name><surname>Mishkin</surname> <given-names>P</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>Training language models to follow instructions with human feedback</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Koyejo</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mohamed</surname> <given-names>S</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>A</given-names></string-name>, <string-name><surname>Belgrave</surname> <given-names>D</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>K</given-names></string-name>, <string-name><surname>Oh</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Advances in neural information processing systems</source>. Vol. <volume>35</volume>. <publisher-loc>Newry, UK</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>; <year>2022</year>. p. <fpage>27730</fpage>&#x2013;<lpage>44</lpage>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>W</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>YE</given-names></string-name>, <string-name><surname>Wilcox</surname> <given-names>E</given-names></string-name>, <string-name><surname>Cotterell</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sachan</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Controlled text generation with natural language instructions</article-title>. In: <conf-name>Proceedings of the 40th International Conference on Machine Learning</conf-name>;<year> 2023</year>. Vol. <volume>202</volume>, p. <fpage>42602</fpage>&#x2013;<lpage>13</lpage>
</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Qian</surname> <given-names>J</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>L</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>F</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Controllable natural language generation with contrastive prefixes</article-title>. In: <conf-name>Findings of the Association for Computational Linguistics: ACL 2022</conf-name>; <year>2022</year>; <publisher-loc>Dublin, Ireland</publisher-loc>. p. <fpage>2912</fpage>&#x2013;<lpage>24</lpage>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Gu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>H</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>B</given-names></string-name></person-group>. <article-title>A distributional lens for multi-aspect controllable text generation</article-title>. In: <conf-name>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</conf-name>; <year>2022</year>; <publisher-loc>Abu Dhabi, United Arab Emirates</publisher-loc>. p. <fpage>1023</fpage>&#x2013;<lpage>43</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zou</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yin</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Controllable generation from pre-trained language models via inverse prompting</article-title>. In: <conf-name>Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery &#x0026; Data Mining: KDD &#x2019;21</conf-name>; <year>2021</year>; <publisher-loc>New York, NY, USA</publisher-loc>. p. <fpage>2450</fpage>&#x2013;<lpage>60</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3447548.3467418</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Pascual</surname> <given-names>D</given-names></string-name>, <string-name><surname>Egressy</surname> <given-names>B</given-names></string-name>, <string-name><surname>Meister</surname> <given-names>C</given-names></string-name>, <string-name><surname>Cotterell</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wattenhofer</surname> <given-names>R</given-names></string-name></person-group>. <chapter-title>A plug-and-play method for controlled text generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Moens</surname> <given-names>MF</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Specia</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wen-tau</surname> <given-names>Yih S</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: EMNLP 2021</source>. <publisher-loc>Punta Cana, Dominican Republic</publisher-loc>; <year>2021</year>. p. <fpage>3973</fpage>&#x2013;<lpage>97</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Klein</surname> <given-names>D</given-names></string-name></person-group>. <chapter-title>FUDGE: controlled text generation with future discriminators</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Toutanova</surname> <given-names>K</given-names></string-name>, <string-name><surname>Rumshisky</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zettlemoyer</surname> <given-names>L</given-names></string-name>, <string-name><surname>Hakkani-Tur</surname> <given-names>D</given-names></string-name>, <string-name><surname>Beltagy</surname> <given-names>I</given-names></string-name>, <string-name><surname>Bethard</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal></person-group>, editors. <source>Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>. <publisher-loc>Online</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2021</year>. p. <fpage>3511</fpage>&#x2013;<lpage>35</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Gu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>H</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>B</given-names></string-name></person-group>. <chapter-title>Improving controllable text generation with position-aware weighted decoding</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Muresan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Nakov</surname> <given-names>P</given-names></string-name>, <string-name><surname>Villavicencio</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: ACL 2022</source>. <publisher-loc>Dublin, Ireland</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2022</year>. p. <fpage>3449</fpage>&#x2013;<lpage>67</lpage>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Jia</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Customizable text generation via conditional text generative adversarial network</article-title>. <source>Neurocomputing</source>. <year>2020</year>;<volume>416</volume>(<issue>1</issue>):<fpage>125</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2018.12.092</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Welleck</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kulikov</surname> <given-names>I</given-names></string-name>, <string-name><surname>Roller</surname> <given-names>S</given-names></string-name>, <string-name><surname>Dinan</surname> <given-names>E</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>K</given-names></string-name>, <string-name><surname>Weston</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Neural text generation with unlikelihood training</article-title>. In: <conf-name>International Conference on Learning Representations</conf-name>; <year>2020</year>. p. <fpage>1</fpage>&#x2013;<lpage>18</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Su</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lan</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yogatama</surname> <given-names>D</given-names></string-name>, <string-name><surname>Kong</surname> <given-names>L</given-names></string-name>, <string-name><surname>Collier</surname> <given-names>N</given-names></string-name></person-group>. <chapter-title>A contrastive framework for neural text generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Koyejo</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mohamed</surname> <given-names>S</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>A</given-names></string-name>, <string-name><surname>Belgrave</surname> <given-names>D</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>K</given-names></string-name>, <string-name><surname>Oh</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Advances in neural information processing systems</source>. Vol. <volume>35</volume>. <publisher-loc>Newry, UK</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>; <year>2022</year>. p. <fpage>21548</fpage>&#x2013;<lpage>61</lpage>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Shu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Papangelis</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>YC</given-names></string-name>, <string-name><surname>Tur</surname> <given-names>G</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Feizollahi</surname> <given-names>Z</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>Controllable text generation with focused variation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Cohn</surname> <given-names>T</given-names></string-name>, <string-name><surname>He</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: EMNLP 2020</source>. <publisher-loc>Online</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2020</year>. p. <fpage>3805</fpage>&#x2013;<lpage>17</lpage>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Gr&#x00F6;ner</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zarrie&#x00DF;</surname> <given-names>S</given-names></string-name>, <string-name><surname>Eger</surname> <given-names>S</given-names></string-name></person-group>. <chapter-title>Evaluating diversity in automatic poetry generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Al-Onaizan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bansal</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>YN</given-names></string-name></person-group>, editors. <source>Proceedings of the 2024 conference on empirical methods in natural language processing</source>. <publisher-loc>Miami, FL, USA</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2024</year>. p. <fpage>19671</fpage>&#x2013;<lpage>92</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Thickstun</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gulrajani</surname> <given-names>I</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>PS</given-names></string-name>, <string-name><surname>Hashimoto</surname> <given-names>TB</given-names></string-name></person-group>. <chapter-title>Diffusion-LM improves controllable text generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Koyejo</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mohamed</surname> <given-names>S</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>A</given-names></string-name>, <string-name><surname>Belgrave</surname> <given-names>D</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>K</given-names></string-name>, <string-name><surname>Oh</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Advances in neural information processing systems</source>. Vol. <volume>35</volume>. <publisher-loc>Newry, UK</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>; <year>2022</year>. p. <fpage>4328</fpage>&#x2013;<lpage>43</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Lovelace</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kishore</surname> <given-names>V</given-names></string-name>, <string-name><surname>Wan</surname> <given-names>C</given-names></string-name>, <string-name><surname>Shekhtman</surname> <given-names>E</given-names></string-name>, <string-name><surname>Weinberger</surname> <given-names>KQ</given-names></string-name></person-group>. <chapter-title>Latent diffusion for language generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Oh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Naumann</surname> <given-names>T</given-names></string-name>, <string-name><surname>Globerson</surname> <given-names>A</given-names></string-name>, <string-name><surname>Saenko</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hardt</surname> <given-names>M</given-names></string-name>, <string-name><surname>Levine</surname> <given-names>S</given-names></string-name></person-group>, editors. <source>Advances in neural information processing systems</source>. Vol. <volume>36</volume>. <publisher-loc>Newry, UK</publisher-loc>: <publisher-name>Curran Associates, Inc.</publisher-name>; <year>2023</year>. p. <fpage>56998</fpage>&#x2013;<lpage>7025</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Liang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>M</given-names></string-name></person-group>. <chapter-title>Open-ended long text generation via masked language modeling</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Rogers</surname> <given-names>A</given-names></string-name>, <string-name><surname>Boyd-Graber</surname> <given-names>J</given-names></string-name>, <string-name><surname>Okazaki</surname> <given-names>N</given-names></string-name></person-group>, editors. <source>Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>. <publisher-loc>Toronto, ON, Canada</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>223</fpage>&#x2013;<lpage>41</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Li</surname> <given-names>P</given-names></string-name>, <string-name><surname>Bi</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Lai</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Kong</surname> <given-names>L</given-names></string-name></person-group>. <chapter-title>Event transition planning for open-ended text generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Muresan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Nakov</surname> <given-names>P</given-names></string-name>, <string-name><surname>Villavicencio</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: ACL 2022</source>. <publisher-loc>Dublin, Ireland</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2022 May 22&#x2013;27</year>. p. <fpage>3412</fpage>&#x2013;<lpage>26</lpage>. doi:<pub-id pub-id-type="doi">10.18653/v1/2022.findings-acl.269</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Mu</surname> <given-names>F</given-names></string-name>, <string-name><surname>Li</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Enhancing text generation via multi-level knowledge aware reasoning</article-title>. In: <conf-name>International Joint Conference on Artificial Intelligence</conf-name>; <year>2022</year>; <publisher-loc>Vienna, Austria</publisher-loc>. p. <fpage>4310</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Hu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chan</surname> <given-names>HP</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>L</given-names></string-name></person-group>. <chapter-title>PLANET: dynamic content planning in autoregressive transformers for long-form text generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Muresan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Nakov</surname> <given-names>P</given-names></string-name>, <string-name><surname>Villavicencio</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>. <publisher-loc>Dublin, Ireland</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2022</year>. p. <fpage>2288</fpage>&#x2013;<lpage>305</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Krause</surname> <given-names>B</given-names></string-name>, <string-name><surname>Gotmare</surname> <given-names>AD</given-names></string-name>, <string-name><surname>McCann</surname> <given-names>B</given-names></string-name>, <string-name><surname>Keskar</surname> <given-names>NS</given-names></string-name>, <string-name><surname>Joty</surname> <given-names>S</given-names></string-name>, <string-name><surname>Socher</surname> <given-names>R</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>GeDi: generative discriminator guided sequence generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Moens</surname> <given-names>MF</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Specia</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wen-tau</surname> <given-names>Yih S</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: EMNLP 2021</source>. <publisher-loc>Punta Cana, Dominican Republic</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2021</year>. p. <fpage>4929</fpage>&#x2013;<lpage>52</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A knowledge-enhanced pretraining model for commonsense story generation</article-title>. <source>Trans Assoc Comput Linguist</source>. <year>2020</year>;<volume>8</volume>(<issue>5</issue>):<fpage>93</fpage>&#x2013;<lpage>108</lpage>. doi:<pub-id pub-id-type="doi">10.1162/tacl_a_00302</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Guan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Mao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>C</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>W</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>M</given-names></string-name></person-group>. <chapter-title>Long text generation by modeling sentence-level and discourse-level coherence</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Zong</surname> <given-names>C</given-names></string-name>, <string-name><surname>Xia</surname> <given-names>F</given-names></string-name>, <string-name><surname>Li</surname> <given-names>W</given-names></string-name>, <string-name><surname>Navigli</surname> <given-names>R</given-names></string-name></person-group>, editors. <source>Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers)</source>; <publisher-loc>2021 Aug 1&#x2013;6</publisher-loc>. Virtual Event: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2021</year>. p. <fpage>6379</fpage>&#x2013;<lpage>93</lpage>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Patwary</surname> <given-names>M</given-names></string-name>, <string-name><surname>Shoeybi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Puri</surname> <given-names>R</given-names></string-name>, <string-name><surname>Fung</surname> <given-names>P</given-names></string-name>, <string-name><surname>Anandkumar</surname> <given-names>A</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Controllable story generation with external knowledge using large-scale language models</article-title>. In: <conf-name>Conference on Empirical Methods in Natural Language Processing 2020</conf-name>. <publisher-loc>Online</publisher-loc>; <year>2020</year>. p. <fpage>2831</fpage>&#x2013;<lpage>45</lpage>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lee</surname> <given-names>J</given-names></string-name>, <string-name><surname>Stevens</surname> <given-names>N</given-names></string-name>, <string-name><surname>Han</surname> <given-names>SC</given-names></string-name>, <string-name><surname>Song</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A survey of large language models in finance (FinLLMs)</article-title>. <source>Neural Comput Appl</source>. <year>2024</year>;<volume>33</volume>(<issue>240</issue>):<fpage>1877</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-024-10495-6</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Liang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Song</surname> <given-names>S</given-names></string-name>, <string-name><surname>Niu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>F</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>B</given-names></string-name>, <etal>et al</etal></person-group>. <chapter-title>UHGEval: benchmarking the hallucination of chinese large language models via unconstrained generation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Ku</surname> <given-names>LW</given-names></string-name>, <string-name><surname>Martins</surname> <given-names>A</given-names></string-name>, <string-name><surname>Srikumar</surname> <given-names>V</given-names></string-name></person-group>, editors. <source>Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>. <publisher-loc>Bangkok, Thailand</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2024</year>. p. <fpage>5266</fpage>&#x2013;<lpage>93</lpage>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Bartsch</surname> <given-names>H</given-names></string-name>, <string-name><surname>Jorgensen</surname> <given-names>O</given-names></string-name>, <string-name><surname>Rosati</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hoelscher-Obermaier</surname> <given-names>J</given-names></string-name>, <string-name><surname>Pfau</surname> <given-names>J</given-names></string-name></person-group>. <chapter-title>Self-consistency of large language models under ambiguity</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Belinkov</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Hao</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jumelet</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>N</given-names></string-name>, <string-name><surname>McCarthy</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mohebbi</surname> <given-names>H</given-names></string-name></person-group>, editors. <source>Proceedings of the 6th blackboxnlp workshop: analyzing and interpreting neural networks for NLP</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>89</fpage>&#x2013;<lpage>105</lpage>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Havaldar</surname> <given-names>S</given-names></string-name>, <string-name><surname>Singhal</surname> <given-names>B</given-names></string-name>, <string-name><surname>Rai</surname> <given-names>S</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Guntuku</surname> <given-names>SC</given-names></string-name>, <string-name><surname>Ungar</surname> <given-names>L</given-names></string-name></person-group>. <chapter-title>Multilingual language models are not multicultural: a case study in emotion</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Barnes</surname> <given-names>J</given-names></string-name>, <string-name><surname>De Clercq</surname> <given-names>O</given-names></string-name>, <string-name><surname>Klinger</surname> <given-names>R</given-names></string-name></person-group>, editors. <source>Proceedings of the 13th Workshop on Computational Approaches to Subjectivity, Sentiment, &#x0026; Social Media Analysis</source>. <publisher-loc>Toronto, ON, Canada</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>202</fpage>&#x2013;<lpage>14</lpage>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Ananiadou</surname> <given-names>S</given-names></string-name></person-group>. <article-title>EmoLLMs: a series of emotional large language models and annotation tools for comprehensive affective analysis</article-title>. In: <conf-name>KDD&#x2019;24: Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining</conf-name>; <year>2024</year>; <publisher-loc>Barcelona, Spain</publisher-loc>. p. <fpage>5487</fpage>&#x2013;<lpage>96</lpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Hua</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name></person-group>. <article-title>PAIR: planning and iterative refinement in pre-trained transformers for long text generation</article-title>. In: <conf-name>Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP)</conf-name>. <publisher-loc>Online</publisher-loc>; <year>2020</year>. p. <fpage>781</fpage>&#x2013;<lpage>93</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Kitaev</surname> <given-names>N</given-names></string-name>, <string-name><surname>Kaiser</surname> <given-names>L</given-names></string-name>, <string-name><surname>Levskaya</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Reformer: the efficient transformer</article-title>. In: <conf-name>ICLR 2020: The Eighth International Conference on Learning Representations</conf-name>. <publisher-loc>Online</publisher-loc>; <year>2020</year>. p. <fpage>1</fpage>&#x2013;<lpage>12</lpage>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Qin</surname> <given-names>G</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Van Durme</surname> <given-names>B</given-names></string-name></person-group>. <chapter-title>The NLP task effectiveness of long-range transformers</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Vlachos</surname> <given-names>A</given-names></string-name>, <string-name><surname>Augenstein</surname> <given-names>I</given-names></string-name></person-group>, editors. <source>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>. <publisher-loc>Dubrovnik, Croatia</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>3774</fpage>&#x2013;<lpage>90</lpage>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Carvalho</surname> <given-names>D</given-names></string-name>, <string-name><surname>Valentino</surname> <given-names>M</given-names></string-name>, <string-name><surname>Pratt-Hartmann</surname> <given-names>I</given-names></string-name>, <string-name><surname>Freitas</surname> <given-names>A</given-names></string-name></person-group>. <chapter-title>Improving semantic control in discrete latent spaces with transformer quantized variational autoencoders</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Graham</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Purver</surname> <given-names>M</given-names></string-name></person-group>, editors. <source>Findings of the Association for Computational Linguistics: EACL 2024</source>. <publisher-loc>St. Julian&#x2019;s, Malta</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2024</year>. p. <fpage>1434</fpage>&#x2013;<lpage>50</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Peynetti</surname> <given-names>E</given-names></string-name>, <string-name><surname>Meerman</surname> <given-names>V</given-names></string-name>, <string-name><surname>Tanner</surname> <given-names>C</given-names></string-name></person-group>. <chapter-title>What GPT knows about who is who</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Tafreshi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sedoc</surname> <given-names>J</given-names></string-name>, <string-name><surname>Rogers</surname> <given-names>A</given-names></string-name>, <string-name><surname>Drozd</surname> <given-names>A</given-names></string-name>, <string-name><surname>Rumshisky</surname> <given-names>A</given-names></string-name>, <string-name><surname>Akula</surname> <given-names>A</given-names></string-name></person-group>, editors. <source>Proceedings of the Third Workshop on Insights from Negative Results in NLP</source>. <publisher-loc>Dublin, Ireland</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2022</year>. p. <fpage>75</fpage>&#x2013;<lpage>81</lpage>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lei</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Beauchamp</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Sentence-level media bias analysis informed by discourse structures</article-title>. In: <conf-name>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</conf-name>; <year>2022</year>; <publisher-loc>Abu Dhabi, United Arab Emirates</publisher-loc>. p. <fpage>10040</fpage>&#x2013;<lpage>50</lpage>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Cai</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ahmed</surname> <given-names>SR</given-names></string-name>, <string-name><surname>Bonn</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wright-Bettner</surname> <given-names>K</given-names></string-name>, <string-name><surname>Palmer</surname> <given-names>M</given-names></string-name>, <string-name><surname>Martin</surname> <given-names>JH</given-names></string-name></person-group>. <chapter-title>CAMRA: copilot for AMR annotation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Feng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lefever</surname> <given-names>E</given-names></string-name></person-group>, editors. <source>Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing: System Demonstrations</source>. <publisher-loc>Singapore</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>381</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Li</surname> <given-names>F</given-names></string-name>, <string-name><surname>Fei</surname> <given-names>H</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Su</surname> <given-names>F</given-names></string-name>, <etal>et al.</etal></person-group> <chapter-title>Entity-centered cross-document relation extraction</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Goldberg</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Kozareva</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name></person-group>, editors. <source>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>. <publisher-loc>Abu Dhabi, United Arab Emirates</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2022</year>. p. <fpage>9871</fpage>&#x2013;<lpage>81</lpage>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>X</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Meng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>C</given-names></string-name></person-group>. <chapter-title>Summarize, outline, and elaborate: long-text generation via hierarchical supervision from extractive summaries</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Calzolari</surname> <given-names>N</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>CR</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>H</given-names></string-name>, <string-name><surname>Pustejovsky</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wanner</surname> <given-names>L</given-names></string-name>, <string-name><surname>Choi</surname> <given-names>KS</given-names></string-name> <etal>et al.</etal></person-group>, editors. <source>Proceedings of the 29th International Conference on Computational Linguistics</source>. <publisher-loc>Gyeongju, Republic of Korea</publisher-loc>: <publisher-name>International Committee on Computational Linguistics</publisher-name>; <year>2022</year>. p. <fpage>6392</fpage>&#x2013;<lpage>402</lpage>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Cavalin</surname> <given-names>PR</given-names></string-name>, <string-name><surname>Vasconcelos</surname> <given-names>M</given-names></string-name>, <string-name><surname>Grave</surname> <given-names>M</given-names></string-name>, <string-name><surname>Pinhanez</surname> <given-names>CS</given-names></string-name>, <string-name><surname>Ribeiro</surname> <given-names>VHA</given-names></string-name></person-group>. <article-title>From disjoint sets to parallel data to train seq2seq models for sentiment transfer</article-title>. In: <conf-name>Findings of the Association for Computational Linguistics: EMNLP 2020</conf-name>. <publisher-loc>Online</publisher-loc>; <year>2020</year>. p. <fpage>689</fpage>&#x2013;<lpage>98</lpage>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Wilmot</surname> <given-names>D</given-names></string-name>, <string-name><surname>Keller</surname> <given-names>F</given-names></string-name></person-group>. <chapter-title>Modelling suspense in short stories as uncertainty reduction over neural representation</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Jurafsky</surname> <given-names>D</given-names></string-name>, <string-name><surname>Chai</surname> <given-names>J</given-names></string-name>, <string-name><surname>Schluter</surname> <given-names>N</given-names></string-name>, <string-name><surname>Tetreault</surname> <given-names>J</given-names></string-name></person-group>, editors. <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>. <publisher-loc>Online</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2020</year>. p. <fpage>1763</fpage>&#x2013;<lpage>88</lpage>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>E</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>J</given-names></string-name></person-group>. <chapter-title>Long text generation with topic-aware discrete latent variable model</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Goldberg</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Kozareva</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name></person-group>, editors. <source>Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing</source>. <publisher-loc>United Arab Emirates</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2022</year>. p. <fpage>8100</fpage>&#x2013;<lpage>7</lpage>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Song</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Bi</surname> <given-names>W</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>M</given-names></string-name></person-group>. <chapter-title>Learning to customize model structures for few-shot dialogue generation tasks</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Jurafsky</surname> <given-names>D</given-names></string-name>, <string-name><surname>Chai</surname> <given-names>J</given-names></string-name>, <string-name><surname>Schluter</surname> <given-names>N</given-names></string-name>, <string-name><surname>Tetreault</surname> <given-names>J</given-names></string-name></person-group>, editors. <source>Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics</source>. <publisher-loc>Online</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2020</year>. p. <fpage>5832</fpage>&#x2013;<lpage>41</lpage>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Swamy</surname> <given-names>S</given-names></string-name>, <string-name><surname>Tabari</surname> <given-names>N</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>C</given-names></string-name>, <string-name><surname>Gangadharaiah</surname> <given-names>R</given-names></string-name></person-group>. <chapter-title>Contextual dynamic prompting for response generation in task-oriented dialog Systems</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Vlachos</surname> <given-names>A</given-names></string-name>, <string-name><surname>Augenstein</surname> <given-names>I</given-names></string-name></person-group>, editors. <source>Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics</source>. <publisher-loc>Dubrovnik, Croatia</publisher-loc>: <publisher-name>Association for Computational Linguistics</publisher-name>; <year>2023</year>. p. <fpage>3102</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>