<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">47811</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.047811</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Graph Convolutional Networks Embedding Textual Structure Information for Relation Extraction</article-title>
<alt-title alt-title-type="left-running-head">Graph Convolutional Networks embedding Textual Structure Information for Relation Extraction</alt-title>
<alt-title alt-title-type="right-running-head">Graph Convolutional Networks embedding Textual Structure Information for Relation Extraction</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Wei</surname><given-names>Chuyuan</given-names></name><email>weichyuan@bucea.edu.cn</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Li</surname><given-names>Jinzhe</given-names></name></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Wang</surname><given-names>Zhiyuan</given-names></name></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Wan</surname><given-names>Shanshan</given-names></name></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Guo</surname><given-names>Maozu</given-names></name></contrib>
<aff><institution>School of Electrical and Information Engineering, Beijing University of Civil Engineering and Architecture</institution>, <addr-line>Beijing, 102616</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Chuyuan Wei. Email: <email>weichyuan@bucea.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>15</day>
<month>5</month>
<year>2024</year></pub-date>
<volume>79</volume>
<issue>2</issue>
<fpage>3299</fpage>
<lpage>3314</lpage>
<history>
<date date-type="received">
<day>18</day>
<month>11</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>19</day>
<month>2</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Wei et al.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Wei et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_47811.pdf"></self-uri>
<abstract>
<p>Deep neural network-based relational extraction research has made significant progress in recent years, and it provides data support for many natural language processing downstream tasks such as building knowledge graph, sentiment analysis and question-answering systems. However, previous studies ignored much unused structural information in sentences that could enhance the performance of the relation extraction task. Moreover, most existing dependency-based models utilize self-attention to distinguish the importance of context, which hardly deals with multiple-structure information. To efficiently leverage multiple structure information, this paper proposes a dynamic structure attention mechanism model based on <bold>textual structure information</bold>, which deeply integrates word embedding, named entity recognition labels, part of speech, dependency tree and dependency type into a graph convolutional network. Specifically, our model extracts text features of different structures from the input sentence. <bold>Textual Structure information Graph Convolutional Networks</bold> employs the dynamic structure attention mechanism to learn multi-structure attention, effectively distinguishing important contextual features in various structural information. In addition, multi-structure weights are carefully designed as a merging mechanism in the different structure attention to dynamically adjust the final attention. This paper combines these features and trains a graph convolutional network for relation extraction. We experiment on supervised relation extraction datasets including SemEval 2010 Task 8, TACRED, TACREV, and Re-TACED, the result significantly outperforms the previous.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Relation extraction</kwd>
<kwd>graph convolutional neural networks</kwd>
<kwd>dependency tree</kwd>
<kwd>dynamic structure attention</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Relation Extraction (RE) aims to identify and extract the relation between two given entities in the input sentence. This task is vital in information extraction and has significant implications for various downstream natural language processing (NLP) applications, including sentiment analysis [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>], question-answering systems [<xref ref-type="bibr" rid="ref-3">3</xref>] and text summarization [<xref ref-type="bibr" rid="ref-4">4</xref>]. As a critical and challenging task, how to improve the performance of RE has attracted considerable attention from researchers.</p>
<p>It is very important to fully exploit the different types of features in text to enhance the performance of the RE task [<xref ref-type="bibr" rid="ref-5">5</xref>&#x2013;<xref ref-type="bibr" rid="ref-7">7</xref>]. To leverage rich feature information in the word sequences, many RE models [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-13">13</xref>] have been proposed for extracting relations between entities. These models include recurrent neural network (RNN)-based approaches, long short-term memory (LSTM)-based models and transformer-based architecture methods. However, such models struggle to capture long-distance connections between words when modeling the linear sequence of text. Many studies utilize additional features and knowledge to deal with this problem. In all the options, dependency parses have been widely used and proven to be effective [<xref ref-type="bibr" rid="ref-14">14</xref>&#x2013;<xref ref-type="bibr" rid="ref-17">17</xref>]. Dependency trees can provide long-distance word-word relations, which are essential supplementary structures for existing RE models. To effectively utilize dependency trees, most methods [<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>] employ graph convolutional networks (GCN) to model dependencies and extract relations between entities. Nevertheless, excessive reliance on dependency information could introduce confusion into RE [<xref ref-type="bibr" rid="ref-21">21</xref>&#x2013;<xref ref-type="bibr" rid="ref-26">26</xref>]. Recently, Zhang et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] combined a pruning strategy with GCN to model the dependency structure and perform RE. Tian et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] proposed a new model that distinguishes important contextual information by dependency attention. These methods focus on utilizing the graph structure information within word sequences but do not leverage other important text inner features, such as part-of-speech (POS) labels and named entity recognition (NER) labels. This omission may impact the performance of the RE model.</p>
<p>Despite their effectiveness, existing methods have the following drawbacks:</p>
<p>1) Most previous studies [<xref ref-type="bibr" rid="ref-27">27</xref>&#x2013;<xref ref-type="bibr" rid="ref-29">29</xref>] could not simultaneously utilize sequence-structure information and graph-structure information in the input text to extract the relation between entities. Some types of introduced sequence information in the model may help mitigate the effects of dependency noise. Such as the NER tags can provide entity features and build constrained relation between words, the POS tags can determine the function and feature of words, and the dependency trees can provide long-distance distances of words.</p>
<p>2) The attention mechanism in traditional research make it difficult to learn important information from multi-graph structures. Besides, pruning dependency tree strategies may introduce new noise to the dependency tree. These dependency trees are automatically extracted by the NLP toolkits. It is difficult to distinguish the noise by directly using dependency trees for modeling. Previous studies [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-15">15</xref>] have consistently required pruning strategies before utilizing dependency information for modeling. While some studies [<xref ref-type="bibr" rid="ref-20">20</xref>] employ self-attention mechanisms to distinguish dependency tree noise, they often focus on specific types of information which makes it challenging to discern noise in various dimensions.</p>
<p>To alleviate the impact of dependency tree noise on RE and effectively leverage textual inner features, we propose <bold>Textual Structure information Graph Convolutional Networks</bold> (<bold>TS-GCN</bold>). The model employs dynamic structure attention to learn the contextual feature weight from multiple types of information, filling the gap left by previous methods that did not simultaneously leverage both sequence information (such as POS type and NER type) and graph information (such as dependency trees and dependency type). We collectively refer to sequence information and graph information as &#x2018;<bold>Textual Structure Information</bold>&#x2019;. In addition, when there is noise in some structural information, the dynamic structural attention mechanism alleviates interference by adjusting the contextual attention weights for different structural information. Specifically, we first utilize the Standard CoreNLP Toolkits (SCT) to extract textual structure information from the input, then build various graphs based on the dependency tree to represent different textual structure information. Next, TS-GCN dynamically calculates the weights between words connected by dependency relation based on multiple graphs of text structure information, and finally utilizes dynamic weights to predict relations between entities. Besides, the TS-GCN dynamic distributes the weights among different graph structures based on information features, a crucial aspect often overlooked in previous studies, especially those employing attention mechanisms [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-20">20</xref>]. Experimental results on four English benchmark datasets&#x2014;TACRED, TACREV, Re-TACRED, and SemEval 2010 Task 8&#x2014;demonstrate the effectiveness of our RE approach using TS-GCN equipped with a dynamic structure attention mechanism. State-of-the-art performance is observed across all datasets.</p>
<p>The contribution of this paper can be summarized as follows:</p>
<p>1) A TS-GCN model based on textual structure information. This model can effectively model both sequential and graphical information within a sentence, realizing the extraction of entity relations.</p>
<p>2) We propose a dynamic structure attention mechanism aimed at mitigating the impact of dependency tree noise on relation extraction. This mechanism independently assigns weights to the feature connections within various text structure graphs. It then dynamically adjusts the contextual attention based on these individual connection weights, thereby mitigating the impact of the noise in structure (such as dependency tree noise, etc.) on relation extraction.</p>
<p>3) A relation modeling method is designed, which is based on multiple sources of structure information. By integrating sequence structure into the graph convolutional network, we create a multi-layered graph structure within the sentence, leading to a significant improvement in model performance.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Early RE methods [<xref ref-type="bibr" rid="ref-30">30</xref>&#x2013;<xref ref-type="bibr" rid="ref-33">33</xref>] typically relied on rule-based techniques or statistical mechanisms. These approaches heavily depended on the high-quality design of manually crafted features, and the effectiveness of the models was significantly influenced by the quality of these handcrafted features.</p>
<p>With the development of deep learning technology, neural network methods [<xref ref-type="bibr" rid="ref-34">34</xref>&#x2013;<xref ref-type="bibr" rid="ref-38">38</xref>] excel in extracting semantic features embedding in text and have found widespread applications in RE tasks. Current RE models can be broadly categorized into two main types: Sequence-based and graph-based.</p>
<p>Sequence-based models [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>], including CNNs, RNNs, and Transformers, employ neural networks to encode contextual information and capture latent features from word sequences. DNN [<xref ref-type="bibr" rid="ref-5">5</xref>] is recognized as one of the pioneering models that first introduced the use of CNNs for relation extraction, employing a convolutional method to acquire sentence features. Att-BLSTM [<xref ref-type="bibr" rid="ref-11">11</xref>] employed Bidirectional Long Short-Term Memory Networks (Bi-LSTM) to extract crucial semantic features from a sentence. It utilized an attention mechanism to capture associations between entities while taking the text context into account. This approach significantly enhanced the performance of relation extraction. SpanBERT [<xref ref-type="bibr" rid="ref-34">34</xref>] was a pre-training method specialized in predicting text spans. It achieved relation extraction by masking contiguous random spans within a given text and subsequently training the model based on representations of these span boundaries. This unique approach equipped SpanBERT with the ability to capture intricate contextual information within the text. Zhou et al. [<xref ref-type="bibr" rid="ref-13">13</xref>] introduced an innovative baseline approach for relation extraction, which integrates an entity representation technique. This technique was designed to effectively tackle the challenges associated with entity representation and ameliorate the influence of noisy or ambiguously defined labels. However, this modeling method faces challenges in effectively leveraging various knowledge sources, particularly the dependency tree and syntactic information.</p>
<p>Graph-based models, different from sequence-based models, leverage graph structure from dependency parsing information to capture long-distance contextual features. Currently, utilizing dependency trees for RE has become a mainstream trend. However, in most studies, dependency trees are automatically generated by toolkits, which may introduce some noise. Therefore, it is crucial to mitigate the impact of noise on RE. C-GCN [<xref ref-type="bibr" rid="ref-15">15</xref>] was the first to apply a graph convolutional network to relation extraction. It enabled effective aggregation of features from dependency structures, and the implementation of a novel path-centric pruning strategy designed to eliminate superfluous dependency information. C-GCN-MG [<xref ref-type="bibr" rid="ref-19">19</xref>] addressed cross-sentence n-ary relation extraction. It utilized a contextualized graph convolutional network spanning multiple dependent sub-graphs, and a method for building graphs around entities based on the dependency tree. A-GCN [<xref ref-type="bibr" rid="ref-20">20</xref>] leveraged dependency-type information and self-attention mechanisms to reduce the reliance on pruning strategies. RE-DMP [<xref ref-type="bibr" rid="ref-31">31</xref>] introduced multiple order dependency connections and types into the pre-training model to obtain an encoder equipped with dependency information. Zhang et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] proposed a dual attention graph convolutional network (DAGCN) with a parallel structure. This network can establish multi-turn interactions between contextual and dependency information, simulating the multi-turn looking-back actions observed in human comprehension. Wu et al. [<xref ref-type="bibr" rid="ref-29">29</xref>] designed an engineering-oriented RE model based on Multilayer Perceptron (MLP) and Graph Neural Networks (GNN). This model replaces the information aggregation process in GCN s with MLP and achieves improved RE performance.</p>
<p>With the recent advancements in large language models (LLMs) in NLP, recent studies have often employed prompt learning or in-context learning (ICL) for RE tasks. However, most studies [<xref ref-type="bibr" rid="ref-39">39</xref>&#x2013;<xref ref-type="bibr" rid="ref-43">43</xref>] indicate that most ICL models perform less effectively in relation extraction tasks, especially when the relation label space is extensive or the input sentence structure is complex, compared to traditional pre-train fine-tuning models. The performance of ICL in RE is influenced by various factors, including computational costs [<xref ref-type="bibr" rid="ref-40">40</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>], prompt templates [<xref ref-type="bibr" rid="ref-42">42</xref>], LLM parameters [<xref ref-type="bibr" rid="ref-39">39</xref>], and constraints on input sequence length. These factors contribute to significant differences in the performance of the models. Yang et al. [<xref ref-type="bibr" rid="ref-43">43</xref>] observed when relation extraction task datasets already comprise rich and well-annotated data, with very few out-of-distribution examples in the test set, pre-train fine-tuned models consistently outperform ICL approaches. Longpre et al. [<xref ref-type="bibr" rid="ref-44">44</xref>] observed that the current upper limit of the capabilities of pre-train fine-tuning models has not been reached.</p>
<p>Although the graph-based studies mentioned above have made significant progress in the field of RE, they still have some shortcomings. On the one hand, some of the models [<xref ref-type="bibr" rid="ref-20">20</xref>,<xref ref-type="bibr" rid="ref-38">38</xref>] solely utilize the graph structure feature from the input for modeling, they fall short in comprehensively leveraging sequence-structure features. On the other hand, to mitigate the impact of the dependence noise on relation extraction, some models [<xref ref-type="bibr" rid="ref-18">18</xref>&#x2013;<xref ref-type="bibr" rid="ref-20">20</xref>] utilize a self-attention mechanism based on word features and dependency types for extracting relations between entities, while other models [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>] incorporate manually designed complex pruning methods to alleviate the impact of dependency tree noise. However, these methods face challenges in handling input information with multiple structural features and a large amount of noise.</p>
<p>Different from the existing RE models, our model has a dynamic structure attention mechanism to capture the important features from diverse structure information, thus alleviating the influence of dependency tree noise on the RE task. Additionally, our model deeply integrates POS types, NER types, dependency trees, and dependency types in RE tasks. In summary, our model is a textual structure model that effectively integrates various types of features and dynamically adjusts attention weights in textual structure information.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Proposed Methodology</title>
<sec id="s3_1">
<label>3.1</label>
<title>Task Definition</title>
<p>A conventional method for relation extraction involves approaching it as a classification task. We propose TS-GCN, which leverages textual structure information to enhance the sparsity features of dependency matrices. This augmentation improves the ability of TS-GCN to distinguish dependency tree noise and enhances the performance of TS-GCN in relation extraction. In this study, we aim to mitigate the impact of dependency tree noise on the RE task. To achieve this, we propose a graph convolutional neural relation extraction model. This model is based on a dynamic structure attention mechanism, which operates within the framework of graph convolutional networks. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> is the overall architecture of TS-GCN.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The overall architecture of our model TS-GCN for RE illustrated with an example input sentence (the two entities &#x201C;I&#x201D; and &#x201C;sound&#x201D; are highlighted in blue and red colors. Green color is coalesced other structure information into our model, example NER label)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47811-fig-1.tif"/>
</fig>
<p>Specifically, given an unstructured input sentence <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x22EF;</mml:mo><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> with n words, and two entity words <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>e</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>e</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> in a sentence. Then, utilize an off-the-shelf toolkit to obtain various textual structure information <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mtext>Z&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> in <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow></mml:math></inline-formula>. The prediction relation formula of <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow></mml:math></inline-formula> between <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mrow><mml:mtext>e</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mrow><mml:mtext>e</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> in each sentence by
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>r</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext>TS</mml:mtext></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mtext>GCN</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>Z&#xA0;</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:mtext>R</mml:mtext></mml:mrow></mml:math></inline-formula> is the relation set, <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:mtext>Z&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> is the textual structure information set, including POS, dependency types and NER labels. <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mrow><mml:mtext>A&#xA0;</mml:mtext></mml:mrow></mml:math></inline-formula> is the input of TS-GCN. The following sections begin by elaborating on the main components of our proposed TS-GCN and conclude by illustrating the process of applying TS-GCN to the classification paradigm for relation extraction.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Textual Structure Information Encoder</title>
<p>To enhance the reliability of dependency information, we combine word embedding, POS, dependency types and NER labels into a dependency matrix. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the data shown can be mined by toolkits from the given input sentence <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow></mml:math></inline-formula> with <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow></mml:math></inline-formula> words (our datasets have given that information). After that we select word information <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:mi mathvariant="normal">&#x03C4;</mml:mi></mml:mrow></mml:math></inline-formula>, POS <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mrow><mml:mtext>p</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mrow><mml:mtext>pos</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x22EF;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:mtext>po</mml:mtext></mml:mrow><mml:msub><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>n</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> represents the corresponding POS features of the words and dependency type matrix <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> where <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is type class if two words <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> have dependency connection and otherwise <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>. Due to POS information is inherently word dependent and exhibits a sequential contextual structure. The POS graph structure is not proficient at capturing these sequential features. In contrast to previous research, we employ Bi-LSTM to acquire contextual POS information, enabling us to learn contextual structure effectively.
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>pos</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>Linear</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>B</mml:mi><mml:mi>i</mml:mi><mml:mi>L</mml:mi><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>M</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>p</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:mtext>pos</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>d</mml:mi></mml:math></inline-formula> is the encoder&#x2019;s hidden dimension. Finally, we utilize the <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mrow><mml:mtext>pos</mml:mtext></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi>x</mml:mi></mml:math></inline-formula> feature sequence to build respective matrices of learning contextual information <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> where <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>pos</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mrow><mml:mtext>pos</mml:mtext></mml:mrow><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The matrices <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> are all n &#x002A; n dimension matrices.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>TS-GCN</title>
<p>TS-GCN employs a novel approach to model word connections, distinct from the classic GCN-based model that assigns weights of either 0 or 1. We propose a dynamic structure attention mechanism to learn the node weights from different textual graphs. It allows the model to attend to diverse information across distinct structures simultaneously. This method can avoid interference from structure noise in the model. The structure attention mechanism can learn the bidirectional weights of dependency paths by considering the differences in text structure information among nodes.</p>
<p>First, we concatenate multiple input matrices <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> into a complete matrix <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>T</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula> with multi-structure information and <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>T</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula> is a n &#x002A; n &#x002A; 3 dimension matrix. Then, leverage the dot product of <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>T</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula> and standard dependency connections matrix <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow></mml:math></inline-formula> to retain node information with dependency connections in the matrix <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>T</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula>.
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>T</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>P</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2295;</mml:mo><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2295;</mml:mo><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>m</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mo>&#x2295;</mml:mo></mml:math></inline-formula> denotes the vector concatenation operation, <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>T</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mrow><mml:mtext>A</mml:mtext></mml:mrow></mml:math></inline-formula> is the standard dependency matrix (if node exists dependency connection is 1, else 0).</p>
<p>The dynamic structure attention dynamically computes attention weights for <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mi>T</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula> based on the combination of structure information. Next, set the weight <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to dynamically filter the feature information in <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>T</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula> and selectively enhance the representative information by
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>t</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mi>t</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>3</mml:mn><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>3</mml:mn><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">&#x2192;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>3</mml:mn><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>h</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a learnable weight 3d &#x002A; 3d dimension matrix. <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> is matrix multiplication. Different from previous methods, our structural attention is directional, i.e., <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>t</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2260;</mml:mo><mml:mi>t</mml:mi><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. This implies that during dynamic structure attention acquisition of contextual semantic information weight, distinct weights are assigned based on the direction of the dependency path by
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msup><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:msup><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>6</mml:mn><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the number of attention head. <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mrow><mml:mover><mml:mi>a</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is the forward dependency path attention weight from i to j. Afterwards, we aggregate forward <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:math></inline-formula> and reverse <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mover><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">&#x21BC;</mml:mo></mml:mrow></mml:mover></mml:math></inline-formula> to obtain the one-head attention matrix <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup></mml:math></inline-formula> and compute the output of each one-head attention by
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mtext>LeakyReL</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:msup><mml:mrow><mml:mover><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">&#x21BC;</mml:mo></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msubsup><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mtext>softmax</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:munder><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mi>a</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Finally, we utilize the head weight <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> to normalize dynamic structure attention by
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>R</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is a head&#x002A;1 dimension matrix. We apply the attention <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to the commonly associated connection between <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>T</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>T</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and obtain the output representation of <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> by
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>T</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mtext>W</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mtext>ReLU</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:mo>(</mml:mo><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x22C5;</mml:mo><mml:mi>T</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>b</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mrow><mml:mtext>W</mml:mtext></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">&#x2192;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mrow><mml:mtext>W</mml:mtext></mml:mrow></mml:math></inline-formula> is a n &#x002A; n dimension paranoid matrix. Compared with traditional GCN, TS-GCN uses dynamic structure attention to dynamic weights to distinguish the importance of different structure content. This method helps the model more fully understand and leverage complex textual structure information. Furthermore, our approach allows for the incorporation of additional features, such as NER labels, etc., into the textual structure information. This approach enhances the extensibility and convenience of our model, enabling the exploration of additional textual structure information.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Relation Extraction with TS-GCN</title>
<p>Before employing TS-GCN for RE, we firstly employ BERT [<xref ref-type="bibr" rid="ref-45">45</xref>] to encode the input <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi>x</mml:mi></mml:math></inline-formula> into hidden embedding, with <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msubsup><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula> representing the hidden embeddings for <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. We next apply our proposed TS-GCN with <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mi>N</mml:mi></mml:math></inline-formula> layers to obtain the corresponding output <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msup><mml:mrow><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> based on the input <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msubsup><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup></mml:math></inline-formula>. Then, we employ the max pooling mechanism to obtain the output hidden embeddings <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for the entity words by
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mtext>MaxPooling</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msubsup><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mi>s</mml:mi><mml:mi>u</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mi>j</mml:mi><mml:mo>}</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Afterward, we utilize matrix multiplication on the concatenated embeddings of the two entities using the trainable matrix <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and apply the ReLU activation function to obtain the output embedding by
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mi>z</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mi>R</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:munder><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mi>z</mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>R</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>R</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mrow><mml:mo>|</mml:mo><mml:mi>R</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:math></inline-formula> is the type of relation. <inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:math></inline-formula> are d dimension paranoid matrixes. <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mrow><mml:mo>&#x2032;</mml:mo></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:math></inline-formula> are paranoid weights.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiments and Analyses</title>
<sec id="s4_1">
<label>4.1</label>
<title>Preliminary</title>
<p><bold>Datasets.</bold> We use four English datasets in the experiments including SemEval 2010 Task 8 (SemEval) [<xref ref-type="bibr" rid="ref-46">46</xref>] and three versions of TACRED: The original TACRED [<xref ref-type="bibr" rid="ref-12">12</xref>], TACREV [<xref ref-type="bibr" rid="ref-47">47</xref>], and Re-TACRED [<xref ref-type="bibr" rid="ref-48">48</xref>]. Due to the presence of approximately 6.62% noisily labeled instances in the TACREV dataset, Alt et al. [<xref ref-type="bibr" rid="ref-47">47</xref>] relabeled it using the TACRED development and test set, and Stoica et al. [<xref ref-type="bibr" rid="ref-48">48</xref>] relabeled the whole dataset by further refining the label definitions on TACRED. For SemEval, we use its official train/test split. We provide the statistics of the datasets in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>The statistics of datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Train</th>
<th>Dev</th>
<th>Test</th>
<th>Class</th>
</tr>
</thead>
<tbody>
<tr>
<td>SemEval</td>
<td>8000</td>
<td>&#x2013;</td>
<td>2717</td>
<td>10</td>
</tr>
<tr>
<td>TACRED</td>
<td>68124</td>
<td>22631</td>
<td>15509</td>
<td>42</td>
</tr>
<tr>
<td>TACREV</td>
<td>68124</td>
<td>22631</td>
<td>15509</td>
<td>42</td>
</tr>
<tr>
<td>Re-TACRED</td>
<td>58465</td>
<td>19584</td>
<td>13418</td>
<td>40</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Results and Discussion</title>
<p><bold>Model configurations.</bold> We follow the study of Soares et al. [<xref ref-type="bibr" rid="ref-49">49</xref>] to insert four special tokens, which are &#x201C;e1&#x201D;, &#x201C;/e1&#x201D;, &#x201C;e2&#x201D;, and &#x201C;/e2&#x201D; into the input sentence to mark the boundary of the two entities. This strategy allows the encoder to distinguish the position of entities during encoding and improves model performance. For the encoder, we utilize the uncased versions of BERT-base and BERT-large [<xref ref-type="bibr" rid="ref-45">45</xref>] from HuggingFace, while following the default settings. Our model is optimized with Adam [<xref ref-type="bibr" rid="ref-50">50</xref>] using the learning rate of 7E-6 on BERT-base and BERT-large, setting four-head dynamic structure attention to obtain important representations. We evaluate all combinations of each model and use the one with the best performance (i.e., F1 scores) on the development set.</p>
<p><bold>Evaluation.</bold> For SemEval, we follow previous studies and use the official evaluation to evaluate it. (The official evaluation script downloaded from <ext-link ext-link-type="uri" xlink:href="https://huggingface.co/datasets/sem_eval_2010_task_8/blob/main/sem_eval_2010_task_8.py">https://huggingface.co/datasets/sem_eval_2010_task_8/blob/main/sem_eval_2010_task_8.py</ext-link>).</p>
<p>For three versions of TACRED, we use the mainstream evaluation formula, <inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:mi>P</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:mi>R</mml:mi></mml:math></inline-formula> and Micro-F1(F1).
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>C</mml:mi></mml:mfrac></mml:mstyle><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mn>100</mml:mn><mml:mrow><mml:mtext>\%&#xA0;</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mi>C</mml:mi></mml:mfrac></mml:mstyle><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>R</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mn>100</mml:mn><mml:mrow><mml:mtext>\%&#xA0;</mml:mtext></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p><disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mi>C</mml:mi></mml:math></inline-formula> is the relation class, <inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the precision score results in class i and <inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mi>R</mml:mi></mml:math></inline-formula> is the recall score results in class i.</p>
<p><bold>Baseline.</bold> We compare TS-GCN on Bert-Large and Bert-Base with the state-of-the-art sentence-level relation extraction model proposed by Tian et al. [<xref ref-type="bibr" rid="ref-20">20</xref>]. They utilized dependency types to acquire attention in dependency nodes, which represent the importance of the node in the information matrix. We follow Tian et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] given the best default settings to train their model on three versions of TACRED, since they only showed the best F1 points on SemEval.</p>
<p>Furthermore, TS-GCN demonstrates performance improvements on the TACRED, TACREV, and RETACRED datasets. We also conducted a comparison with the latest baseline model by Zhou et al. [<xref ref-type="bibr" rid="ref-13">13</xref>], which is based on a transformer architecture.</p>
<p><xref ref-type="table" rid="table-2">Table 2</xref> shows the comparison of our TS-GCN approach to the baseline, which uses dependency-driven relation extraction and other studies. Our approach outperforms the baseline methods on the four datasets. Especially on the TACRED dataset, our approach achieves an F1 score of 87.73% and 88.28%, which is significantly higher than the baseline model of 86.76% and 87.64% by Tian et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] and achieves a new SOTA compared to previous studies such as 72.9% by Zhou et al. [<xref ref-type="bibr" rid="ref-13">13</xref>], 70.8% by Joshi et al. [<xref ref-type="bibr" rid="ref-27">27</xref>], 66.3% by Zhang et al. [<xref ref-type="bibr" rid="ref-15">15</xref>]. This proves our method can bring consistent and considerable performance improvements to all the datasets. Besides, when utilizing BERT-BASE as the encoder, TS-GCN still achieves state-of-the-art (SOTA) performance on four datasets. One the one hand, this indicates that TS-GCN effectively learns the representations of textual structure information in the input text, which reduces the impact of noise in dependency trees on relation extraction. On the other hand, it demonstrates that the performance enhancement in TS-GCN does not result from the encoder replacement.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Result of F1 scores (in %) between previous studies and our best models</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Models</th>
<th>SemEval</th>
<th align="center" colspan="3">TACRED</th>
<th align="center" colspan="3">TACREV</th>
<th align="center" colspan="3">Re-TACRED</th>
</tr>
<tr>
<td></td>
<td>F<sub>1</sub></td>
<td>P</td>
<td>R</td>
<td>F<sub>1</sub></td>
<td>P</td>
<td>R</td>
<td>F<sub>1</sub></td>
<td>P</td>
<td>R</td>
<td>F<sub>1</sub></td>
</tr>
</thead>
<tbody>
<tr>
<td>BERTEM&#x002B;MTB [<xref ref-type="bibr" rid="ref-49">49</xref>]</td>
<td>89.5</td>
<td>71.8</td>
<td>68.4</td>
<td>70.1</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>LST-AGCN [<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td>86.0</td>
<td>69.6</td>
<td>68.0</td>
<td>68.8</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>C-GCN-MG [<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td>85.9</td>
<td>67.1</td>
<td>65.1</td>
<td>66.1</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>PA-LSTM [<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td>&#x2013;</td>
<td>66.6</td>
<td>63.6</td>
<td>65.1</td>
<td>74.5</td>
<td>72.1</td>
<td>73.3<sup>3</sup></td>
<td>82.1</td>
<td>76.8</td>
<td>79.4<sup>2</sup></td>
</tr>
<tr>
<td>C-GCN [<xref ref-type="bibr" rid="ref-15">15</xref>]</td>
<td>84.8</td>
<td>87.2</td>
<td>82.5</td>
<td>84.8</td>
<td>84.9</td>
<td>84.7</td>
<td>84.8<sup>3</sup></td>
<td>86.6</td>
<td>83.1</td>
<td>84.8<sup>2</sup></td>
</tr>
<tr>
<td>SpanBERT [<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>&#x2013;</td>
<td>71.2</td>
<td>70.3</td>
<td>70.8</td>
<td>80.5</td>
<td>75.6</td>
<td>78.0<sup>1</sup></td>
<td>85.8</td>
<td>84.7</td>
<td>85.3<sup>2</sup></td>
</tr>
<tr>
<td>RE-Improved (BERT<sub>LARGE</sub>) [<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>72.9</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>81.3</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>89.7</td>
</tr>
<tr>
<td>DAGCN [<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td></td>
<td>72.4</td>
<td>64.8</td>
<td>68.4</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>Wu et al. [<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td>83.5</td>
<td>71.1</td>
<td>62.8</td>
<td>66.7</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>A-GCN (BERT<sub>BASE</sub>) [<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td>89.16</td>
<td>87.87<sup>4</sup></td>
<td>83.74<sup>4</sup></td>
<td>85.76<sup>4</sup></td>
<td>88.75<sup>4</sup></td>
<td>87.12<sup>4</sup></td>
<td>87.94<sup>4</sup></td>
<td>90.01<sup>4</sup></td>
<td>86.51<sup>4</sup></td>
<td>88.23<sup>4</sup></td>
</tr>
<tr>
<td>A-GCN (BERT<sub>LARGE</sub>) [<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td>89.85</td>
<td>88.73<sup>4</sup></td>
<td>84.63<sup>4</sup></td>
<td>86.64<sup>4</sup></td>
<td>89.40<sup>4</sup></td>
<td>87.06<sup>4</sup></td>
<td>88.22<sup>4</sup></td>
<td>92.31<sup>4</sup></td>
<td>86.71<sup>4</sup></td>
<td>89.43<sup>4</sup></td>
</tr>
<tr>
<td>Our Model</td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td>TS-GCN (BERT<sub>BASE</sub>)</td>
<td>89.86</td>
<td>90.41</td>
<td>85.20</td>
<td>87.73</td>
<td>92.41</td>
<td>89.09</td>
<td>90.72</td>
<td>92.93</td>
<td>87.86</td>
<td>90.33</td>
</tr>
<tr>
<td>TS-GCN (BERT<sub>LARGE</sub>)</td>
<td><bold>91.61</bold></td>
<td>89.78</td>
<td>86.82</td>
<td><bold>88.28</bold></td>
<td>93.37</td>
<td>90.33</td>
<td><bold>91.81</bold></td>
<td>93.76</td>
<td>88.56</td>
<td><bold>91.09</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="table-2fn">
<p>Note: <sup>1</sup> Marks re-implemented results from Alt et al. [<xref ref-type="bibr" rid="ref-47">47</xref>]. <sup>2</sup> Marks re-implemented results from Stoica et al. [<xref ref-type="bibr" rid="ref-48">48</xref>].
<sup>3</sup> Marks re-implemented results from Zhou et al. [<xref ref-type="bibr" rid="ref-13">13</xref>]. <sup>4</sup> Marks our re-implemented results.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>In <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, we present the F1 score progression of TS-GCN with the increasing number of epochs. It shows that TS-GCN reaches convergence faster than the A-GCN baseline model. This is a significant advantage because it means that the model can be trained more efficiently, which saves both time and resources. Additionally, the figure shows that TS-GCN consistently achieves faster convergence results than A-GCN on all four datasets that we tested. This finding confirms that TS-GCN is a robust and effective model that can be applied to a wide range of NLP tasks with high accuracy.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The contrasted F1 scores for four datasets were obtained during training using the BERT-large encoder</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_47811-fig-2.tif"/>
</fig>
<p>Overall, all evaluation demonstrates that TS-GCN is a powerful and efficient model for RE. Its ability to reach convergence quickly and achieve a higher F1 score than the A-GCN baseline model makes it an excellent choice for RE.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Ablation Study</title>
<p>To further analyze TS-GCN, we conduct an ablation study on best model to study the effectiveness of each component on four datasets. Compared to the previous RE model that applies GCN, TS-GCN enhances the semantic exploration ability in two aspects: 1) using a Bi-directional long short-term memory (Bi-LSTM) to enrich the representation of POS representations and enhance the sensitivity to context, 2) introducing multi-head dynamic structure attention to weight different textual structure information for reducing the impact of dependency tree noise interference, To investigate the independent enhancement effects of each modules, we conduct an ablation study on our best model. The best model includes two layers of TS-GCN, 4 heads of dynamic structure attention, and utilizes dependency type and POS information.</p>
<p><xref ref-type="table" rid="table-3">Table 3</xref> shows the experimental results of different modules, including the performance of the GCN baseline and the BERT-only baseline for reference. The results indicate the ablation of modules could result in worse results. Especially, the ablation of the multi-head dynamic structure attention module significantly impairs TS-GCN. This indicates that the guided learning of information is abandoned, making TS-GCN susceptible to dependency tree noise, making it difficult to learn correct features.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>The ablation study results (F1) of TS-GCN on whether use dynamic structure attention mechanism and POS Bi-LSTM. &#x2018;&#x2713;&#x2019; and &#x2018;&#x2715;&#x2019; stand for that whether a module is used</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th></th>
<th>Bi-LSTM</th>
<th>G-ATT</th>
<th>SemEval</th>
<th>TACRED</th>
<th>TACREV</th>
<th>Re-TACRED</th>
</tr>
</thead>
<tbody>
<tr>
<td>BERT<sub>BASE</sub></td>
<td align="center" colspan="2">GCN</td>
<td>88.62</td>
<td>83.35</td>
<td>86.21</td>
<td>86.75</td>
</tr>
<tr>
<td rowspan="3">TS-GCN</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>89.86</td>
<td>87.73</td>
<td>90.10</td>
<td>90.33</td>
</tr>
<tr>
<td>&#x2715;</td>
<td>&#x2713;</td>
<td>89.21</td>
<td>86.51</td>
<td>89.55</td>
<td>89.87</td>
</tr>
<tr>
<td>&#x2713;</td>
<td>&#x2715;</td>
<td>88.03</td>
<td>85.79</td>
<td>88.13</td>
<td>88.36</td>
</tr>
<tr>
<td></td>
<td align="center" colspan="2">Only Bert</td>
<td>87.87</td>
<td>71.56</td>
<td>79.33</td>
<td>85.91</td>
</tr>
<tr>
<td>BERT<sub>LARGE</sub></td>
<td align="center" colspan="2">GCN</td>
<td>89.13</td>
<td>84.95</td>
<td>86.68</td>
<td>87.02</td>
</tr>
<tr>
<td rowspan="3">TS-GCN</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>91.61</td>
<td>88.28</td>
<td>91.81</td>
<td>91.09</td>
</tr>
<tr>
<td>&#x2715;</td>
<td>&#x2713;</td>
<td>89.56</td>
<td>87.22</td>
<td>90.22</td>
<td>90.26</td>
</tr>
<tr>
<td>&#x2713;</td>
<td>&#x2715;</td>
<td>88.39</td>
<td>86.98</td>
<td>89.88</td>
<td>89.18</td>
</tr>
<tr>
<td></td>
<td align="center" colspan="2">Only Bert</td>
<td>89.02</td>
<td>72.95</td>
<td>81.31</td>
<td>86.72</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="table" rid="table-4">Table 4</xref> shows the experimental results of textual structure information with different feature combinations, which include dependency types (Dep), POS labels (POS), and NER labels (NER). The results indicate that an increase in the types of textual structure information leads to improved performances. Multiple types of textual structure data are important to TS-GCN, especially with some noise in input information. Without the introduction of POS and NER features, the F1 performance of TS-GCN using BERT-Base decreases by 0.88%, 0.93%, and 1.12%, while using BERT-Large, it decreases by 0.89%, 1.53%, and 0.55%. This illustrates that our method effectively mitigates the impact of noise in the dependency tree on context learning, leading to improved results in relation extraction.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>The ablation study results (F1) of TS-GCN on textual structure information module</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th></th>
<th>Dep</th>
<th>POS</th>
<th>NER</th>
<th>TACRED</th>
<th>TACREV</th>
<th>Re-TACRED</th>
</tr>
</thead>
<tbody>
<tr>
<td>BERT<sub>BASE</sub></td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>88.24</td>
<td>91.71</td>
<td>91.04</td>
</tr>
<tr>
<td rowspan="2">TS-GCN</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2715;</td>
<td>87.73</td>
<td>90.72</td>
<td>90.33</td>
</tr>
<tr>
<td>&#x2713;</td>
<td>&#x2715;</td>
<td>&#x2715;</td>
<td>87.36</td>
<td>89.75</td>
<td>89.92</td>
</tr>
<tr>
<td>BERT<sub>LARGE</sub></td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>88.72</td>
<td>92.26</td>
<td>91.38</td>
</tr>
<tr>
<td rowspan="2">TS-GCN</td>
<td>&#x2713;</td>
<td>&#x2713;</td>
<td>&#x2715;</td>
<td>88.28</td>
<td>91.81</td>
<td>91.09</td>
</tr>
<tr>
<td>&#x2713;</td>
<td>&#x2715;</td>
<td>&#x2715;</td>
<td>87.83</td>
<td>90.73</td>
<td>90.83</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Case Study</title>
<p>To investigate the effect of the number of heads in dynamic structure attention on TS-GCN, we conducted a case study using our TS-GCN models with different numbers of dynamic structure attention heads.</p>
<p><xref ref-type="table" rid="table-5">Table 5</xref> shows the experimental results with different numbers of dynamic structure attention heads, including 1, 2, 4, 8, and 12. In this table, we observe that the 4-head dynamic structure attention obtains better performance compared to the 1-head and 2-head configurations. Furthermore, compared to the 8- head and 12-head configurations, the 4-head configuration requires fewer computing resources and achieves similar optimal performance. Therefore, using 4 heads can enhance the training efficiency of our model. Overall, we conclude that the optimal configuration for multi&#x2013;head dynamic structure attention is 4.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>The case study results (F1) for multi-head dynamic structure attention (BERT-Large), S is train computing resources consumed by TS-GCN</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Head Num</th>
<th align="center" colspan="2">SemEval</th>
<th align="center" colspan="2">TACRED</th>
<th align="center" colspan="2">TACREV</th>
<th align="center" colspan="2">Re-TACRED</th>
</tr>
<tr>
<td></td>
<td>F<sub>1</sub></td>
<td>S (GB)</td>
<td>F<sub>1</sub></td>
<td>S (GB)</td>
<td>F<sub>1</sub></td>
<td>S (GB)</td>
<td>F<sub>1</sub></td>
<td>S (GB)</td>
</tr>
</thead>
<tbody>
<tr>
<td>1 head</td>
<td>89.37</td>
<td>2.4</td>
<td>86.10</td>
<td>3.6</td>
<td>89.05</td>
<td>2.9</td>
<td>89.08</td>
<td>3.2</td>
</tr>
<tr>
<td>2 head</td>
<td>90.16</td>
<td>5.1</td>
<td>87.09</td>
<td>7.7</td>
<td>90.33</td>
<td>6.6</td>
<td>90.09</td>
<td>6.9</td>
</tr>
<tr>
<td>4 head</td>
<td>91.61</td>
<td>9.1</td>
<td>88.28</td>
<td>15.9</td>
<td>91.81</td>
<td>14.5</td>
<td>91.09</td>
<td>15.3</td>
</tr>
<tr>
<td>8 head</td>
<td>91.36</td>
<td>14.0</td>
<td>88.36</td>
<td>21.4</td>
<td>91.73</td>
<td>19.8</td>
<td>90.40</td>
<td>20.0</td>
</tr>
<tr>
<td>12 head</td>
<td>91.46</td>
<td>20.6</td>
<td>87.94</td>
<td>30.0</td>
<td>90.82</td>
<td>28.3</td>
<td>91.11</td>
<td>28.8</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To investigate the effect of the number of layers in TS-GCN on RE, we conducted a case study by training our model with different numbers of layers.</p>
<p><xref ref-type="table" rid="table-6">Table 6</xref> shows the experimental results on the test datasets for TS-GCN with varying numbers of layers, 1, 2, 3, and 4. In this table, we can observe that the TS-GCN with 2 layers precedes other configurations. We consider this result to be due to the ease with which the weights of the multi-head dynamic structure attention can be influenced by the number of convolutional layers. When the number of layers is set to 1, it is difficult for TS-GCN to learn deep contextual features. On the other hand, when the number of layers exceeds 2, the multi-head dynamic structure attention weight of TS-GCN becomes averaged, which makes it less sensitive to noise in the input text. Overall, we conclude that the optimal configuration for TS-GCN is with 2 layers.</p>
<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>The case study results (F1) on varying the number of layers in TS-GCN (BERT-Large), S is train computing resources consumed by TS-GCN</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Layers Num</th>
<th align="center" colspan="2">SemEval</th>
<th align="center" colspan="2">TACRED</th>
<th align="center" colspan="2">TACREV</th>
<th align="center" colspan="2">Re-TACRED</th>
</tr>
<tr>
<td></td>
<td>F<sub>1</sub></td>
<td>Size (GB)</td>
<td>F<sub>1</sub></td>
<td>Size (GB)</td>
<td>F<sub>1</sub></td>
<td>Size (GB)</td>
<td>F<sub>1</sub></td>
<td>Size (GB)</td>
</tr>
</thead>
<tbody>
<tr>
<td>1 Layers</td>
<td>88.67</td>
<td>4.7</td>
<td>87.16</td>
<td>7.5</td>
<td>90.00</td>
<td>6.1</td>
<td>88.76</td>
<td>7.4</td>
</tr>
<tr>
<td>2 Layers</td>
<td>91.61</td>
<td>9.1</td>
<td>88.28</td>
<td>15.9</td>
<td>91.81</td>
<td>14.5</td>
<td>91.09</td>
<td>15.3</td>
</tr>
<tr>
<td>3 Layers</td>
<td>90.87</td>
<td>14.5</td>
<td>88.03</td>
<td>23.6</td>
<td>91.22</td>
<td>21.8</td>
<td>90.71</td>
<td>22.9</td>
</tr>
<tr>
<td>4 Layers</td>
<td>89.60</td>
<td>18.8</td>
<td>87.77</td>
<td>31.1</td>
<td>90.96</td>
<td>30.1</td>
<td>89.85</td>
<td>30.7</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To investigate the resistance to dependency tree noise interference of TS-GCN, we conducted a case study by randomly masking some of the dependency nodes in the test sets.</p>
<p><xref ref-type="table" rid="table-7">Table 7</xref> shows the test results of experiments conducted on the best-trained model having noisy testsets. These test sets had 5%, 10%, and 20% of their dependent connections randomly removed, respectively. The results indicate that when the noise proportion is less than or equal to 10%, there is no significant decrease in our model performance. We believe that textual structure information can enhance the capacity of our model for self-correction. Furthermore, the dynamic structure attention mechanism adapts the contextual attention weights based on distinct information characteristics, thereby mitigating the interference of dependency tree noise in the context of RE. Overall, we conclude that TS-GCN with strong resilience to noise features.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>The case study results (F1) on the noise resistance of TS-GCN (BERT-Large)</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Mask Pct</th>
<th>SemEval</th>
<th>TACRED</th>
<th>TACREV</th>
<th>Re-TACRED</th>
</tr>
<tr>
<td></td>
<td>F<sub>1</sub></td>
<td>F<sub>1</sub></td>
<td>F<sub>1</sub></td>
<td>F<sub>1</sub></td>
</tr>
</thead>
<tbody>
<tr>
<td>0%</td>
<td>91.61</td>
<td>88.28</td>
<td>91.81</td>
<td>91.09</td>
</tr>
<tr>
<td>5%</td>
<td>90.49</td>
<td>88.04</td>
<td>90.97</td>
<td>89.24</td>
</tr>
<tr>
<td>10%</td>
<td>89.87</td>
<td>87.84</td>
<td>89.58</td>
<td>88.21</td>
</tr>
<tr>
<td>20%</td>
<td>88.91</td>
<td>85.47</td>
<td>84.52</td>
<td>86.76</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>In this paper, we propose a graph convolutional network embedding textual structure information for relation extraction. We transform the task into a multi-information graph structure problem by incorporating different sequence information into graph nodes and propose a TS-GCN model that utilizes a dynamic structure attention mechanism to learn the importance of contextual information on dependency tree paths. This attention-learning process is dynamic and will selectively highlight and express important path information according to the composition of structural information features. Furthermore, we assign different learning weights to all information graph structures to reduce the impact of noise generated during the generation of information graphs on relation extraction. Experiments are conducted on the popular TACRED dataset, TCREV dataset, Re-TACRED dataset and SemEval 2010 Task 8 dataset. The results demonstrate that TS-GCN surpasses the best existing GCN-based models on the four datasets. We demonstrate that TS-GCN is a multiple-structure attention method, which emphasizes the importance of textual structure information in concerning extraction. To validate our approach, we conduct ablation experiments on the proposed dynamic structure attention mechanism and additional textual structure information. The experimental results show that increasing the types of information can mitigate the impact of dependency noise on relation extraction. Dynamic structure attention can improve the ability of the model to effectively learn multiple structure information. However, the size of the TS-GCN model will increase significantly as the number of attention heads and the number of graph convolution layers increases, but the model performance gradually levels off. Although our model achieves satisfactory results in representing relation extraction with graph neural networks, there is still significant study room for future work. Specifically, we plan to propose a more generalizable model template that minimizes the training cost of the model when introducing new textual structure information. Additionally, a meaningful direction is to compress the existing TS-GCN model to reduce computational costs. The dependency tree matrix is often a sparse matrix with huge computational costs, which presents a challenging yet important problem. We also would like to explore LLM and combine textual structure information to learn contextual features and enhance the performance of relation extraction. In addition, relation extraction can be combined with technologies such as knowledge graphs to provide technical support for practical problems in many industrial fields. For example, it helps to construct an intelligent knowledge graph belonging to industrial parts or manufacturing processes and infers whether the part is a qualified part through external information such as the size of the shape. Such research can provide application directions for GCN-based relation extraction methods and promote the further development of relation extraction technology.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec><title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec><title>Author Contributions</title>
<p>Study conception and design: Chuyuan Wei, Jinzhe Li; data collection: Zhiyuan Wang; analysis and interpretation of results: Jinzhe Li, Shanshan Wan; draft manuscript preparation: Chuyuan Wei, Maozu Guo. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The TACRED data used in the study are publicly available for purchase at <ext-link ext-link-type="uri" xlink:href="https://catalog.ldc.upenn.edu/LDC2018T24">https://catalog.ldc.upenn.edu/LDC2018T24</ext-link>. The SemEval data used in the study is publicly available in <ext-link ext-link-type="uri" xlink:href="https://huggingface.co/datasets/sem_eval_2010_task_8">https://huggingface.co/datasets/sem_eval_2010_task_8</ext-link>.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Mensah</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Mao</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Aspect-level sentiment analysis via convolution over dependency tree</article-title>,&#x201D; in <conf-name>Proc. EMNLP-IJCNLP</conf-name>, <publisher-loc>Hong Kong, China</publisher-loc>, <year>2019</year>, pp. <fpage>5679</fpage>&#x2013;<lpage>5688</lpage>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Boudjellal</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ahmad</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Khan</surname></string-name></person-group>, &#x201C;<article-title>Improving sentiment analysis in election-based conversations on twitter with elecbert language model</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>76</volume>, no. <issue>3</issue>, pp. <fpage>3345</fpage>&#x2013;<lpage>3361</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2023.041520</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Reddy</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Huang</surname></string-name>, and <string-name><given-names>D.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>Question answering on Freebase via relation extraction and textual evidence</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <publisher-loc>Berlin, Germany</publisher-loc>, <year>2016</year>, pp. <fpage>2326</fpage>&#x2013;<lpage>2336</lpage>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Cardie</surname></string-name></person-group>, &#x201C;<article-title>Focused meeting summarization via unsupervised relation extraction</article-title>,&#x201D; in <conf-name>Proc. SIGDIAL</conf-name>, <publisher-loc>Seoul, South Korea</publisher-loc>, <year>2012</year>, pp. <fpage>304</fpage>&#x2013;<lpage>313</lpage>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zeng</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Lai</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Zhou</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>Relation classification via convolutional deep neural network</article-title>,&#x201D; in <conf-name>Proc. COLING</conf-name>, <publisher-loc>Dublin, Ireland</publisher-loc>, <year>2014</year>, pp. <fpage>2335</fpage>&#x2013;<lpage>2344</lpage>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Relation classification via recurrent neural network</article-title>,&#x201D; <comment>arXiv preprint arXiv: 1508.01006</comment>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Mou</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Peng</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Jin</surname></string-name></person-group>, &#x201C;<article-title>Classifying relations via long short term memory networks along shortest dependency paths</article-title>,&#x201D; in <conf-name>Proc. EMNLP</conf-name>, <publisher-loc>Lisbon, Portugal</publisher-loc>, <year>2015</year>, pp. <fpage>1785</fpage>&#x2013;<lpage>1794</lpage>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>dos Santos</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Xiang</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Zhou</surname></string-name></person-group>, &#x201C;<article-title>Classifying relations by ranking with convolutional neural networks</article-title>,&#x201D; in <conf-name>Proc. ACL-IJCNLP</conf-name>, <publisher-loc>Beijing, China</publisher-loc>, <year>2015</year>, pp. <fpage>626</fpage>&#x2013;<lpage>634</lpage>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Hu</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Yang</surname></string-name></person-group>, &#x201C;<article-title>Bidirectional long short-term memory networks for relation classification</article-title>,&#x201D; in <conf-name>Proc. PACLIC</conf-name>, <publisher-loc>Shanghai, China</publisher-loc>, <year>2015</year>, pp. <fpage>73</fpage>&#x2013;<lpage>78</lpage>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Cao</surname></string-name>, <string-name><given-names>G.</given-names> <surname>de Melo</surname></string-name>, and <string-name><given-names>Z.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Relation classification via multi-level attention CNNs</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <publisher-loc>Berlin, Germany</publisher-loc>, <year>2016</year>, pp. <fpage>1298</fpage>&#x2013;<lpage>1307</lpage>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Zhou</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Attention-based bidirectional long short-term memory networks for relation classification</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <publisher-loc>Berlin, Germany</publisher-loc>, <year>2016</year>, pp. <fpage>207</fpage>&#x2013;<lpage>212</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Zhong</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Angeli</surname></string-name>, and <string-name><given-names>C. D.</given-names> <surname>Manning</surname></string-name></person-group>, &#x201C;<article-title>Position-aware attention and supervised data improve slot filling</article-title>,&#x201D; in <conf-name>Proc. EMNLP</conf-name>, <publisher-loc>Copenhagen, Denmark</publisher-loc>, <year>2017</year>, pp. <fpage>35</fpage>&#x2013;<lpage>45</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Zhou</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>An improved baseline for sentence-level relation extraction</article-title>,&#x201D; in <conf-name>Proc. AACL-IJCNLP</conf-name>, <year>2022</year>, pp. <fpage>161</fpage>&#x2013;<lpage>168</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Miwa</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Bansal</surname></string-name></person-group>, &#x201C;<article-title>End-to-end relation extraction using LSTMs on sequences and tree structures</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <publisher-loc>Berlin, Germany</publisher-loc>, <year>2016</year>, pp. <fpage>1105</fpage>&#x2013;<lpage>1116</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Qi</surname></string-name>, and <string-name><given-names>C. D.</given-names> <surname>Manning</surname></string-name></person-group>, &#x201C;<article-title>Graph convolution over pruned dependency trees improves relation extraction</article-title>,&#x201D; in <conf-name>Proc. EMNLP</conf-name>, <publisher-loc>Brussels, Belgium</publisher-loc>, <year>2018</year>, pp. <fpage>2205</fpage>&#x2013;<lpage>2215</lpage>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Mao</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Mensah</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Relation extraction with convolutional network over learnable syntax-transport graph</article-title>,&#x201D; in <conf-name>Proc. AAAI</conf-name>, <publisher-loc>New York, USA</publisher-loc>, <year>2020</year>, vol. <volume>34</volume>, pp. <fpage>8928</fpage>&#x2013;<lpage>8935</lpage>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Song</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Wan</surname></string-name></person-group>, &#x201C;<article-title>Relation extraction with type-aware map memories of word dependencies</article-title>,&#x201D; in <conf-name>Proc. ACL-IJCNLP</conf-name>, <year>2021</year>, pp. <fpage>2501</fpage>&#x2013;<lpage>2512</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>W.</given-names> <surname>Lu</surname></string-name></person-group>, &#x201C;<article-title>Attention guided graph convolutional networks for relation extraction</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <publisher-loc>Florence, Italy</publisher-loc>, <year>2019</year>, pp. <fpage>241</fpage>&#x2013;<lpage>251</lpage>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Mandya</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Bollegala</surname></string-name>, and <string-name><given-names>F.</given-names> <surname>Coenen</surname></string-name></person-group>, &#x201C;<article-title>Graph convolution over multiple dependency sub-graphs for relation extraction</article-title>,&#x201D; in <conf-name>Proc. COLING</conf-name>, <publisher-loc>Barcelona, Spain</publisher-loc>, <year>2020</year>, pp. <fpage>6424</fpage>&#x2013;<lpage>6435</lpage>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Song</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Wan</surname></string-name></person-group>, &#x201C;<article-title>Dependency-driven relation extraction with attentive graph convolutional networks</article-title>,&#x201D; in <conf-name>Proc. ACL-IJCNLP </conf-name>, <year>2021</year>, pp. <fpage>4458</fpage>&#x2013;<lpage>4471</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Xue</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Learning to prune dependency trees with rethinking for neural relation extraction</article-title>,&#x201D; in <conf-name>Proc. COLING</conf-name>, <publisher-loc>Barcelona, Spain</publisher-loc>, <year>2020</year>, pp. <fpage>3842</fpage>&#x2013;<lpage>3852</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Hao</surname></string-name>, <string-name><given-names>X. S.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Cai</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xiao</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>A multi-feature fusion model for Chinese relation extraction with entity sense</article-title>,&#x201D; <source>Knowl.-Based Syst.</source>, vol. <volume>206</volume>, no. <issue>7</issue>, pp. <fpage>0950</fpage>&#x2013;<lpage>7051</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1016/j.knosys.2020.106348</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Javeed</surname></string-name></person-group>, &#x201C;<article-title>A hybrid attention mechanism for multi-target entity relation extraction using graph neural networks</article-title>,&#x201D; <source>Mach. Learn. Appl.</source>, vol. <volume>11</volume>, pp. <fpage>2666</fpage>&#x2013;<lpage>8270</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.mlwa.2022.100444</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Wan</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Du</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Wei</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>A hybrid attention mechanism for multi-target entity relation extraction using graph neural networks</article-title>,&#x201D; <source>Mach. Learn. Appl.</source>, vol. <volume>11</volume>, pp. <fpage>2666</fpage>&#x2013;<lpage>8270</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Liao</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Du</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>A contextual dependency-aware graph convolutional network for extracting entity relations</article-title>,&#x201D; <source>Expert Syst. Appl.</source>, vol. <volume>239</volume>, no. <issue>10</issue>, pp. <fpage>0957</fpage>&#x2013;<lpage>4174</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1016/j.eswa.2023.122366</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Du</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Yi</surname></string-name></person-group>, &#x201C;<article-title>Relation extraction for manufacturing knowledge graphs based on feature fusion of attention mechanism and graph convolution network</article-title>,&#x201D; <source>Knowl.-Based Syst.</source>, vol. <volume>255</volume>, no. <issue>2</issue>, pp. <fpage>0950</fpage>&#x2013;<lpage>7051</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.knosys.2022.109703</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ren</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Document-level relation extraction with multi-layer heterogeneous graph attention network</article-title>,&#x201D; <source>Eng. Appl. Artif. Intell.</source>, vol. <volume>123</volume>, no. <issue>4</issue>, pp. <fpage>106212</fpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.engappai.2023.106212</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Jia</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Tan</surname></string-name></person-group>, &#x201C;<article-title>Dual attention graph convolutional network for relation extraction</article-title>,&#x201D; <source>IEEE Trans. Knowl. Data Eng.</source>, vol. <volume>36</volume>, no. <issue>2</issue>, pp. <fpage>1588</fpage>&#x2013;<lpage>2191</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/TKDE.2023.3289879</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>You</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xian</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Pu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Qiao</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Towards deep understanding of graph convolutional networks for relation extraction</article-title>,&#x201D; <source>Data Knowl. Eng.</source>, vol. <volume>149</volume>, pp. <fpage>102265</fpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Kambhatla</surname></string-name></person-group>, &#x201C;<article-title>Combining lexical, syntactic, and semantic features with maximum entropy models for information extraction</article-title>,&#x201D; in <conf-name>Proc. ACL Interact. Poster Demon. Sessions, Assoc. Comput. Linguist.</conf-name>, <publisher-loc>Barcelona, Spain</publisher-loc>, <year>2004</year>, pp. <fpage>178</fpage>&#x2013;<lpage>181</lpage>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Su</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Exploring various knowledge in relation extraction</article-title>,&#x201D; in <conf-name>Proc. ACL, Ann Arbor</conf-name>, <publisher-loc>Michigan, USA</publisher-loc>, <year>2005</year>, pp. <fpage>427</fpage>&#x2013;<lpage>434</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zelenko</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Aone</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Richardella</surname></string-name></person-group>, &#x201C;<article-title>Kernel methods for relation extraction</article-title>,&#x201D; <source>J. Mach. Learn. Res.</source>, vol. <volume>3</volume>, pp. <fpage>1083</fpage>&#x2013;<lpage>1106</lpage>, <year>2003</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Aone</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Halverson</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Hampton</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Ramos-Santacruz</surname></string-name></person-group>, &#x201C;<article-title>SRA: Description of the IE<sub>2</sub> system used for MUC-7</article-title>,&#x201D; in <conf-name>Proc. MUC-7</conf-name>, <publisher-loc>Virginia, USA</publisher-loc>, <year>Apr. 29&#x2013;May 1, 1998</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Joshi</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>D. S.</given-names> <surname>Weld</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zettlemoyer</surname></string-name> and <string-name><given-names>O.</given-names> <surname>Levy</surname></string-name></person-group>, &#x201C;<article-title>SpanBERT: Improving pre-training by representing and predicting spans</article-title>,&#x201D; <source>Trans. Assoc. Comput. Linguist.</source>, vol. <volume>8</volume>, pp. <fpage>64</fpage>&#x2013;<lpage>77</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1162/tacl_a_00300</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Ding</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>A knowledge-enriched and span-based network for joint entity and relation extraction</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>68</volume>, no. <issue>1</issue>, pp. <fpage>377</fpage>&#x2013;<lpage>389</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2021.016301</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zeng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xiao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Dai</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Kumar Sangaiah</surname></string-name></person-group>, &#x201C;<article-title>Distant supervised relation extraction with cost-sensitive loss</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>60</volume>, no. <issue>3</issue>, pp. <fpage>1251</fpage>&#x2013;<lpage>1261</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2019.06100</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Yin</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Meng</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Relation extraction for massive news texts</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>60</volume>, no. <issue>1</issue>, pp. <fpage>275</fpage>&#x2013;<lpage>285</lpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2019.05556</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Song</surname></string-name>, and <string-name><given-names>F.</given-names> <surname>Xia</surname></string-name></person-group>, &#x201C;<article-title>Improving relation extraction through syntax-induced pre-training with dependency masking</article-title>,&#x201D; in <conf-name>Find. Assoc. Comput. Linguist.: ACL 2022, Assoc. Comput. Linguist.</conf-name>, <publisher-loc>Dublin, Ireland</publisher-loc>, <year>2022</year>, pp. <fpage>1875</fpage>&#x2013;<lpage>1886</lpage>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Wan</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>GPT-RE: In-context learning for relation extraction using large language models</article-title>,&#x201D; <comment>arXiv preprint arXiv:2305.02105</comment>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>N.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>How to unleash the power of large language models for few-shot relation extraction?</article-title>&#x201D; in <conf-name>Proc. SustaiNLP</conf-name>, <publisher-loc>Toronto, Canada</publisher-loc>, <year>2023</year>, pp. <fpage>190</fpage>&#x2013;<lpage>200</lpage>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Ozyurt</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Feuerriegel</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>In-context few-shot relation extraction via pre-trained language models</article-title>,&#x201D; <comment>arXiv preprint arXiv:2310.11085</comment>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Peng</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Model tuning or prompt tuning? A study of large language models for clinical concept and relation extraction</article-title>,&#x201D; <comment>arXiv preprint arXiv:2310.06239</comment>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yang</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Harnessing the power of LLMs in practice: A survey on ChatGPT and beyond</article-title>,&#x201D; <comment>arXiv preprint arXiv:2304.13712</comment>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Longpre</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>The flan collection: Designing data and methods for effective instruction tuning</article-title>,&#x201D; <comment>arXiv preprint arXiv:2301.13688</comment>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Devlin</surname></string-name>, <string-name><given-names>M. W.</given-names> <surname>Chang</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Lee</surname></string-name>, and <string-name><given-names>K.</given-names> <surname>Toutanova</surname></string-name></person-group>, &#x201C;<article-title>BERT: Pre-training of deep bidirectional transformers for language understanding</article-title>,&#x201D; in <conf-name>Proc. NAACL</conf-name>, <publisher-loc>Minneapolis, Minnesota, USA</publisher-loc>, <year>2019</year>, pp. <fpage>4171</fpage>&#x2013;<lpage>4186</lpage>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Hendrickx</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>SemEval-2010 Task 8: Multi-way classification of semantic relations between pairs of nominals</article-title>,&#x201D; in <conf-name>Proc. SemEval</conf-name>, <publisher-loc>Uppsala, Sweden</publisher-loc>, <year>2010</year>, pp. <fpage>33</fpage>&#x2013;<lpage>38</lpage>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Alt</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Gabryszak</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>Hennig</surname></string-name></person-group>, &#x201C;<article-title>TACRED revisited: A thorough evaluation of the TACRED relation extraction task</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <year>2020</year>, pp. <fpage>1558</fpage>&#x2013;<lpage>1569</lpage>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Stoica</surname></string-name>, <string-name><given-names>E. A.</given-names> <surname>Platanios</surname></string-name>, and <string-name><given-names>B.</given-names> <surname>Poczos</surname></string-name></person-group>, &#x201C;<article-title>Re-TACRED: Addressing shortcomings of the TACRED dataset</article-title>,&#x201D; in <source>Proc. AAAI</source>, vol. <volume>35</volume>, no. <issue>15</issue>, pp. <fpage>13843</fpage>&#x2013;<lpage>13850</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1609/aaai.v35i15.17631</pub-id>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L. B.</given-names> <surname>Soares</surname></string-name>, <string-name><given-names>N.</given-names> <surname>FitzGerald</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ling</surname></string-name>, and <string-name><given-names>T.</given-names> <surname>Kwiatkowski</surname></string-name></person-group>, &#x201C;<article-title>Matching the blanks: Distributional similarity for relation learning</article-title>,&#x201D; in <conf-name>Proc. ACL</conf-name>, <publisher-loc>Florence, Italy</publisher-loc>, <year>2019</year>, pp. <fpage>2895</fpage>&#x2013;<lpage>2905</lpage>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>D. P.</given-names> <surname>Kingma</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Ba</surname></string-name></person-group>, &#x201C;<article-title>Adam: A method for stochastic optimization</article-title>,&#x201D; <comment>arXiv preprint arXiv:1412.6980</comment>, <year>2014</year>.</mixed-citation></ref>
</ref-list>
</back></article>