A Two-Phase Paradigm for Joint Entity-Relation Extraction

An exhaustive study has been conducted to investigate span-based models for the joint entity and relation extraction task. However, these models sample a large number of negative entities and negative relations during the model training, which are essential but result in grossly imbalanced data distributions and in turn cause suboptimal model performance. In order to address the above issues, we propose a two-phase paradigm for the span-based joint entity and relation extraction, which involves classifying the entities and relations in the first phase, and predicting the types of these entities and relations in the second phase. The two-phase paradigm enables our model to significantly reduce the data distribution gap, including the gap between negative entities and other entities, as well as the gap between negative relations and other relations. In addition, we make the first attempt at combining entity type and entity distance as global features, which has proven effective, especially for the relation extraction. Experimental results on several datasets demonstrate that the spanbased joint extraction model augmented with the two-phase paradigm and the global features consistently outperforms previous state-of-the-art span-based models for the joint extraction task, establishing a new standard benchmark. Qualitative and quantitative analyses further validate the effectiveness the proposed paradigm and the global features.


Introduction
Span-based joint entity and relation extraction models simultaneously conduct NER (Named Entity Recognition) and RE (Relation Extraction) in text span forms.Typically, these models are constructed as follows: given an unstructured text, the model divides it into text spans; it then constructs ordered span pairs (a.k.a.relation tuples); and finally, it obtains entities and relations by performing classifications on the semantic representations of spans and relation tuples, respectively.We present a typical case study in Fig. 1: the "In", "In 1831", and "James Garfield" are three span examples; the <"James Garfield", "U.S."> and <"James Garfield", "Ohio"> are two relation tuple examples; a span-based model predicts the types of spans and relation tuples by performing classifications on related semantic representations.For instance, the "In" is classed as the Not-Entity type, and the <"James Garfield", "Ohio"> is classified as the Live type.Span-based joint extraction models [2][3][4][5][6][7] sample numerous negative entities and relations (i.e., spans of the Not-Entity type and relation tuples of the Not-Relation type) during the model training.These negative examples actually lead to grossly imbalanced data distributions, which is one of the primary reasons for the suboptimal model performance.As shown in Tab. 1, the entity distribution between Other and Not-Entity is 592: 101555 (approximate to 1: 172), the relation distribution between Kill and Not-Relation is 229: 12915 (approximate to 1: 56).Paradoxically, previous work [1] demonstrates that an adequate number of negative examples are required to ensure that the model performs well.Thus, resolving the issue of grossly imbalanced data distributions while maintaining an adequate number of negative examples is a feasible way to improve the model performance.Global features, such as those derived from entity information, can be critical in the joint extraction task.As illustrated in Fig. 1, if SpERT is aware that the "James Garfield" is a person (Per) entity and the "U.S." is a location (Loc) entity beforehand, it may easily classify the <"James Garfield", "U.S."> into the Live type.Moreover, entity distance, which tracks the word counts between two entities, can reflect the entities' correlation.For example, in the CoNLL04 dataset, relations with an entity distance of less than 6 account for 64.5%, and the smaller the distance, the more likely the two entities have a relation.However, as far as we know, previous work [8][9][10][11][12] has used either entity type or entity distance but not both.The combination of the above two types of information may play a more important role in the joint extraction task.As shown in Tab. 2, the <Loc, Loc> tends to have the LocIn relation when the entity distance is smaller, such as 76.6% for [0-3], 12.8% for [4-7] and 3.5% for [8][9][10][11], whereas the <Per, Per> tends to have the Kill relation in the case of a bigger entity distance, such as 21.3% for [0-3], 33.5% for [4][5][6][7], and 26.7% for [8][9][10][11].
In this paper, we propose a two-phase span-based model for the joint extraction task, with the goal of addressing the issue of grossly imbalanced data distributions and the lack of effective global features.Motivated by the fact that we can achieve NER (RE) in two steps, namely first classify all entities (relations) and then predict their types.We divide the joint extraction task into two phases, with the first phase obtaining entities and relations and the second phase predicting their types.Our model reduces the data distribution gap by dozens of times using the two-phase paradigm.Take the data in Tab. 1 as an example: (1) in the first stage, the entity distribution can be reduced to 1: 24 and the relation distribution to 1: 8, whereas the corresponding values in SpERT are 1: 172 and 1: 56, respectively. 2(2) In the second xxxx phase, our model predicts the types of entities and relations, implying that the data distributions are roughly even. 3 Moreover, we attempt for the first time to combine entity type and entity distance as global features and use them to augment our model.Furthermore, we propose a gated mechanism for fusing various semantic representations, taking the weighted importance of each representation into account.In Section 4.5, we validate the effectiveness of the above model components.

Table 2:
Entity distance statistics of the CoNLL04 dataset.We use the ordered entity type tuple to denote ordered relation tuples of the same type, such as the <Per, Loc> denotes all relation tuples that the type of their first entity is the Per and the type of their second entity is the Loc.We divide all distances into four distance intervals, i.e., [0-3], [4][5][6][7], [8][9][10][11]  In summary, our model differs from the previous span-based models in three ways: (1) As far as we know, our model makes the first attempt to balance the grossly imbalanced data distributions.(2) Our model combines entity type and entity distance as the global features, whereas previous span-based models use at most one of them.(3) Our model uses a gated mechanism to fuse various semantic representations, whereas previous span-based models use a simple concatenation manner.

Span-based Joint Entity and Relation Extraction
Recently, span-based models have been extensively investigated for the joint entity and relation extraction task.Luan et al. [2] propose almost the first span-based joint model and attempt to further improve model performance by incorporating the coreference resolution task [13][14].Luan et al. [4] also include the coreference resolution task in their span-based joint model.Moreover, some other span-based models [5] have examined how to incorporate additional natural language processing tasks, such as event detection [15][16].More recently, Dixit and Al-Onaizan [3] introduce the pre-trained language model, i.e., ELMo (Embeddings from Language Models) [17], into the span-based joint model for the first time.Eberts and Ulges [1] propose to use BERT (Bidirectional Encoder Representation from Transformers) [18] as the backbone of their span-based joint model.Zhong and Chen [7] propose to use ALBERT (A Lite BERT) [19] in their span-based joint model.However, these models suffer from grossly imbalanced data distributions, as the span-based paradigm requires extensive negative entities and relations.Although our model also samples a large number of examples, we propose a two-phase paradigm to eliminate the data distribution gap effectively.

Global Features
The entity type and entity distance are two types of important global features that are frequently used in joint extraction models [20][21][22][23][24][25][26][27].Miwa and Bansal [l8], Sun and Grishman [28], and Bekoulis et al. [9], are among the first to use entity types as global features in their joint extraction models.They concatenate fixed-size embeddings trained for entity types to relation semantic representations.Zhao et al. [10] model strong correlations between entity labels and text tokens and concatenate entity label embeddings to relation semantic representations.For entity distance, Zeng et al. [11] and Ye et al. [12] concatenate relative entity position features to relation semantic representations.However, the above models use either entity type or entity distance but make no attempt to combine them.In comparison, our model suggests combining the entity type and entity distance as global features, which is validated to be more effective.

Model
The neural architecture of our two-phase span-based model is illustrated in Fig. 2. For a given unstructured text  = ( 1 ,  2 , … ,   ) where   denotes the  -ℎ text token, our model first obtains its BERT embedding sequence (Section 3.1); then in Phase One, our model obtain entities and relations by performing binary classifications on semantic representations of spans and relation tuples, respectively.These entities and relations are referred to as coarse-grained entities and relations, respectively (Section 3.2); next, in Phase Two, our model predicts the types of these coarse-grained entities and relations, obtaining fine-grained entities and relations (Section 3.3).In both phases, we combine the entity type and entity distance as global features and use a gated mechanism to fuse various semantic representations.

Embedding Layer
Our model uses the BERT [18] model as the word embedding generator.We denote the BERT embedding sequence for text  as follows: where   ∈ ℝ (+1) *  and  is the BERT embedding dimension. 0 is the BERT embedding for the added [CLS] token, which is a built-in setting of the BERT model. 4  is the BERT embedding for the token   .Due to the fact that BERT may tokenize a token into several sub-tokens in order to avoid the Out-of-Vocabulary (OOV) problem, we obtain   by applying the max-pooling function to the BERT embeddings of the sub-tokens tokenized from the token   .

Phase One
As shown in Fig. 2, the Phase One is composed of two modules: Entity Classification and Relation Classification, where the former obtains coarse-grained entities and the latter obtains coarse-grained relations.

Entity Classification xxxx
This module obtains coarse-grained entities by performing binary classification on span semantic representations.We begin by converting all entity types in the training set to the Entity type and set the type of sampled negative entities to the Not-Entity type.Our model will be trained to classify spans as the Entity type when they are predicted to be entities, otherwise the Not-Entity type.
In this paper, we obtain the span semantic representations using three different types of semantic representations: (1) span token representation, (2) contextual representation, and (3) span width embedding.
In this paper, we take the  0 ∈ ℝ  as the contextual representation for any span  from the text .
Span width embedding allows the model to incorporate prior experience over span widths.In this paper, we train a fixed-size embedding for each span width (i.e., 1, 2 ,...) during the model training.And we refer to the width embedding for the span  (length is  + 1) as  +1 , where  +1 ∈ ℝ  . ̂, on the other hand, should theoretically contribute the most to the span semantic representation, whereas  +1 the least.However, the previous work [1,4] has overlooked this critical property, concatenating the above representations, which has been demonstrated to be insufficient [10].In this paper, we propose a gated mechanism that enables us to weigh the importance of each representation.The span semantic representation (denoted as   ′ ) is then obtained by summing the weighted representations: where  1 ∈ ℝ  ,  1 is a scalar and {  ,   ′ } ∈ ℝ  .In the current scenario, the  is set to 3, and  1 ,  2 , and  3 are  ̂,  0 , and  +1 , respectively.
To obtain coarse-grained entities, we first pass the   ′ through an FFN (Feed Forward Network), and then feed it into the sigmoid function, which yields probability distributions for the span  on the above two types, i.e., Entity and Not-Entity: where  2 ∈ ℝ 2 *  and  2 ∈ ℝ 2 are trainable FFN parameters,  ̂ ′ ∈ ℝ 2 .By searching the highest-scored class,   estimates whether  is a coarse-grained entity or not.We build a coarse-grained entity set   with the predicted entities.

Relation Classification
This module obtains coarse-grained relations by performing binary classification on semantic representations of relation tuples.We begin by converting all relation types in the training set to the Relation type and assigning the Not-Relation type to sampled negative relations.Our model will be trained to classify relation tuples as the Relation type if they have relations, otherwise the Not-Relation type.
(7) We obtain the sematic representation of   with four different types of semantic representations, namely (1) the representation of  1 , (2) the representation of  2 , (3) relation contextual representation, and (4) global features.We use   1 ′ and   2 ′ , which are calculated using the Eq. ( 5) in Section 3.2.1, as the representations of  1 and  2 , respectively.Relation context is the text that between the two entities of a relation tuple [29].In this paper, we assume the relation context of   as  = (  ,  +1 , … ,  + ).Thus the BERT embedding sequence for  is as follows: We obtain the contextual representation of   (denoted as   ) by applying the max-pooling function to   :   = [max( ,1 ,  +1,1 , … ,  +,1 ), max( ,2 ,  +1,2 , … ,  +,2 ), … , max( , ,  +1, , … ,  +, )].(9) In this paper, we propose to combine entity type and entity distance as global features.Due to the fact that all entities here are the Entity type, only the entity distance can be used to distinguish different feature entries.As show in Fig. 2, we refer to them as binary global features.During model training, we train a fixed-size embedding for each feature entry and denote the feature embedding for   as    , where    ∈ ℝ  .
We obtain the semantic representation of   (denoted as    ) using the proposed gated mechanism, as shown in Eq. ( 5).In the current scenario, the  is set to 4, and  1 ,  2 ,  3 , and  4 are   1 ′ ,   2 ′ ,   , and    , respectively.xxxx To obtain coarse-grained relations, we first pass the    through an FFN and then feed it into the sigmoid function, which yields probability distributions for   on the above two types, i.e., Relation and Not-Relation:  ̂ =  3    +  3 , (10a) where  3 ∈ ℝ 2 *  and  3 ∈ ℝ 2 are trainable FFN parameters, and  ̂ ∈ ℝ 2 .By searching the highestscored class,    estimates whether   has a relation or not.We build a coarse-grained relation set   with the predicted relations.

Training Loss of Phase One
For each of the above two binary classifications, the training objective is to minimize the following binary cross-entropy loss: where  denotes one of the above two classifications.   is the one-hot vector of gold type. ̂  is the predicted probability distributions.  is the number of instances for the classification .

Phase Two
In the Phase Two, our model predicts the types of coarse-grained entities and relations, obtaining fine-grained entities and relations.The Phase Two, as illustrated in Fig. 2, is composed of two modules: Entity Type Predication and Relation Type Predication.

Entity Type Predication
In this module, we obtain entity types by conducting multi-class classifications on the semantic representations of coarse grained entities.Specifically, for each coarse-grained entity  in   , we denote its semantic representation as   ′ ∈ ℝ  ,which is obtained the same as the span semantic representation, as illustrated in Section 3.2.1.To obtain the type of , we first pass   ′ through an FFN and then feed it into the softmax function, which yields probability distributions for  on Ω, where Ω is the set of all predefined entity types: where  4 ∈ ℝ |Ω| *  and  4 ∈ ℝ |Ω| are trainable FFN parameters.|Ω| is the counts of pre-defined entity types.By searching the highest-scored class,  ̂ estimates a pre-defined entity type for .

Relation Type Predication
We obtain relation types by performing multi-class classifications on relation semantic representations.As shown in Fig. 2, the relation semantic representation is derived from two parts: the relation representation used for the binary relation classification and multi-class global features.
For each coarse-grained relation  in   , we denote its representation used for the binary relation classification as   , which can be obtained using the same approach illustrated in Section 3.2.2.As shown in Fig. 2, we combine the entity type and entity distance as the multi-class global features.We formulate the combination of entity type and entity distance as follows: where Ω is the set of pre-defined entity types, ∆ is the set of entity distances, and ⨂ denotes the Cartesian Product.For each feature entry in , we train a fixed-size embedding for it during the model training.We denote the feature embedding for  as   ∈ ℝ  .
Then we obtain the relation semantic representation (denoted as   ′ ) with   and   , which is calculated using the Eq. ( 5).In the current scenario, the  is set to 2, and  1 and  2 are   and   , respectively.
To obtain the type of , we first pass   ′ through an FFN, and then feed it into the softmax function, which yields probability distributions for  on Ψ, where Ψ is the set of all pre-defined relation types: where  5 ∈ ℝ |Ψ| *  and  5 ∈ ℝ |Ψ| .|Ψ| is the counts of pre-defined relation types.By searching the highest-scored class,  ̂ estimates the type that  has.

Training Loss of the Phase Two
For each of above two multi-class classification tasks, the training objective is to minimize the following cross-entropy loss: where  ′ denotes one of the above two classifications.   ′ is the one-hot vector of gold type. ̂  ′ is the predicted probability distributions.  ′ is the number of instances for the classification  ′ .

Model Training
During the model training, we minimize the following joint training loss: where  denotes the two binary classifications and  ′ denotes the two multi-class classifications.
CoNLL04 defines four entity types (Loc, Org, Per, and Other) and five relation types (Kill, Live, LocIn, OrgBI, and Work).We use the splits defined by Ji et al. [6] and Wang et al. [25].The dataset consists of 910 instances for training, 243 for development and 288 for test.
SciERC is derived from 500 abstracts of AI papers.The dataset defines six scientific entities (Task, Method, Metric, Material, Other, and Generic) and seven relation types (Compare, Conjunction, Evaluate-for, Used-for, Feature-of, Part-of, and Hyponym-of) in a total of 2,687 sentences.We use the same training (1,861 sentences), development (275 sentences), and test (551 sentences) split following the previous work [3,34].

Experimental Setup
CMC, 202x, vol.xx, no.xx xxxx For a fair comparison with previous work, we use the bert-base-cased model on ACE05 and CoNLL04, and use the scibert-scivocab-cased model on SciERC.We optimize our model using the BertAdam for 120 epochs with a learning rate of 5e-5 and a weight decay of 1e-2.We set the span width threshold  to 10 for all datasets and the entity distance set ∆ to {0, 1, … , 10}, and if an entity distance is greater than 10, we set it to 10.Moreover, we employ the same negative sampling strategy proposed by Eberts and Ulges [1].We use the standard Precision (P), Recall (R) and F1-score to evaluate the model performance: where TP, FP and FN stand for true positive, false positive, and false negative, respectively.For ACE05, an entity mention is considered correct if its head region and type match the ground truth, and a relation is correct if both its relation type and two entity mentions are correct.For CoNLL04, an entity mention is considered correct if its offsets and type match the ground truth, and a relation is correct if both its relation type and two entity mentions are correct.For SciERC, the entity type is not considered when evaluating relation extraction, which is in line with the previous work [6,7].And the remaining settings are identical to those for CoNLL04.

Main Results
We compare our model with all the published span-based models for the joint extraction task that we are aware of.We report the comparison results in Tab. 3, Tab.4, and Tab. 5, from which we can observe that our model consistently outperforms the strongest baselines in terms of F1-score across the three datasets.
To be more precise, on ACE05, our model achieves +0.4% and +3.2% absolute F1 gains on NER and RE, respectively, when compared to Ji et al. [6] that achieves the previous best NER performance.In addition, when compared to Zhong and Chen [7] that achieves the previous best RE performance, our model achieves +1.3% and +1.4% absolute F1 gains on NER and RE, respectively.On CoNLL04, our model achieves +0.3% and +1.6% absolute F1 gains on NER and RE, respectively, when compared to the strongest baseline Ji et al. [6].On SciERC, when compared to Santosh et al. [35] that achieves the previous best NER performance, our model delivers +0.5% and +1.4% absolute F1 gains.When compared to Zhang et al. [36] that achieves the previous best RE results, our model achieves +0.6% and +0.2% absolute F1 gains.We attribute the above performance improvements to that our model is capable of balancing the grossly imbalanced data distributions and exploiting the effective global features.

Effectiveness Investigations
We conduct extensive effectiveness investigations across the three datasets and use SpERT [1] as the baseline.SpERT is the most similar model to ours, and it uses two linear decoders for entity and relation classifications, as well as the BERT model as a backbone.SpERT, on the other hand, ignores the global features and does not balance the imbalanced data distributions.Furthermore, to make a fair comparison, our model employs the same negative sampling strategy as SpERT.

Data Distributions
As illustrated in Tab.6, we compare our model with the baseline in terms of the most imbalanced data distributions.We obtain the data distributions on NER and RE by comparing the numbers of different types of entities and relations, i.e., the smallest number v.s. the largest number.And we obtain the data distributions during the model training.We have the following observations: (1) On ACE05, the most imbalanced data distributions of the baseline are 1: 773.3 on NER and 1: 150.0 on RE.Our model, on the other hand, reduces the ratios to 1: 21.3 and 1: 13.8, respectively.(2) On CoNLL04, the most imbalanced data distributions of the baseline are 1: 171.5 on NER and 1: 56.4 on RE.Our model, on the other hand, reduces the ratios to 1: 23.7 and 1: 9.9, respectively.(3) On SciERC, the most imbalanced data distributions of the baseline are 1: 605.3 on NER and 1: 913.5 on RE.Our model, on the other hand, reduces the ratios to 1: 25.5 and 1: 35.6, respectively.
Based on the above observations, we conclude that the two-phase paradigm allows our model to avoid suffering from grossly imbalanced data distributions.

Table 6:
Comparisons regarding the most imbalanced data distributions between our model and the baseline.

Effectiveness against Entity Length
In general, as the entity lengths increase, it becomes increasingly difficult to recognize the entities.In this section, we conduct investigations on NER performance in relation to entity lengths.We divide all entity lengths, which are restricted by the span width threshold  ( is set to 10), into five intervals, i.e., [1][2], [3][4], [5][6], [7][8], and [9][10].We conduct investigations on the dev sets of the three datasets and report the results in Fig. 3.We can observe that our model consistently outperforms the baseline across all length intervals on the three datasets.Moreover, our model obtains greater F1 gains when the entity length increases.To be more precise, our model achieves the greatest improvement on ACE05 when the entity length is [7][8], and on CoNLL04 and SciERC when the entity length is [9][10], suggesting that our model is more successful in the case of long entity lengths.

Effectiveness against Entity Distance
In general, as the distance between the two entities of a relation increases, the relation becomes more difficult to extract.In this section, we conduct investigations on RE performance in relation to entity distances.We divide all entity distances into five intervals, namely [0], [1][2][3], [4][5][6], [7][8][9], and [>=10].We conduct investigations on the dev sets of the three datasets and report the investigation results in Fig. 4. The results demonstrate that our model beats the baseline across all distance intervals.Specifically, our model obtains greater improvement when the distance increases, demonstrating that our model is more effective in the case of long entity distances.

Ablation Study
We conduct ablation studies on the dev sets of the three datasets to analyze the effects of various model components.We report the ablation results in Tab. 7, where the "w/o Two-Phase" denotes ablating the two-phase paradigm.As a result, our approach is incapable of dealing with unbalanced data distributions.Additionally, our model cannot make use of binary global features, but retains multi-class global features.The "w/o Bi-Features" denotes ablating the binary global features, which is realized by removing    from    .The "w/o Multi-Features" denotes ablating the multi-class global features, which is realized by removing   from   ′ .The "w/o Both-Features" denotes conducting the above "w/o Bi-Features" and "w/o Multi-Features" ablations simultaneously.The "w/o gated" denotes ablating the gated mechanism.We use the concatenation manner to concatenate various semantic representations instead.The "base" denotes conducting all above ablations.After doing this, our model has the same neural architecture as SpERT.

Figure 1 :
Figure 1: An example of the span-based joint extraction.The Loc and Per are two pre-defined entity types, and the Live is a pre-defined relation type.× denotes the <"James Garfield", "U.S.">, supposedly classified into the Live type, is actually classified into the Not-Relation type by SpERT [1], a Span-based joint Entity-Relation extraction model with Transformer.Span-based joint extraction models [2-7] sample numerous negative entities and relations (i.e., spans of the Not-Entity type and relation tuples of the Not-Relation type) during the model training.These negative examples actually lead to grossly imbalanced data distributions, which is one of the primary reasons for the suboptimal model performance.As shown in Tab. 1, the entity distribution between Other and Not-Entity is 592: 101555 (approximate to 1: 172), the relation distribution between Kill and Not-Relation is 229: 12915 (approximate to 1: 56).Paradoxically, previous work [1] demonstrates that an adequate number of negative examples are required to ensure that the model performs well.Thus, resolving the issue of grossly imbalanced data distributions while maintaining an adequate number of negative examples is a feasible way to improve the model performance.

Figure 2 :
Figure 2: Neural architecture of the proposed model.In the Phase One, the model classifies entities and relations; in the Phase Two, the model predicts their types.In both phases, the model combines entity type and distance as global features.

Figure 3 :
Figure 3: NER performance (F1-score) comparison of our model and the baseline under various entity length intervals, which are tested on the dev sets of three datasets.

Figure 4 :
Figure 4: RE performance (F1-score) comparison of our model and the baseline under various entity distance intervals, which are tested on the dev sets of three datasets.

Table 1 :
The entity and relation counts in CoNLL04, which are obtained using SpERT.The Other, Org, Per and Loc are four pre-defined entity types, and the Kill, LocIn, Work, OrgBI and Live are five predefined relation types, and the Not-Entity and Not-Relation are the types of negative entities and relations, respectively.

Table 3 :
Performance comparisons on ACE05.⁎ denotes using the bert-based-cased model and the single-sentence setting for a fair comparison.The bold values denote the best results.

Table 4 :
Performance comparisons on CoNLL04.The bold values denote the best results.

Table 5 :
Performance comparisons on SciERC.⁑ denotes using the scibert-scivocab-cased model.The bold values denote the best results.

Table 7 :
(3)ation results on the dev sets of the three datasets.Weonly report the F1-scores.Two-Phase a negligible effect on NER.A plausible explanation is that these features are derived from entity information and are employed in the relation extraction.(3)Thecombination of the two types of global features results in improved RE performance, suggesting that they have a beneficial effect on one another.(4)Theproposed gated method consistently improves model performance, bringing +0.2% to +1.50% F1scores on NER and +0.5% to +0.9% on RE, suggesting that the gated mechanism can better fuse various semantic representations.5ConclusionInthis paper, we propose a two-phase span-based model for the joint entity and relation extraction task, aiming to tackle the grossly imbalanced data distributions caused by the essential negative sampling.And we augment the proposed model with global features obtained by combining entity types and entity distances.Moreover, we propose a gated mechanism for effectively fusing various semantic representations.Experimental results on several datasets demonstrate that our model consistently outperforms the strongest span-based models for the joint extraction task, establishing a new standard benchmark.