Future Event Prediction Based on Temporal Knowledge Graph Embedding

Accurate prediction of future events brings great benefits and reduces losses for society in many domains, such as civil unrest, pandemics, and crimes. Knowledge graph is a general language for describing and modeling complex systems. Different types of events continually occur, which are often related to historical and concurrent events. In this paper, we formalize the future event prediction as a temporal knowledge graph reasoning problem. Most existing studies either conduct reasoning on static knowledge graphs or assume knowledges graphs of all timestamps are available during the training process. As a result, they cannot effectively reason over temporal knowledge graphs and predict events happening in the future. To address this problem, some recent works learn to infer future events based on historical event-based temporal knowledge graphs. However, these methods do not comprehensively consider the latent patterns and influences behind historical events and concurrent events simultaneously. This paper proposes a new graph representation learning model, namely Recurrent Event Graph ATtention Network (RE-GAT), based on a novel historical and concurrent events attention-aware mechanism by modeling the event knowledge graph sequence recurrently. More specifically, our RE-GAT uses an attention-based historical events embedding module to encode past events, and employs an attentionbased concurrent events embedding module to model the associations of events at the same timestamp. A translation-based decoder module and a learning objective are developed to optimize the embeddings of entities and relations. We evaluate our proposed method on four benchmark datasets. Extensive experimental results demonstrate the superiority of our RE-GAT model comparing to various baselines, which proves that our method can more accurately predict what events are going to happen.


Introduction
From the 9/11 terrorist attacks to the COVID-19 pandemic, societal events often deeply affect people's daily lives and cause huge economic burden. Predicting these events in advance is highly valuable to the risk perception and prevention of our society [1]. It is not surprising that computational social science has exploded in prominence as an active field for the need of analyzing societal events [2]. With the advent of the big data era, computational social science now focuses on social intelligence more than social information processing. This movement is achieved by capturing human social dynamics, and modelling social behavior through existing big data [3].
For the promising future of this field, considerable attention has been paid to get further development in the past decades. Relevant methods, systems, and event databases have been proposed in succession, e.g., the Integrated Crisis Early Warning System (ICEWS) [4], which helps US policy predict international crises. Another example is the Early Model Based Event Recognition using Surrogates (EMBERS) [5,6] for forecasting events include influenza-like illness, civil unrest, domestic political crises, and elections. Among existing researches, GDELT [7] has emerged as an interesting project because it is a free open platform which monitors societal events across nearly all countries of the world in over 100 languages.
Recently, Knowledge Graphs (KGs) [8][9][10][11][12] are widely used in many real-world applications. Since knowledge graphs can model/reflect real-world facts, event prediction problem can be transformed into a missing fact reasoning problem in the KGs. Most existing research studies on knowledge reasoning are based on static knowledge graphs. In particular, an event is normally defined in the form of a triplet including event subject, event object, and the relation between them, i.e., (subject, relation, object). However, as the facts are highly correlated with time, temporal knowledge graphs (TKGs) are proposed to associate events with their corresponding timestamps, i.e., (subject, relation, object, time). Fig. 1 shows an example for event predicting on a temporal knowledge graph. We can learn that events dynamically occur with the time, as the relations (suppress or negotiation) between the same entities (Taliban and Government) would be different at different dates (2021/07/09 and 2021/08/13).
Reasoning tasks on temporal knowledge graph are mainly divided into two types: interpolation and extrapolation [13]. Given a temporal knowledge graph in which timestamps vary from t 0 to t T , events are predicted for time t satisfied that (t 0 ≤ t ≤ t T ) in interpolation reasoning, while extrapolation reasoning focuses on predicting unseen events for t beyond t T (i.e., t > t T ).  Existing researches attempt to solve KG reasoning problem by static knowledge graph embedding approaches, or simply extend the static KG embedding methods with timestamps. Besides, the latter mostly focuses on interpolation scenarios [14], or encoding patterns associated with the occurrence of events by using simple aggregation method [13]. Thus, it is desirable to develop a more efficient and comprehensive method that can extrapolate future events by representing historical event through modeling local graph within a time window.
In this work, we proposed a Recurrent Event Graph ATtention Network (RE-GAT) to predict events in the extrapolation setting. Unlike traditional temporal knowledge graph embedding methods that neglect the structure during the process of learning representations, RE-GAT employs an attention-based historical events embedding module to encode past events, and uses an attention-based concurrent events embedding module to model the associations of events at the same timestamp. A translation-based decoder module and a learning objective is used to optimize the embeddings of event entities and relations. The contributions of our work can be summarized as follows: We formalize the future event prediction problem into a temporal knowledge graph extrapolation reasoning problem. RE-GAT uses RNNs and GNNs to jointly encode temporal and structural event information from historical and concurrent events for predicting future events. In addition, we employ a novel attention mechanism to ensure better representations of sophisticated patterns associated with the events. We conduct extensive experiments on four real-world datasets. The experimental results demonstrate that our proposed method outperforms various state-of-the-art baselines.

Static Knowledge Graph Embeddings
Static knowledge graph embeddings without considering the temporal information have been extensively studied, which mostly target at embedding entities and relations into the latent vector spaces. A class of them focus on translation tasks [15][16][17][18], which models the relation of two entities into a translation vector. RotatE [19] defines relation as a rotation between entities in vector space to embed knowledge graph. Other models represent semantic information by using triangular norm to measure plausibility of facts [20,21]. There are also some works based on deep neural network [22][23][24]. However, these methods are not effective in predicting future events due to their incapability of capturing temporally dynamic facts.

Temporal Knowledge Graph Embedding
More recently, researchers attempt to model the varying facts over time in temporal knowledge graphs. TTransE [25] extends the traditional TransE [16] into temporal scenarios through embedding temporal information into score function. Similarly, HyTE [26] extends TransH [15]. DE-SimpIE [27] combines entity and timestamp to generate time-specific representations. Despite well performance in their tasks, these methods do not take into consideration the long-term temporal relationship of real-world events. These methods assume that all timestamps and corresponding knowledge graphs are available during the training process, hence they are not able to predict events in the future.
Another line of works tries to model graph sequences to capture long-term dependency of facts. DyREP [28] proposes a two-time scale deep process to jointly model global and local topological evolution. Historical Information Passing (HIP) [29] network models the evolution of event-based knowledge graph by passing information from three perspectives (temporal, structural, and repetitive). RE-GCN [30] based on GCN models knowledge graph sequence recurrently to learn representation at each timestamp. CyGNet [14] proposes a time-aware copy-generation representation learning method to model temporal knowledge graph. RE-NET [13] uses an autoregressive architecture based on RNN to inference over temporal knowledge graph of event sequences.

Event Prediction
Traditional event prediction tasks are mainly viewed as a classic machine learning classification problem, e.g., customer churn event prediction [31], civil unrest [32], adverse drug reaction [33], and etc. However, not all event prediction tasks can be modeled as classification problems. With the development of technology on graph, events can be represented as nodes or links in graph, and hence event prediction tasks are modeled as node/link prediction tasks. Abstract Causality Network [34] embeds real-world events into continuous vector space, and predicts causality event through minimizing a defined energy function. Dynamic Graph Convolution Network [35] is proposed to give context information of the result while predicting event, which is improved to be suitable in multi-event prediction tasks [36]. Overall, event prediction with reasoning over temporal knowledge graphs is relatively unexplored.

Problem Formulation
We start with describe notations for the temporal knowledge graph (TKG), and then we define the TKG reasoning problem.
An event-based TKG can be regarded as a sequence of static knowledge graphs (SKG) sorted by event timestamp, i.e., G = {G 1 , G 2 , …, G t , …}. Each SKG in G can be represented as G t ¼ ðE; R; T Þ, where E; R; and T denote the sets of event entities, event type, and timestamps, respectively. G t consists of a set of events with the same timestamp t. An event in G t can be represented as a time-stamped quadruple, i.e., (subject, relation, object, time) and is denoted by a quadruple (s, r, o, t) ∈ G t . This means that an event is happened at timestamp t 2 T between subject s 2 E and object o 2 E. The event type is denoted as relation r 2 R.
The future event prediction problem is formalized as predict the event object or the event subject given all the set of historical events before timestamp t, namely (s, r, ?, t) or (?, r, o, t). We assume that the events at a time step t, i.e., G t , depends on the events at the previous k time steps (i.e., {G t−k , G t−k+1 , …, G t−1 }), denotes as G t−k:t−1 . We use H t and R t to describe embedding matrices of event entities, event types at t, respectively. To predict an event at time t, we use the information of the historical KGs is embedded in the matrices of event subjects and objects H tÀ1 2 R jEjÂd E and the event types R tÀ1 2 R jRjÂd R at timestamp t − 1, where d E and d R is the dimension of the event entity vector representations and event type vector representations, respectively. Given all past events, i.e., the historical event sequences G t−k:t−1 , we can formulate the future event object prediction problem as a ranking problem. Given a future event prediction task (s, r, ?, t), our proposed RE-GAT model utilizes the event subject s, the event type r, and past events G t−k:t−1 to calculate the conditional probability for all event objects: pðojG tÀk:tÀ1 ; s; rÞ ¼ pðojH tÀ1 ; R tÀ1 ; s; rÞ: (1) Similarly, we can define the problem of predicting future event subject entity (?, r, o, t) and event type (s, ?, o, t) as follows: pðsjG tÀk:tÀ1 ; o; rÞ ¼ pðsjH tÀ1 ; R tÀ1 ; o; rÞ; (2) pðrjG tÀk:tÀ1 ; s; oÞ ¼ pðrjH tÀ1 ; R tÀ1 ; s; oÞ:

The RE-GAT Model
We introduce our proposed RE-GAT model in this section. RE-GAT uses an attention-based recurrent neural network to encode the informative sequential patterns across historical events. RE-GAT learns the local structural relations between concurrent events in a knowledge graph at each timestamp utilizing an attention-based graph neural network representation mechanism. Based on these learned temporal event subject embeddings, event object embeddings, and event type embeddings, future event at subsequent timestamp can be predicted with classical translation-based decoder. As shown in Fig. 2, RE-GAT consists of an entity and relation embedding encoder and a decoder. The former contains an attentionbased concurrent events embedding module (Translational-based GAT) and an attention-based historical events embedding module (Time Gate, GRU, Attention, etc.) to encode the historical event KGs. The latter employs a translation-based score function for corresponding entity prediction task.

Attention-Based Concurrent Events Embedding Module
To capture the concurrent events information at the same timestamp, we use the attention-based historical events embedding module to encode the structural dependencies and associations among the entities and relations in these events. Since graph neural network (GNN) has strong expressive power for the unstructured graph data [23,[37][38][39][40] and neighbors play different influences in reality [41][42][43], we utilize a ω layer graph attention network (GAT) to model the neighborhood concurrent events information. To represent the inverse event type (relation) of the event entities in our model, we add the inverse event quadruple (o, r −1 , s, t) at the same timestamp to the event-based KG for each event (s, r, o, t). Without loss of generality, we take the object prediction problem (s, r, ?, t) for example. Specifically, for each knowledge graph at timestamp t, an event object o obtains its embeddings at layer l ∈ [0, ω − 1] from the corresponding event subjects and event types in the quadruples under a graph attention network framework at layer l and learns its vector representations at the l + 1 layer, i.e., where h ðlÞ o;t , h ðlÞ s;t , and r t represent the l th layer vector embeddings of event object o, event subject s and event type r at timestamp t, respectively; a ðlÞ os is the learnable attention weight, W ðlÞ 1 and W ðlÞ 2 are the learnable weight matrix parameters in the l th layer. We calculate a pairwise unnormalized attention score between event subject and event object in Eq. (5), where || denotes concatenation. We first concatenate the vector representations of event object and event subject, then take a dot product with a learnable weight vector a (l) . Finally, we apply a LeakyReLU to the dot product result. In Eq. (6), a softmax function is applied to normalize the attention scores on all quadruples containing the event object entity. Similar to the aggregator of classic graph convolutional network (GCN), the embeddings from concurrent events are multiplied by the attention scores, summed together, and added by the self-loop embeddings in Eq. (7). Note that each KG G t is composed of a set of events occur at the same timestamp. We use h ðlÞ s;t þ r t in Eq. (7) to capture the relationship between the event subject, event type, and event object. It also means that d E ¼ d R . We use d in the following for short.
The attention-based concurrent events embedding module gets the event entity vector representations, namely embeddings, based on the concurrent events occurring with the target event and its own embeddings. These operations can be interpreted as the evolution and change of the events.

Attention-Based Historical Events Embedding Module
This module seeks to model the historical events patterns between entity pairs, encode the temporal information across time, and generate the temporal embeddings for entities and relations. For the event subject s and event object o in an event (s, r, o, t) or the inverse type event (o, r −1 , s, t), the latent temporal event features and patterns contained in the historical events imply the historical trends and regularities. To cover as many temporal patterns of historical events as possible, the model needs to take time sequence of events into account. Since the output of the attention-based concurrent events embedding module (Translational-based GAT) in the final layer, i.e., h x o;tÀ1 , already encodes the vector representation of event objects at timestamp t − 1, we might think of using this entire output event entity embedding matrix H tÀ1 directly as the input of the translational-based GAT module at time t, namely H 0 t ¼ H tÀ1 . However, this is equivalent to stacking all the ɷ-layer translational-based GAT together, resulting in the over-smoothing problems [44]. The embeddings of event objects, event subjects, and event types will converge to the same vector values. The large number of stacked translational-based GAT modules may also introduce the vanishing gradient problem, preventing the weight from changing during the training iteration. Thus, we utilize a time gate component in our attention-based historical events embedding module to address these problems following [30]. The event entity matrix H t is computed by the output at timestamp t of the attention-based concurrent events embedding module in the final layer ω, i.e., H x t and H tÀ1 from the same module at timestamp t − 1. More specifically, where ⊗ represents the element-wise product operation. The time gate matrix U t 2 R jEjÂd applies non-linear sigmoid transformation as: where rðÞ denotes the sigmoid function and W 3 is the parameter for weight matrix.
To better capture the event representation, we employ an historical event attention mechanism that allows the module to dynamically select and linearly combine different historical events of the relations [45]: where v e , W e and U e are parameters. The factors a s determine which part of the historical event should be emphasized or ignored when making predictions. Relation embeddings r t form the embedding matrices of relations at t, i.e., R t .

Translation-Based Decoder Module
Traditional knowledge graph entity prediction task [19,22,24,37] usually use a scoring function to measure the plausibility of quadruples given the embeddings. They utilize training data consists of positive and sampled negative quadruples to update the representation. Previous studies [22,24,37] have demonstrated that GNN with the convolutional score functions perform well on knowledge graph entity prediction task. For the purpose of cosidering the translational property of the vector representations in Eq. (7), ConvTransE [24] is chosen as the decoder model to compute the conditional probability in Eqs.
(1), (2) and (3). Following [30], the probability of event object is: pðojH tÀ1 ; R tÀ1 ; s; rÞ ¼ rðH tÀ1 ConvTransEðs tÀ1 ; r tÀ1 ÞÞ: In the same way, the probability score of the event type is: where σ(⋅) denotes the widely used Sigmoid function and s t−1 , r t−1 , o t−1 represent the vector representations of event subject s, event type r, and event object o in H tÀ1 and R tÀ1 at timestamp t − 1, respectively.
Note that the ConvTransE model can be replaced by any other translation-based score functions or decoders. We omit the details of ConvTransE for brevity.

Learning Objective
In this section, we discuss the training process of RE-GAT model. An event object entity prediction problem (s, r, ?, t) can be thought of as a multi-class classification problem in which each class corresponds to each event object entity. Similarly, we can also consider the subject entity prediction problem (?, r, o, t) as a multi-class classification task. Without loss of generality, we describe future event prediction problem as predicting the event object in a time-stamped quadruple (s, r, ?, t). We can easily extend the model to predict the event subject entity, i.e., (?, r, o, t).
Following [30], we use y e t 2 R jEj and y r t 2 R jRj to represent the vector representations of labels for event entity prediction task and relation prediction task at the timestamp t. Then, i¼0 y e t;i log p i ðojH tÀ1 ; R tÀ1 ; s; rÞ; (15) y r t;i log p i ðrjH tÀ1 ; R tÀ1 ; s; oÞ; where T denotes the total number of event-based KG timestamps in the training dataset, y e t;i and y r t;i represent the i th vector element in y e t and y r t , respectively. Note that the elements of vector y e t and y r t are 1 for events that do occur and 0 otherwise.
We use a multi-task learning framework [30,46] for the event entity prediction and event type prediction tasks. Therefore, the final loss score can be calculated as: where λ 1 is the importance parameter. We can choose the parameter value according to the task and control the importance of each component.

Experiments
We evaluate the performance of RE-GAT model with four public event datasets in this section. First, we explain experimental settings in detail, including the datasets and baselines. After that, we discuss the experimental results.

Experimental Setup
In this section, we compare the performance of our proposed model against various static knowledge graph embedding methods and some recent temporal knowledge graph models.
Evaluation Metrics. The methods are evaluated on the link prediction and relation prediction task which evaluates whether the ground-truth event quadruple (fact) is ranked ahead of other event quadruple. We report the results of mean reciprocal rank (MRR), hits at 1/3/10 (H@1/3/10) in our experiments. Hits at k (H@k) measures the proportion of the correct quadruple appears in the top k ranked quadruples. Many previous works remove corrupted event quadruples during evaluation which is called filtered setting. As mentioned in [30,49,50], all event quadruples that occur in the training, validation, or test sets are removed from the ranking result, which is not suitable for temporal knowledge graph entity prediction tasks. To this end, we only report the experimental results under the raw settings.

Experimental Results
Tab. 2 presents the entity prediction performance of future event of RE-GAT and baseline models on four event-based datasets. The best scores are boldfaced and the second-best scores are underlined.
As we can see that Static KGE methods are much worse than RE-GAT since they cannot capture temporal events information. We can also observe that RE-GAT performs much better than HyTE and TA-DistMult with MRR and H@1/3/10 metrics. We believe this is because HyTE and TA-DistMult only learn events representations independently for each timestamp and lack the ability of capturing the longterm dependency.
It can also be observed from Tab. 2 that RE-GAT outperforms all the baselines on ICEWS05-15, ICEWS14 and ICEWS18 datasets. For instance, RE-GAT achieves the improvements of 11.19% over second-best results with H@3 metric on ICEWS05-15 dataset.
To further study the performance of our RE-GAT model and the visual advantage of knowledge graph, we present a case study of three subgraphs from the event-based temporal knowledge graph of ICEWS18 test dataset. As shown in Fig. 3, we are given historical events (quadruples) at timestamps 2018/09/26 and 2018/ 09/27, and attempt to predict which entity will Militant (Taliban) use unconventional violence to at the timestamp 2018/09/28. As we can see from the subgraph on 2018/09/28 in Fig. 3, RE-GAT successfully obtains the correct answer Military (Afghanistan), which shows that the temporal and structural event information can be learned by our RE-GAT model.

Conclusion
It is highly desirable to predict the occurrence of events (such as political events, pandemics, and crimes etc.) in advance to reduce the potential damage and social upheaval. In this paper, we formulate the event prediction problem as an extrapolation reasoning problem in temporal knowledge graphs. We propose a RE-GAT model to tackle the problem. RE-GAT learns event information from the historical and concurrent structural perspectives to make future predictions. The proposed RE-GAT model also considers the complex influence of historical events in the past and concurrent events at the same timestamp, which makes it can effectively capture the historical patterns and neighborhood event interactions. As shown by the experimental results on four real-world datasets, our proposed RE-GAT model significantly achieves improvements over baselines.