<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">63208</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.063208</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An Integrated Perception Model for Predicting and Analyzing Urban Rail Transit Emergencies Based on Unstructured Data</article-title>
<alt-title alt-title-type="left-running-head">An Integrated Perception Model for Predicting and Analyzing Urban Rail Transit Emergencies Based on Unstructured Data</alt-title>
<alt-title alt-title-type="right-running-head">An Integrated Perception Model for Predicting and Analyzing Urban Rail Transit Emergencies Based on Unstructured Data</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Mu</surname> <given-names>Liang</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Kang</surname><given-names>Yurui</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Yan</surname><given-names>Zixu</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Zhu</surname><given-names>Guangyu</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><xref rid="cor1" ref-type="corresp">&#x002A;</xref><email>gyzhu@sxu.edu.cn</email></contrib>
<aff id="aff-1"><label>1</label><institution>School of Traffic and Transportation, Beijing Jiaotong University</institution>, <addr-line>Beijing, 100044</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>School of Automation and Software Engineering, Shanxi University</institution>, <addr-line>Taiyuan, 030006</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Guangyu Zhu. Email: <email>gyzhu@sxu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>03</day><month>07</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>2</issue>
<fpage>2495</fpage>
<lpage>2512</lpage>
<history>
<date date-type="received">
<day>08</day>
<month>1</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>17</day>
<month>4</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_63208.pdf"></self-uri>
<abstract>
<p>The accurate prediction and analysis of emergencies in Urban Rail Transit Systems (URTS) are essential for the development of effective early warning and prevention mechanisms. This study presents an integrated perception model designed to predict emergencies and analyze their causes based on historical unstructured emergency data. To address issues related to data structuredness and missing values, we employed label encoding and an Elastic Net Regularization-based Generative Adversarial Interpolation Network (ER-GAIN) for data structuring and imputation. Additionally, to mitigate the impact of imbalanced data on the predictive performance of emergencies, we introduced an Adaptive Boosting Ensemble Model (AdaBoost) to forecast the key features of emergencies, including event types and levels. We also utilized Information Gain (IG) to analyze and rank the causes of various significant emergencies. Experimental results indicate that, compared to baseline data imputation models, ER-GAIN improved the prediction accuracy of key emergency features by 3.67% and 3.78%, respectively. Furthermore, AdaBoost enhanced the accuracy by over 4.34% and 3.25% compared to baseline predictive models. Through causation analysis, we identified the critical causes of train operation and fire incidents. The findings of this research will contribute to the establishment of early warning and prevention mechanisms for emergencies in URTS, potentially leading to safer and more reliable URTS operations.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Urban rail transit system</kwd>
<kwd>emergency prediction</kwd>
<kwd>generative adversarial imputation network</kwd>
<kwd>ensemble learning</kwd>
<kwd>cause analysis</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Fundamental Research Funds for the Central Universities</funding-source>
<award-id>2024YJS096</award-id>
</award-group>
<award-group id="awg2">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>62433005</award-id>
<award-id>62272036</award-id>
<award-id>62173167</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Urban rail transit (URT) is the preferred mode of transportation for urban communities, and its safe and stable operation is an essential reflection of the city&#x2019;s level of safety management. The operation of the Urban rail transit system (URTS) is in relatively enclosed spaces and often accumulates many passengers within a short period. In such scenarios, emergencies can result in massive casualties and severe consequences [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>]. Research on the perception methods for emergencies in URTS, in-depth exploration of hidden information in historical emergency data. Forecasting potential events and analyzing associated causes serves as a crucial approach to prevent incidents and enhance system security [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>].</p>
<p>Urban Rail Transit Emergencies (URTE) data predominantly consist of unstructured narrative texts, such as accident reports, eyewitness descriptions, investigation reports, news reports [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>]. Current methods for structuring text data include Label Encoding [<xref ref-type="bibr" rid="ref-8">8</xref>], One Hot Encoding [<xref ref-type="bibr" rid="ref-9">9</xref>], and Bag of Words models [<xref ref-type="bibr" rid="ref-10">10</xref>]. Label encoding involves mapping text labels into numerical form, preserving the relationships between sequences to the greatest extent possible.</p>
<p>The low incidence of emergency, incomplete reporting records, and data collection and sharing constraints [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>] have contributed to the sparse and incomplete characteristics of structured URTE data. Currently, there are three types of interpolation methods for incomplete data [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>]; the first one is based on statistics and regression ideas, which includes mean value [<xref ref-type="bibr" rid="ref-15">15</xref>], regression model [<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>], principal component analysis (PCA) [<xref ref-type="bibr" rid="ref-18">18</xref>], etc.; the second one is based on machine learning data completion methods, which includes: K-Nearest Neighbor (KNN) [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-20">20</xref>], decision tree [<xref ref-type="bibr" rid="ref-21">21</xref>], Bayesian network [<xref ref-type="bibr" rid="ref-22">22</xref>], etc. However, these methods are restricted in handling low-dimensional and simple datasets and face challenges with high-dimensional and complex URTE data, potentially reducing the accuracy of perception models and limiting comprehensive risk identification.</p>
<p>An increasing number of scholars are exploring deep learning methods for interpolating missing data values, forming a third category of approaches. In 2018, Yoon et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] introduced the GAIN network model utilizing generative adversarial networks to fill missing data values. Based on the GAIN network, Sun et al. [<xref ref-type="bibr" rid="ref-24">24</xref>] compared various complementation algorithms such as variational auto-encoder (VAE) with unique heat coding using multiple datasets for different missing mechanisms, and the experiments showed that GAIN performs better in all missing mechanisms. Bernardini et al. [<xref ref-type="bibr" rid="ref-25">25</xref>] proposed a conditional generative adversarial network (ccGAN) data interpolation method in the medical clinical domain for real-world electronic health record data containing multiple records with missing values. In summary, GAIN surpasses traditional methods by accurately capturing the complex distribution of raw data. However, overfitting is prone to occur during the training process.</p>
<p>The prediction of URTE is based on using models to learn the event&#x2019;s characteristics and predict the critical features of emergencies. There have been studies using traditional machine learning methods such as Decision Trees [<xref ref-type="bibr" rid="ref-26">26</xref>], Support Vector Machines (SVM) [<xref ref-type="bibr" rid="ref-27">27</xref>,<xref ref-type="bibr" rid="ref-28">28</xref>], and logistic regression [<xref ref-type="bibr" rid="ref-29">29</xref>] to predict emergencies, but such methods are only adapted to simple, balanced datasets. The distribution of emergency data in terms of category and level often has an uneven problem [<xref ref-type="bibr" rid="ref-30">30</xref>], which leads to traditional machine learning perception models tending to learn emergency features of a particular category and level. Prediction models are prone to prediction errors in the case of limited samples.</p>
<p>Compared to traditional machine learning methods, Ensemble Learning (EL) methods, which aggregate the results of multiple base classifiers, are gaining significant attention from researchers [<xref ref-type="bibr" rid="ref-31">31</xref>&#x2013;<xref ref-type="bibr" rid="ref-33">33</xref>]. Meng et al. [<xref ref-type="bibr" rid="ref-34">34</xref>] proposed an EL strategy for accident-type prediction and causation analysis in response to unbalanced railway accident data. Wang et al. [<xref ref-type="bibr" rid="ref-35">35</xref>] used an EL approach with AdaBoost to predict critical indicators such as PM2.5 in the metro environment. This approach aims to enhance air quality management and help prevent cardiopulmonary diseases caused by passengers&#x2019; exposure to hazardous air. EL methods offer a robust approach for various predictive tasks, especially in scenarios involving complex and imbalanced datasets. However, right now, this method is hardly used in predicting emergencies in URTS.</p>
<p>The cause analysis of emergencies involves identifying the primary causes that lead to the occurrence of such events, thereby facilitating a clear analysis of the processes underlying emergency situations. In recent years, researchers have employed a variety of methodologies to explore the origins of emergencies. One such approach is model-based, which includes System-Theoretic Accident Model and Processes (STAMP) [<xref ref-type="bibr" rid="ref-36">36</xref>] and AcciMap [<xref ref-type="bibr" rid="ref-37">37</xref>]. These methodologies identify potential accident causes by analyzing the interactions and feedback loops within a system. However, the complexity of URTS and the inherent uncertainty of emergencies make modeling within actual URT environments exceedingly challenging. Consequently, data-driven methods [<xref ref-type="bibr" rid="ref-38">38</xref>] have increasingly been utilized to uncover the deep relationships between emergency characteristics and their causes, providing a more objective and quantifiable perspective for causes analysis.</p>
<p>Main contributions of this study are summarized as follows:
<list list-type="simple">
<list-item><label>(1)</label><p>Coding schemes for the types and levels of emergencies are built, and label coding is carried out to achieve structured data processing.</p></list-item>
<list-item><label>(2)</label><p>This study integrates elastic net regularization to refine the GAIN (ER-GAIN). By leveraging its capability to learn the complex distribution of the original data, the model effectively interpolates missing values within the dataset, thereby enhancing the integrity of emergency data.</p></list-item>
<list-item><label>(3)</label><p>To address the imbalance in data related to URTE, we employed the AdaBoost ensemble learning model to predict the key features of these emergencies.</p></list-item>
<list-item><label>(4)</label><p>This paper utilizes the IG from data-driven strategies to analyze the causes of emergencies, thereby identifying the key factors that contribute to the occurrence of such events.</p></list-item>
</list></p>
<p>The rest of the paper is structured as follows: <xref ref-type="sec" rid="s2">Section 2</xref> provides a detailed description of the integrated perception model; <xref ref-type="sec" rid="s3">Section 3</xref> uses actual URTE data for experimental; <xref ref-type="sec" rid="s4">Section 4</xref> discusses the results. <xref ref-type="sec" rid="s5">Section 5</xref> concludes and looks forward to the work in this paper.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Methodology</title>
<p>The framework of the URTE integrated perceptual model is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. It contains a total of four parts: (1) Structuring of data, (2) Data interpolation based on ER-GAIN, and (3) Emergency prediction method. (4) Cause analysis of emergency.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Framework of integrated perception model for URTE under incomplete data</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-1.tif"/>
</fig>
<sec id="s2_1">
<label>2.1</label>
<title>Structured Method for Emergency Data</title>
<p>Unstructured data obtained from various sources, such as social media and incident reports, provide narrative textual records of the processes and outcomes of URTE. However, Structured Data refers to data stored and organized in a predefined format, whose structure and content follow clear rules and patterns, facilitating efficient processing and analysis by computer programs. This paper employs a label encoding method to process unstructured emergency data, developing a labeling scheme in terms of the cause, type, and level of emergencies.</p>
<sec id="s2_1_1">
<label>2.1.1</label>
<title>Causes of Emergencies</title>
<p>The causes or interactions of causes leading to URTE directly reflect system vulnerabilities and insufficient resilience. According to the theory of event causality [<xref ref-type="bibr" rid="ref-39">39</xref>], these emergencies can often be attributed to four key factors: unsafe human behavior, unsafe material conditions, adverse environmental conditions, and managerial deficiencies. Building on this theory, our study further refines the causes of URTE across four dimensions: human, machine, environment, and management. By combining a literature review and expert experience, we identified 19 specific causal factors, detailed in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Label coding scheme for the causation of URTE</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Causative factors</th>
<th>Specific causes</th>
<th>Coded value</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="6">Human</td>
<td>Human error <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td rowspan="19">0&#x2013;1: Likelihood that the cause exists</td>
</tr>
<tr>
<td>Unsafe behavior of personnel <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Weak security awareness among personnel <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Low business level of employees <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Poor emergency response capacity <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Passenger&#x2019;s destructive behavior <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td rowspan="7">Machine</td>
<td>Power supply system failure <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Station equipment failure <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>8</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Train malfunction <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>9</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Signal system failure <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Tunnels and lines damaged <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Security equipment is insufficient <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Design and construction defects <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>13</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td rowspan="3">Environment</td>
<td>Natural disasters and severe weather <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>14</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Heavy traffic <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>15</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Foreign object intrusion <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>16</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td rowspan="3">Management</td>
<td>Untimely maintenance of equipment <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>17</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Inadequate security precautions <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>18</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td>Inadequate supervision of safety and quality <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>19</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_1_2">
<label>2.1.2</label>
<title>Types of Emergencies</title>
<p>Event types refers to all types of emergencies that may occur during operations, which may have a serious impact on passenger safety, operational order, and facilities and equipment. In this study, the classification of URTE and their corresponding encoding values are detailed in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Label coding scheme for event type</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Event types</th>
<th>Description of the accident</th>
<th>Coded value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Driving accident</td>
<td>Derailment, rear end collision, collision, suspension, delay, etc.</td>
<td>1</td>
</tr>
<tr>
<td>Fire accident</td>
<td>Equipment catching fire and human induced arson, etc.</td>
<td>2</td>
</tr>
<tr>
<td>Public safety accident</td>
<td>Passenger disembarkation, overcrowding, violence, etc.</td>
<td>3</td>
</tr>
<tr>
<td>Terrorist attack</td>
<td>Poison gas, bombs, etc.</td>
<td>4</td>
</tr>
<tr>
<td>Power outage accident</td>
<td>Failure of external and internal power supply units, etc.</td>
<td>5</td>
</tr>
<tr>
<td>Flooding accidents</td>
<td>Rainwater or other water into the underground, etc.</td>
<td>6</td>
</tr>
<tr>
<td>Natural disaster</td>
<td>Earthquake, wind, lightning, etc.</td>
<td>7</td>
</tr>
<tr>
<td>Equipment failure</td>
<td>Screen door failure, lift failure, etc.</td>
<td>8</td>
</tr>
<tr>
<td>Construction accident</td>
<td>Construction section collapse, etc.</td>
<td>9</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_1_3">
<label>2.1.3</label>
<title>Levels of Emergencies</title>
<p>Event levels refers to the categorization and grading of emergencies based on factors such as their severity, scope of impact, degree of harm, and emergency response needs [<xref ref-type="bibr" rid="ref-40">40</xref>], and the results are shown in <xref ref-type="table" rid="table-3">Table 3</xref>. If the event meets one of the classification conditions, it will be classified as such.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Label coding scheme for levels of URTE</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Event levels</th>
<th colspan="3">Demarcation conditions</th>
<th align="center">Coded value</th>
</tr>
<tr>
<th/>
<th align="center">Duration of interruptions</th>
<th align="center">Personnel casualties</th>
<th align="center">Economic loss</th>
<th/>
</tr>
</thead>
<tbody>
<tr>
<td>Particularly significant (I)</td>
<td>Interruption &#x2265; 36 h</td>
<td>Deaths &#x2265; 30 or serious injuries &#x2265; 100</td>
<td>Direct economic losses &#x2265; 100M yuan</td>
<td>1</td>
</tr>
<tr>
<td>Significant (II)</td>
<td>24 h &#x2264; Interruption &#x003C; 36 h</td>
<td>10 &#x2264; Deaths &#x003C; 30 or 50 &#x2264; serious injuries &#x003C; 100</td>
<td>50M yuan &#x2264; Direct economic losses &#x003C; 100M yuan</td>
<td>2</td>
</tr>
<tr>
<td>Larger (III)</td>
<td>6 h &#x2264; Interruption &#x003C; 24 h</td>
<td>3 &#x2264; Deaths &#x003C; 10 or 10 &#x2264; serious injuries &#x003C; 50</td>
<td>10M yuan &#x2264; Direct economic losses &#x003C; 50M yuan</td>
<td>3</td>
</tr>
<tr>
<td>General (IV)</td>
<td>2 h &#x2264; Interruption &#x003C; 6 h</td>
<td>Deaths &#x003C; 3 or serious injuries &#x003C; 10</td>
<td>500,000 yuan &#x2264; Direct economic losses &#x003C; 10M yuan</td>
<td>4</td>
</tr>
<tr>
<td>Minor (V)</td>
<td>Interruption &#x2264; 2 h</td>
<td>No casualties</td>
<td>Direct economic losses &#x003C; 500,000 yuan</td>
<td>5</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>GAIN-Based Data Interpolation Method</title>
<p>The issue of data missingness in URTE is complex, encompassing both random and non-random factors. GAIN is an unsupervised interpolation method, and the core idea is to use a generative adversarial network to simulate and learn the underlying distribution of event data to estimate and interpolate missing data more accurately. However, the traditional GAIN is prone to overfitting during the training process, which results in poor generalization capabilities of the model. Therefore, we propose an improved data imputation method that combines elastic network regularization (ER) and GAIN. The framework is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The framework of ER-GAIN</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-2.tif"/>
</fig>
<p>ER [<xref ref-type="bibr" rid="ref-41">41</xref>] is a machine learning method that combines L1 and L2 regularization techniques. It aims to perform variable selection and control model complexity by constructing an optimized loss function, thereby effectively preventing the model from overfitting. Specifically, L1 regularization (Lasso) encourages sparsity in the model parameters, which aids in feature selection, while L2 regularization (Ridge) controls the growth of model weights to prevent overfitting. The general form of the elastic net regularization loss function is shown as <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>where, <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the original loss function, such as mean square error or cross entropy loss. <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> are regularization parameters that control the strength of L1 and L2 regularization, respectively. <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the weight parameter of the model.</p>
<p>Applying elastic net regularization to GAIN involves reconstructing the loss functions of both the generator and the discriminator. For the generator, the goal is to learn more generalized feature representations while reducing the noise in the generated data. For the discriminator, the aim is to enhance its ability to generalize, thereby improving its capacity to distinguish between real and generated data. The reconstructed loss functions for both components are presented as <xref ref-type="disp-formula" rid="eqn-2">Eqs. (2)</xref> and <xref ref-type="disp-formula" rid="eqn-3">(3)</xref>:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>R</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03BB;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>where, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mrow><mml:mrow><mml:mi>&#x02112;</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represent the original loss functions of the generator and discriminator in the GAIN algorithm (Appendix A, <xref ref-type="disp-formula" rid="eqn-A3">Eqs. (A3)</xref> and <xref ref-type="disp-formula" rid="eqn-A4">(A4)</xref>). The terms <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denote the weight parameters of the ith unit of the generator and discriminator, respectively.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Emergency Prediction Method</title>
<p>Due to the inherent variability in the probability of occurrence of URTE, with certain equipment failure events being more frequent than major disasters, this leads to a significant imbalance in the URTE dataset with respect to event types and severity levels. In order to prevent this imbalance from affecting the performance of the prediction model, Prediction of URTE using AdaBoost [<xref ref-type="bibr" rid="ref-42">42</xref>] (<xref ref-type="fig" rid="fig-3">Fig. 3</xref>).</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Integrated prediction model for URTE</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-3.tif"/>
</fig>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>URTE Cause Analysis Method</title>
<p>The causal analysis of URTE aims to identify causes associated with emergency types. IG can serve as an indicator to evaluate the importance of a cause. This indicator is derived by measuring the reduction in uncertainty regarding the type of emergency given a specific cause. The calculation method is presented as <xref ref-type="disp-formula" rid="eqn-4">Eqs. (4)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-6">(6)</xref>:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>I</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>C</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msubsup><mml:mfrac><mml:mrow><mml:mo>|</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>|</mml:mo></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mi>S</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:mfrac><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where, <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>H</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>S</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represents the original entropy of the entire dataset, indicating the uncertainty of event types in the absence of any feature. <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>p</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> denotes the proportion of the <italic>i</italic>th type within the dataset, and <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>C</mml:mi></mml:math></inline-formula> is the total number of emergency types. <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is the conditional entropy for each cause, representing the uncertainty of event types given that cause. <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mi>S</mml:mi></mml:math></inline-formula> denotes the URTE dataset, while <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> refers to a subset under the condition of cause <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. The variable <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>m</mml:mi></mml:math></inline-formula> specifies the number of causes, and <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>S</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:math></inline-formula> is the total size of the dataset.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experimental Analysis</title>
<sec id="s3_1">
<label>3.1</label>
<title>URTE Dataset and Structuring</title>
<p>We manually collected text records from the Internet and URTS operational event reports, totaling 496 URTS event records worldwide from 1969 to 2021. After structuring the textual data using label coding, we statistically analyzed the events for key features and missing data, as shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Statistical results of emergencies data. <bold>(a)</bold> Key features; <bold>(b)</bold> Missing of the data</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-4.tif"/>
</fig>
<p>There is a highly uneven distribution of emergency data, with the most significant percentage of emergencies of equipment failure types and minor grades, while the number of events of other types and levels is low, and there is even a situation where no record exists. There are a large number of missing cases in the structured emergency data, especially the percentage of missing causes such as tunnel and line damage <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>11</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, insufficient safety equipment <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>12</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, foreign object invasion <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>16</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, etc., reaches more than 50%, and other causes also have different percentages of missing. To ensure the accuracy of the model, it is necessary to complete the missing data.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Interpolation of Structured Emergencies Data</title>
<p>This section employs the ER-GAIN proposed herein to impute missing values within structured emergency data. The generator and discriminator are trained using the methodology outlined in <xref ref-type="sec" rid="app-1">Appendix A</xref>. The training parameters were configured as follows: the training step size was set to 20,000, with <italic>&#x03BB;</italic><sub>1</sub> and <italic>&#x03BB;</italic><sub>2</sub> assigned the values of 0.05 and 0.5, respectively. The evolution of the loss for both models during the training process is depicted in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Variation of generator and discriminator losses during the training process</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-5.tif"/>
</fig>
<p>The loss of the generator gradually tends to stabilize during the training process, while the loss of the discriminator still has large fluctuations at the late stage of training. This characteristic shows that the complementary model has good missing data interpolation ability, so the discriminator cannot distinguish the generated data from the actual data well.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Emergencies Prediction and Result Analysis</title>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Tuning of Hyperparameters of Predictive Models</title>
<p>The number of weak classifiers and the learning rate significantly affect the prediction performance of AdaBoost. To make the optimal prediction performance of AdaBoost, it is necessary to adjust the number of weak classifiers and the learning rate of the model before training and to find a set of optimal values for the model hyperparameters.</p>
<p>The hyperparameter tuning scheme of grid search is utilized to find the optimal number of weak classifiers and model learning rate in a violently exhaustive manner. In the grid search process, the efficiency of the computational process is fully considered, and the training step size is set to be uniformly 30, the number of weak classifiers ranges from 1 to 30, and the model learning rate ranges from 0.1 to 1. The computational process of the optimal hyperparameter grid search is shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Grid search process for optimal hyperparameters of the prediction model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-6.tif"/>
</fig>
<p>As the number of weak classifiers increases, the accuracy of event perception shows an upward trend. When the number of weak classifiers reaches a number, the accuracy of the prediction instead shows a decreasing trend. Too small a learning rate will result in the model not being adequately trained, while too large a learning rate will result in the model not being able to converge, which in turn affects the accuracy of the model. According to the grid search results, the highest model accuracy (88.18%) was achieved by choosing 18 for the number of weak classifiers and 0.6 for the learning rate.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Indicators for Model Assessment</title>
<p>To effectively evaluate the performance of the predictive model and other benchmark models, Accuracy, F1-score, Precision, Recall, and AUC (Area Under Curve) are selected to evaluate the model comprehensively, these evaluation indicators are shown in <xref ref-type="table" rid="table-4">Table 4</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Calculation formula and representation of each assessment indicator</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Indicators</th>
<th>Formulas</th>
<th>Significance</th>
</tr>
</thead>
<tbody>
<tr>
<td>Accuracy</td>
<td><inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mrow><mml:mtext>Accuracy</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>The model&#x2019;s ability to perceive positive and negative samples</td>
</tr>
<tr>
<td>Precision</td>
<td><inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mrow><mml:mtext>Precision</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>The model&#x2019;s ability to perceive positive samples</td>
</tr>
<tr>
<td>Recall rate</td>
<td><inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>The ability of the model to check for positive samples</td>
</tr>
<tr>
<td>F1-score</td>
<td><inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext>score</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mrow><mml:mtext>Precision</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mtext>Precision</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mfrac></mml:mstyle></mml:math></inline-formula></td>
<td>The overall performance of the model</td>
</tr>
<tr>
<td>AUC</td>
<td><inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mrow><mml:mtext>AUC</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi>N</mml:mi></mml:mfrac></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mo>&#x222B;</mml:mo><mml:mrow><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:msub><mml:mrow><mml:mtext>ROC</mml:mtext></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>f</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mi>f</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td>The performance of the model at different thresholds</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Comparative Analysis of Experimental Results</title>
<p>The above hyperparameter scheme are used to train the Adaboost. The dataset is divided into the training and test set according to the ratio of 8:2. The prediction results is shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Prediction results. <bold>(a)</bold> Event types; <bold>(b)</bold> Event levels</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-7.tif"/>
</fig>
<p>The confusion matrix derived from the predictive outcomes reveals that the model exhibits greater accuracy in predicting event types and levels with larger sample sizes, achieving an accuracy rate of over 85% in most cases. However, for events such as natural disasters and terrorist attacks, which are characterized by a scarcity of samples, the model&#x2019;s training is insufficient, thereby adversely affecting the accuracy of its predictions for these types of incidents. Different thresholds are set to plot the ROC curves of the perceptual model in terms of both event types and event levels, and the results are shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Prediction ROC curve. <bold>(a)</bold> Event types; <bold>(b)</bold> Event levels</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-8.tif"/>
</fig>
<p>From the figure, in the prediction of event types, the AUC of the two types of equipment failure and power outage are more than 0.9, and the AUC of other types are above 0.7; in the perception of event level, the AUC of level V is more than 0.9, and the AUC of other types are above 0.75. It proves that the perception model was effectively trained, and the model validation by adding the test set still has a good perception effect, which proves that the model has been balanced.</p>
<p>After verifying the performance of the data-completion model and the prediction model, the commonly used machine learning algorithms are selected as the benchmark perceptual model, and a comprehensive comparative analysis is conducted with different data-completion datasets to verify the superiority of the perceptual model in this paper. The benchmark models include Naive Bayes Classification (NBC) [<xref ref-type="bibr" rid="ref-43">43</xref>], SVM [<xref ref-type="bibr" rid="ref-27">27</xref>], and Artificial Neural Networks (ANN) [<xref ref-type="bibr" rid="ref-44">44</xref>], and the data completion methods include MICE, KNN, and ER-GAIN. The comparative results of the performance of the different models in the prediction of event types and levels are shown in <xref ref-type="table" rid="table-5">Tables 5</xref> and <xref ref-type="table" rid="table-6">6</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Prediction performance comparison of different models in event types</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Perception model</th>
<th>Interpolation model</th>
<th colspan="5">Evaluating indicator</th>
</tr>
<tr>
<th/>
<th/>
<th>Accuracy</th>
<th>Precision</th>
<th>Recall</th>
<th>F1-Score</th>
<th>AUC</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">NBC [<xref ref-type="bibr" rid="ref-43">43</xref>]</td>
<td>None</td>
<td>0.7489</td>
<td>0.7524</td>
<td>0.7489</td>
<td>0.7533</td>
<td>0.7488</td>
</tr>
<tr>
<td>ER-GAIN</td>
<td>0.7665</td>
<td>0.7647</td>
<td>0.7665</td>
<td>0.7594</td>
<td>0.7558</td>
</tr>
<tr>
<td rowspan="2">SVM [<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>None</td>
<td>0.7642</td>
<td>0.7547</td>
<td>0.7642</td>
<td>0.7716</td>
<td>0.7581</td>
</tr>
<tr>
<td>ER-GAIN</td>
<td>0.7741</td>
<td>0.7615</td>
<td>0.7741</td>
<td>0.7664</td>
<td>0.7767</td>
</tr>
<tr>
<td rowspan="2">ANN [<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td>None</td>
<td>0.7817</td>
<td>0.7891</td>
<td>0.7817</td>
<td>0.7864</td>
<td>0.7881</td>
</tr>
<tr>
<td>ER-GAIN</td>
<td>0.7849</td>
<td>0.7861</td>
<td>0.7849</td>
<td>0.7805</td>
<td>0.7932</td>
</tr>
<tr>
<td rowspan="4">AdaBoost [<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td>None</td>
<td>0.7916</td>
<td>0.7907</td>
<td>0.7916</td>
<td>0.8011</td>
<td>0.7989</td>
</tr>
<tr>
<td>KNN</td>
<td>0.8047</td>
<td>0.7968</td>
<td>0.8047</td>
<td>0.8004</td>
<td>0.8141</td>
</tr>
<tr>
<td>MICE</td>
<td>0.7942</td>
<td>0.7903</td>
<td>0.7942</td>
<td>0.7912</td>
<td>0.8171</td>
</tr>
<tr>
<td>ER-GAIN (proposed)</td>
<td><bold>0.8283</bold></td>
<td><bold>0.8273</bold></td>
<td><bold>0.8283</bold></td>
<td><bold>0.8278</bold></td>
<td><bold>0.8367</bold></td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Prediction performance comparison of different models in event levels</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Prediction model</th>
<th>Completion model</th>
<th colspan="5">Evaluating indicator</th>
</tr>
<tr>
<th/>
<th/>
<th>Accuracy</th>
<th>Precision</th>
<th>Recall</th>
<th>F1-Score</th>
<th>AUC</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="2">NBC [<xref ref-type="bibr" rid="ref-43">43</xref>]</td>
<td>None</td>
<td>0.8108</td>
<td>0.8163</td>
<td>0.8108</td>
<td>0.8164</td>
<td>0.7869</td>
</tr>
<tr>
<td>ER-GAIN</td>
<td>0.8291</td>
<td>0.8299</td>
<td>0.8291</td>
<td>0.8288</td>
<td>0.798</td>
</tr>
<tr>
<td rowspan="2">SVM [<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>None</td>
<td>0.8314</td>
<td>0.8418</td>
<td>0.8314</td>
<td>0.8371</td>
<td>0.8169</td>
</tr>
<tr>
<td>ER-GAIN</td>
<td>0.8476</td>
<td>0.8486</td>
<td>0.8476</td>
<td>0.8494</td>
<td>0.8243</td>
</tr>
<tr>
<td rowspan="2">ANN [<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td>None</td>
<td>0.827</td>
<td>0.837</td>
<td>0.827</td>
<td>0.8336</td>
<td>0.8301</td>
</tr>
<tr>
<td>ER-GAIN</td>
<td>0.8463</td>
<td>0.85</td>
<td>0.8463</td>
<td>0.8583</td>
<td>0.845</td>
</tr>
<tr>
<td rowspan="4">AdaBoost [<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td>None</td>
<td>0.841</td>
<td>0.8473</td>
<td>0.841</td>
<td>0.8414</td>
<td>0.816</td>
</tr>
<tr>
<td>KNN</td>
<td>0.8518</td>
<td>0.8661</td>
<td>0.8518</td>
<td>0.8601</td>
<td>0.8397</td>
</tr>
<tr>
<td>MICE</td>
<td>0.8496</td>
<td>0.8571</td>
<td>0.8496</td>
<td>0.855</td>
<td>0.8316</td>
</tr>
<tr>
<td>ER-GAIN (proposed)</td>
<td><bold>0.8788</bold></td>
<td><bold>0.8814</bold></td>
<td><bold>0.8788</bold></td>
<td><bold>0.8801</bold></td>
<td><bold>0.8704</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>From the comparison results in the above table, after adopting the data complemented by the ER-GAIN, the performance of each model for the event types and levels is improved. From the comparison results in the above table, after adopting the data complemented by the ER-GAIN, the performance of each model for the event types and levels is improved. The model proposed in this study compared to other models under the same conditions, and it has improved in multiple indicators such as accuracy, precision, recall, F1 score, and AUC. For predicting event types, it has increased by 2.36%, 3.05%, 2.36%, 2.67%, and 1.96%, respectively; The prediction of event levels has increased by 2.7%, 1.53%, 2.7%, 2.0%, and 2.54%, respectively. This proves the effectiveness of ER-GAIN for incomplete emergency data, which improves the validity and completeness of the data. In addition, the AdaBoost has a higher percentage of improvement in the perceptual performance of the data complemented using the ER-GAIN compared to the other benchmark models.</p>
</sec>
<sec id="s3_3_4">
<label>3.3.4</label>
<title>Real Scene Testing</title>
<p>Two scenarios (<xref ref-type="table" rid="table-7">Table 7</xref>) are set up to test the trained emergencies perception model to predict the features of emergencies that may occur in that scenario.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Causation of two real scenarios</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th></th>
<th><inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">8</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>$</mml:mo></mml:mrow></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">9</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>$</mml:mo></mml:mrow></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">10</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">11</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">12</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">13</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">14</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">15</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">16</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">17</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">18</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:msub><mml:mi mathvariant="bold-italic">r</mml:mi><mml:mrow><mml:mn mathvariant="bold">19</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></th>
</tr>
</thead>
<tbody>
<tr>
<td>Scene 1</td>
<td>1</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0.5</td>
<td>1</td>
<td>0.3</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>Scene 2</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0.5</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The prediction results for these two scenarios are shown in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Prediction results for two scenarios. <bold>(a)</bold> Event type of Scenario 1; <bold>(b)</bold> Event type of Scenario 2; <bold>(c)</bold> Event level of Scenario 1; <bold>(d)</bold> Event level of Scenario 2</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-9.tif"/>
</fig>
<p>A Level II/III Driving/Equipment failure accident is most likely to occur in Scenario 1, and a Level IV/IV Fire/Power failure accident is most likely to occur in Scenario 2. This prediction result is highly compatible with the historical case data and widely recognized by experts in the field. Leveraging the predictive outcomes enables practitioners to conduct thorough accident analysis and implement corresponding remedial measures. For example, by utilizing geographical information such as latitude and longitude, they can identify accident-prone stations or areas. Consequently, more human resources and funding can be allocated to these locations. This allows for enhanced early warning systems, improved risk management strategies, and more effective maintenance of operating trains and tracks. As demonstrated by Singapore&#x2019;s Intelligent Transport System (ITS) [<xref ref-type="bibr" rid="ref-45">45</xref>], which utilizes real-time traffic information and data analytics to predict traffic events and improve public transportation, such an approach can significantly enhance the safety and efficiency of urban rail transit systems.</p>
</sec>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>URTE Cause Analysis</title>
<p>This section employs the IG method to analyze the key causes of URTE. As shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, the primary types of Level I and II catastrophic URTE are traffic accidents and fires. The URTE dataset was filtered for these two types, and the IG method was utilized to calculate the importance of each cause. The results are illustrated in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Ranking of the importance of the causes of driving and fire accidents. <bold>(a)</bold> Driving; <bold>(b)</bold> Fire</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63208-fig-10.tif"/>
</fig>
<p>The results indicate that significant driving and fire accidents primarily arise from a combination of Human and Machine factors. In driving accidents, Machine factors are relatively more prominent, whereas in fires, human factors have a greater influence. Unsafe human behaviors, human errors, and power system failures are the most critical factors contributing to these accidents. By conducting an in-depth analysis and implementing effective management of these key factors, we can significantly reduce the occurrence of significant driving and fire accidents, thereby safeguarding lives and property.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Discussion</title>
<sec id="s4_1">
<label>4.1</label>
<title>Model Limitations</title>
<p>The data structuring component of our model is intricately tailored to the unique characteristics of the URTS, leveraging the expertise of domain-specific professionals. URTS operates with a distinctive mode, featuring specialized equipment architecture and a specific logic for data generation. This unique setup allows our data structuring approach to precisely extract and integrate the nuanced features of data pertinent to URTS. However, this very specificity poses a challenge when considering the model&#x2019;s application beyond the URTS domain. The data structuring methodology, rooted in the experience of experts familiar with the intricacies of urban rail transit, is not readily transferable to other fields. This is primarily due to the substantial disparities in data sources, formats, and the embedded business logic that vary significantly across different domains.</p>
<p>To extend the model&#x2019;s applicability, we can achieve this by the following methods. Firstly, the framework that will be the subject of this research can be modularized and divided into domain-specific rules and directly generalizable algorithms. This modularity allows us to redefine the domain-specific rules. Secondly, for other domains with different data formats, such as GPS data in transport or sensor data in industrial systems. Flexible data preprocessing methods can be developed to enable the conversion of metadata to standardized data. Finally, by collaborating with relevant experts, domain-specific knowledge can be integrated and data coding schemes can be modified to enable the migration of this research to other domains.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Data Limitations</title>
<p>This study has certain limitations in terms of data. When constructing the interannual analysis framework, although the initial integration of rail transit data between 1969 and 2021 was achieved through standardized processing, it did fail to fully incorporate the characteristics of the evolution of the urban rail transit system over time in terms of technological standards, operation modes, etc. due to the differences in statistical calibers and the absence of key indicators in the early historical data. To address the limitations of the data in this regard, we can develop data preprocessing methods based on backward calibration techniques. In the case of passenger flow data, for example, by establishing a baseline equivalence between historical manual statistical passenger flow data and modern automated statistical passenger flow data, and using the overlapping validation period (1995&#x2013;2005) in which the two data types co-exist to determine the conversion factor, we can then achieve the alignment of the different standard data.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>URTS confronts significant operational risks due to the abrupt and unpredictable nature of URTE. The inherent characteristics of these emergencies result in URTE data that are often unstructured, incomplete, and imbalanced. This study, therefore, presents an integrated incident perception model that encompasses data structuring, completion, emergency prediction, and causal analysis. Initially, a tagging encoding scheme was developed through literature synthesis and expert consultation, thereby structuring the unstructured textual URTE data. Subsequently, to address the incompleteness in URTE data and prevent biases in subsequent models, an improved GAIN method with elastic network regularization was proposed, achieving the completion of incomplete data. Furthermore, to counteract the negative impact of class and grade imbalances in URTS data on model predictions, an EL approach was employed to construct an incident prediction model, enriching the decision boundaries of the perception model and enhancing its predictive performance.</p>
<p>To substantiate the efficacy of this study, the proposed integrated perception model underwent rigorous experimental analysis, with various machine learning methods serving as benchmark models for comparison. The results demonstrate that the ER-GAIN data completion method not only optimizes statistical metrics but also enhances the predictive performance across all models. Moreover, the AdaBoost model based on the EL outperforms other benchmark models in predicting emergencies. These findings validate the effectiveness of both the data completion method and the emergency prediction model. A causal analysis was conducted on the two most severe types of accidents: train operations and fires. By IG index, key causes for these accidents were identified, and their causal coupling mechanisms were revealed. This study provides a scientific foundation for the development of targeted safety prevention measures, the refinement of emergency management strategies, and the enhancement of the overall safety performance of URTS.</p>
<p>Future work will focus on several key areas to further enhance the safety and efficiency of URTS. Firstly, expanding the dataset to include diverse scenarios and emerging risk factors will improve model robustness and adaptability. Additionally, integrating real-time data feeds into the incident prediction model could facilitate proactive management and quicker response times.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank the editors and reviewers for their valuable work.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This research was supported by the Fundamental Research Funds for the Central Universities (grant number 2024YJS096), and National Natural Science Foundation of China (grant numbers 62433005, 62272036, 62173167).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>Liang Mu: Conceptualization, Methodology, Software, Writing&#x2014;original draft. Yurui Kang: Writing&#x2014;review &#x0026; editing. Zixu Yan: Data curation, Visualization. Guangyu Zhu: Supervision and review. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The authors confirm that the data supporting the findings of this study are available within <ext-link ext-link-type="uri" xlink:href="https://doi.org/10.6084/m9.figshare.27021613.v1">https://doi.org/10.6084/m9.figshare.27021613.v1</ext-link>.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<app-group id="apg-1">
<app id="app-1"><label>Appendix A</label>
<title></title>
<p>The GAIN interpolation process for burst incomplete data is as follows:
<list list-type="simple">
<list-item><label>(1)</label><p>Suppose <bold><italic>X</italic></bold> is the original data matrix, where some of the data (for example <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) are missing values. Define <bold><italic>M</italic></bold> as a binary mask matrix of the same size as <bold><italic>X</italic></bold>, which is used to indicate whether a certain data is missing or not: if <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is missing data, then <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula>; otherwise, <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:msub><mml:mi>m</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>; meanwhile, define <bold><italic>Z</italic></bold> as a random noise matrix of the same size as <bold><italic>X</italic></bold>, which is used to assist the generator in generating interpolated data; and blurring the binary mask matrix <bold><italic>M</italic></bold> to generate the cueing matrix <italic>H</italic>. Specifically, a 1 (which indicates that the data is intact) in <bold><italic>M</italic></bold> has a certain probability is set to 0.5, while a 0 in <bold><italic>M</italic></bold> (indicating that the data is missing) likewise has a certain probability of being set to 0.5.</p></list-item>
<list-item><label>(2)</label><p>The Generator tries to estimate the values of the missing data based on the original data matrix <bold><italic>X</italic></bold>, the mask matrix <bold><italic>M</italic></bold>, and the random noise matrix <bold><italic>Z</italic></bold> and outputs the complete data matrix <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> as shown in the following equation:
<disp-formula id="eqn-A1"><label>(A1)</label><mml:math id="mml-eqn-A1" display="block"><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">Z</mml:mi><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
 <italic>G</italic> stands for the generator function, which is a fully connected multilayer feed-forward neural network used to learn the distribution of the original data and generate missing values. The generator&#x2019;s hidden layer usually uses the ReLU (Rectified Linear Unit) activation function, and its output layer usually uses the Softmax activation function.</p></list-item>
<list-item><label>(3)</label><p>The discriminator receives the output <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> of the generator, and the cue matrix <bold><italic>H</italic></bold> as input and tries to determine whether each value in the data matrix is actual data or filler data generated by the generator and calculates the probability that each value is actual data as shown in the following equation:
<disp-formula id="eqn-A2"><label>(A2)</label><mml:math id="mml-eqn-A2" display="block"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mi>s</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
 <italic>D</italic> stands for the discriminator function, a fully connected multilayer feedforward neural network that distinguishes actual data from generator-generated artificial data. The hidden layer of the discriminator usually uses the ReLU activation function, and its output layer usually uses the Sigmoid activation function.</p></list-item>
<list-item><label>(4)</label><p>In GAIN, two loss functions train the Generator and Discriminator, respectively. In this case, the loss functions for the Discriminator and Generator are <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, respectively, as shown in the following equation:
<disp-formula id="eqn-A3"><label>(A3)</label><mml:math id="mml-eqn-A3" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="eqn-A4"><label>(A4)</label><mml:math id="mml-eqn-A4" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>
The objective function of GAIN is:
<disp-formula id="eqn-A5"><label>(A5)</label><mml:math id="mml-eqn-A5" display="block"><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:munder><mml:munder><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:munder><mml:mi>V</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:mi>G</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mi>ln</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mover><mml:mi mathvariant="bold-italic">X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
</list-item>
<list-item><label>(5)</label><p>In the training process of GAIN, the objective function contains the mathematical expectation <italic>E</italic>. The objective of <italic>G</italic> is to minimize the function, while the objective of <italic>D</italic> is to maximize the function. Through adversarial training, the two models gradually reach a Nash equilibrium, at which point the <italic>D</italic> cannot accurately identify whether the input data is real data or artificially interpolated data generated by the <italic>G</italic>.</p></list-item>
</list></p>
</app>
</app-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>SS-D</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>EQ</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Two-Stage OD flow prediction for emergency in urban rail transit</article-title>. <source>IEEE Trans Intell Transp Syst</source>. <year>2024</year>;<volume>25</volume>(<issue>1</issue>):<fpage>920</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TITS.2023.3235413</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>M</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ning</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Parallel urban rail transit stations for passenger emergency management</article-title>. <source>IEEE Intell Syst</source>. <year>2020</year>;<volume>35</volume>(<issue>6</issue>):<fpage>16</fpage>&#x2013;<lpage>27</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MIS.2019.2963192</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Irizarry</surname> <given-names>J</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>W</given-names></string-name></person-group>. <article-title>A network-based approach to modeling safety accidents and causations within the context of subway construction project management</article-title>. <source>Saf Sci</source>. <year>2021</year>;<volume>139</volume>(<issue>10</issue>):<fpage>105261</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2021.105261</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xue</surname> <given-names>G</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>L</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>D</given-names></string-name></person-group>. <article-title>A data aggregation-based spatiotemporal model for rail transit risk path forecasting</article-title>. <source>Reliab Eng Syst Safe</source>. <year>2023</year>;<volume>239</volume>(<issue>2</issue>):<fpage>109530</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2023.109530</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xue</surname> <given-names>G</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gong</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Identifying abnormal riding behavior in urban rail transit: a survey on &#x201C;in-out&#x201D; in the same subway station</article-title>. <source>IEEE Trans Intell Transp Syst</source>. <year>2020</year>;<volume>23</volume>(<issue>4</issue>):<fpage>3201</fpage>&#x2013;<lpage>13</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TITS.2020.3032843</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>EQ</given-names></string-name>, <string-name><surname>Law</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Extraction of emergency elements and business process model of urban rail transit plans</article-title>. <source>IEEE Trans Comput Soc Syst</source>. <year>2024</year>;<volume>11</volume>(<issue>2</issue>):<fpage>1744</fpage>&#x2013;<lpage>52</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TCSS.2023.3235338</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Relationship extraction method for urban rail transit operation emergencies records</article-title>. <source>IEEE Trans Intell Veh</source>. <year>2023</year>;<volume>8</volume>(<issue>1</issue>):<fpage>520</fpage>&#x2013;<lpage>30</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIV.2022.3160502</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jia</surname> <given-names>BB</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>ML</given-names></string-name></person-group>. <article-title>Multi-dimensional classification via decomposed label encoding</article-title>. <source>IEEE Trans Knowl Data Eng</source>. <year>2021</year>;<volume>35</volume>(<issue>2</issue>):<fpage>1844</fpage>&#x2013;<lpage>56</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TKDE.2021.3100436</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>X</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>C-G</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>R</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>EMU: effective multi-hot encoding net for lightweight scene text recognition with a large character set</article-title>. <source>IEEE Trans Circuits Syst Video Technol</source>. <year>2022</year>;<volume>32</volume>(<issue>8</issue>):<fpage>5374</fpage>&#x2013;<lpage>85</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TCSVT.2022.3146240</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guo</surname> <given-names>S</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Process monitoring and fault prediction in multivariate time series using bag-of-words</article-title>. <source>IEEE Trans Autom Sci Eng</source>. <year>2020</year>;<volume>19</volume>(<issue>1</issue>):<fpage>230</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TASE.2020.3026065</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Geng</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Dynamic uncertain causality graph applied to dynamic fault diagnoses of large and complex systems</article-title>. <source>IEEE Trans Reliab</source>. <year>2015</year>;<volume>64</volume>(<issue>3</issue>):<fpage>910</fpage>&#x2013;<lpage>27</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TR.2015.2416332</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Sheng</surname> <given-names>K</given-names></string-name>, <string-name><surname>Niu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Chu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>M</given-names></string-name>, <string-name><surname>Jia</surname> <given-names>L</given-names></string-name></person-group>. <article-title>A comprehensive analysis method of urban rail transit operation accidents and safety management strategies based on text big data</article-title>. <source>Saf Sci</source>. <year>2024</year>;<volume>172</volume>(<issue>1</issue>):<fpage>106400</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2023.106400</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bertsimas</surname> <given-names>D</given-names></string-name>, <string-name><surname>Pawlowski</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zhuo</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>From predictive methods to missing data imputation: an optimization approach</article-title>. <source>J Mach Learn Res</source>. <year>2018</year>;<volume>18</volume>(<issue>196</issue>):<fpage>1</fpage>&#x2013;<lpage>39</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Miao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>L</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yin</surname> <given-names>J</given-names></string-name></person-group>. <article-title>An experimental survey of missing data imputation algorithms</article-title>. <source>Trans Knowl Data Eng</source>. <year>2023</year>;<volume>35</volume>(<issue>7</issue>):<fpage>6630</fpage>&#x2013;<lpage>50</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TKDE.2022.3186498</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Henrickson</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Flexible and robust method for missing loop detector data imputation</article-title>. <source>Transp Res Rec</source>. <year>2015</year>;<volume>2527</volume>(<issue>1</issue>):<fpage>29</fpage>&#x2013;<lpage>36</lpage>. doi:<pub-id pub-id-type="doi">10.3141/2527-04</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Su</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>Distributed nonparametric regression imputation for missing response problems with large-scale data</article-title>. <source>J Mach Learn Res</source>. <year>2023</year>;<volume>24</volume>(<issue>68</issue>):<fpage>1</fpage>&#x2013;<lpage>52</lpage>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Karmitsa</surname> <given-names>N</given-names></string-name>, <string-name><surname>Taheri</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bagirov</surname> <given-names>A</given-names></string-name>, <string-name><surname>M&#x00E4;kinen</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Missing value imputation via clusterwise linear regression</article-title>. <source>IEEE Trans Knowl Data Eng</source>. <year>2022</year>;<volume>34</volume>(<issue>4</issue>):<fpage>1889</fpage>&#x2013;<lpage>901</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TKDE.2020.3001694</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>L</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Missing data estimation method for time series data in structure health monitoring systems by probability principal component analysis</article-title>. <source>Adv Eng Softw</source>. <year>2020</year>;<volume>149</volume>(<issue>2</issue>):<fpage>102901</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.advengsoft.2020.102901</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Seoane</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Abreu</surname> <given-names>PH</given-names></string-name>, <string-name><surname>Fern&#x00E1;ndez</surname> <given-names>A</given-names></string-name>, <string-name><surname>Luengo</surname> <given-names>J</given-names></string-name>, <string-name><surname>Santos</surname> <given-names>J</given-names></string-name></person-group>. <article-title>The impact of heterogeneous distance functions on missing data imputation and classification performance</article-title>. <source>Eng Appl Artif Intel</source>. <year>2022</year>;<volume>111</volume>(<issue>3</issue>):<fpage>104791</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2022.104791</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Al-Helali</surname> <given-names>B</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>B</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A new imputation method based on genetic programming and weighted KNN for symbolic regression with incomplete data</article-title>. <source>Soft Comput</source>. <year>2021</year>;<volume>25</volume>(<issue>8</issue>):<fpage>5993</fpage>&#x2013;<lpage>6012</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00500-021-05590-y</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lyngdoh</surname> <given-names>GA</given-names></string-name>, <string-name><surname>Zaki</surname> <given-names>M</given-names></string-name>, <string-name><surname>Krishnan</surname> <given-names>NMA</given-names></string-name>, <string-name><surname>Das</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Prediction of concrete strengths enabled by missing data imputation and interpretable machine learning</article-title>. <source>Cem Concr Comp</source>. <year>2022</year>;<volume>128</volume>:<fpage>104414</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cemconcomp.2022.104414</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>A Bayesian vector autoregression-based data analytics approach to enable irregularly-spaced mixed-frequency traffic collision data imputation with missing values</article-title>. <source>Transp Res C-Emer</source>. <year>2019</year>;<volume>108</volume>:<fpage>302</fpage>&#x2013;<lpage>19</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.trc.2019.09.013</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yoon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jordon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Schaar</surname> <given-names>M</given-names></string-name></person-group>. <article-title>GAIN: missing data imputation using generative adversarial nets</article-title>. In: <conf-name>Proceedings of the 35th International Conference on Machine Learning</conf-name>; <year>2018 Jul 10&#x2013;15</year>; <publisher-loc>Stockholm, Sweden</publisher-loc>. p. <fpage>5689</fpage>&#x2013;<lpage>98</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Deep learning versus conventional methods for missing data imputation: a review and comparative study</article-title>. <source>Expert Syst Appl</source>. <year>2023</year>;<volume>227</volume>(<issue>87</issue>):<fpage>120201</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2023.120201</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bernardini</surname> <given-names>M</given-names></string-name>, <string-name><surname>Doinychko</surname> <given-names>A</given-names></string-name>, <string-name><surname>Romeo</surname> <given-names>L</given-names></string-name>, <string-name><surname>Frontoni</surname> <given-names>E</given-names></string-name>, <string-name><surname>Amini</surname> <given-names>M-R</given-names></string-name></person-group>. <article-title>A novel missing data imputation approach based on clinical conditional Generative Adversarial Networks applied to EHR datasets</article-title>. <source>Comput Biol Med</source>. <year>2023</year>;<volume>163</volume>(<issue>W1</issue>):<fpage>107188</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2023.107188</pub-id>; <pub-id pub-id-type="pmid">37393785</pub-id></mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zheng</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Tolliver</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Decision tree approach to accident prediction for highway-rail grade crossings: empirical analysis</article-title>. <source>Transp Res Rec</source>. <year>2016</year>;<volume>2545</volume>(<issue>1</issue>):<fpage>115</fpage>&#x2013;<lpage>22</lpage>. doi:<pub-id pub-id-type="doi">10.3141/2545-12</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jha</surname> <given-names>AN</given-names></string-name>, <string-name><surname>Chatterjee</surname> <given-names>N</given-names></string-name>, <string-name><surname>Tiwari</surname> <given-names>G</given-names></string-name></person-group>. <article-title>A performance analysis of prediction techniques for impacting vehicles in hit-and-run road accidents</article-title>. <source>Accid Anal Prev</source>. <year>2021</year>;<volume>157</volume>(<issue>24</issue>):<fpage>106164</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aap.2021.106164</pub-id>; <pub-id pub-id-type="pmid">33957476</pub-id></mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Dynamic fire risk classification prediction of stadiums: multi-dimensional machine learning analysis based on intelligent perception</article-title>. <source>Appl Sci</source>. <year>2022</year>;<volume>12</volume>(<issue>13</issue>):<fpage>6607</fpage>. doi:<pub-id pub-id-type="doi">10.3390/app12136607</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Saat</surname> <given-names>MR</given-names></string-name>, <string-name><surname>Barkan</surname> <given-names>CPL</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Freight-train derailment rates for railroad safety and risk analysis</article-title>. <source>Accid Anal Prev</source>. <year>2017</year>;<volume>98</volume>(<issue>229</issue>):<fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aap.2016.09.012</pub-id>; <pub-id pub-id-type="pmid">27676241</pub-id></mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Skitmore</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>An incident database for improving metro safety: the case of Shanghai</article-title>. <source>Saf Sci</source>. <year>2016</year>;<volume>84</volume>(<issue>3</issue>):<fpage>88</fpage>&#x2013;<lpage>96</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2015.11.023</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ganaie</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Malik</surname> <given-names>AK</given-names></string-name>, <string-name><surname>Tanveer</surname> <given-names>M</given-names></string-name>, <string-name><surname>Suganthan</surname> <given-names>PN</given-names></string-name></person-group>. <article-title>Ensemble deep learning: a review</article-title>. <source>Eng Appl Artif Intel</source>. <year>2022</year>;<volume>115</volume>:<fpage>105151</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2022.105151</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dong</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>A survey on ensemble learning</article-title>. <source>Front Comput Sci</source>. <year>2020</year>;<volume>14</volume>(<issue>2</issue>):<fpage>241</fpage>&#x2013;<lpage>58</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11704-019-8208-z</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mohammed</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kora</surname> <given-names>R</given-names></string-name></person-group>. <article-title>A comprehensive review on ensemble deep learning: opportunities and challenges</article-title>. <source>J King Saud Univ-Comput Inf Sci</source>. <year>2023</year>;<volume>35</volume>(<issue>2</issue>):<fpage>757</fpage>&#x2013;<lpage>74</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.jksuci.2023.01.014</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Meng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tong</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>G</given-names></string-name>, <string-name><surname>Ji</surname> <given-names>W</given-names></string-name>, <string-name><surname>Hei</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Railway accident prediction strategy based on ensemble learning</article-title>. <source>Accid Anal Prev</source>. <year>2022</year>;<volume>176</volume>(<issue>6</issue>):<fpage>106817</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aap.2022.106817</pub-id>; <pub-id pub-id-type="pmid">36057162</pub-id></mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Yoo</surname> <given-names>CK</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Soft sensor for predicting indoor PM2.5 concentration in subway with adaptive boosting deep learning model</article-title>. <source>J Hazard Mater</source>. <year>2024</year>;<volume>465</volume>(<issue>24</issue>):<fpage>133074</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.jhazmat.2023.133074</pub-id>; <pub-id pub-id-type="pmid">38029591</pub-id></mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bjerga</surname> <given-names>T</given-names></string-name>, <string-name><surname>Aven</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zio</surname> <given-names>E</given-names></string-name></person-group>. <article-title>Uncertainty treatment in risk analysis of complex systems: the cases of STAMP and FRAM</article-title>. <source>Reliab Eng Syst Safe</source>. <year>2016</year>;<volume>156</volume>(<issue>1</issue>):<fpage>203</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2016.08.004</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Salmon</surname> <given-names>PM</given-names></string-name>, <string-name><surname>Hulme</surname> <given-names>A</given-names></string-name>, <string-name><surname>Walker</surname> <given-names>GH</given-names></string-name>, <string-name><surname>Waterson</surname> <given-names>P</given-names></string-name>, <string-name><surname>Berber</surname> <given-names>E</given-names></string-name>, <string-name><surname>Stanton</surname> <given-names>NA</given-names></string-name></person-group>. <article-title>The big picture on accident causation: a review, synthesis and meta-analysis of AcciMap studies</article-title>. <source>Saf Sci</source>. <year>2020</year>;<volume>126</volume>(<issue>4</issue>):<fpage>104650</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ssci.2020.104650</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xie</surname> <given-names>X</given-names></string-name>, <string-name><surname>Shu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jia</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>J</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Accident causes data-driven coal and gas outburst accidents prevention: application of data mining and machine learning in accident path mining and accident case-based deduction</article-title>. <source>Process Saf Env</source>. <year>2022</year>;<volume>162</volume>:<fpage>891</fpage>&#x2013;<lpage>913</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.psep.2022.04.059</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>G</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>R</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Li</surname> <given-names>F</given-names></string-name>, <string-name><surname>Hou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Coupling effect and chain evolution of urban rail transit emergencies</article-title>. <source>IEEE Trans Intell Transp Syst</source>. <year>2024</year>;<volume>25</volume>(<issue>1</issue>):<fpage>1044</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TITS.2023.3283100</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="other"><article-title>National emergency response plan for emergencies in urban rail transit operations [Internet]. Beijing, China: General Office of the State Council</article-title>; <comment>2015 [ cited 2025 Jan 08]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.gov.cn/zhengce/content/2015-05/14/content_9751.htm">https://www.gov.cn/zhengce/content/2015-05/14/content_9751.htm</ext-link>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tay</surname> <given-names>JK</given-names></string-name>, <string-name><surname>Narasimhan</surname> <given-names>B</given-names></string-name>, <string-name><surname>Hastie</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Elastic net regularization paths for all generalized linear models</article-title>. <source>J Stat Softw</source>. <year>2023</year>;<volume>106</volume>. doi:<pub-id pub-id-type="doi">10.18637/jss.v106.i01</pub-id>; <pub-id pub-id-type="pmid">37138589</pub-id></mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>R&#x00E4;tsch</surname> <given-names>G</given-names></string-name>, <string-name><surname>Onoda</surname> <given-names>T</given-names></string-name>, <string-name><surname>M&#x00FC;ller</surname> <given-names>KR</given-names></string-name></person-group>. <article-title>Soft margins for AdaBoost</article-title>. <source>Mach Learn</source>. <year>2001</year>;<volume>42</volume>(<issue>3</issue>):<fpage>287</fpage>&#x2013;<lpage>320</lpage>. doi:<pub-id pub-id-type="doi">10.1023/A:1007618119488</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Frank</surname> <given-names>E</given-names></string-name>, <string-name><surname>Trigg</surname> <given-names>L</given-names></string-name>, <string-name><surname>Holmes</surname> <given-names>G</given-names></string-name>, <string-name><surname>Witten</surname> <given-names>IH</given-names></string-name></person-group>. <article-title>Naive Bayes for regression</article-title>. <source>Mach Learn</source>. <year>2000</year>;<volume>41</volume>(<issue>1</issue>):<fpage>5</fpage>&#x2013;<lpage>25</lpage>. doi:<pub-id pub-id-type="doi">10.1023/A:1007670802811</pub-id>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Development and application of artificial neural network</article-title>. <source>Wirel Pers Commun</source>. <year>2018</year>;<volume>102</volume>(<issue>2</issue>):<fpage>1645</fpage>&#x2013;<lpage>56</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11277-017-5224-x</pub-id>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="other"><article-title>Smart Nation: Singapore&#x2019;s Intelligent Transport System (ITS) [Internet]. Singapore: The ASEAN Post</article-title>; <comment>2018 [cited 2025 Jan 08]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://theaseanpost.com/article/smart-nation-singapores-intelligent-transport-system-its">https://theaseanpost.com/article/smart-nation-singapores-intelligent-transport-system-its</ext-link>.</mixed-citation></ref>
</ref-list>
</back></article>