<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">74009</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.074009</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An Integrated Attention-BiLSTM Approach for Probabilistic Remaining Useful Life Prediction</article-title>
<alt-title alt-title-type="left-running-head">An Integrated Attention-BiLSTM Approach for Probabilistic Remaining Useful Life Prediction</alt-title>
<alt-title alt-title-type="right-running-head">An Integrated Attention-BiLSTM Approach for Probabilistic Remaining Useful Life Prediction</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Zhu</surname><given-names>Bo</given-names></name><xref ref-type="author-notes" rid="afn1">#</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Dong</surname><given-names>Enzhi</given-names></name><xref ref-type="author-notes" rid="afn1">#</xref></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Cheng</surname><given-names>Zhonghua</given-names></name><xref rid="cor1" ref-type="corresp">&#x002A;</xref><email>a15032073178@sina.com</email></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Jiang</surname><given-names>Kexin</given-names></name></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Guo</surname><given-names>Chiming</given-names></name></contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Yue</surname><given-names>Shuai</given-names></name></contrib>
<aff id="aff-1"><institution>Shijiazhuang Campus of Army Engineering University of PLA</institution>, <addr-line>Shijiazhuang, 050003</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Zhonghua Cheng. Email: <email>a15032073178@sina.com</email></corresp>
<fn id="afn1">
<p><sup>#</sup>These authors contributed equally to this work</p>
</fn>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2026</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>10</day><month>2</month><year>2026</year>
</pub-date>
<volume>87</volume>
<issue>1</issue>
<elocation-id>38</elocation-id>
<history>
<date date-type="received">
<day>30</day>
<month>09</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>14</day>
<month>11</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2026 The Authors.</copyright-statement>
<copyright-year>2026</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_74009.pdf"></self-uri>
<abstract>
<p>Accurate prediction of remaining useful life serves as a reliable basis for maintenance strategies, effectively reducing both the frequency of failures and associated costs. As a core component of PHM, RUL prediction plays a crucial role in preventing equipment failures and optimizing maintenance decision-making. However, deep learning models often falter when processing raw, noisy temporal signals, fail to quantify prediction uncertainty, and face challenges in effectively capturing the nonlinear dynamics of equipment degradation. To address these issues, this study proposes a novel deep learning framework. First, a new bidirectional long short-term memory network integrated with an attention mechanism is designed to enhance temporal feature extraction with improved noise robustness. Second, a probabilistic prediction framework based on kernel density estimation is constructed, incorporating residual connections and stochastic regularization to achieve precise RUL estimation. Finally, extensive experiments on the C-MAPSS dataset demonstrate that our method achieves competitive performance in terms of RMSE and Score metrics compared to state-of-the-art models. More importantly, the probabilistic output provides a quantifiable measure of prediction confidence, which is crucial for risk-informed maintenance planning, enabling managers to optimize maintenance strategies based on a quantifiable understanding of failure risk.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Bidirectional long short-term memory network</kwd>
<kwd>attention mechanism</kwd>
<kwd>kernel density estimation</kwd>
<kwd>remaining useful life prediction</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>scientific research projects</funding-source>
<award-id>JY2024B011</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In modern industrial systems, Prognostics and Health Management (PHM) plays a crucial role in maintaining equipment functionality, reducing failure rates, and minimizing maintenance costs. By leveraging advanced sensing technologies, PHM collects operational data from equipment, which is then processed through techniques such as data analysis and information fusion to assess real-time health status and predict potential failures. These insights support managerial decision-making regarding maintenance actions [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>During operation, equipment is subject to performance degradation due to various external destructive factors, ultimately leading to failure. Research on Remaining Useful Life (RUL) focuses on analyzing performance degradation characteristics to forecast mechanical failures and prevent potential accidents. The total lifespan of equipment refers to the duration from its initial normal state through progressive degradation until functional failure occurs. RUL is defined as the expected operational time from a given inspection point until failure occurs, under the condition of no maintenance. RUL prediction models utilize degradation trajectories and integrate current and historical data to estimate the remaining operational life effectively. However, conventional techniques such as Convolutional Neural Networks (CNNs) and Long Short-Term Memory (LSTM) networks are often applied directly to raw signals that contain random noise, which can significantly impair their performance. Given the importance of temporal features, Bidirectional Long Short-Term Memory (BiLSTM) has been widely adopted in RUL prediction due to its superior capability in capturing temporal dependencies [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>]. As illustrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, which displays four randomly selected engine operation sequences from the C-MAPSS dataset, noticeable noise is commonly present in the raw temporal signals. Such noise induces substantial signal fluctuations and leads to highly degraded encoded sequence information, thereby reducing RUL prediction accuracy.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Raw time-series signals with widespread random noise</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-1.tif"/>
</fig>
<p>Although BiLSTM networks and attention mechanisms have been extensively explored, most existing studies only focus on achieving more accurate point predictions [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-6">6</xref>]. In contrast, this study aims to address the paradigm shift from point prediction to probabilistic distribution prediction. We would like to emphasize that the proposed Kernel Density Estimation (KDE) integration framework can convert multiple deterministic prediction points output by the deep learning model into an interpretable probability distribution. This constitutes a key innovation that distinguishes our work from previous efforts, which merely improved network structures to enhance point prediction accuracy.</p>
<p>To address these issues, this study proposes a deep network integrating BiLSTM with an attention mechanism and a prediction framework based on KDE. This combined approach enables comprehensive training and testing of real-time feature data from equipment degradation, thereby providing more scientific and accurate RUL predictions. This work seeks to promote the broader application of deep learning technologies in PHM and enhance equipment reliability and operational safety. By shifting the paradigm from deterministic point estimation to probabilistic risk assessment, the proposed framework equips managers with not only a predicted RUL but also a clear understanding of the associated uncertainty, thereby enabling more scientific and cost-effective maintenance decisions.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Addressing the complexity of industrial AI, machine learning has emerged as a vital tool for fault diagnosis and smart factory health management. Key contributions collectively underscore this trend: Raouf et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] provided a systemic overview of PHM in smart factories, while Kumar et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] reviewed algorithmic strategies for robotics and machinery. On the implementation front, Raouf et al. proposed a fault classification system using motor current analysis [<xref ref-type="bibr" rid="ref-9">9</xref>] and later a feature aggregation network for detecting bearing faults in industrial robots [<xref ref-type="bibr" rid="ref-10">10</xref>]. RUL prediction aims to estimate the operational lifespan remaining for critical components of equipment by analyzing degradation trajectories and leveraging time-series data from multiple sensors (e.g., vibration signals). Conventional RUL prediction approaches can be broadly categorized into two groups: physics-based models and data-driven methods [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>].</p>
<p>In recent years, deep learning-based methods have gained widespread adoption. These approaches treat the equipment as a black box, automatically learning high-level abstract features from raw time-series signals through multi-type and multi-dimensional neural network layers. This eliminates the need for manual feature engineering, making them particularly suitable for complex systems with high-dimensional and unstructured operational data. Common architectures include CNN-based, LSTM-based [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>], and TCN-based [<xref ref-type="bibr" rid="ref-13">13</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>] models. In practice, hybrid models combining multiple neural networks are often developed to enhance performance. For instance, Zha et al. [<xref ref-type="bibr" rid="ref-13">13</xref>] proposed an RUL prediction model integrating XGBoost for feature extraction and an improved temporal convolutional network. He et al. [<xref ref-type="bibr" rid="ref-16">16</xref>] developed a bearing RUL prediction model using a dual correlation adaptive gated graph convolutional network (DCAGGCN). Liao et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] introduced an LSTM-based feedforward neural network (LSTM-FNN) with a bootstrap method (LSTMBS) for uncertainty prediction in RUL estimation. Xiang et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] developed a multicellular LSTM (MCLSTM) structure based on layer-partitioned and multi-cell units, establishing a deep learning framework for RUL prediction. Chen et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] proposed a novel approach to improve the accuracy of bearing RUL prediction by employing a prototypical network to identify abrupt changes in health states. Yu et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] integrated similarity curve matching into a bidirectional recurrent network-based autoencoder scheme for RUL estimation. Zhang et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] devised a bidirectional gated recurrent unit with a temporal self-attention mechanism (BiGRU-TSAM), which assigns self-learned weights to each considered time instance for RUL prediction. Al-Dulaimi et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] presented a hybrid deep neural network (HDNN) framework, noted as the first to concurrently integrate two DL models in parallel.</p>
<p>Currently, numerous studies have integrated deep learning networks with attention mechanisms [<xref ref-type="bibr" rid="ref-23">23</xref>&#x2013;<xref ref-type="bibr" rid="ref-25">25</xref>]. Han et al. [<xref ref-type="bibr" rid="ref-26">26</xref>] proposed a novel parallel convolutional LSTM dual-attention (PCLD) model incorporating both long short-term memory networks and a dual-attention mechanism. Zhang et al. [<xref ref-type="bibr" rid="ref-27">27</xref>] combined slow feature analysis-assisted attention with a dual-LSTM network to enhance feature representation. Liu et al. [<xref ref-type="bibr" rid="ref-28">28</xref>] adopted a two-phase training strategy: the model is first pre-trained to capture general equipment operating states, and then fine-tuned on a single dataset to adapt to specific working conditions, while a sequential feature attention mechanism is employed to integrate multi-dimensional time-series data.</p>
<p>Despite advances in deep learning for RUL prediction, existing methods still struggle with two key challenges in real industrial settings: limited robustness under complex noise and fluctuating conditions, which obscures true degradation trends, and the inability to shift from deterministic point estimates to probabilistic uncertainty quantification, leaving prediction reliability unassessed.</p>
<p>To critically synthesize and analyze the work covered in existing literature, this study summarizes the differences between deterministic models, probabilistic models, and hybrid models in terms of core ideas, advantages, limitations, and representative methods, as presented in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Comparison of differences between different model types</title>
</caption>
<table>
<colgroup>
<col align="center" width="20mm"/>
<col align="center" width="50mm"/>
<col align="center" width="35mm"/>
<col align="center" width="35mm"/> </colgroup>
<thead>
<tr>
<th>Type</th>
<th>Core idea</th>
<th>Advantages</th>
<th>Limitations</th>
</tr>
</thead>
<tbody>
<tr>
<td>Deterministic model</td>
<td>A deterministic mapping function for single RUL point estimates</td>
<td>Strong feature learning for point prediction</td>
<td>Sensitive to noise; unable to quantify prediction uncertainty</td>
</tr>
<tr>
<td>Probabilistic model</td>
<td>Learning the probability distribution of RUL to characterize and quantify prediction uncertainty</td>
<td>Provides uncertainty information</td>
<td>Limited in complex temporal feature extraction and noise suppression</td>
</tr>
<tr>
<td>Hybrid model</td>
<td>Integrating the strong feature learning capability of deterministic models with the uncertainty quantification ability of probabilistic models</td>
<td>Balances prediction accuracy and uncertainty quantification</td>
<td>Higher computational complexity; requires careful fusion strategy design</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This paper proposes an integrated deep learning framework for the probabilistic prediction of RUL of mechanical equipment. At the conceptual level, this work shifts the paradigm from deterministic point estimation to probabilistic distribution prediction. The key innovation lies in integrating KDE to convert multiple deterministic model outputs into a complete and interpretable probability density function, thereby transforming the prediction from a single value into a quantified uncertainty. On the practical front, our method provides a full probability distribution that enables risk-informed decision-making, moving beyond a single point estimate of unknown reliability. This allows maintenance strategies to evolve from simplistic threshold-based triggers to dynamic, risk-based planning, offering significant practical value for optimizing operational economics and safety. The main innovations can be summarized as follows:</p>
<p><italic>An Attention- BiLSTM architecture</italic></p>
<p>The proposed Attention-BiLSTM architecture leverages bidirectional layers to capture both historical and future context, overcoming the limitation of unidirectional models in learning long-term, bidirectional dependencies. An integrated additive attention mechanism automatically prioritizes critical degradation phases while suppressing irrelevant noise.</p>
<p><italic>A probabilistic prediction framework based on KDE</italic></p>
<p>This framework generates a set of prediction samples for KDE by performing multiple samplings from the posterior distribution of network weights. Applying KDE to these generated samples yields a full probability density distribution of the RUL. This synergistic design ensures stable model optimization while constituting an end-to-end probabilistic prediction solution.</p>
<p><italic>A shift from deterministic point prediction to probabilistic distribution prediction</italic></p>
<p>While most existing studies focus on improving point prediction accuracy of RUL through network structure enhancements, the proposed probabilistic framework not only provides a point estimate of RUL but also enables the quantification of prediction uncertainty, offering a reliable basis for risk-informed maintenance decision-making.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Methodology</title>
<sec id="s3_1">
<label>3.1</label>
<title>Deep Neural Network Based on BiLSTM with Attention Mechanism</title>
<p>The workflow of the proposed probabilistic RUL prediction framework is illustrated in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The figure clearly depicts the end-to-end data flow, starting from the input of raw multi-sensor time-series data, followed by data preprocessing and sliding window segmentation, bidirectional feature encoding via the BiLSTM layer, calculation of key time-step weights by the attention mechanism layer, output of deterministic predictions through the fully connected layer, activation of Monte Carlo Dropout for multiple forward samplings to model uncertainty, and finally fusion of all sampling points using KDE to generate a probability density distribution. This distribution provides a basis for risk quantification in decision-making.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Flowchart of the probabilistic RUL PREDICTION FRAMEWORK</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-2.tif"/>
</fig>
<p>To address the challenge of predicting the RUL of equipment, a deep fusion network incorporating a BiLSTM architecture is proposed, with the aim of enhancing the extraction of temporal features and complex patterns. The core idea leverages the bidirectional temporal modeling capability of BiLSTM to fully capture contextual relationships in time-series data through the integration of both forward and backward directions. This mechanism overcomes the limitations of traditional unidirectional LSTM in modeling long-range dependencies and significantly improves the capture of dynamic correlations within time-series data. The workflow is illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Workflow diagram of fusion network model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-3.tif"/>
</fig>
<p>BiLSTM has demonstrated strong capabilities in capturing sequential dependencies. The computational process for the forward module of the model is given as follows [<xref ref-type="bibr" rid="ref-3">3</xref>]:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>i</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mover><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>f</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>g</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>tanh</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>o</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mover><mml:mi>f</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mover><mml:mi>i</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mover><mml:mi>g</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mover><mml:mi>o</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mi>tanh</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Similarly, formulation of the backward part of the BiLSTM albeit with a reversed temporal sequence:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>i</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>f</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>g</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>tanh</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mi>g</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>o</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mover><mml:mi>x</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>&#x03C9;</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>h</mml:mi><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mover><mml:mi>b</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mi>o</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mover><mml:mi>f</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mover><mml:mi>i</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mover><mml:mi>g</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mover><mml:mi>o</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mi>tanh</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-12">(12)</xref>, <italic>i</italic>(<italic>t</italic>), <italic>f</italic>(<italic>t</italic>), <italic>g</italic>(<italic>t</italic>), and <italic>o</italic>(<italic>t</italic>) represent the input gate, forget gate, candidate cell state, and output gate at time <italic>t</italic>, respectively; <italic>h</italic>(<italic>t</italic>) and <italic>c</italic>(<italic>t</italic>) denote the hidden state and cell state at time <italic>t</italic>, respectively; <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> denotes the sigmoid function, <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mo>&#x2299;</mml:mo></mml:math></inline-formula> denotes the Hadamard product, <italic>w</italic><sub><italic>ii</italic></sub>, <italic>w</italic><sub><italic>hi</italic></sub>, <italic>w</italic><sub><italic>if</italic></sub>, <italic>w</italic><sub><italic>hf</italic></sub>, <italic>w</italic><sub><italic>io</italic></sub>, <italic>w</italic><sub><italic>ho</italic></sub>, <italic>w</italic><sub><italic>ig</italic></sub>, <italic>w</italic><sub><italic>hg</italic></sub> are learnable weights, <italic>b</italic><sub><italic>i</italic></sub>, <italic>b</italic><sub><italic>f</italic></sub>, <italic>b</italic><sub><italic>g</italic></sub>, <italic>b</italic><sub><italic>o</italic></sub> are bias terms, &#x2192; indicates the forward direction, and &#x2190; indicates the backward direction. The above equations collectively constitute the core computational mechanism of the BiLSTM network. Through a series of differentiable control structures&#x2014;including the input gate, forget gate, and output gate&#x2014;the network adaptively manages the state of its internal memory cells. The forget gate <italic>f</italic>(<italic>t</italic>) determines how much information to retain from the historical memory cell <italic>c</italic>(<italic>t</italic> &#x2212; 1) based on the current input and the previous hidden state; a value closer to 1 indicates more memory retention. The input gate <italic>i</italic>(<italic>t</italic>) works in conjunction with the candidate state <italic>g</italic>(<italic>t</italic>) to decide how much new information from the current input should be updated into the memory cell <italic>c</italic>(<italic>t</italic>). Finally, the output gate <italic>o</italic>(<italic>t</italic>) controls how much information from the current memory cell <italic>c</italic>(<italic>t</italic>) is output as the hidden state <italic>h</italic>(<italic>t</italic>) and passed to subsequent layers. By processing sequences along both forward and backward time axes, the BiLSTM can integrate contextual information at each time step.</p>
<p>The BiLSTM integrates two oppositely oriented LSTM layers to capture long-term temporal dependencies from bidirectional data. After processing through multiple LSTM layers, the model employs data smoothing to filter out noise interference from the raw input signals, thereby facilitating more accurate extraction of temporal features and improving the precision of RUL prediction. The architecture of the model is illustrated in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, and its final output is expressed as follows:</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Schematic diagram of the BiLSTM model workflow</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-4.tif"/>
</fig>
<p><disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mi>h</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2295;</mml:mo><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mrow><mml:mo>(</mml:mo><mml:mi>t</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The Additive Attention Layer, a classical implementation of the attention mechanism originally proposed by Bahdanau et al. in 2015 (also referred to as Bahdanau attention), computes the relevance between queries and keys through learnable parameters. Its core idea involves mapping queries and keys to the same dimension via separate linear transformations, followed by an additive operation and a nonlinear activation function to derive the attention weights [<xref ref-type="bibr" rid="ref-29">29</xref>]. For a given input sequence <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, this encoder produces a forward hidden state sequence <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mo>{</mml:mo><mml:mover><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mover><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2192;</mml:mo></mml:mover><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula> and a backward hidden state sequence <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mrow><mml:mo>{</mml:mo><mml:mover><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mover><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">&#x2190;</mml:mo></mml:mover><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>. The encoded representation of any element <italic>x</italic><sub><italic>j</italic></sub> is formed by concatenating its corresponding forward and backward hidden states. The attention mechanism contains an alignment model that measures how well the annotation <italic>h</italic><sub><italic>j</italic></sub> from the input suits the hidden state <italic>s</italic><sub><italic>i</italic>&#x2212;1</sub> from the output. The calculation formula is as follows:
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mi>tan</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>h</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>U</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mn>2</mml:mn><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> are learnable weight matrix, <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>a</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is a weight vector used to compress to a scalar, and m is the dimension of the attention hidden layer.</p>
<p>The weights are calculated by applying the <italic>softmax</italic> operation to the alignment scores, and they represent the probability that the input <italic>x</italic><sub><italic>j</italic></sub> is aligned with the output <italic>y</italic><sub><italic>i</italic></sub>. The weight distribution can be obtained by normalizing the scores with <italic>softmax</italic>.
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mtext>soft</mml:mtext></mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:msub><mml:mi>e</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msqrt><mml:mi>m</mml:mi></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>As output of the attention mechanism, the context vector is calculated as the annotations <italic>h</italic><sub><italic>j</italic></sub>:
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>The model in <xref ref-type="disp-formula" rid="eqn-14">Eq. (14)</xref> learns to evaluate the relevance between the hidden state <italic>h</italic><sub><italic>j</italic></sub> (at the <italic>j</italic>-th time step of the input sequence) and the prediction target for the current prediction task. Subsequently, these scores are normalized into a probability distribution <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> via the <italic>Softmax</italic> function in <xref ref-type="disp-formula" rid="eqn-15">Eq. (15)</xref>. Finally, as shown in <xref ref-type="disp-formula" rid="eqn-16">Eq. (16)</xref>, the hidden states of all time steps are weighted and summed according to their attention weights, generating a context vector <italic>c</italic><sub><italic>ij</italic></sub> that condenses the key information of the sequence.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>KDE-Based Probabilistic RUL Prediction Framework</title>
<p>The innovation of the proposed prediction framework lies in the synergistic mechanism between residual learning and stochastic regularization. The residual pathway enables cross-layer propagation of raw signals through parallel linear transformations, whose mathematical essence establishes a high-speed channel for gradient flow. Experimental results demonstrate that this design effectively mitigates the gradient vanishing problem when handling long-term temporal dependencies in sensor signals. Meanwhile, LayerNorm standardizes the activation values of neurons, maintaining the data distribution of each layer&#x2019;s output within a stable range. Since each forward propagation effectively samples from the posterior distribution of network weights, the predictive distribution obtained through multiple sampling iterations naturally incorporates model epistemic uncertainty. This elegantly introduces Bayesian inference into conventional neural networks [<xref ref-type="bibr" rid="ref-30">30</xref>] and enables probabilistic outputs without altering the base architecture. The selection of the KDE method is primarily based on two considerations. First, as a non-parametric method, KDE does not require prior assumptions about the distribution shape of predictions, enabling more flexible fitting of complex distribution patterns (e.g., skewed or multimodal distributions) that may occur in real-world data. Second, it offers high computational efficiency, making it particularly suitable for combination with Monte Carlo Dropout sampling and thus more feasible and practical for engineering applications.</p>
<p>Theoretical analysis indicates that the generalization capability of this architecture stems from the synergy of multiple regularization mechanisms: Dropout provides a model averaging effect in deep networks, LayerNorm maintains feature distribution stability, and residual connections ensure optimization feasibility. Each forward pass utilizes a randomly generated Dropout mask, effectively sampling a distinct sub-network from the learned weight posterior. This process yields a discrete set of <italic>T</italic> prediction points, capturing the epistemic uncertainty arising from model parameters. Residual connections are incorporated after the second BiLSTM layer, with their outputs summed to those of the attention layer to mitigate gradient vanishing in deep networks. Stochastic regularization is implemented via Dropout layers applied after each BiLSTM layer and the first fully-connected layer, with a dropout rate of 0.2 determined through grid search.</p>
<p>For practical decision-making, key quantiles can be easily derived from the probability density distribution to define a predictive confidence interval. By evaluating this failure risk, strategies like condition-based maintenance can be optimized to balance the costs of unexpected downtime against those of premature component replacement. The main workflow is illustrated in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Implementation flowchart of the RUL prediction framework</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-5.tif"/>
</fig>
<p>Following the approach in reference [<xref ref-type="bibr" rid="ref-31">31</xref>], in Bayesian learning, for a given input <italic>x&#x002A;</italic>, obtaining its predictive distribution requires marginalizing over all model parameters:
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mo>&#x222B;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mi>&#x03C9;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C9;</mml:mi><mml:mrow><mml:mo>|</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mtext>Y</mml:mtext></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mi>&#x03C9;</mml:mi></mml:math></disp-formula>where <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>&#x03C9;</mml:mi><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> denotes the set of random variables for an L-layer model.</p>
<p>To approximate the predictive distribution, <italic>T</italic> sets of vectors are sampled from a Bernoulli distribution with probability <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, enabling empirical estimation of the first two moments:
<disp-formula id="eqn-18"><label>(18)</label><mml:math id="mml-eqn-18" display="block"><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>&#x2217;</mml:mo><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2248;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Monte Carlo estimator is referred to as MC-Dropout. In practice, it is equivalent to performing T forward passes through the deep network and averaging the outputs. We provide a novel derivation for this result, which allows us to derive well-founded uncertainty estimates. Empirically, the integration can be approximated by averaging the network weights (i.e., scaling each <italic>W</italic><sub><italic>i</italic></sub> by <italic>p</italic><sub><italic>i</italic></sub> at test time). Similarly, for regression tasks, the predictive mean and variance can be computed as:
<disp-formula id="eqn-19"><label>(19)</label><mml:math id="mml-eqn-19" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2248;</mml:mo><mml:msup><mml:mi>&#x03C2;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-20"><label>(20)</label><mml:math id="mml-eqn-20" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>V</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2248;</mml:mo><mml:msup><mml:mi>&#x03C2;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mtext mathvariant="bold">I</mml:mtext></mml:mrow><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>T</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup><mml:msup><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>N</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>L</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msup></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:msub><mml:mi>E</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where the predictive uncertainty is estimated via the sample variance over <italic>T</italic> forward passes plus the inverse model precision, denoted by <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>&#x03C2;</mml:mi></mml:math></inline-formula>. Since the model precision is often related to the ratio between learning rate <italic>r</italic><sub>1</sub> and weight decay <italic>r</italic><sub>2</sub>, we define <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>&#x03C2;</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>This prediction framework utilizes multiple forward passes with Dropout enabled, performing <italic>N</italic> predictions for each model in the ensemble to approximate Bayesian inference. The final prediction is obtained by aggregating all sampled outputs through averaging, while the resulting probability density distribution reflects the model&#x2019;s predictive uncertainty. Theoretical studies have demonstrated that residual connections can enhance the optimization behavior of the network. Dropout can be interpreted as a Bayesian approximation of Gaussian processes, with its key advantage lying in the provision of predictive distributions rather than point estimates&#x2014;this is crucial for risk assessment in RUL prediction applications.</p>
<p>In engineering practice, when applying KDE, an ensemble strategy is adopted: each of the 10 base models performs 100 independent predictions, resulting in a total of 1000 predictive samples. These predictions are processed via KDE to generate probability distribution curves of the RUL. Research indicates that under large-sample conditions, the choice of kernel function has limited influence on the shape of the probability density function. The mathematical formulation can be expressed as follows:
<disp-formula id="eqn-21"><label>(21)</label><mml:math id="mml-eqn-21" display="block"><mml:mrow><mml:mover><mml:mi>f</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mfrac><mml:mn>1</mml:mn><mml:mi>h</mml:mi></mml:mfrac><mml:mi>&#x03BA;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>h</mml:mi><mml:msqrt><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi></mml:msqrt></mml:mrow></mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mo stretchy="false">(</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:mfrac><mml:mrow><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>In the <xref ref-type="disp-formula" rid="eqn-21">Eq. (21)</xref>, <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>&#x03BA;</mml:mi></mml:math></inline-formula> represents the Gaussian kernel function, <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mi>h</mml:mi></mml:mfrac></mml:math></inline-formula> denotes the normalized distance variable, and <italic>h</italic> is the bandwidth parameter, which controls the smoothness of the kernel. We adopted the commonly used Scott&#x2019;s rule to adaptively determine the optimal bandwidth, which provides excellent smoothing for data approximating a normal distribution. The specific formula is <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>h</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>n</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>d</mml:mi><mml:mo>+</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:mfrac></mml:mrow></mml:msup></mml:math></inline-formula>, where <italic>n</italic> denotes the number of samples and <italic>d</italic> represents the dimension.</p>
<p>Unlike deterministic RUL prediction models, the proposed framework employs KDE to model the probability density distribution of the RUL prediction, thereby enabling uncertainty quantification&#x2014;an essential feature for supporting maintenance decision-making by management.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Case Study</title>
<sec id="s4_1">
<label>4.1</label>
<title>Dataset Description</title>
<p>To validate the effectiveness of the proposed RUL prediction method, time-series data from 21 sensors in the C-MAPSS dataset [<xref ref-type="bibr" rid="ref-32">32</xref>] were utilized, with the aircraft engine model illustrated in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. These sensors capture measurements such as temperature, pressure, and rotational speed from various locations within the engine. By simulating turbofan engine degradation under diverse conditions and faults, the dataset captures complex performance dynamics. The significant variation in complexity and fault types among its subsets makes them an ideal testbed for a thorough evaluation of model adaptability and generalizability. Detailed information on the four subsets of C-MAPSS is provided in <xref ref-type="table" rid="table-2">Table 2</xref>. As shown in the table, FD002 and FD004 are more complex than the other subsets, containing more training and testing trajectories and involving six different operating conditions. Following the setup in [<xref ref-type="bibr" rid="ref-4">4</xref>], sensor outputs that remain constant throughout the lifecycle&#x2014;providing no useful information for RUL prediction. Thus, 14 sensor signals were selected for both training and testing, specifically from sensors numbered 2, 3, 4, 7, 8, 9, 11, 12, 13, 14, 15, 17, 20, and 21.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Schematic diagram of the C-MPASS aero-engine model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-6.tif"/>
</fig><table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Information of the C-MPASS dataset</title>
</caption>
<table>
<colgroup>
<col align="center" width="45mm"/>
<col align="center" width="14mm"/>
<col align="center" width="14mm"/>
<col align="center" width="14mm"/>
<col align="center" width="14mm"/> </colgroup>
<thead>
<tr>
<th>Data subset</th>
<th>FD001</th>
<th>FD002</th>
<th>FD003</th>
<th>FD004</th>
</tr>
</thead>
<tbody>
<tr>
<td>Operating condition</td>
<td>1</td>
<td>6</td>
<td>1</td>
<td>6</td>
</tr>
<tr>
<td>Fault mode</td>
<td>1</td>
<td>1</td>
<td>2</td>
<td>2</td>
</tr>
<tr>
<td>Training set trajectory</td>
<td>100</td>
<td>260</td>
<td>100</td>
<td>249</td>
</tr>
<tr>
<td>Test set trajectory</td>
<td>100</td>
<td>259</td>
<td>100</td>
<td>248</td>
</tr>
<tr>
<td>Training sample</td>
<td>17,731</td>
<td>46,219</td>
<td>21,820</td>
<td>54,028</td>
</tr>
<tr>
<td>Test sample</td>
<td>13,096</td>
<td>33,991</td>
<td>16,596</td>
<td>41,214</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In accordance with [<xref ref-type="bibr" rid="ref-4">4</xref>], a sliding window of size 30 was applied to segment the sensor data. For sequences shorter than 30 time steps, the segmented data were padded using the first available value. To direct the model&#x2019;s focus toward the critical degradation phase near failure, the RUL labels in early stages were truncated to a constant value. A sliding step of 1 was adopted to maximize training data utilization by generating a large number of overlapping windows, thereby enriching the training samples. Data normalization, which has been shown to be crucial in RUL prediction, was performed using min-max scaling on all sensor readings, with the maximum RUL value set to 125.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Network Hyperparameters</title>
<p>The hyperparameter configuration of the proposed deep fusion network reflects a carefully balanced design tailored for time-series regression tasks. The input window length was set to 30 time steps, sufficiently capturing engine degradation trends without introducing excessive noise. The corresponding 14 input features represent the most predictive sensor indicators retained after preprocessing. The hidden architecture comprises two LSTM layers, each with 50 units, providing adequate memory capacity to learn temporal dependencies while controlling model complexity to avoid overfitting. Subsequent fully connected layers consist of 96 and 128 neurons, respectively. This progressively expanding structure facilitates hierarchical feature abstraction, enabling the extraction of high-level patterns from sequential encodings for accurate prediction.</p>
<p>The Adam optimizer was selected for its ability to adaptively adjust the learning rate without manual scheduling. The MSE loss function was used to heavily penalize large prediction errors. An early stopping mechanism with a patience of 10 epochs was employed to terminate training promptly when validation loss ceased to improve and restore the optimal weights, effectively preventing overfitting. A learning rate decay mechanism was also adopted, reducing the learning rate to one-tenth of its original value if no improvement was observed after 5 epochs. For model training, the batch size was set to 32 and the number of training epochs to 100, with all experiments conducted under the TensorFlow framework. Training was performed on a workstation equipped with an NVIDIA RTX 3080 GPU, using an initial learning rate of 0.001 and a learning rate scheduling strategy. The complete training process took approximately 1&#x2013;1.5 h, with slight variations across different dataset subsets.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Evaluation Metrics</title>
<p>To validate the effectiveness and reliability of the proposed model, the following evaluation metrics were employed: Root Mean Square Error (RMSE) and Score function.
<list list-type="simple">
<list-item><label>1.</label><p>RMSE: This metric measures the deviation between predicted values and actual values. It is defined as follows:
<disp-formula id="eqn-22"><label>(22)</label><mml:math id="mml-eqn-22" display="block"><mml:mi>R</mml:mi><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:msqrt></mml:math></disp-formula></p></list-item>
<list-item><label>2.</label><p>Score: In practical applications, late predictions may lead to more severe consequences than early predictions. To reflect this, the Score function imposes a higher penalty on late predictions compared to early ones. It is calculated as follows:
<disp-formula id="eqn-23"><label>(23)</label><mml:math id="mml-eqn-23" display="block"><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mfrac><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mn>13</mml:mn></mml:mfrac></mml:mrow></mml:msup><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003C;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mfrac><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mn>10</mml:mn></mml:mfrac></mml:mrow></mml:msup><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x0394;</mml:mi></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p></list-item>
</list></p>
<p>In <xref ref-type="disp-formula" rid="eqn-22">Eqs. (22)</xref> and <xref ref-type="disp-formula" rid="eqn-23">(23)</xref>, <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>U</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mi>R</mml:mi><mml:mi>U</mml:mi><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <italic>N</italic> represents the number of samples, while <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mrow><mml:mover><mml:mrow><mml:mi>R</mml:mi><mml:mi>U</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo>&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mi>R</mml:mi><mml:mi>U</mml:mi><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denote the predicted and true values of RUL, respectively.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>RUL Prediction Results</title>
<p>This section presents a comparative analysis between the proposed BiLSTM-based fusion network and several state-of-the-art methods referenced in <xref ref-type="table" rid="table-3">Table 3</xref>, including LSTM-BS [<xref ref-type="bibr" rid="ref-17">17</xref>], MCLSTM [<xref ref-type="bibr" rid="ref-18">18</xref>], GAT [<xref ref-type="bibr" rid="ref-19">19</xref>], RNN-AE [<xref ref-type="bibr" rid="ref-20">20</xref>], BiGRU-TSAM [<xref ref-type="bibr" rid="ref-21">21</xref>], and BLCNN [<xref ref-type="bibr" rid="ref-22">22</xref>]. The BiLSTM-based fusion network demonstrates superior performance over most comparative methods across multiple subsets. Notably, it achieves significant improvements in both RMSE and Score metrics on the more challenging and complex subsets, FD003 and FD004.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Comparison table of various RUL prediction models</title>
</caption>
<table>
<colgroup>
<col align="center" width="26mm"/>
<col align="center" width="9mm"/>
<col align="center" width="9mm"/>
<col align="center" width="9mm"/>
<col align="center" width="9mm"/>
<col align="center" width="9mm"/>
<col align="center" width="10mm"/>
<col align="center" width="9mm"/>
<col align="center" width="14mm"/> </colgroup>
<thead>
<tr>
<th>Evaluation metric</th>
<th colspan="4">RMSE</th>
<th colspan="4">Score</th>
</tr>
<tr>
<th>Dataset</th>
<th>FD001</th>
<th>FD002</th>
<th>FD003</th>
<th>FD004</th>
<th>FD001</th>
<th>FD002</th>
<th>FD003</th>
<th>FD004</th>
</tr>
</thead>
<tbody>
<tr>
<td>LSTM-BS [<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td>14.89</td>
<td>26.86</td>
<td>15.11</td>
<td>27.11</td>
<td>481</td>
<td>7982</td>
<td>493</td>
<td>5200</td>
</tr>
<tr>
<td>MCLSTM [<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td>13.71</td>
<td>/</td>
<td>/</td>
<td>23.81</td>
<td>315</td>
<td>/</td>
<td>/</td>
<td>4826</td>
</tr>
<tr>
<td>GAT [<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td>13.82</td>
<td>18.52</td>
<td>15.07</td>
<td>19.02</td>
<td>333.14</td>
<td>3289.6</td>
<td>778.45</td>
<td>2262.7</td>
</tr>
<tr>
<td>RNN-AE [<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td>13.27</td>
<td>19.59</td>
<td>19.16</td>
<td>22.15</td>
<td>228</td>
<td>2650</td>
<td>1727</td>
<td>2901</td>
</tr>
<tr>
<td>BiGRU-TSAM [<xref ref-type="bibr" rid="ref-21">21</xref>]</td>
<td>12.56</td>
<td>18.94</td>
<td>12.45</td>
<td>20.47</td>
<td>213.35</td>
<td>2264.13</td>
<td>232.86</td>
<td>3610.34</td>
</tr>
<tr>
<td>BLCNN [<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td>13.18</td>
<td>19.09</td>
<td>13.75</td>
<td>20.97</td>
<td>302.28</td>
<td>1557.55</td>
<td>381.37</td>
<td>3858.78</td>
</tr>
<tr>
<td>This study</td>
<td>12.13</td>
<td>18.93</td>
<td>11.93</td>
<td>18.09</td>
<td>232.42</td>
<td>1249.28</td>
<td>218.75</td>
<td>2090.76</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To verify the statistical significance and robustness of the proposed method, we conducted systematic replicate experiments. For the four subsets of the C-MAPSS dataset, the model was independently run under identical experimental settings using 10 different random seeds (42, 123, 456, 789, 999, 111, 222, 333, 444, 555). Each experiment included a complete workflow of data preprocessing, model training, and evaluation to ensure strict consistency of experimental conditions. The significance test analysis of the proposed method is presented in <xref ref-type="table" rid="table-4">Table 4</xref>, which includes the mean, coefficient of variation, and 95% confidence interval of the RMSE and Score metrics, providing a comprehensive evaluation of the model&#x2019;s robustness across different datasets.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Statistical significance test analysis of the model</title>
</caption>
<table>
<colgroup>
<col align="center" width="15mm"/>
<col align="center" width="21mm"/>
<col align="center" width="25mm"/>
<col align="center" width="21mm"/>
<col align="center" width="25mm"/> </colgroup>
<thead>
<tr>
<th></th>
<th colspan="2">RMSE</th>
<th colspan="2">Score</th>
</tr>
<tr>
<th>Dataset</th>
<th>Confidence interval</th>
<th>Coefficient of variation</th>
<th>Confidence interval</th>
<th>Coefficient of variation</th>
</tr>
</thead>
<tbody>
<tr>
<td>FD001</td>
<td>[12.01, 12.25]</td>
<td>1.48%</td>
<td>[227.85, 236.99]</td>
<td>2.77%</td>
</tr>
<tr>
<td>FD002</td>
<td>[17.71, 18.15]</td>
<td>1.79%</td>
<td>[1232.15, 1266.41]</td>
<td>2.03%</td>
</tr>
<tr>
<td>FD003</td>
<td>[11.78, 12.08]</td>
<td>1.76%</td>
<td>[213.15, 224.35]</td>
<td>3.57%</td>
</tr>
<tr>
<td>FD004</td>
<td>[17.84, 18.34]</td>
<td>1.93%</td>
<td>[2061.25, 2120.27]</td>
<td>2.02%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To further analyze the RUL prediction accuracy on the C-MAPSS dataset, visualizations of the prediction results for all four subsets are provided in <xref ref-type="fig" rid="fig-7">Figs. 7</xref> and <xref ref-type="fig" rid="fig-8">8</xref>. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> illustrates the overall prediction performance across all test engines within each subset, while <xref ref-type="fig" rid="fig-8">Fig. 8</xref> depicts the results for individual engines. As shown in the subplots of <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, a noticeable deviation between the predicted and actual RUL values is observed when the true RUL is high. However, as the true RUL approaches zero, the predicted and actual curves converge, demonstrating improved alignment and smoother fitting. The overall trends captured in <xref ref-type="fig" rid="fig-8">Fig. 8</xref> reveal that the predicted curves consistently follow the true RUL trajectories, particularly during the critical degradation phase of the engines. This indicates the model&#x2019;s capability to accurately reflect the gradual decline in RUL and capture the underlying degradation trends. These results visually underscore the robustness of the model in scenarios involving rapid health deterioration, highlighting its practical suitability for providing timely and accurate early warnings in industrial fault prediction systems.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>RUL prediction results for all engines in FD001 (<bold>a</bold>), FD002 (<bold>b</bold>), FD003 (<bold>c</bold>), and FD004 (<bold>d</bold>)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-7.tif"/>
</fig><fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>RUL prediction results for a representative individual engine from each subset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-8.tif"/>
</fig>
<p>In addition, the probability density prediction distribution for individual engine lifetimes within four subsets was visualized, as shown in <xref ref-type="fig" rid="fig-9">Figs. 9</xref>&#x2013;<xref ref-type="fig" rid="fig-12">12</xref>. The results demonstrate that the proposed method achieves accurate RUL predictions under various operating conditions and scenarios, with prediction errors consistently lower than those of the traditional LSTM model. The probability density curves exhibit a Gaussian distribution, and the predicted RUL means show minimal deviation from the actual values. From a maintenance perspective, the shape and spread of these distributions are highly informative. For instance, in <xref ref-type="fig" rid="fig-10">Fig. 10a</xref>,<xref ref-type="fig" rid="fig-10">b</xref>, the narrow and sharp distribution indicates high prediction confidence, suggesting that maintenance can be scheduled with high certainty around the mean RUL. The proposed framework not only maintains a low mean squared error but also generates probability distribution curves, which are unavailable through traditional method. This advancement shifts the prediction paradigm from &#x201C;point estimation&#x201D; to &#x201C;risk quantification,&#x201D; highlighting the value of probabilistic prediction in maintenance decision-making.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Probability density prediction distribution of sample 38 (<bold>a</bold>) and sample 91 (<bold>b</bold>) in the FD001 subset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-9.tif"/>
</fig><fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Probability density prediction distribution of sample 109 (<bold>a</bold>) and sample 147 (<bold>b</bold>) in the FD002 subset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-10.tif"/>
</fig><fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Probability density prediction distribution of sample 2 (<bold>a</bold>) and sample 59 (<bold>b</bold>) in the FD003 subset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-11.tif"/>
</fig><fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Probability density prediction distribution of 13 (<bold>a</bold>) and sample 221 (<bold>b</bold>) in the FD004 subset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_74009-fig-12.tif"/>
</fig>
<p>The model exhibits two main types of RUL prediction bias: (1) slight overestimation during early-to-mid operation due to weak degradation signals being masked by noise. Although the attention mechanism aims to focus on key information, the model struggles to extract strongly predictive early degradation patterns from complex multi-sensor signals when degradation features are not yet prominent. (2) occasional significant underestimation near failure end for units with abrupt performance drops, likely due to insufficient training data on such rapid degradation patterns. This is likely because training data contains relatively few examples of rapid failure, preventing the model from adequately learning the dynamics of fast degradation.</p>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Ablation Study</title>
<p>To quantitatively evaluate the individual contribution of each key component in the proposed framework&#x2014;namely the Attention mechanism, Residual Connections, and the KDE-based probabilistic framework, a comprehensive ablation study was conducted. The objective is to isolate the impact of these components on the model&#x2019;s overall prediction performance and its uncertainty quantification capability.</p>
<p>The ablation experiments were performed on four subsets under identical experimental settings (e.g., hyperparameters, training-testing split) as the full model. We designed three model variants for comparison: Variant A (w/o Attention), Variant B (w/o Residual), and Variant C (w/o KDE). The results of the ablation study are summarized in <xref ref-type="table" rid="table-5">Table 5</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Results of the ablation study</title>
</caption>
<table>
<colgroup>
<col align="center" width="20mm"/>
<col align="center" width="41mm"/>
<col align="center" width="18mm"/>
<col align="center" width="20mm"/> </colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Model variant</th>
<th>RMSE</th>
<th>Score</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="4">FD001</td>
<td>A</td>
<td>13.89</td>
<td>352.54</td>
</tr>
<tr>
<td>B</td>
<td>13.65</td>
<td>268.33</td>
</tr>
<tr>
<td>C</td>
<td>12.41</td>
<td>245.90</td>
</tr>
<tr>
<td>Proposed model</td>
<td>12.13</td>
<td>232.42</td>
</tr>
<tr>
<td rowspan="4">FD002</td>
<td>A</td>
<td>20.87</td>
<td>2028.65</td>
</tr>
<tr>
<td>B</td>
<td>19.42</td>
<td>1751.77</td>
</tr>
<tr>
<td>C</td>
<td>19.15</td>
<td>1398.44</td>
</tr>
<tr>
<td>Proposed model</td>
<td>18.93</td>
<td>1249.28</td>
</tr>
<tr>
<td rowspan="4">FD003</td>
<td>A</td>
<td>13.75</td>
<td>305.18</td>
</tr>
<tr>
<td>B</td>
<td>13.38</td>
<td>269.61</td>
</tr>
<tr>
<td>C</td>
<td>12.10</td>
<td>235.14</td>
</tr>
<tr>
<td>Proposed model</td>
<td>11.93</td>
<td>218.75</td>
</tr>
<tr>
<td rowspan="4">FD004</td>
<td>A</td>
<td>24.25</td>
<td>3256.89</td>
</tr>
<tr>
<td>B</td>
<td>20.65</td>
<td>2514.52</td>
</tr>
<tr>
<td>C</td>
<td>19.34</td>
<td>2350.13</td>
</tr>
<tr>
<td>Proposed model</td>
<td>18.09</td>
<td>2090.76</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In conclusion, the attention mechanism enables temporal feature selection, residual connections ensure stable training of deep networks, and the KDE framework provides essential uncertainty quantification for risk-conscious decision-making. Their synergistic integration in the full model is justified by its superior and robust performance across all evaluation scenarios.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This study has successfully established an integrated deep learning framework for the probabilistic prediction of RUL, effectively addressing key challenges in processing noisy sensor data and quantifying predictive uncertainty. This work provides a complete probabilistic RUL prediction solution spanning from data processing to decision-making. The value of this solution lies in the fact that the output probability distribution can be directly applied to risk assessment and maintenance strategy optimization, thereby offering a new technical approach for health management of complex mechanical systems. The main conclusions are as follows:</p>
<p>The proposed fusion neural network significantly enhances a superior capability in modeling complex, nonlinear degradation patterns from multi-sensor time-series data, outperforming conventional models and achieving higher prediction accuracy on the benchmark C-MAPSS dataset.</p>
<p>A well-balanced trade-off among model capacity, training efficiency, and generalization ability is achieved. The framework strikes a careful balance between model capacity and generalization through the synergistic use of residual connections and stochastic regularization, ensuring stable training and robust performance.</p>
<p>The incorporation of KDE enhances the robustness of RUL predictions and enables uncertainty quantification. The framework improves the accuracy of interval predictions, thereby offering more reliable decision support for equipment health management.</p>
<p>Although the proposed framework demonstrates superior performance in RUL probability prediction, several limitations remain. The current work primarily relies on homogeneous sensor sequences and has not yet explored the deep integration of multi-modal sensor information. Furthermore, the model&#x2019;s cross-equipment transfer learning capability requires validation, and a dynamic updating mechanism for online streaming PHM systems has not been implemented.</p>
<p>Future research will focus on extending the framework to incorporate multi-source heterogeneous information fusion, developing efficient transfer learning strategies, and constructing a dynamic prediction framework capable of online adaptive learning, thereby further enhancing the model&#x2019;s practicality and generalization in complex industrial scenarios.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This research was funded by scientific research projects under Grant JY2024B011.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Conceptualization, Bo Zhu, Zhonghua Cheng; methodology, Shuai Yue; software, Enzhi Dong; validation, Kexin Jiang; formal analysis, Bo Zhu; investigation, Enzhi Dong, Shuai Yue, Chiming Guo; resources, Shuai Yue; data curation, Shuai Yue, Chiming Guo; writing&#x2014;original draft preparation, Bo Zhu; writing&#x2014;review and editing, Enzhi Dong, Kexin Jiang, Zhonghua Cheng; visualization, Kexin Jiang; supervision, Zhonghua Cheng, Chiming Guo; project administration, Zhonghua Cheng, Chiming Guo. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Data available on request from the authors. The data that support the findings of this study are available from the corresponding author, Zhonghua Cheng, upon reasonable request.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Qiu</surname> <given-names>LJ</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>MH</given-names></string-name></person-group>. <article-title>PHM technology framework and its key technologies review</article-title>. <source>Foreign Electron Meas Technol</source>. <year>2018</year>;<volume>37</volume>(<issue>2</issue>):<fpage>10</fpage>&#x2013;<lpage>5</lpage>. (In Chinese). doi:<pub-id pub-id-type="doi">10.19652/j.cnki.femt.1700638</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kong</surname> <given-names>JZ</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>JZ</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tsui</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>ZK</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Review on lithium-ion battery PHM from the perspective of key PHM steps</article-title>. <source>Chin J Mech Eng</source>. <year>2024</year>;<volume>37</volume>(<issue>4</issue>):<fpage>14</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1186/s10033-024-01055-z</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jin</surname> <given-names>RB</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>ZH</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>KY</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Li</surname> <given-names>XL</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>RQ</given-names></string-name></person-group>. <article-title>Bi-LSTM-based two-stream network for machine remaining useful life prediction</article-title>. <source>IEEE Trans Instrum Meas</source>. <year>2022</year>;<volume>71</volume>:<fpage>3511110</fpage>. doi:<pub-id pub-id-type="doi">10.1109/TIM.2022.3167778</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Song</surname> <given-names>JW</given-names></string-name>, <string-name><surname>Park</surname> <given-names>YI</given-names></string-name>, <string-name><surname>Hong</surname> <given-names>JJ</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>SG</given-names></string-name>, <string-name><surname>Kang</surname> <given-names>SJ</given-names></string-name></person-group>. <article-title>Attention-based bidirectional LSTM-CNN model for remaining useful life estimation</article-title>. In: <conf-name>2021 IEEE International Symposium on Circuits and Systems (ISCAS)</conf-name>; <year>2021</year> May 22&#x2013;28; Virtual. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi: <pub-id pub-id-type="doi">10.1109/ISCAS51556.2021.9401572</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cai</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>JW</given-names></string-name>, <string-name><surname>Li</surname> <given-names>C</given-names></string-name>, <string-name><surname>He</surname> <given-names>ZQ</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>ZM</given-names></string-name></person-group>. <article-title>A RUL prediction method of rolling bearings based on degradation detection and deep BiLSTM</article-title>. <source>Electron Res Arch</source>. <year>2024</year>;<volume>32</volume>(<issue>5</issue>):<fpage>3145</fpage>&#x2013;<lpage>61</lpage>. doi:<pub-id pub-id-type="doi">10.3934/era.2024144</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guo</surname> <given-names>XF</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>KZ</given-names></string-name>, <string-name><surname>Yao</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>GJ</given-names></string-name>, <string-name><surname>Ning</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>RUL prediction of lithium ion battery based on CEEMDAN-CNN BiLSTM model</article-title>. <source>Energy Rep</source>. <year>2023</year>;<volume>9</volume>(<issue>1</issue>):<fpage>1299</fpage>&#x2013;<lpage>306</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.egyr.2023.05.121</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Raouf</surname> <given-names>I</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>P</given-names></string-name>, <string-name><surname>Khalid</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>HS</given-names></string-name></person-group>. <article-title>Comprehensive analysis of current developments, challenges, and opportunities for the health assessment of smart factory</article-title>. <source>Int J Precis Eng Manuf-Green Technol</source>. <year>2025</year>;<volume>12</volume>(<issue>4</issue>):<fpage>1321</fpage>&#x2013;<lpage>38</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s40684-025-00694-4</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kumar</surname> <given-names>P</given-names></string-name>, <string-name><surname>Khalid</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>HS</given-names></string-name></person-group>. <article-title>Prognostics and health management of rotating machinery of industrial robot with deep learning applications&#x2014;a review</article-title>. <source>Math</source>. <year>2023</year>;<volume>11</volume>(<issue>13</issue>):<fpage>3008</fpage>. doi:<pub-id pub-id-type="doi">10.3390/math11133008</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Raouf</surname> <given-names>I</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>H</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>HS</given-names></string-name></person-group>. <article-title>Mechanical fault detection based on machine learning for robotic RV reducer using electrical current signature analysis: a data-driven approach</article-title>. <source>J Comput Des Eng</source>. <year>2022</year>;<volume>9</volume>(<issue>2</issue>):<fpage>417</fpage>&#x2013;<lpage>33</lpage>. doi:<pub-id pub-id-type="doi">10.1093/jcde/qwac015</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Raouf</surname> <given-names>I</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>HS</given-names></string-name></person-group>. <article-title>Deep learning-based fault diagnosis of servo motor bearing using the attention-guided feature aggregation network</article-title>. <source>Expert Syst Appl</source>. <year>2024</year>;<volume>258</volume>(<issue>4</issue>):<fpage>125137</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2024.125137</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Ma</surname> <given-names>MY</given-names></string-name></person-group>. <source>Application of deep learning in fault diagnosis and life prediction of rolling bearings [dissertation]</source>. <publisher-loc>Nanjing, China</publisher-loc>: <publisher-name>Nanjing University of Information Science and Technology</publisher-name>; <year>2024</year>. doi: <pub-id pub-id-type="doi">10.27248/d.cnki.gnjqc.2024.000262</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>ZZ</given-names></string-name></person-group>. <source>Research on remaining useful life prediction methods for mechanical equipment based on deep learning [dissertation]</source>. <publisher-loc>Jinan, China</publisher-loc>: <publisher-name>Shandong University</publisher-name>; <year>2024</year>. doi: <pub-id pub-id-type="doi">10.27272/d.cnki.gshdu.2024.000382</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zha</surname> <given-names>WT</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>YH</given-names></string-name></person-group>. <article-title>An aero-engine remaining useful life prediction model based on feature selection and the improved TCN</article-title>. <source>Franklin Open</source>. <year>2024</year>;<volume>6</volume>(<issue>5</issue>):<fpage>100083</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.fraope.2024.100083</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>FD</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>YJ</given-names></string-name></person-group>. <article-title>An aero-engine remaining useful life prediction model based on clustering analysis and the improved GRU-TCN</article-title>. <source>Meas Sci Technol</source>. <year>2025</year>;<volume>36</volume>(<issue>1</issue>):<fpage>016001</fpage>. doi:<pub-id pub-id-type="doi">10.1088/1361-6501/ad825a</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>E</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>R</given-names></string-name></person-group>. <article-title>An integrated approach to condition-based maintenance decision-making of planetary gearboxes: combining temporal convolutional network auto encoders with wiener process</article-title>. <source>Comput Mater Contin</source>. <year>2025</year>;<volume>86</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>26</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2025.069194</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jin</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Yi</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name></person-group>. <article-title>DCAGGCN: a novel method for remaining useful life prediction of bearings</article-title>. <source>Reliab Eng Syst Saf</source>. <year>2025</year>;<volume>260</volume>(<issue>9</issue>):<fpage>110978</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2025.110978</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Liao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Uncertainty prediction of remaining useful life using long short-term memory network based on bootstrap method</article-title>. In: <conf-name>Proceedings of the 2018 IEEE International Conference on Prognostics and Health Management (ICPHM)</conf-name>; <year>2018</year> Jun <publisher-loc>11&#x2013;13; Seattle, WA, USA</publisher-loc>. p. 1&#x2013;8. doi: <pub-id pub-id-type="doi">10.1109/ICPHM.2018.8448804</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xiang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>J</given-names></string-name>, <string-name><surname>Pu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Multicellular LSTM-based deep learning model for aero-engine remaining useful life prediction</article-title>. <source>Reliab Eng Syst Saf</source>. <year>2021</year>;<volume>216</volume>:<fpage>107927</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2021.107927</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>A novel method for identifying sudden degradation changes in remaining useful life prediction for bearing</article-title>. <source>Expert Syst Appl</source>. <year>2025</year>;<volume>278</volume>(<issue>3</issue>):<fpage>127315</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2025.127315</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>IY</given-names></string-name>, <string-name><surname>Mechefske</surname> <given-names>C</given-names></string-name></person-group>. <article-title>An improved similarity-based prognostic algorithm for RUL estimation using an RNN autoencoder scheme</article-title>. <source>Reliab Eng Syst Saf</source>. <year>2020</year>;<volume>199</volume>:<fpage>106926</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2020.106926</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yin</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Prediction of remaining useful life based on bidirectional gated recurrent unit with temporal self-attention mechanism</article-title>. <source>Reliab Eng Syst Saf</source>. <year>2022</year>;<volume>221</volume>(<issue>4</issue>):<fpage>108297</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ress.2021.108297</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Al-Dulaimi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zabihi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Asif</surname> <given-names>A</given-names></string-name>, <string-name><surname>Mohammadi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>A multimodal and hybrid deep neural network model for remaining useful life estimation</article-title>. <source>Comput Ind</source>. <year>2019</year>;<volume>108</volume>:<fpage>186</fpage>&#x2013;<lpage>96</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compind.2019.02.004</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>R</given-names></string-name>, <string-name><surname>Guretno</surname> <given-names>F</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>R</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Machine remaining useful life prediction via an attention-based deep learning approach</article-title>. <source>IEEE Trans Ind Electron</source>. <year>2021</year>;<volume>68</volume>(<issue>3</issue>):<fpage>2521</fpage>&#x2013;<lpage>31</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIE.2020.2972443</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xiao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Fang</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Predicting the remaining useful life of aircraft engines using spatial and temporal attention mechanisms</article-title>. <source>IEEE Trans Instrum Meas</source>. <year>2025</year>;<volume>74</volume>:<fpage>1</fpage>&#x2013;<lpage>14</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIM.2025.3545540</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gao</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Dai</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Multiscale spatiotemporal attention network for remaining useful life prediction of mechanical systems</article-title>. <source>IEEE Sens J</source>. <year>2025</year>;<volume>25</volume>(<issue>4</issue>):<fpage>6825</fpage>&#x2013;<lpage>35</lpage>. doi:<pub-id pub-id-type="doi">10.1109/JSEN.2024.3523176</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>B</given-names></string-name>, <string-name><surname>Yin</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Bao</surname> <given-names>H</given-names></string-name>, <string-name><surname>Song</surname> <given-names>L</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Remaining useful life prediction of turbofan engines based on dual attention mechanism guided parallel CNN-LSTM</article-title>. <source>Meas Sci Technol</source>. <year>2025</year>;<volume>36</volume>(<issue>1</issue>):<fpage>016160</fpage>. doi:<pub-id pub-id-type="doi">10.1088/1361-6501/ad8946</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Cai</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Remaining useful life prediction of motor bearings based on slow feature analysis-assisted attention mechanism and dual-LSTM networks</article-title>. <source>Struct Health Monit</source>. <year>2025</year>;<volume>1</volume>(<issue>1</issue>):<fpage>97</fpage>. doi:<pub-id pub-id-type="doi">10.1177/14759217251324103</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>D</given-names></string-name>, <string-name><surname>Gou</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lai</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>D</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A two-phase equipment remaining useful life prediction model with sequence-feature attention mechanism</article-title>. <source>IEEE Trans Instrum Meas</source>. <year>2025</year>;<volume>74</volume>(<issue>4</issue>):<fpage>1</fpage>&#x2013;<lpage>14</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIM.2025.3576953</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bahdanau</surname> <given-names>D</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>K</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Neural machine translation by jointly learning to align and translate</article-title>. <source>arXiv:1409.0473</source>. <year>2014</year>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.1409.0473</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Le Folgoc</surname> <given-names>L</given-names></string-name>, <string-name><surname>Baltatzis</surname> <given-names>V</given-names></string-name>, <string-name><surname>Desai</surname> <given-names>S</given-names></string-name>, <string-name><surname>Devaraj</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ellis</surname> <given-names>S</given-names></string-name>, <string-name><surname>Manzanera</surname> <given-names>OEM</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Is MC dropout Bayesian?</article-title> arXiv:2110.04286. <comment>2021</comment>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.2110.04286</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gal</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ghahramani</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Dropout as a Bayesian approximation: representing model uncertainty in deep learning</article-title>. <source>arXiv:1506.02142</source>. <year>2015</year>. doi: <pub-id pub-id-type="doi">10.48550/arXiv.1506.02142</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Saxena</surname> <given-names>A</given-names></string-name>, <string-name><surname>Goebel</surname> <given-names>K</given-names></string-name>, <string-name><surname>Simon</surname> <given-names>D</given-names></string-name>, <string-name><surname>Eklund</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Damage propagation modeling for aircraft engine run-to-failure simulation</article-title>. In: <conf-name>Proceedings of the International Conference on Prognostics and Health Management; 2008 Oct 6&#x2013;9; Denver, CO, USA. Piscataway, NJ, USA: IEEE; 2008</conf-name>. p. 1&#x2013;9. doi:<pub-id pub-id-type="doi">10.1109/PHM.2008.4711414</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>