<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">72777</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.072777</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>The Missing Data Recovery Method Based on Improved GAN</article-title>
<alt-title alt-title-type="left-running-head">The Missing Data Recovery Method Based on Improved GAN</alt-title>
<alt-title alt-title-type="right-running-head">The Missing Data Recovery Method Based on Improved GAN</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Zhang</surname><given-names>Su</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Deng</surname><given-names>Song</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>dengsong@njupt.edu.cn</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Liu</surname><given-names>Qingsheng</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>College of Automation, Nanjing University of Posts and Telecommunications</institution>, <addr-line>Nanjing, 210023</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>State Grid Jiashan Power Supply Company</institution>, <addr-line>Jiaxing, 314100</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Song Deng. Email: <email>dengsong@njupt.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2026</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>10</day><month>2</month><year>2026</year>
</pub-date>
<volume>87</volume>
<issue>1</issue>
<elocation-id>45</elocation-id>
<history>
<date date-type="received">
<day>03</day>
<month>09</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>24</day>
<month>11</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2026 The Authors.</copyright-statement>
<copyright-year>2026</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_72777.pdf"></self-uri>
<abstract>
<p>Accurate and reliable power system data are fundamental for critical operations such as grid monitoring, fault diagnosis, and load forecasting, underpinned by increasing intelligentization and digitalization. However, data loss and anomalies frequently compromise data integrity in practical settings, significantly impacting system operational efficiency and security. Most existing data recovery methods require complete datasets for training, leading to substantial data and computational demands and limited generalization. To address these limitations, this study proposes a missing data imputation model based on an improved Generative Adversarial Network (BAC-GAN). Within the BAC-GAN framework, the generator utilizes Bidirectional Long Short-Term Memory (BiLSTM) networks and Multi-Head Attention mechanisms to capture temporal dependencies and complex relationships within power system data. The discriminator employs a Convolutional Neural Network (CNN) architecture to integrate local features with global structures, effectively mitigating the generation of implausible imputations. Experimental results on two public datasets demonstrate that the BAC-GAN model achieves superior data recovery accuracy compared to five state-of-the-art and classical benchmark methods,with an average improvement of 17.7% in reconstruction accuracy. The proposed method significantly enhances the accuracy of grid fault diagnosis and provides reliable data support for the stable operation of smart grids, showing great potential for practical applications in power systems.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Power system</kwd>
<kwd>data recovery</kwd>
<kwd>generative adversarial network</kwd>
<kwd>bidirectional long short-term memory network</kwd>
<kwd>multi-head attention mechanism</kwd>
<kwd>convolutional neural network</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>51977113</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Science and Technology Project of State Grid Zhejiang Electric Power</funding-source>
<award-id>5211JX240001</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Driven by increasing intelligence and digitization in power systems, modern grid data have grown significantly in complexity. The widespread deployment of sensors, smart meters, and IoT devices enables real-time collection and transmission of operational, power delivery, weather, and business data [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>], supporting enhanced grid monitoring and management [<xref ref-type="bibr" rid="ref-4">4</xref>]. However, data integrity is frequently compromised by missing or incomplete values resulting from equipment failures and communication network issues [<xref ref-type="bibr" rid="ref-5">5</xref>]. This data loss severely impacts critical functions including real-time monitoring, fault diagnosis, and load forecasting [<xref ref-type="bibr" rid="ref-6">6</xref>]. A notable example is the 2018 California grid outage, where missing monitoring data from substations significantly hampered fault diagnosis and restoration efforts, ultimately affecting approximately 200,000 households and businesses [<xref ref-type="bibr" rid="ref-7">7</xref>]. Consequently, effective recovery of missing data is essential for ensuring power system reliability and security.</p>
<p>Methods for recovering missing data in power systems are broadly classified as statistical, machine learning (ML), and deep learning (DL) approaches. Statistical methods encompass techniques such as linear interpolation [<xref ref-type="bibr" rid="ref-8">8</xref>], multiple imputation [<xref ref-type="bibr" rid="ref-9">9</xref>], and regression imputation [<xref ref-type="bibr" rid="ref-10">10</xref>]. Time-series specific approaches include the ARMA model (combining autoregressive (AR) and moving average (MA) components) for prediction [<xref ref-type="bibr" rid="ref-11">11</xref>], ARIMA for non-stationary time series recovery [<xref ref-type="bibr" rid="ref-12">12</xref>], and spatial interpolation using Delaunay triangulation [<xref ref-type="bibr" rid="ref-13">13</xref>]. While these methods primarily infer missing values based on underlying statistical characteristics, they often struggle to effectively capture the complex nonlinear dynamics and temporal dependencies inherent in power system data due to its intricate nature.</p>
<p>As recognition of power system data complexity grows, machine learning (ML) has been increasingly adopted for missing data imputation. Compared to statistical methods, ML excels at automatically learning intricate data patterns. For instance, K-Nearest Neighbors (KNN) has been applied to impute missing wind power data [<xref ref-type="bibr" rid="ref-14">14</xref>], hybrid approaches combine statistical and ML techniques like kernel canonical correlation analysis to enhance accuracy [<xref ref-type="bibr" rid="ref-15">15</xref>], and random forest-based multiple imputation algorithms have been developed [<xref ref-type="bibr" rid="ref-16">16</xref>]. However, ML methods typically depend on explicit feature engineering. Given the high dimensionality, nonlinearity, and complex temporal dependencies inherent in power system data, these approaches often struggle to capture deep underlying relationships, limiting their effectiveness in complex scenarios.</p>
<p>Deep learning (DL) advances beyond conventional ML by employing sophisticated neural architectures to autonomously extract latent patterns and correlations from raw data. Representative approaches include: bidirectional RNNs (M-RNN) for data stream interpolation [<xref ref-type="bibr" rid="ref-17">17</xref>]; GRU-D models incorporating masking and time intervals [<xref ref-type="bibr" rid="ref-18">18</xref>]; graph networks enhanced with sparse spatiotemporal attention for improved robustness to sparsity [<xref ref-type="bibr" rid="ref-19">19</xref>]; conditional score diffusion models for time series imputation [<xref ref-type="bibr" rid="ref-20">20</xref>]; Gaussian mixture models parameterized via maximum likelihood estimation [<xref ref-type="bibr" rid="ref-21">21</xref>]; and denoising autoencoders for missing data handling [<xref ref-type="bibr" rid="ref-22">22</xref>]. Despite these innovations, DL-based imputation methods remain challenged by complex missing patterns and limitations in model robustness.</p>
<p>To address these challenges, generative adversarial networks (GANs) have emerged as a promising approach for missing data imputation, leveraging their strong generative capabilities to produce realistic and plausible imputation results. Notably, GAIN (Generative Adversarial Imputation Network) [<xref ref-type="bibr" rid="ref-23">23</xref>] introduced a GAN-based framework that generates high-quality imputations through adversarial training. However, the GAIN model exhibits two key limitations for power system applications: (1) Its reliance on fully connected neural networks hinders effective capture of long-term temporal dependencies essential for complex power system time series characterized by strong periodicity and temporal correlations, and (2) It inadequately models local data correlations, limiting its ability to exploit local patterns and variation trends inherent in power system data.</p>
<p>To overcome these limitations, this paper proposes BAC-GAN (Bidirectional LSTM with Multi-Head Attention and CNN-based Discriminator Generative Adversarial Network), an enhanced GAN architecture for power system data imputation. The generator integrates bidirectional long short-term memory (BiLSTM) networks with multi-head attention (MA) mechanisms to capture complex temporal dependencies. Simultaneously, the discriminator employs a convolutional neural network (CNN) architecture to effectively model local features and global structures, mitigating the generation of implausible imputations.</p>
<p><bold>The principal contributions are:</bold>
<list list-type="bullet">
<list-item>
<p>We propose BAC-GAN, a novel generative adversarial network framework incorporating Bidirectional LSTM with Multi-Head Attention and a CNN-based Discriminator, specifically designed for high-accuracy imputation of missing data in power systems.</p></list-item>
<list-item>
<p>The generator integrates Bidirectional LSTM (BiLSTM) networks with a Multi-Head Attention (MA) mechanism to comprehensively capture complex long-term temporal dependencies, while the discriminator employs a Convolutional Neural Network (CNN) architecture to effectively model local features and integrate them with global structural context, thereby ensuring the plausibility of generated imputations.</p></list-item>
<list-item>
<p>Comprehensive experimental evaluation on two public power system datasets demonstrates that BAC-GAN achieves significantly superior imputation accuracy compared to five state-of-the-art and classical benchmark methods.</p></list-item>
</list></p>
<p>The remainder of this paper is organized as follows. <xref ref-type="sec" rid="s2">Section 2</xref> provides prerequisite knowledge. <xref ref-type="sec" rid="s3">Section 3</xref> highlights the BAC-GAN model. <xref ref-type="sec" rid="s4">Section 4</xref> presents experimental analysis. The conclusions are shown in <xref ref-type="sec" rid="s5">Section 5</xref>.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Prerequisite Knowledge</title>
<p>Consider a <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>d</mml:mi></mml:math></inline-formula>-dimensional space. Assume that <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a random variable with a distribution denoted as <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:math></inline-formula> is referred to as the power data vector, representing observed data in the power system. Meanwhile, a random variable <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>M</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>d</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> takes values in <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula> is called the mask vector, which identifies which components in <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:math></inline-formula> are observed values (<inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> represents observed values, and <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula> represents missing values). For each <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>i</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>d</mml:mi><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, the original space <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is extended to define a new space <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mrow><mml:mover><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x222A;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, where &#x2217; is a special symbol used to represent unobserved values.</p>
<p>Based on this, a new random variable <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mover><mml:mrow><mml:mi>&#x1D4B3;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is defined, with its computation rule as follows:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left left" rowspacing=".2em" columnspacing="1em" displaystyle="false"><mml:mtr><mml:mtd><mml:msub><mml:mi>X</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x2217;</mml:mo><mml:mo>,</mml:mo></mml:mtd><mml:mtd><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The power system data matrix is <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>X</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>C</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, and <italic>M</italic> is its corresponding mask matrix. The input missing data matrix is <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x2299;</mml:mo><mml:mi>M</mml:mi></mml:math></inline-formula>, where <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>X</mml:mi><mml:mo>&#x2299;</mml:mo><mml:mi>M</mml:mi></mml:math></inline-formula> denotes element-wise multiplication.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>The BAC-GAN Model</title>
<p>Given the temporal and high-dimensional nature of power system data, this paper proposes a missing data recovery model named BAC-GAN (as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>), which is based on an improved Generative Adversarial Network.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Recovery model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-1.tif"/>
</fig>
<p>The generator adopts a structure that combines BiLSTM with a multi-head attention mechanism. BiLSTM integrates historical information and future trends through its bidirectional information flow mechanism, aiding in a comprehensive understanding of the dynamic characteristics of time series data. The multi-head attention mechanism captures global dependencies and key time points in the sequence through weighted allocation, further enhancing the model&#x2019;s ability to model temporal features and compensating for BiLSTM&#x2019;s limitations in capturing global contextual information.</p>
<p>The discriminator employs a CNN architecture. Power system data often exhibit high dimensionality, containing multivariate information such as voltage, current, frequency, and load. CNNs can capture local correlations and underlying structures within the data. When processing high-dimensional data, the CNN models interdependencies among variables through multi-channel input mechanisms, reducing the impact of redundant information. This provides more reliable feedback signals to the generator, prompting it to continuously optimize the imputation performance. Through adversarial training between the generator and the discriminator, high-precision recovery of missing data is ultimately achieved.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Generator</title>
<p>Conventional generators typically employ fully connected networks, which fail to effectively capture bidirectional temporal dependencies and global features, resulting in low-quality imputed data. To address this issue, this paper proposes a generator architecture based on a BiLSTM and multi-head attention mechanism.The core of our temporal modeling is a two-layer BiLSTM. This stacked design aims to learn hierarchical temporal representations: the first BiLSTM layer processes the input sequence and generates preliminary hidden states containing fundamental bidirectional patterns; the second BiLSTM layer builds upon this foundation to model more complex and abstract long-term dependencies at a higher level of feature abstraction. This hierarchical processing enables the network to capture intricate temporal dynamics that might be missed by a single-layer BiLSTM. Subsequently, the output from the second BiLSTM layer is fed into a single-layer multi-head attention mechanism. We employ a single attention layer not for deep feature transformation, but to serve as a powerful global weighting and aggregation module. Its function is to reweight the temporal features refined by the BiLSTM, identifying and emphasizing the most critical time steps across the entire sequence from multiple representation subspaces. This single-layer structure effectively captures global contextual relationships while avoiding the computational overhead and overfitting risks associated with multi-layer attention stacking. The detailed structure is illustrated in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Improvement of generator structure</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-2.tif"/>
</fig>
<p>Generator <italic>G</italic> receives the power data with missing values <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mrow><mml:mi>X</mml:mi></mml:mrow></mml:math></inline-formula>, the mask matrix <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:math></inline-formula>, and the noise-processed <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mrow><mml:mi>Z</mml:mi></mml:mrow></mml:math></inline-formula> after masking as inputs. Through adjustments by the mask matrix, randomness is introduced only at the missing value positions, thereby aligning with the target distribution dimensions, i.e., the input data <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>G</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>M</mml:mi><mml:mo>,</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>M</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2299;</mml:mo><mml:mi>Z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>.</p>
<p>BiLSTM is an extension of LSTM, consisting of two LSTMs: one processes information in the forward direction, while the other handles information in the backward direction. BiLSTM can capture bidirectional temporal information in sequences, making it suitable for tasks that require consideration of both past and future information.</p>
<p>Based on the temporal features extracted by BiLSTM, the multi-head attention mechanism further enhances the feature representation capability. The multi-head attention mechanism employs multiple attention heads to capture dependencies between different time steps of the input sequence from distinct subspaces. For the query (Query), key (Key), and value (Value) of the input sequence, denoted as <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mrow><mml:mi>Q</mml:mi></mml:mrow></mml:math></inline-formula>, <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mrow><mml:mi>V</mml:mi><mml:mo>,</mml:mo></mml:mrow></mml:math></inline-formula> respectively, the computation for a single head of attention is as follows:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mtext>Attention</mml:mtext></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>Q</mml:mi><mml:mo>,</mml:mo><mml:mi>K</mml:mi><mml:mo>,</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mtext>softmax</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mi>Q</mml:mi><mml:msup><mml:mi>K</mml:mi><mml:mi mathvariant="normal">&#x22A4;</mml:mi></mml:msup></mml:mrow><mml:msqrt><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:msqrt></mml:mfrac><mml:mo>)</mml:mo></mml:mrow><mml:mi>V</mml:mi></mml:math></disp-formula>where <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mfrac><mml:mn>1</mml:mn><mml:msqrt><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:msqrt></mml:mfrac></mml:math></inline-formula> is the scaling factor, <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:msqrt><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:msqrt></mml:math></inline-formula> is the dimension of the key, and the softmax function ensures the normalization of attention weights. The multi-head attention mechanism computes attention in parallel through multiple independent attention heads and concatenates the results of all heads. By leveraging this mechanism, the model can capture global dependencies in temporal data from diverse perspectives, further refining feature representation.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Discriminator</title>
<p>In the model proposed in this paper, a hint matrix (Hint Mechanism) and a CNN-based network architecture are introduced to enhance the discriminator&#x2019;s capability, as illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Improvement of discriminator structure</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-3.tif"/>
</fig>
<p>The hint matrix provides information about the validity of the original data, helping the discriminator more accurately distinguish between observed and imputed values. Particularly in cases of severe data missingness, the generator may produce multiple plausible imputation results, making it challenging for the discriminator to make effective judgments. The calculation formula for the hint matrix is:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>H</mml:mi><mml:mo>=</mml:mo><mml:mi>B</mml:mi><mml:mo>&#x2299;</mml:mo><mml:mi>M</mml:mi><mml:mo>+</mml:mo><mml:mn>0.8</mml:mn><mml:mo>&#x2299;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <italic>B</italic> is a defined random variable and <italic>M</italic> is the mask matrix.</p>
<p>The input to the discriminator consists of the generated sample <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> and the hint matrix <italic>H</italic>. To better capture the features in the input data <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mrow><mml:mover><mml:mi>X</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> and enhance the discriminator&#x2019;s ability to distinguish between observed and imputed values, a CNN architecture is adopted for the discriminator. In this model, the discriminator is structured with three convolutional layers, which are systematically designed to extract hierarchical features from the input data through sequential convolutional operations. The initial convolutional layer functions as a low-level feature detector, capturing fundamental patterns and local structures within the input. As the network depth increases, the subsequent layers progressively learn more abstract and complex representations: the second layer integrates local features to identify intermediate patterns, while the third layer synthesizes global contextual information, enabling the discriminator to effectively distinguish subtle inconsistencies between observed and imputed values. This hierarchical feature extraction mechanism significantly enhances the discriminator&#x2019;s capacity to evaluate data authenticity across multiple scales.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Loss Function</title>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Discriminator Loss Function</title>
<p>The discriminator <italic>D</italic> aims to distinguish between observed values and generated values. The loss function of the discriminator is defined as cross-entropy:
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:mo>[</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represents the probability predicted by the discriminator that the <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>i</mml:mi></mml:math></inline-formula>-th component is real data, <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mi>i</mml:mi></mml:math></inline-formula>-th value in the mask matrix (with missing positions marked as 0), and <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mrow><mml:mi mathvariant="normal">n</mml:mi></mml:mrow></mml:math></inline-formula> is the total number of data samples.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Generator Loss Function</title>
<p>The objective of the generator is to impute missing data by generating values that are as realistic as possible, such that the discriminator cannot easily distinguish between observed and imputed values. Furthermore, to ensure the accuracy of the imputed data at the positions of observed values, a reconstruction error term is incorporated.</p>
<p>(1) Adversarial Loss. The generator aims to mislead the discriminator into classifying the imputed data as observed data. The adversarial loss is applied exclusively to the positions of missing values:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msubsup><mml:mi>L</mml:mi><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>&#x03BD;</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:mo>[</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> ensures that the loss is computed only for missing values, and <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> represents the probability predicted by the discriminator that the <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>i</mml:mi></mml:math></inline-formula>-th component is real data.</p>
<p>(2) Reconstruction Loss. To ensure that the imputed values produced by the generator align closely with the ground truth data at observed positions, the reconstruction loss is formulated as follows:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mrow><mml:mo>[</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mi>G</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mn>2</mml:mn></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>M</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the ground truth data values, and <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mi>G</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> denotes the imputed values generated by the generator G. The total loss function of the generator is defined as a weighted sum of the adversarial loss <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msubsup><mml:mi>L</mml:mi><mml:mrow><mml:mi>G</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>v</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> and the reconstruction loss <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>M</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mi>G</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mi>L</mml:mi><mml:mi>G</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>&#x03BD;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:msub><mml:mi>L</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:math></disp-formula>where <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> is a hyperparameter that balances the two loss functions.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experimental Analysis</title>
<p>This project runs on the Ubuntu 20.04 operating system, equipped with an Intel<sup>&#x00AE;</sup> Xeon<sup>&#x00AE;</sup> Platinum 8474C processor (16-core vCPU) and an NVIDIA GeForce RTX 4090D GPU (24 GB VRAM). The deep learning framework used is PyTorch 2.0.0, with Python version 3.8 and CUDA version 11.8. Our experimental datasets and source code will be available at <ext-link ext-link-type="uri" xlink:href="https://github.com/zhangsu1234/BAC-GAN">https://github.com/zhangsu1234/BAC-GAN</ext-link> (accessed on 23 November 2025). In the subsequent experiments, this study aims to address the following research questions:</p>
<p>RQ1: How does the performance of the proposed model compare to other methods as the proportion of missing data increases?</p>
<p>RQ2: How does the proposed model perform across different datasets?</p>
<sec id="s4_1">
<label>4.1</label>
<title>Datasets</title>
<p>Datasets are shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Experimental dataset</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th>Acronym</th>
<th>Dataset name</th>
<th>Instance</th>
<th>Feature</th>
</tr>
</thead>
<tbody>
<tr>
<td>RLD</td>
<td>Residential load dataset</td>
<td>1580</td>
<td>96</td>
</tr>
<tr>
<td>SMD</td>
<td>Smart meter data</td>
<td>8760</td>
<td>134</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>(1) Residential Load Dataset (RLD): The dataset comprises residential household electricity consumption data, with daily load profiles sampled at 15-min intervals, resulting in a feature dimension of 96. This dataset provides detailed documentation of typical residential electricity usage patterns and holds significant value for analyzing and predicting household electricity consumption behaviors. The time series plot is shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Time series plot of RLD (15-min intervals)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-4.tif"/>
</fig>
<p>(2) Smart Meter Dataset (SMD): The SMD aggregates hourly energy consumption and multi-dimensional electrical parameters&#x2014;including voltage, current, and power factor&#x2014;from smart meters deployed across urban and suburban regions in a regional power distribution network. Covering a full calendar year of 2020 (8760 hourly records), the dataset integrates metadata such as timestamps and device status, enabling comprehensive analysis of load profiles, anomaly detection, and missing data imputation under realistic grid operation scenarios. The time series plot is shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Time series plot of SMD (1-h intervals)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-5.tif"/>
</fig>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Compared Algorithms</title>
<p>1. KNN [<xref ref-type="bibr" rid="ref-14">14</xref>]: This method identifies the K nearest observations to a sample with missing data using Euclidean distance and performs a distance-weighted average to impute the missing values.</p>
<p>2. VAE [<xref ref-type="bibr" rid="ref-24">24</xref>]: By maximizing the likelihood in the latent space and minimizing the discrepancy between the generated data and the original data, this approach recovers missing values.</p>
<p>3. GAIN [<xref ref-type="bibr" rid="ref-23">23</xref>]: Built upon the Generative Adversarial Network (GAN) framework, this model recovers missing data by performing conditional modeling of the missing locations and adversarial training.</p>
<p>4. M-RNN [<xref ref-type="bibr" rid="ref-19">19</xref>]: This method constructs a generative module based on Recurrent Neural Networks (RNNs) to model temporal dependencies by integrating historical and contextual information. It achieves data recovery through recursive prediction.</p>
<p>5. MIVAE [<xref ref-type="bibr" rid="ref-25">25</xref>]: By jointly optimizing the posterior likelihood of latent variables and the reconstruction error, this approach generates multiple probabilistic imputations within a variational framework, effectively modeling the uncertainty of missing values and enabling high-accuracy data recovery.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Evaluation Metric</title>
<p>Root Mean Square Error (RMSE), Mean Square Error (MSE), Mean Absolute Error (MAE), and the Coefficient of Determination (<inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>) are commonly used evaluation metrics for assessing the accuracy of predictive models or algorithms. The formulas for each metric are as follows:
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>R</mml:mi><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:msqrt><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:msqrt></mml:math></disp-formula>
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mn>2</mml:mn></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>N</mml:mi></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2223;</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">&#x2223;</mml:mo></mml:math></disp-formula>
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>y</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> is the actual value at the missing position, <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x00AF;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is the mean value, <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the imputed value at the missing position obtained by the imputation algorithm, and <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:math></inline-formula> is the total number of missing values.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Experimental Analysis</title>
<p>Sixty percent of the dataset was used as the training set, twenty percent was allocated to the validation set, and the remaining twenty percent was used for the test set. The experimental data were subjected to random missingness at missing rates ranging from <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mn>10</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> to <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mn>90</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula>.</p>
<p>The selection of the optimal hyperparameters, specifically Hint &#x003D; 0.9 and Alpha &#x003D; 400, was driven by a systematic grid search on the validation set with a <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mn>50</mml:mn><mml:mi mathvariant="normal">&#x0025;</mml:mi></mml:math></inline-formula> missing rate, with the results detailed in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Hyperparameter selection experiment</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th>(Alpha, Hint)</th>
<th>MSE</th>
<th>(Alpha, Hint)</th>
<th>MSE</th>
<th>(Alpha, Hint)</th>
<th>MSE</th>
<th>(Alpha, Hint)</th>
<th>MSE</th>
</tr>
</thead>
<tbody>
<tr>
<td>(100, 0.6)</td>
<td>0.0985</td>
<td>(200, 0.5)</td>
<td>0.0875</td>
<td>(400, 0.5)</td>
<td>0.0458</td>
<td>(800, 0.5)</td>
<td>0.0652</td>
</tr>
<tr>
<td>(100, 0.6)</td>
<td>0.0974</td>
<td>(200, 0.6)</td>
<td>0.0812</td>
<td>(400, 0.6)</td>
<td>0.0412</td>
<td>(800, 0.6)</td>
<td>0.0612</td>
</tr>
<tr>
<td>(100, 0.7)</td>
<td>0.0875</td>
<td>(200, 0.7)</td>
<td>0.0789</td>
<td>(400, 0.7)</td>
<td>0.0398</td>
<td>(800, 0.7)</td>
<td>0.0545</td>
</tr>
<tr>
<td>(100, 0.8)</td>
<td>0.0814</td>
<td>(200, 0.8)</td>
<td>0.0745</td>
<td>(400, 0.8)</td>
<td>0.0354</td>
<td>(800, 0.8)</td>
<td>0.0446</td>
</tr>
<tr>
<td>(100, 0.9)</td>
<td>0.0755</td>
<td>(200, 0.9)</td>
<td>0.0678</td>
<td>(400, 0.9)</td>
<td>0.0215</td>
<td>(800, 0.9)</td>
<td>0.0401</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Our analysis of the experimental data reveals a clear trend of consistent performance improvement as the Hint value increased from 0.5 to 0.9 across all Alpha groups. For instance, within the critical Alpha &#x003D; 400 group, the MSE decreased monotonically from 0.0458 at Hint &#x003D; 0.5 to 0.0215 at Hint &#x003D; 0.9, indicating that providing more precise location information to the discriminator is crucial for guiding the generator to produce more accurate imputations in high-missing-rate scenarios. The optimality of Alpha &#x003D; 400 emerges from its synergistic interaction with Hint &#x003D; 0.9, as evidenced by the performance comparison with other combinations. While the combination of Alpha &#x003D; 200 with Hint &#x003D; 0.9 yields a relatively high MSE of 0.0678, and Alpha &#x003D; 800 with Hint &#x003D; 0.9 achieves a better but still suboptimal MSE of 0.0401, the pair of Alpha &#x003D; 400 with Hint &#x003D; 0.9 achieves the lowest MSE of 0.0215 across the entire parameter space tested. This demonstrates that Alpha &#x003D; 400 provides the ideal weight for the reconstruction loss, ensuring the generator not only fools the discriminator but also accurately reconstructs the known values&#x2014;a capability fully leveraged when the discriminator is well-informed through Hint &#x003D; 0.9. Therefore, this specific combination was selected as it represents the configuration where the generator receives the most effective guidance from the discriminator while being simultaneously constrained to achieve the highest factual accuracy. The parameters of the network model are configured as shown in the <xref ref-type="table" rid="table-3">Table 3</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Model parameters</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th>Parameters</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hint</td>
<td>0.9</td>
</tr>
<tr>
<td>Missing_rate</td>
<td>0.1&#x2013;0.9</td>
</tr>
<tr>
<td>Alpha</td>
<td>400</td>
</tr>
<tr>
<td>num_epochs</td>
<td>1000</td>
</tr>
<tr>
<td>Batch_size</td>
<td>64</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The imputation results of the RLD dataset across a range of missing rates are shown in <xref ref-type="table" rid="table-4">Tables 4</xref>&#x2013;<xref ref-type="table" rid="table-8">8</xref>. The time<xref ref-type="table" rid="table-5"> </xref> series after imputation<xref ref-type="table" rid="table-6"> </xref> for a missing rate of 0.5 is shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>RMSE of RLD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.1163</td>
<td>0.1321</td>
<td>0.1327</td>
<td>0.1414</td>
<td>0.1558</td>
<td>0.1611</td>
<td>0.1648</td>
<td>0.1874</td>
<td>0.2185</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.0825</td>
<td>0.1080</td>
<td>0.1215</td>
<td>0.1750</td>
<td>0.1898</td>
<td>0.2453</td>
<td>0.3198</td>
<td>0.3370</td>
<td>0.3726</td>
</tr>
<tr>
<td>KNN</td>
<td>0.1586</td>
<td>0.1645</td>
<td>0.1653</td>
<td>0.1696</td>
<td>0.1712</td>
<td>0.1734</td>
<td>0.2105</td>
<td>0.2235</td>
<td>0.2339</td>
</tr>
<tr>
<td>VAE</td>
<td>0.2041</td>
<td>0.2103</td>
<td>0.2409</td>
<td>0.2740</td>
<td>0.2753</td>
<td>0.2781</td>
<td>0.2790</td>
<td>0.2804</td>
<td>0.2822</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.1208</td>
<td>0.1800</td>
<td>0.2275</td>
<td>0.2305</td>
<td>0.2430</td>
<td>0.2906</td>
<td>0.2974</td>
<td>0.2998</td>
<td>0.3345</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.1341</td>
<td>0.1715</td>
<td>0.2074</td>
<td>0.2171</td>
<td>0.2201</td>
<td>0.2597</td>
<td>0.2607</td>
<td>0.2654</td>
<td>0.2697</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>MSE of RLD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.0135</td>
<td>0.0174</td>
<td>0.0176</td>
<td>0.0200</td>
<td>0.0243</td>
<td>0.0260</td>
<td>0.0272</td>
<td>0.0351</td>
<td>0.0477</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.0068</td>
<td>0.0117</td>
<td>0.0148</td>
<td>0.0306</td>
<td>0.0360</td>
<td>0.0602</td>
<td>0.1023</td>
<td>0.1136</td>
<td>0.1388</td>
</tr>
<tr>
<td>VAE</td>
<td>0.0252</td>
<td>0.0270</td>
<td>0.0273</td>
<td>0.0288</td>
<td>0.0293</td>
<td>0.0301</td>
<td>0.0443</td>
<td>0.0499</td>
<td>0.0547</td>
</tr>
<tr>
<td>KNN</td>
<td>0.0416</td>
<td>0.0442</td>
<td>0.0580</td>
<td>0.0751</td>
<td>0.0758</td>
<td>0.0773</td>
<td>0.0779</td>
<td>0.0787</td>
<td>0.0796</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.0146</td>
<td>0.0324</td>
<td>0.0517</td>
<td>0.0531</td>
<td>0.0590</td>
<td>0.0844</td>
<td>0.0885</td>
<td>0.0899</td>
<td>0.1119</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.0180</td>
<td>0.0294</td>
<td>0.0430</td>
<td>0.0471</td>
<td>0.0484</td>
<td>0.0674</td>
<td>0.0680</td>
<td>0.0704</td>
<td>0.0727</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>MAE of RLD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.0998</td>
<td>0.1144</td>
<td>0.1182</td>
<td>0.1195</td>
<td>0.1254</td>
<td>0.1382</td>
<td>0.1511</td>
<td>0.1654</td>
<td>0.1796</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.0615</td>
<td>0.0825</td>
<td>0.0886</td>
<td>0.1328</td>
<td>0.1491</td>
<td>0.1847</td>
<td>0.2882</td>
<td>0.2936</td>
<td>0.3305</td>
</tr>
<tr>
<td>VAE</td>
<td>0.1258</td>
<td>0.1267</td>
<td>0.1279</td>
<td>0.1314</td>
<td>0.1406</td>
<td>0.1485</td>
<td>0.1762</td>
<td>0.1780</td>
<td>0.1852</td>
</tr>
<tr>
<td>KNN</td>
<td>0.1760</td>
<td>0.1799</td>
<td>0.2118</td>
<td>0.2473</td>
<td>0.2489</td>
<td>0.2512</td>
<td>0.2598</td>
<td>0.2642</td>
<td>0.2565</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.1123</td>
<td>0.1566</td>
<td>0.1715</td>
<td>0.1804</td>
<td>0.1896</td>
<td>0.2685</td>
<td>0.2703</td>
<td>0.2775</td>
<td>0.2955</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.1194</td>
<td>0.1367</td>
<td>0.1304</td>
<td>0.1417</td>
<td>0.1569</td>
<td>0.1614</td>
<td>0.1789</td>
<td>0.1814</td>
<td>0.1914</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title><inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> of RLD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.9865</td>
<td>0.9825</td>
<td>0.9824</td>
<td>0.9800</td>
<td>0.9757</td>
<td>0.9740</td>
<td>0.9728</td>
<td>0.9649</td>
<td>0.9523</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.9732</td>
<td>0.9683</td>
<td>0.9552</td>
<td>0.9494</td>
<td>0.9440</td>
<td>0.9398</td>
<td>0.9377</td>
<td>0.9201</td>
<td>0.8907</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.9487</td>
<td>0.9354</td>
<td>0.9217</td>
<td>0.9201</td>
<td>0.9199</td>
<td>0.9155</td>
<td>0.9109</td>
<td>0.9001</td>
<td>0.8999</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.9360</td>
<td>0.8527</td>
<td>0.8065</td>
<td>0.7770</td>
<td>0.7896</td>
<td>0.7684</td>
<td>0.7578</td>
<td>0.8356</td>
<td>0.7478</td>
</tr>
<tr>
<td>VAE</td>
<td>0.7858</td>
<td>0.7829</td>
<td>0.7858</td>
<td>0.7189</td>
<td>0.7210</td>
<td>0.6943</td>
<td>0.6899</td>
<td>0.6418</td>
<td>0.6039</td>
</tr>
<tr>
<td>KNN</td>
<td>0.6583</td>
<td>0.6558</td>
<td>0.5742</td>
<td>0.6049</td>
<td>0.5842</td>
<td>0.5727</td>
<td>0.5522</td>
<td>0.5314</td>
<td>0.5204</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>Recovery time of RLD under different missing rate level (Unit: ms)</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>69</td>
<td>80</td>
<td>95</td>
<td>143</td>
<td>152</td>
<td>178</td>
<td>f188</td>
<td>201</td>
<td>260</td>
</tr>
<tr>
<td>GAIN</td>
<td>78</td>
<td>85</td>
<td>99</td>
<td>103</td>
<td>110</td>
<td>138</td>
<td>165</td>
<td>198</td>
<td>241</td>
</tr>
<tr>
<td>VAE</td>
<td>84</td>
<td>98</td>
<td>107</td>
<td>118</td>
<td>136</td>
<td>187</td>
<td>196</td>
<td>219</td>
<td>264</td>
</tr>
<tr>
<td>KNN</td>
<td>120</td>
<td>134</td>
<td>174</td>
<td>189</td>
<td>201</td>
<td>213</td>
<td>239</td>
<td>287</td>
<td>334</td>
</tr>
<tr>
<td>MRNN</td>
<td>94</td>
<td>102</td>
<td>103</td>
<td>178</td>
<td>198</td>
<td>215</td>
<td>245</td>
<td>258</td>
<td>298</td>
</tr>
<tr>
<td>MIVAE</td>
<td>97</td>
<td>102</td>
<td>110</td>
<td>137</td>
<td>146</td>
<td>199</td>
<td>235</td>
<td>244</td>
<td>294</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Time series plot of RLD after imputation (Missing rate &#x003D; 0.5)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-6.tif"/>
</fig>
<p>At low missing rates (0.1&#x2013;0.3), the models<xref ref-type="table" rid="table-7"> </xref> were evaluated using metrics such as MSE, RMSE, and MAE. The results indicate that the proposed model performed second only to GAIN. At medium to high missing rates (0.4&#x2013;0.9), the proposed model achieved the best performance across these metrics, significantly outperforming other methods, demonstrating its strong robustness in handling high missing scenarios. Furthermore, the model exhibited favorable <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> values, indicating its ability to effectively capture the relationship between the imputed data and the ground truth. On the other hand, by comparing recovery times, the proposed model also showed favorable recovery times, suggesting that it not only excels in terms of accuracy but also efficiently handles the computational workload.</p>

<p>The imputation results of the RLD dataset across a range of missing rates are shown in <xref ref-type="table" rid="table-9">Tables 9</xref>&#x2013;<xref ref-type="table" rid="table-13">13</xref>. The time series<xref ref-type="table" rid="table-10"> </xref> after imputation for a missing<xref ref-type="table" rid="table-11"> </xref> rate of 0.5 is shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref></p>
<table-wrap id="table-9">
<label>Table 9</label>
<caption>
<title>RMSE of SMD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.1363</td>
<td>0.1371</td>
<td>0.1378</td>
<td>0.1390</td>
<td>0.1436</td>
<td>0.1438</td>
<td>0.1542</td>
<td>0.1597</td>
<td>0.1627</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.1064</td>
<td>0.1109</td>
<td>0.1292</td>
<td>0.1783</td>
<td>0.1820</td>
<td>0.1850</td>
<td>0.1968</td>
<td>0.2081</td>
<td>0.2087</td>
</tr>
<tr>
<td>VAE</td>
<td>0.2210</td>
<td>0.2211</td>
<td>0.2297</td>
<td>0.2326</td>
<td>0.2349</td>
<td>0.2370</td>
<td>0.2393</td>
<td>0.2421</td>
<td>0.2471</td>
</tr>
<tr>
<td>KNN</td>
<td>0.1733</td>
<td>0.2182</td>
<td>0.2681</td>
<td>0.2689</td>
<td>0.2736</td>
<td>0.2769</td>
<td>0.2820</td>
<td>0.2949</td>
<td>0.2968</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.1295</td>
<td>0.1945</td>
<td>0.2195</td>
<td>0.2752</td>
<td>0.2870</td>
<td>0.2901</td>
<td>0.2993</td>
<td>0.3017</td>
<td>0.3130</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.1414</td>
<td>0.1517</td>
<td>0.1774</td>
<td>0.1801</td>
<td>0.1921</td>
<td>0.2017</td>
<td>0.2114</td>
<td>0.2152</td>
<td>0.2314</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-10">
<label>Table 10</label>
<caption>
<title>MSE of SMD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.0186</td>
<td>0.0188</td>
<td>0.0190</td>
<td>0.0193</td>
<td>0.0206</td>
<td>0.0207</td>
<td>0.0238</td>
<td>0.0255</td>
<td>0.0265</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.0113</td>
<td>0.0123</td>
<td>0.0167</td>
<td>0.0318</td>
<td>0.0332</td>
<td>0.0342</td>
<td>0.0387</td>
<td>0.0433</td>
<td>0.0436</td>
</tr>
<tr>
<td>VAE</td>
<td>0.0480</td>
<td>0.0489</td>
<td>0.0528</td>
<td>0.0541</td>
<td>0.0552</td>
<td>0.0561</td>
<td>0.0573</td>
<td>0.0586</td>
<td>0.0611</td>
</tr>
<tr>
<td>KNN</td>
<td>0.0300</td>
<td>0.0476</td>
<td>0.0719</td>
<td>0.0723</td>
<td>0.0749</td>
<td>0.0767</td>
<td>0.0795</td>
<td>0.0869</td>
<td>0.0881</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.0168</td>
<td>0.0378</td>
<td>0.0482</td>
<td>0.0752</td>
<td>0.0823</td>
<td>0.0846</td>
<td>0.0896</td>
<td>0.0910</td>
<td>0.0979</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.0200</td>
<td>0.0230</td>
<td>0.0315</td>
<td>0.0324</td>
<td>0.0369</td>
<td>0.0404</td>
<td>0.0447</td>
<td>0.0463</td>
<td>0.0535</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-11">
<label>Table 11</label>
<caption>
<title>MAE of SMD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.1014</td>
<td>0.1071</td>
<td>0.1160</td>
<td>0.1173</td>
<td>0.1207</td>
<td>0.1278</td>
<td>0.1348</td>
<td>0.1379</td>
<td>0.1390</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.0780</td>
<td>0.0810</td>
<td>0.0989</td>
<td>0.1352</td>
<td>0.1393</td>
<td>0.1418</td>
<td>0.1567</td>
<td>0.1556</td>
<td>0.1563</td>
</tr>
<tr>
<td>VAE</td>
<td>0.1860</td>
<td>0.1863</td>
<td>0.2032</td>
<td>0.2064</td>
<td>0.2089</td>
<td>0.2112</td>
<td>0.2138</td>
<td>0.2168</td>
<td>0.2221</td>
</tr>
<tr>
<td>KNN</td>
<td>0.1374</td>
<td>0.1731</td>
<td>0.2133</td>
<td>0.2138</td>
<td>0.2175</td>
<td>0.2200</td>
<td>0.2238</td>
<td>0.2336</td>
<td>0.2353</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.1001</td>
<td>0.1602</td>
<td>0.1835</td>
<td>0.2339</td>
<td>0.2471</td>
<td>0.2666</td>
<td>0.2881</td>
<td>0.2922</td>
<td>0.3001</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.1101</td>
<td>0.1317</td>
<td>0.1379</td>
<td>0.1401</td>
<td>0.1497</td>
<td>0.1501</td>
<td>0.1697</td>
<td>0.1897</td>
<td>0.2014</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-12">
<label>Table 12</label>
<caption>
<title><inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msup><mml:mi>R</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> of SMD under different missing rate level</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.9503</td>
<td>0.9400</td>
<td>0.9401</td>
<td>0.9317</td>
<td>0.9256</td>
<td>0.9220</td>
<td>0.9201</td>
<td>0.9129</td>
<td>0.9089</td>
</tr>
<tr>
<td>GAIN</td>
<td>0.9274</td>
<td>0.9117</td>
<td>0.9008</td>
<td>0.9017</td>
<td>0.8994</td>
<td>0.8954</td>
<td>0.8845</td>
<td>0.8514</td>
<td>0.8102</td>
</tr>
<tr>
<td>VAE</td>
<td>0.7895</td>
<td>0.7785</td>
<td>0.7565</td>
<td>0.7486</td>
<td>0.7341</td>
<td>0.7214</td>
<td>0.7100</td>
<td>0.6458</td>
<td>0.6130</td>
</tr>
<tr>
<td>KNN</td>
<td>0.6624</td>
<td>0.5618</td>
<td>0.5012</td>
<td>0.4512</td>
<td>0.4529</td>
<td>0.4408</td>
<td>0.3917</td>
<td>0.3879</td>
<td>0.3540</td>
</tr>
<tr>
<td>MRNN</td>
<td>0.8971</td>
<td>0.8814</td>
<td>0.8543</td>
<td>0.8241</td>
<td>0.8214</td>
<td>0.8107</td>
<td>0.8047</td>
<td>0.7778</td>
<td>0.7521</td>
</tr>
<tr>
<td>MIVAE</td>
<td>0.9101</td>
<td>0.9087</td>
<td>0.9065</td>
<td>0.8958</td>
<td>0.8914</td>
<td>0.8814</td>
<td>0.8797</td>
<td>0.8701</td>
<td>0.8696</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-13">
<label>Table 13</label>
<caption>
<title>Recovery time of SMD under different missing rate level (Unit: ms)</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th align="center" colspan="10">Method</th>
</tr>
<tr>
<th>Missing rate</th>
<th>0.1</th>
<th>0.2</th>
<th>0.3</th>
<th>0.4</th>
<th>0.5</th>
<th>0.6</th>
<th>0.7</th>
<th>0.8</th>
<th>0.9</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>402</td>
<td>554</td>
<td>689</td>
<td>898</td>
<td>1201</td>
<td>1388</td>
<td>1590</td>
<td>1958</td>
<td>2164</td>
</tr>
<tr>
<td>GAIN</td>
<td>356</td>
<td>417</td>
<td>598</td>
<td>752</td>
<td>1024</td>
<td>1411</td>
<td>1741</td>
<td>2378</td>
<td>2575</td>
</tr>
<tr>
<td>VAE</td>
<td>312</td>
<td>401</td>
<td>498</td>
<td>665</td>
<td>997</td>
<td>1347</td>
<td>1607</td>
<td>2187</td>
<td>2413</td>
</tr>
<tr>
<td>KNN</td>
<td>845</td>
<td>1054</td>
<td>1254</td>
<td>1423</td>
<td>1665</td>
<td>1998</td>
<td>2385</td>
<td>2798</td>
<td>3041</td>
</tr>
<tr>
<td>MRNN</td>
<td>498</td>
<td>654</td>
<td>758</td>
<td>994</td>
<td>1474</td>
<td>1554</td>
<td>1745</td>
<td>2063</td>
<td>2369</td>
</tr>
<tr>
<td>MIVAE</td>
<td>641</td>
<td>875</td>
<td>956</td>
<td>1001</td>
<td>1417</td>
<td>1697</td>
<td>1956</td>
<td>2407</td>
<td>2598</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Time series plot of SMD after imputation (Missing rate &#x003D; 0.5)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72777-fig-7.tif"/>
</fig>
<p>The proposed model demonstrates<xref ref-type="table" rid="table-12"> </xref> outstanding performance across all missing-rate scenarios, with its superiority being particularly pronounced at medium to high missing rates (0.4&#x2013;0.9). Under these conditions, the model significantly outperforms comparative methods in terms of imputation accuracy, robustness, and ability to capture complex temporal dependencies. Meanwhile, BAC-GAN also exhibits excellent recovery time efficiency, maintaining fast response capabilities even under high missing-rate conditions, highlighting its strong applicability in practical situations where high proportions of data loss commonly occur.</p>

<p>Based on the four evaluation metrics mentioned above, the proposed model demonstrates excellent recovery performance under most missing-rate scenarios. The key to its success lies in the fact that BAC-GAN integrates BiLSTM and a multi-head attention mechanism, which effectively captures complex dependencies in time-series data and enhances the learning of global relationships. Meanwhile, the incorporation of CNN improves the model&#x2019;s ability to handle high-dimensional data and strengthens local feature extraction, thereby increasing the accuracy and robustness of data recovery.</p>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Ablation Study</title>
<p>To validate the effectiveness of individual components in our proposed BAC-GAN framework, we conducted comprehensive ablation studies under various missing rate scenarios (0.2, 0.4, 0.6, 0.8). As demonstrated in <xref ref-type="table" rid="table-14">Tables 14</xref> and <xref ref-type="table" rid="table-15">15</xref>, the complete BAC-GAN model consistently achieves superior or competitive performance across all missing rates, particularly excelling at higher missing scenarios (0.6&#x2013;0.8).</p>
<table-wrap id="table-14">
<label>Table 14</label>
<caption>
<title>Ablation study results on RLD dataset under different missing rates</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th></th>
<th colspan="2">0.2</th>
<th colspan="2">0.4</th>
<th colspan="2">0.6</th>
<th colspan="2">0.8</th>
</tr>
<tr>
<th></th>
<th>MSE</th>
<th>MAE</th>
<th>MSE</th>
<th>MAE</th>
<th>MSE</th>
<th>MAE</th>
<th>MSE</th>
<th>MAE</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.0174</td>
<td>0.1144</td>
<td>0.0200</td>
<td>0.1195</td>
<td>0.0260</td>
<td>0.1382</td>
<td>0.0351</td>
<td>0.1654</td>
</tr>
<tr>
<td>B-GAN</td>
<td>0.0180</td>
<td>0.1230</td>
<td>0.0264</td>
<td>0.1264</td>
<td>0.0310</td>
<td>0.1614</td>
<td>0.0794</td>
<td>0.2497</td>
</tr>
<tr>
<td>BC-GAN</td>
<td>0.0178</td>
<td>0.1149</td>
<td>0.0213</td>
<td>0.1201</td>
<td>0.0278</td>
<td>0.1397</td>
<td>0.0456</td>
<td>0.1756</td>
</tr>
<tr>
<td>BA-GAN</td>
<td>0.0183</td>
<td>0.1201</td>
<td>0.0245</td>
<td>0.1214</td>
<td>0.0281</td>
<td>0.1412</td>
<td>0.0642</td>
<td>0.1879</td>
</tr>
<tr>
<td>A-GAN</td>
<td>0.0190</td>
<td>0.1268</td>
<td>0.0287</td>
<td>0.1307</td>
<td>0.0541</td>
<td>0.1798</td>
<td>0.1014</td>
<td>0.2789</td>
</tr>
<tr>
<td>AC-GAN</td>
<td>0.0109</td>
<td>0.0798</td>
<td>0.0254</td>
<td>0.1256</td>
<td>0.0297</td>
<td>0.1498</td>
<td>0.0754</td>
<td>0.2014</td>
</tr>
<tr>
<td>C-GAN</td>
<td>0.0186</td>
<td>0.1245</td>
<td>0.0275</td>
<td>0.1301</td>
<td>0.0419</td>
<td>0.1697</td>
<td>0.0851</td>
<td>0.2596</td>
</tr>
<tr>
<td>GAN</td>
<td>0.0117</td>
<td>0.0825</td>
<td>0.0306</td>
<td>0.1328</td>
<td>0.0602</td>
<td>0.1847</td>
<td>0.1136</td>
<td>0.2936</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-15">
<label>Table 15</label>
<caption>
<title>Ablation study results on SMD dataset under different missing rates</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/> </colgroup>
<thead>
<tr>
<th rowspan="2" align="center">Method</th>
<th colspan="2">0.2</th>
<th colspan="2">0.4</th>
<th colspan="2">0.6</th>
<th colspan="2">0.8</th>
</tr>
<tr>
<th></th>
<th>MSE</th>
<th>MAE</th>
<th>MSE</th>
<th>MAE</th>
<th>MSE</th>
<th>MAE</th>
<th>MSE</th>
<th>MAE</th>
</tr>
</thead>
<tbody>
<tr>
<td>BAC-GAN</td>
<td>0.0188</td>
<td>0.1071</td>
<td>0.0193</td>
<td>0.1173</td>
<td>0.0207</td>
<td>0.1278</td>
<td>0.0255</td>
<td>0.1379</td>
</tr>
<tr>
<td>B-GAN</td>
<td>0.0193</td>
<td>0.1079</td>
<td>0.0259</td>
<td>0.1248</td>
<td>0.0265</td>
<td>0.1345</td>
<td>0.0312</td>
<td>0.1511</td>
</tr>
<tr>
<td>BC-GAN</td>
<td>0.0134</td>
<td>0.0910</td>
<td>0.0199</td>
<td>0.1189</td>
<td>0.0219</td>
<td>0.1298</td>
<td>0.0261</td>
<td>0.1412</td>
</tr>
<tr>
<td>BA-GAN</td>
<td>0.0156</td>
<td>0.0970</td>
<td>0.0208</td>
<td>0.1197</td>
<td>0.0232</td>
<td>0.1307</td>
<td>0.0278</td>
<td>0.1489</td>
</tr>
<tr>
<td>A-GAN</td>
<td>0.0201</td>
<td>0.1175</td>
<td>0.0304</td>
<td>0.1304</td>
<td>0.0327</td>
<td>0.1401</td>
<td>0.0412</td>
<td>0.1543</td>
</tr>
<tr>
<td>AC-GAN</td>
<td>0.0178</td>
<td>0.1041</td>
<td>0.0216</td>
<td>0.1211</td>
<td>0.0242</td>
<td>0.1316</td>
<td>0.0287</td>
<td>0.1501</td>
</tr>
<tr>
<td>C-GAN</td>
<td>0.0197</td>
<td>0.1085</td>
<td>0.0274</td>
<td>0.1297</td>
<td>0.0298</td>
<td>0.1365</td>
<td>0.0387</td>
<td>0.1525</td>
</tr>
<tr>
<td>GAN</td>
<td>0.0123</td>
<td>0.0810</td>
<td>0.0318</td>
<td>0.1352</td>
<td>0.0342</td>
<td>0.1418</td>
<td>0.0433</td>
<td>0.1556</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The performance degradation observed in partial component models reveals the complementary nature of the designed modules in this work. The BA-GAN variant demonstrates consistently intermediate performance, indicating that the attention mechanism contributes significantly to feature extraction but requires the constraint module to achieve optimal results. The A-GAN and C-GAN models exhibit relatively weaker performance, confirming that individual components alone are insufficient to handle complex missing data patterns.</p>
<p>Furthermore, the progressive performance decline from BC-GAN to BAC-GAN highlights the challenging nature of high missing-rate scenarios and validates the robustness of the proposed model. The complete BAC-GAN model maintains the most stable performance trajectory, exhibiting the smallest performance degradation as the missing rate increases from 0.2 to 0.8. These findings collectively demonstrate that the synergistic integration of all components in BAC-GAN is crucial for achieving robust performance across diverse missing data scenarios, with each element playing a distinct yet interdependent role in the overall architecture.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>With the continuous development of power systems, the complexity and diversity of power data have been steadily increasing. Due to factors such as sensor failures and communication interruptions, missing or anomalous data frequently occur in power systems, posing significant challenges to system monitoring and management. To address this issue, this paper proposes a data recovery method based on an improved Generative Adversarial Network (BAC-GAN). The generator of this method incorporates bidirectional LSTM and multi-head attention mechanisms to capture complex dependencies in time-series data, while the discriminator employs CNN to integrate local features with global structures, ensuring the rationality of data recovery. Without relying on complete datasets, this method achieves high-precision data recovery. Experiments conducted on three publicly available power system datasets demonstrate that the BAC-GAN model exhibits significant advantages in recovery accuracy compared to five state-of-the-art and classical data recovery methods, providing an effective solution for missing data recovery in power systems.</p>
<p>The study of nonlinear generation mechanisms for multidimensional data represents a critical direction for the future development of smart grids. Current generation models based on linear assumptions struggle to accurately capture the complex dynamic coupling characteristics among source-grid-load-storage components. This limitation is particularly evident in high-penetration renewable energy integration scenarios, where traditional methods fail to adequately characterize the spatiotemporal correlations of wind and solar power outputs and load response characteristics. Future research should focus on developing interpretable generation architectures that integrate domain knowledge, ensuring that the data generation process aligns with power system operational principles.</p>
<p>As the scale and complexity of power system data continue to grow, missing data recovery methods based on generative adversarial networks still hold broad prospects for development. On one hand, more efficient network architectures and training strategies can be explored to further enhance the recovery accuracy and generalization capability of the model. On the other hand, the integration of technologies such as edge computing and federated learning may enable efficient data recovery in distributed environments.</p>
<p>Furthermore, although this study does not delve deeply into the application of Generative Adversarial Networks in multidimensional data fusion within power systems, the potential of GANs in handling multidimensional power data deserves further exploration. GANs can integrate different types of data and generate high-quality recovery results, which would significantly improve the comprehensiveness and accuracy of data recovery in power systems. While this paper primarily focuses on time-series data recovery, future work could extend the application of GANs to multidimensional data fusion. By integrating various types of data in power systems, the performance and robustness of missing data recovery could be further enhanced.</p>
</sec>
</body>
<back>
<ack>
<p>The authors acknowledge the support provided by the Advanced Ocean Institute of Southeast University, Nantong.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This work was supported by the National Natural Science Foundation of China (No. 51977113) and the Science and Technology Project of State Grid Zhejiang Electric Power Co., Ltd. (No. 5211JX240001).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>Study conception and design: Su Zhang, Song Deng and Qingsheng Liu; data collection: Su Zhang, Song Deng and Qingsheng Liu; analysis and interpretation of results: Su Zhang, Song Deng and Qingsheng Liu; draft manuscript preparation: Su Zhang, Song Deng and Qingsheng Liu. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The authors confirm that the data supporting the findings of this study are available within the article.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cheng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Jara</surname> <given-names>A</given-names></string-name>, <string-name><surname>Song</surname> <given-names>H</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Big data and knowledge extraction for cyber-physical systems</article-title>. <source>Int J Distrib Sens Netw</source>. <year>2015</year>;<volume>11</volume>(<issue>9</issue>):<fpage>231527</fpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2022.026751</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Baccarelli</surname> <given-names>E</given-names></string-name>, <string-name><surname>Cordeschi</surname> <given-names>N</given-names></string-name>, <string-name><surname>Mei</surname> <given-names>A</given-names></string-name>, <string-name><surname>Panella</surname> <given-names>M</given-names></string-name>, <string-name><surname>Shojafar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Stefa</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Energy-efficient dynamic traffic offloading and reconfiguration of networked data centers for big data stream mobile computing: review, challenges, and a case study</article-title>. <source>IEEE Netw</source>. <year>2016</year>;<volume>30</volume>(<issue>2</issue>):<fpage>54</fpage>&#x2013;<lpage>61</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MNET.2016.7437025</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Big data analytics for system stability evaluation strategy in the energy Internet</article-title>. <source>IEEE Trans Ind Inform</source>. <year>2017</year>;<volume>13</volume>(<issue>4</issue>):<fpage>1969</fpage>&#x2013;<lpage>78</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TII.2017.2692775</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Fang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Misra</surname> <given-names>S</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>G</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Smart grid&#x2014;the new and improved power grid: a survey</article-title>. <source>IEEE Commun Surv Tutor</source>. <year>2011</year>;<volume>14</volume>(<issue>4</issue>):<fpage>944</fpage>&#x2013;<lpage>80</lpage>. doi:<pub-id pub-id-type="doi">10.1109/SURV.2011.101911.00087</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>MC</given-names></string-name>, <string-name><surname>Tsai</surname> <given-names>CF</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>WC</given-names></string-name></person-group>. <article-title>Towards missing electric power data imputation for energy management systems</article-title>. <source>Expert Syst Appl</source>. <year>2021</year>;<volume>174</volume>(<issue>1</issue>):<fpage>114743</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2021.114743</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mohammadi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Kavousi-Fard</surname> <given-names>A</given-names></string-name>, <string-name><surname>Dabbaghjamanesh</surname> <given-names>M</given-names></string-name>, <string-name><surname>Farughian</surname> <given-names>A</given-names></string-name>, <string-name><surname>Khosravi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Effective management of ener-gy internet in renewable hybrid microgrids: a secured data driven resilient architecture</article-title>. <source>IEEE Trans Ind Inform</source>. <year>2021</year>;<volume>18</volume>(<issue>3</issue>):<fpage>1896</fpage>&#x2013;<lpage>904</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TII.2021.3081683</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Do</surname> <given-names>V</given-names></string-name>, <string-name><surname>McBrien</surname> <given-names>H</given-names></string-name>, <string-name><surname>Flores</surname> <given-names>NM</given-names></string-name>, <string-name><surname>Northrop</surname> <given-names>AJ</given-names></string-name>, <string-name><surname>Schlegelmilch</surname> <given-names>J</given-names></string-name>, <string-name><surname>Kiang</surname> <given-names>MV</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Spatiotemporal distribution of power outages with climate events and social vulnerability in the USA</article-title>. <source>Nat Commun</source>. <year>2023</year>;<volume>14</volume>(<issue>1</issue>):<fpage>2470</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41467-023-38084-6</pub-id>; <pub-id pub-id-type="pmid">37120649</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yuan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Ji</surname> <given-names>J</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Linear interpolation process and its influence on the secondary equipment in substations</article-title>. In: <conf-name>2017 China International Electrical and Energy Conference (CIEEC); 2017 Oct 25&#x2013;27</conf-name>; <publisher-loc>Beijing, China</publisher-loc>. p. <fpage>205</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CIEEC.2017.8388447</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Wind power prediction with missing data using Gaussian process regression and multiple imputation</article-title>. <source>Appl Soft Comput</source>. <year>2018</year>;<volume>71</volume>:<fpage>905</fpage>&#x2013;<lpage>16</lpage>. doi:<pub-id pub-id-type="doi">10.1109/SURV.2011.101911.00087</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Van Buuren</surname> <given-names>S</given-names></string-name>, <string-name><surname>Groothuis-Oudshoorn</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Mice : multivariate imputation by chained equations in R</article-title>. <source>J Stat Softw</source>. <year>2011</year>;<volume>45</volume>:<fpage>1</fpage>&#x2013;<lpage>67</lpage>. doi:<pub-id pub-id-type="doi">10.18637/jss.v045.i03</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Broersen</surname> <given-names>PMT</given-names></string-name>, <string-name><surname>Bos</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Time-series analysis if data are randomly missing</article-title>. <source>IEEE Trans Instrum Meas</source>. <year>2006</year>;<volume>55</volume>(<issue>1</issue>):<fpage>79</fpage>&#x2013;<lpage>84</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIM.2005.861247</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Box</surname> <given-names>GEP</given-names></string-name>, <string-name><surname>Jenkins</surname> <given-names>GM</given-names></string-name>, <string-name><surname>Reinsel</surname> <given-names>GC</given-names></string-name></person-group>. <source>Time series analysis: forecasting and control</source>. <publisher-loc>Hoboken, NJ, USA</publisher-loc>: <publisher-name>John Wiley and Sons</publisher-name>; <year>2015</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>YP</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>SW</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Yin</surname> <given-names>B</given-names></string-name></person-group>. <article-title>An interpolation algorithm based on sliding neighborhood in wireless sensor networks</article-title>. <source>J Comput Res Dev</source>. <year>2012</year>;<volume>49</volume>(<issue>6</issue>):<fpage>1196</fpage>&#x2013;<lpage>203</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Poloczek</surname> <given-names>J</given-names></string-name>, <string-name><surname>Treiber</surname> <given-names>NA</given-names></string-name>, <string-name><surname>Kramer</surname> <given-names>O</given-names></string-name></person-group>. <article-title>KNN regression as geo-imputation method for spatio-temporal wind data</article-title>. In: <conf-name>Proceedings of International Joint Conference SOCO&#x2019;14-CISIS&#x2019;14-ICEUTE&#x2019;14; 2014 Jun 25&#x2013;27</conf-name>; <publisher-loc>Bilbao, Spain. Cham, Switzerland</publisher-loc>: <publisher-name>Springer International Publishing</publisher-name>; <year>2014</year>. p. <fpage>185</fpage>&#x2013;<lpage>93</lpage>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Su</surname> <given-names>T</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Yue</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Nonlinear compensation algorithm for multidimensional temporal data: a missing value imputation for the power grid applications</article-title>. <source>Knowl Based Syst</source>. <year>2021</year>;<volume>215</volume>:<fpage>106743</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.knosys.2021.106743</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Stekhoven</surname> <given-names>DJ</given-names></string-name>, <string-name><surname>B&#x00FC;hlmann</surname> <given-names>P</given-names></string-name></person-group>. <article-title>MissForest&#x2014;non-parametric missing value imputation for mixed-type data</article-title>. <source>Bioinformatics</source>. <year>2012</year>;<volume>28</volume>(<issue>1</issue>):<fpage>112</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1093/bioinformatics/btr597</pub-id>; <pub-id pub-id-type="pmid">22039212</pub-id></mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yoon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zame</surname> <given-names>WR</given-names></string-name>, <string-name><surname>Van Der Schaar</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Estimating missing data in temporal data streams using multi-directional recurrent neural networks</article-title>. <source>IEEE Trans Biomed Eng</source>. <year>2018</year>;<volume>66</volume>(<issue>5</issue>):<fpage>1477</fpage>&#x2013;<lpage>90</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TBME.2018.2874712</pub-id>; <pub-id pub-id-type="pmid">30296210</pub-id></mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Che</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Purushotham</surname> <given-names>S</given-names></string-name>, <string-name><surname>Cho</surname> <given-names>K</given-names></string-name>, <string-name><surname>Sontag</surname> <given-names>D</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Recurrent neural networks for multivariate time series with missing values</article-title>. <source>Sci Rep</source>. <year>2018</year>;<volume>8</volume>(<issue>1</issue>):<fpage>6085</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-018-24271-9</pub-id>; <pub-id pub-id-type="pmid">29666385</pub-id></mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Marisca</surname> <given-names>I</given-names></string-name>, <string-name><surname>Cini</surname> <given-names>A</given-names></string-name>, <string-name><surname>Alippi</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Learning to reconstruct missing data from spatiotemporal graphs with sparse observations</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2022</year>;<volume>35</volume>:<fpage>32069</fpage>&#x2013;<lpage>82</lpage>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.2205.13479</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tashiro</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Song</surname> <given-names>J</given-names></string-name>, <string-name><surname>Song</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ermon</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Csdi: conditional score-based diffusion models for probabilistic time series imputation</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2021</year>;<volume>34</volume>:<fpage>24804</fpage>&#x2013;<lpage>16</lpage>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.2107.03502</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mesquita</surname> <given-names>DPP</given-names></string-name>, <string-name><surname>Gomes</surname> <given-names>JPP</given-names></string-name>, <string-name><surname>Rodrigues</surname> <given-names>LR</given-names></string-name></person-group>. <article-title>Artificial neural networks with random weights for incomplete datasets</article-title>. <source>Neural Process Lett</source>. <year>2019</year>;<volume>50</volume>(<issue>3</issue>):<fpage>2345</fpage>&#x2013;<lpage>72</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11063-019-10012-02</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Siddiqi</surname> <given-names>MD</given-names></string-name>, <string-name><surname>Asadi</surname> <given-names>R</given-names></string-name>, <string-name><surname>Regan</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Imputation of missing traffic flow data using denoising autoencoders</article-title>. <source>Procedia Comput Sci</source>. <year>2021</year>;<volume>184</volume>:<fpage>84</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.procs.2021.03.122</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yoon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jordon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Schaar</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Gain: missing data imputation using generative adversarial nets</article-title>. In: <conf-name>Proceedings of the 35th International Conference on Machine Learning; 2018 Jul 10&#x2013;15</conf-name>; <publisher-loc>Stockholm Sweden</publisher-loc>. p. <fpage>5689</fpage>&#x2013;<lpage>98</lpage>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.1806.02920</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>McCoy</surname> <given-names>JT</given-names></string-name>, <string-name><surname>Kroon</surname> <given-names>S</given-names></string-name>, <string-name><surname>Auret</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Variational autoencoders for missing data imputation with application to a simulated milling circuit</article-title>. <source>IFAC-PapersOnLine</source>. <year>2018</year>;<volume>51</volume>(<issue>21</issue>):<fpage>141</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.ifacol.2018.09.406</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ma</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Bai</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ning</surname> <given-names>B</given-names></string-name>, <string-name><surname>Li</surname> <given-names>G</given-names></string-name></person-group>. <article-title>MIVAE: multiple imputation based on variational auto-encoder</article-title>. <source>Eng Appl Artif Intell</source>. <year>2023</year>;<volume>123</volume>:<fpage>106270</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2023.106270</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>