<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">48008</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.048008</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>TSCND: Temporal Subsequence-Based Convolutional Network with Difference for Time Series Forecasting</article-title>
<alt-title alt-title-type="left-running-head">TSCND: Temporal Subsequence-based Convolutional Network with Difference for Time Series Forecasting</alt-title>
<alt-title alt-title-type="right-running-head">TSCND: Temporal Subsequence-based Convolutional Network with Difference for Time Series Forecasting</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Huang</surname><given-names>Haoran</given-names></name></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Chen</surname><given-names>Weiting</given-names></name><email>wtchen@sei.ecnu.edu.cn</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Fan</surname><given-names>Zheming</given-names></name></contrib>
<aff><institution>MOE Research Center of Software/Hardware Co-Design Engineering, East China Normal University</institution>, <addr-line>Shanghai, 200062</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Weiting Chen. Email: <email>wtchen@sei.ecnu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>26</day>
<month>3</month>
<year>2024</year></pub-date>
<volume>78</volume>
<issue>3</issue>
<fpage>3665</fpage>
<lpage>3681</lpage>
<history>
<date date-type="received">
<day>24</day>
<month>11</month>
<year>2023</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>1</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 Huang, Chen and Fan</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Huang, Chen and Fan</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_48008.pdf"></self-uri>
<abstract>
<p>Time series forecasting plays an important role in various fields, such as energy, finance, transport, and weather. Temporal convolutional networks (TCNs) based on dilated causal convolution have been widely used in time series forecasting. However, two problems weaken the performance of TCNs. One is that in dilated casual convolution, causal convolution leads to the receptive fields of outputs being concentrated in the earlier part of the input sequence, whereas the recent input information will be severely lost. The other is that the distribution shift problem in time series has not been adequately solved. To address the first problem, we propose a subsequence-based dilated convolution method (SDC). By using multiple convolutional filters to convolve elements of neighboring subsequences, the method extracts temporal features from a growing receptive field via a growing subsequence rather than a single element. Ultimately, the receptive field of each output element can cover the whole input sequence. To address the second problem, we propose a difference and compensation method (DCM). The method reduces the discrepancies between and within the input sequences by difference operations and then compensates the outputs for the information lost due to difference operations. Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. The experimental results show that TSCND can reduce prediction mean squared error by 7.3% and save runtime, compared with state-of-the-art models and vanilla TCN.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Difference</kwd>
<kwd>data prediction</kwd>
<kwd>time series</kwd>
<kwd>temporal convolutional network</kwd>
<kwd>dilated convolution</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Key Research and Development Program</funding-source>
<award-id>2018YFB2101300</award-id>
</award-group>
<award-group id="awg2">
<funding-source>National Natural Science Foundation</funding-source>
<award-id>61871186</award-id>
</award-group>
<award-group id="awg3">
<funding-source>Dean&#x2019;s Fund of Engineering Research Center of Software/Hardware Co-Design Technology and Application, Ministry of Education</funding-source>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Inferring the future state of data based on past information [<xref ref-type="bibr" rid="ref-1">1</xref>], time series forecasting can help make current decisions, and it has important applications in weather [<xref ref-type="bibr" rid="ref-2">2</xref>], transport [<xref ref-type="bibr" rid="ref-3">3</xref>], finance [<xref ref-type="bibr" rid="ref-4">4</xref>], healthcare [<xref ref-type="bibr" rid="ref-5">5</xref>], and energy [<xref ref-type="bibr" rid="ref-6">6</xref>]. In recent years, research on time series forecasting has evolved from traditional statistical methods and machine learning techniques to forecasting models based on deep learning [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>].</p>
<p>As a common type of deep learning model for time series modeling, temporal convolutional networks (TCNs) have been widely used in current research on time series forecasting. When dealing with long sequences, TCNs are not as prone to gradient disappearance problems as recurrent neural networks (RNNs). Compared with transformers, TCNs have advantages in memory consumption [<xref ref-type="bibr" rid="ref-9">9</xref>] and do not use the permutation-invariant self-attention mechanism [<xref ref-type="bibr" rid="ref-10">10</xref>], which may lead to temporal information loss. These advantages make TCNs excellent modeling methods for time series forecasting.</p>
<p>Dilated causal convolution plays an important role in TCNs [<xref ref-type="bibr" rid="ref-11">11</xref>], and the structure of dilated causal convolution is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. It is essentially a pyramidal-like process of aggregating and learning sequence information. TCNs with dilated causal convolution have excellent memory capabilities [<xref ref-type="bibr" rid="ref-9">9</xref>].</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Dilated causal convolution with dilation factors of <italic>d</italic> &#x003D; 1, 2, 4 and a filter size of <italic>k</italic> &#x003D; 2</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-1.tif"/>
</fig>
<p>However, for time series forecasting, it is unnecessary to employ causal convolution to prevent future information leakage into the past, as the input sequence is solely past information compared to the predicted sequence. Worse still, causal convolution leads to the following problems. On the one hand, in dilated causal convolution, the earlier an input element is located, the greater its effect on the output sequences. For example, in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> affects <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, while <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> only affects the output <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>. Nevertheless, the model should focus more on recent elements of the inputs, as recent elements are closer to the time series to be predicted. On the other hand, only the last element <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>o</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can receive the information from the whole long sequence, and using this single element instead of a sequence to extract the whole input sequence information may result in serious information loss. Due to the limitations of causal convolution for time series forecasting, we decided to abandon causal convolution and design a new dilated convolution method. Since using a single convolutional filter per layer for dilation convolution tends to lose a lot of information, we use multiple convolutional filters to generate different elements that share the receptive field and form these elements into subsequences. By continuously convolving different subsequences, each final output element can share the receptive field covering the entire input sequence. This is an improvement over the dilated causal convolution in which only the last output element captures the entire input sequence information.</p>
<p>In addition, the problem of distribution shift exists widely in time series, which significantly reduces the performance of forecasting models. The distribution shift problem is that the statistical properties (such as the mean) of the time series may change over time, which may lead to the distribution shift between training and test data. For example, in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, there are significant differences in the time series across various time periods. Previous work mainly focused on eliminating discrepancies between the input sequences of the test and training datasets. However, in a long input sequence, the statistical properties vary considerably in different parts of the sequence, thus the discrepancies within an input sequence also need to be addressed. Since differencing can help stabilize the mean of a time series and eliminate trend and seasonality, we use differencing to solve the above problem.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The original time series and difference sequence in the ETTh1 dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-2.tif"/>
</fig>
<p>The contributions of this paper are as follows:
<list list-type="bullet">
<list-item>
<p>To mitigate information loss in dilated convolution on long sequences, we propose a novel subsequence-based convolution method (SDC). The method extracts temporal features from a receptive field via a growing subsequence, and the subsequence has a richer representation than a single element.</p></list-item>
<list-item>
<p>To break the limitation of dilated causal convolution on the receptive field, we use multiple convolution filters to generate elements that share a receptive field in SDC without causal convolution. As the elements of the shared receptive field increase, eventually, all output elements will be able to look back at the entire input sequence.</p></list-item>
<list-item>
<p>To alleviate the distribution shift in time series, we propose a difference and compensation method (DCM) to reduce the discrepancies between and within input sequences by difference operations. As shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, difference operation greatly reduces the variance of sequences at different time periods.</p></list-item>
<list-item>
<p>Based on SDC and DCM, we further construct a temporal subsequence-based convolutional network with difference (TSCND) for time series forecasting. Experimentally, compared with state-of-the-art methods and vanilla TCN, TSCND can reduce prediction mean squared error by 7.3% and save runtime. The results of the ablation experiments also demonstrate the effectiveness of SDC and DCM for time series forecasting.</p></list-item>
</list></p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Works</title>
<sec id="s2_1">
<label>2.1</label>
<title>Time Series Forecasting</title>
<p>Current time series forecasting methods can be divided into traditional statistics-based methods and deep learning-based methods.</p>
<p>Traditional statistics-based methods, such as the autoregressive integrated moving average (ARIMA) [<xref ref-type="bibr" rid="ref-12">12</xref>] and Kalman filter models [<xref ref-type="bibr" rid="ref-13">13</xref>], have had extensive theoretical research in the past. However, these traditional statistical-based methods perform poorly on complex time series.</p>
<p>Deep learning can automatically learn and model the hidden features of complex series based on raw data and can achieve better forecasting accuracy on complex time series datasets. Therefore, more research is now based on deep learning methods.</p>
<p>Recurrent neural networks (RNNs) [<xref ref-type="bibr" rid="ref-14">14</xref>] play an important role in time series forecasting [<xref ref-type="bibr" rid="ref-15">15</xref>&#x2013;<xref ref-type="bibr" rid="ref-17">17</xref>]. However, when the given time series is long, much of the information contained earlier in the time series will be lost [<xref ref-type="bibr" rid="ref-18">18</xref>].</p>
<p>In recent years, transformer-based models [<xref ref-type="bibr" rid="ref-19">19</xref>] have been widely used for time series forecasting tasks [<xref ref-type="bibr" rid="ref-20">20</xref>]. Informer [<xref ref-type="bibr" rid="ref-21">21</xref>] reduces the computational complexity of the self-attention mechanism and performs well in long-term time series forecasting. Autoformer [<xref ref-type="bibr" rid="ref-22">22</xref>] uses autocorrelation attention and seasonal decomposition methods for model construction, reducing the required computational workload and improving forecasting accuracy. FEDformer [<xref ref-type="bibr" rid="ref-23">23</xref>] uses the seasonal trend decomposition method and frequency-enhanced attention. PatchTST [<xref ref-type="bibr" rid="ref-24">24</xref>] achieves SOTA accuracy on several long-term time series forecasting datasets based on channel independence and subseries-level patch input mechanism. However, when these transformer-based models deal with long sequence data, they always require considerable computational resources and memory. In addition, using self-attention mechanisms for time series modeling may lead to the loss of temporal information [<xref ref-type="bibr" rid="ref-10">10</xref>].</p>
<p>TCNs are also popular for time series forecasting tasks [<xref ref-type="bibr" rid="ref-25">25</xref>&#x2013;<xref ref-type="bibr" rid="ref-28">28</xref>]. MISO-TCN [<xref ref-type="bibr" rid="ref-29">29</xref>] is a lightweight and novel weather forecasting model based on TCN. TCN-CBAM [<xref ref-type="bibr" rid="ref-30">30</xref>] uses the convolutional block attention module for the prediction of chaotic time series. VMD-GRU-TCN [<xref ref-type="bibr" rid="ref-31">31</xref>] integrates variational modal decomposition, gated recurrent unit and TCN into a hybrid network for high and low frequency load forecasting. Veg-W2TCN [<xref ref-type="bibr" rid="ref-32">32</xref>] combines multi-resolution analysis wavelet transform and TCN for vegetation change forecasting. TCNs use dilated causal convolution in the temporal dimension to learn temporal dependencies [<xref ref-type="bibr" rid="ref-33">33</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>]. For vanilla convolution operations, the size of the receptive field is linearly related to the number of convolution layers, which leads to an inability to handle long sequence inputs. Dilated causal convolution is a very efficient structure that helps a model achieve exponential receptive field sizes. For an input sequence <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>}</mml:mo></mml:mrow></mml:math></inline-formula>, the dilated causal convolution operation for the element located at <italic>s</italic> can be formalized as:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>i</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>where <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003A;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:math></inline-formula> is a convolutional filter, <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>d</mml:mi></mml:math></inline-formula> is the dilation factor, and <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>k</mml:mi></mml:math></inline-formula> is the convolutional filter size. The use of a larger <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>d</mml:mi></mml:math></inline-formula> allows the top output to represent a larger range of inputs, which effectively expands the receptive field of the convolutional network, as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<p>Dilated causal convolution can capture the long-term dependencies of the time series, but the causal convolution structure limits the receptive field and results in severe information loss during dilated convolution. Therefore, we propose a new dilated convolution method to replace dilated causal convolution in TCNs.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Distribution Shift Problem Addressing</title>
<p>Forecasting models often suffer badly from distribution shift in time series. Domain adaptation [<xref ref-type="bibr" rid="ref-35">35</xref>] and domain generalization [<xref ref-type="bibr" rid="ref-36">36</xref>] are common methods to alleviate the distribution shift [<xref ref-type="bibr" rid="ref-37">37</xref>,<xref ref-type="bibr" rid="ref-38">38</xref>]. However, these methods are often complex, and it is not easy to define the domain in non-stationary time series. Recently, some simple and effective methods have been proposed to address the distribution shift problem. The Revin method [<xref ref-type="bibr" rid="ref-39">39</xref>] normalizes each input sequence and then non-normalizes the model output sequence. However, this method prevents the model from obtaining features such as the magnitude of data changes. The NLinear model [<xref ref-type="bibr" rid="ref-10">10</xref>] uses a method of subtracting the end value of the input sequence from each value of the input sequence and restoring the end value to the output sequence. However, there is a problem of significant differences within the input sequence, which undermines the effectiveness of model training. Our proposed method minimizes the difference between and within input sequences without losing information in the original data.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Methods</title>
<sec id="s3_1">
<label>3.1</label>
<title>Model Structure</title>
<p>The overall architecture of our proposed model is shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. First, the DCM carries out a difference operation on the input time series to obtain difference sequences with smaller discrepancies in values, and then the Padding operation is used to extend the sequence length to meet the requirements of dilated convolution. Next, the Embedding operation maps the padded sequences to a high-dimensional space, followed by multiple layers of SDC to extract temporal dependencies of the input. Finally, the Decoding operation is used and the DCM restores the original input values to the Decoding output to obtain the prediction result.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Overview of the TSCND architecture</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-3.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Difference and Compensation Method (DCM)</title>
<p>DCM is used to address the problem of distribution shift. It contains a difference stage and a compensation stage.</p>
<p>In the difference stage, a difference sequence is obtained through differencing adjacent elements in an input sequence. For an input sequence <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>X</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> with length <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>t</mml:mi></mml:math></inline-formula>, the difference stage can be formalized as:</p>
<p><disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></disp-formula></p>
<p>In the compensation stage, to compensate for the information loss caused by difference operation, the last element value of the original sequence is added back to the output. For the output sequence <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>Y</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula> of length <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>q</mml:mi></mml:math></inline-formula>, the compensation stage can be formalized as:</p>
<p><disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>Y</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>q</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></disp-formula></p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Padding and Embedding</title>
<p>To ensure that the sequence length can meet the requirements of the SDC, we pad the difference sequence <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> with zeros, and the length <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>L</mml:mi></mml:math></inline-formula> of the padded sequence is:</p>
<p><disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msup></mml:math></disp-formula>where <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>k</mml:mi></mml:math></inline-formula> is the convolutional filter size, and <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>c</mml:mi></mml:math></inline-formula> is the number of layers of the Muti-layer SDC.</p>
<p>The process of padding and embedding the sequence <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> can be formalized as:</p>
<p><disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mo>,</mml:mo></mml:math></disp-formula></p>
<p><disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>E</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>E</mml:mi><mml:mi>m</mml:mi><mml:mi>b</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo>.</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is a linear layer that maps the padded sequence <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula> denotes the number of variates in time series, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi></mml:math></inline-formula> is a hyperparameter that represents the hidden variable dimension of the model, <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is the input of Muti-layer SDC.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Subsequence-Based Dilated Convolution Layer</title>
<p>Due to the limitation of dilated causal convolution on the receptive field, we propose the SDC to be used instead of dilated causal convolution. Our SDC has the following two similarities with dilated causal convolution: <bold>(1)</bold> The size of the receptive field of the output sequence elements is exponentially related to the number of network layers, so very large receptive fields can be achieved by stacking a few convolutional layers. <bold>(2)</bold> The input and output sequences are of equal length, which facilitates the stacking of layers to obtain a larger receptive field. The uniqueness of SDC is that it uses multiple convolutional filters to convolve the elements of adjacent subsequences. The elements in a subsequence share a receptive field. <xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows a 3-layer SDC structure.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The multi-layer SDC contains 3 SDC layers with <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi mathvariant="bold-italic">k</mml:mi></mml:math></inline-formula> (here <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi mathvariant="bold-italic">k</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mtext mathvariant="bold">2</mml:mtext></mml:mrow></mml:math></inline-formula>) convolution filters at each layer, this figure omits the residual connection. The elements generated using the same convolution filter are represented by the same color (e.g., at Layer 2, the elements colored in red are all generated by the same convolution filter, but they do not share a convolution filter with the green elements). Subsequences are marked by rounded rectangles in the figure. Solid lines show how an output element acquires information about the entire input sequence via the SDC</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-4.tif"/>
</fig>
<p>At the SDC layer, an input or output sequence is divided into several subsequences. For multi-layer SDC, the initial subsequence length is 1, and the length will increase with the number of SDC layers by SDC operation. Each SDC layer uses <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi>k</mml:mi></mml:math></inline-formula> convolutional filters of size <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>k</mml:mi></mml:math></inline-formula>, which means that the size and number of convolutional filters used in each layer are equal. This setting ensures that the input and output sequences possess equal lengths.</p>
<p>Specifically, at SDC Layer <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>i</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>k</mml:mi></mml:math></inline-formula> input subsequences are convolved to generate an output subsequence whose length is the sum of the lengths of these input subsequences. Therefore, we can calculate the length of subsequences for each layer. The output sequence <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> of SDC Layer <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mi>i</mml:mi></mml:math></inline-formula> can be expressed based on subsequence as:</p>
<p><disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> denotes the <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>a</mml:mi></mml:math></inline-formula>-th element of the <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>b</mml:mi></mml:math></inline-formula>-th subsequence, the subsequence length <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msup><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is the <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>i</mml:mi></mml:math></inline-formula>-th power of <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mi>k</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the number of the subsequences, and <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>L</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>.</p>
<p>The detailed process of the SDC operation for convolving subsequences is as follows: firstly, the elements in the neighboring subsequences are convolved using several different filters, and elements in the same subsequence are not convolved with each other. Then the output elements generated by convolving the same elements but with different convolution filters are adjacent to each other, and the elements generated by the same subsequences are merged into a new subsequence. E.g., in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, convolution between <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>, not between <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula>, the output elements <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> generated by the same input elements are adjacent to each other.</p>
<p>We first formalize SDC operations from the perspective of a single output element. The SDC operation on the <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>a</mml:mi></mml:math></inline-formula>-th element <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msubsup><mml:mrow><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> of the <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mi>b</mml:mi></mml:math></inline-formula>-th subsequence can be formalized as:</p>
<p><disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:msubsup><mml:mrow><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>f</mml:mi><mml:mi>d</mml:mi><mml:mi>c</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mi>a</mml:mi><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:msubsup><mml:mi>h</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mover><mml:mi>r</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mo>,</mml:mo><mml:mover><mml:mi>p</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mo>+</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msubsup></mml:math></disp-formula>where <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> is the input sequence of Layer <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mi>i</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mi>k</mml:mi></mml:math></inline-formula> is the convolutional filter size, <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msub><mml:mo>&#x003A;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:math></inline-formula> is the <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mi>m</mml:mi></mml:math></inline-formula>-th convolution filter, <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mi>a</mml:mi><mml:mspace width="thinmathspace" /><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mspace width="thinmathspace" /><mml:mi>k</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mover><mml:mi>r</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">&#x230A;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>a</mml:mi><mml:mi>k</mml:mi></mml:mfrac></mml:mstyle><mml:mo fence="false" stretchy="false">&#x230B;</mml:mo></mml:math></inline-formula>, and <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mover><mml:mi>p</mml:mi><mml:mo>.</mml:mo></mml:mover><mml:mo>=</mml:mo><mml:mi>b</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>k</mml:mi></mml:math></inline-formula>.</p>
<p>Next, based on the SDC operation for a single element, we extend to the SDC operation for the sequence. The SDC operation for the input sequence <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> can be formalized as:</p>
<p><disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msub><mml:mrow><mml:mover><mml:mi>H</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mi>D</mml:mi><mml:mi>C</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="ueqn-10"><mml:math id="mml-ueqn-10" display="block"><mml:mspace width="1em" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mi>h</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>}</mml:mo></mml:mrow></mml:math></disp-formula>
<disp-formula id="ueqn-11"><mml:math id="mml-ueqn-11" display="block"><mml:mspace width="1em" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mi>f</mml:mi><mml:mi>d</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>0</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mi>f</mml:mi><mml:mi>d</mml:mi><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>For a SDC Layer <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>i</mml:mi></mml:math></inline-formula>, the output <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is obtained after SDC operation and residual connection, the process can be formalized as:</p>
<p><disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>L</mml:mi><mml:mi>U</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>H</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula>where <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>L</mml:mi><mml:mi>U</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is the rectified linear unit (ReLU) activation function [<xref ref-type="bibr" rid="ref-40">40</xref>].</p>
<p>To enhance the comprehensibility of the SDC layer, we utilize pseudo-code to illustrate the process in Algorithm 1.</p>
<fig id="fig-7">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-7.tif"/>
</fig>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Decoding</title>
<p>The Decoding module includes 2 fully-connected layers. The one maps <inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to <inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:mi>t</mml:mi><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula> denotes the number of target variates to be predicted. And then the other one maps <inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:msub><mml:mi>H</mml:mi><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to Y <inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>v</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>p</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-97"><mml:math id="mml-ieqn-97"><mml:mi>p</mml:mi><mml:mi>L</mml:mi></mml:math></inline-formula> denotes the time step of the target variates.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experiments</title>
<p>To evaluate our model&#x2019;s performance, we conducted univariate time series forecasting on four popular real-world datasets ETTh1, ETTh2, ECL, and WTH.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Datasets</title>
<p><bold>ETT (Electricity Transformer Temperature):</bold> This dataset consists of the electricity transformer temperature derived from two different countries in China for two years. ETTh1 and ETTh2 correspond to data collected at one-hour intervals from these two counties. The target &#x201C;oil temperature&#x201D; was predicted based on past data. The train/val/test is 12/4/4 months.</p>
<p><bold>ECL (Electricity Consumption Load):</bold> This dataset collects the electricity consumption loads (kWh) of 321 customers. We set &#x201C;MT_320&#x201D; as the target value. The training, validation, and test sets were scaled to 0.6/0.2/0.2.</p>
<p><bold>WTH (Weather):</bold> This dataset collects four years of climate data from 2010 to 2013; it covers 1600 locations in the United States, with data points collected every hour. Each data point consists of the target value &#x201C;wet bulb&#x201D;. The training, validation, and test sets were scaled to 0.6/0.2/0.2.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Baselines</title>
<p>To verify the superiority of our proposed method, we compared it with four SOTA models: PatchTST, FEDformer, Autoformer, and Informer. In addition, our proposed method was also compared with the classic models TCN and LSTM.</p>
<p><bold>PatchTST (2023)</bold> [<xref ref-type="bibr" rid="ref-24">24</xref>]: This is a transformer-based model. it achieves SOTA accuracy on ETTh1 and ETTh2 datasets based on channel independence and subseries-level patch input mechanism.</p>
<p><bold>FEDformer (2022)</bold> [<xref ref-type="bibr" rid="ref-23">23</xref>]: This is a transformer-based model. It uses the seasonal trend decomposition method and a frequency-enhanced attention module. Since the frequency-enhanced attention module has linear complexity, FEDformer is more efficient than the standard transformer. It achieves the best forecasting performance on the benchmark dataset in comparison with previous state-of-the-art algorithms.</p>
<p><bold>Autoformer (2021)</bold> [<xref ref-type="bibr" rid="ref-22">22</xref>]: This is a transformer-based model. It uses auto-correlation attention and seasonal decomposition for model construction, reducing the required computational effort and achieving improved forecasting accuracy.</p>
<p><bold>Informer (2021)</bold> [<xref ref-type="bibr" rid="ref-21">21</xref>]: This is an efficient transformer-based model that uses the ProbSparse self-attention mechanism to reduce its computational complexity. The model performs well in long-term time series prediction tasks.</p>
<p><bold>TCN (2018)</bold> [<xref ref-type="bibr" rid="ref-9">9</xref>]: This is a convolutional network for sequence modeling that uses dilated causal convolution. It outperforms recurrent neural networks in many sequence modeling tasks.</p>
<p><bold>LSTM (1997)</bold> [<xref ref-type="bibr" rid="ref-41">41</xref>]: This is a recurrent neural network whose gating mechanism alleviates the problem of gradient disappearance and allows the model to capture long-term dependencies.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Evaluation Metrics</title>
<p>We used the following evaluation metrics:</p>
<p>(1) Mean squared error (MSE):</p>
<p><disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>&#x2211;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>(2) Mean absolute error (MAE):</p>
<p><disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:math></disp-formula>where <inline-formula id="ieqn-98"><mml:math id="mml-ieqn-98"><mml:mi>n</mml:mi></mml:math></inline-formula> is the number of time steps to be predicted, <inline-formula id="ieqn-99"><mml:math id="mml-ieqn-99"><mml:mi>y</mml:mi></mml:math></inline-formula> is the predicted value, and <inline-formula id="ieqn-100"><mml:math id="mml-ieqn-100"><mml:msup><mml:mi>y</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> is the ground truth.</p>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Implementation Details</title>
<p>For all the compared models, the same parameter settings were used for the training process, with predicted sequence lengths of 24, 48, 168, 336, and 720. The models were optimized using the adaptive moment estimation (Adam) optimizer with learning rates starting at 1e-3. The total number of epochs is 8 with proper early stopping. We used Mean Squared Error (MSE) as our loss function. The inputs of each dataset were zero-mean normalized. Following the previous related work [<xref ref-type="bibr" rid="ref-42">42</xref>], we used a 10-residual-block stack in the TCN. Referring to Informer [<xref ref-type="bibr" rid="ref-21">21</xref>] and PatchTST [<xref ref-type="bibr" rid="ref-24">24</xref>], the input length was selected from {168, 336, 512, 720}. The other settings used their default values.</p>
<p>For better performance of TSCND, we set the convolutional filter size <inline-formula id="ieqn-101"><mml:math id="mml-ieqn-101"><mml:mi>k</mml:mi></mml:math></inline-formula> to 2 and the hidden layer dimension <inline-formula id="ieqn-102"><mml:math id="mml-ieqn-102"><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi></mml:math></inline-formula> to 64 based on the validation dataset. To allow the output elements to have a look back at the entire input sequence while the model has as little computational effort as possible, the number of model layers is computed from the input length <inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:mi>t</mml:mi></mml:math></inline-formula>, and the number of model layers <inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">&#x230A;</mml:mo><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>t</mml:mi><mml:mo fence="false" stretchy="false">&#x230B;</mml:mo><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>.</p>
</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Experimental Results</title>
<p><xref ref-type="table" rid="table-1">Tables 1</xref> and <xref ref-type="table" rid="table-2">2</xref> list the forecasting results of the comparison models and our model. Our model achieved the best results in 32/40 cases and the second-best results in 6/8 cases where it did not achieve the best results. Compared with the SOTA model PatchTST, our model yields an overall 7.3% relative MSE reduction, and when the prediction length is 720, the improvement can be more than 15%. TSCND performs slightly worse than PatchTST on the MSE metrics in the ETTh1 dataset. This is due to the presence of many outliers in the ETTh1 dataset, and the self-attention mechanism employed by PatchTST is more advantageous than convolutional networks in dealing with outliers. Meanwhile, TSCND outperforms PatchTST on the MAE metric in the ETTh1 dataset. This metric is insensitive to outliers, which supports the above view and suggests that TSCND is better at predicting the normal value of ETTh1.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>The MSE results for time series forecasting tasks with different prediction lengths</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th colspan="2">Method</th>
<th>Ours</th>
<th>PatchTST</th>
<th>FEDformer</th>
<th>Autoformer</th>
<th>Informer</th>
<th>LSTM</th>
<th>TCN</th>
</tr>
<tr>
<th colspan="2">Metric</th>
<th>MSE</th>
<th>MSE</th>
<th>MSE</th>
<th>MSE</th>
<th>MSE</th>
<th>MSE</th>
<th>MSE</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="5">ETTh1</td>
<td>24</td>
<td><bold>0.027</bold></td>
<td><underline>0.028</underline></td>
<td>0.046</td>
<td>0.064</td>
<td>0.098</td>
<td>0.065</td>
<td>0.044</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.040</bold></td>
<td><underline>0.041</underline></td>
<td>0.066</td>
<td>0.091</td>
<td>0.158</td>
<td>0.120</td>
<td>0.061</td>
</tr>
<tr>
<td>168</td>
<td><underline>0.073</underline></td>
<td><bold>0.071</bold></td>
<td>0.100</td>
<td>0.118</td>
<td>0.183</td>
<td>0.249</td>
<td>0.086</td>
</tr>
<tr>
<td>336</td>
<td><underline>0.085</underline></td>
<td><bold>0.080</bold></td>
<td>0.125</td>
<td>0.119</td>
<td>0.222</td>
<td>0.244</td>
<td>0.131</td>
</tr>
<tr>
<td>720</td>
<td><underline>0.089</underline></td>
<td><bold>0.086</bold></td>
<td>0.162</td>
<td>0.123</td>
<td>0.269</td>
<td>0.266</td>
<td>0.195</td>
</tr>
<tr>
<td rowspan="5">ETTh2</td>
<td>24</td>
<td><bold>0.067</bold></td>
<td><underline>0.072</underline></td>
<td>0.108</td>
<td>0.104</td>
<td>0.093</td>
<td>0.147</td>
<td>0.090</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.098</bold></td>
<td><underline>0.101</underline></td>
<td>0.130</td>
<td>0.140</td>
<td>0.155</td>
<td>0.182</td>
<td>0.119</td>
</tr>
<tr>
<td>168</td>
<td><bold>0.164</bold></td>
<td><underline>0.165</underline></td>
<td>0.183</td>
<td>0.182</td>
<td>0.232</td>
<td>0.276</td>
<td>0.223</td>
</tr>
<tr>
<td>336</td>
<td><bold>0.188</bold></td>
<td><underline>0.195</underline></td>
<td>0.206</td>
<td>0.268</td>
<td>0.263</td>
<td>0.300</td>
<td>0.268</td>
</tr>
<tr>
<td>720</td>
<td><bold>0.202</bold></td>
<td><underline>0.209</underline></td>
<td>0.305</td>
<td>0.351</td>
<td>0.277</td>
<td>0.355</td>
<td>0.312</td>
</tr>
<tr>
<td rowspan="5">ECL</td>
<td>24</td>
<td><bold>0.138</bold></td>
<td><bold>0.138</bold></td>
<td>0.371</td>
<td>0.366</td>
<td>0.246</td>
<td>0.774</td>
<td>0.182</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.176</bold></td>
<td><underline>0.180</underline></td>
<td>0.350</td>
<td>0.450</td>
<td>0.285</td>
<td>0.909</td>
<td>0.213</td>
</tr>
<tr>
<td>168</td>
<td><bold>0.237</bold></td>
<td><underline>0.256</underline></td>
<td>0.295</td>
<td>0.644</td>
<td>0.373</td>
<td>0.908</td>
<td>0.287</td>
</tr>
<tr>
<td>336</td>
<td><bold>0.298</bold></td>
<td><underline>0.306</underline></td>
<td>0.421</td>
<td>0.703</td>
<td>0.416</td>
<td>0.955</td>
<td>0.314</td>
</tr>
<tr>
<td>720</td>
<td><bold>0.357</bold></td>
<td>0.502</td>
<td>0.453</td>
<td>0.677</td>
<td>0.408</td>
<td>0.995</td>
<td><underline>0.361</underline></td>
</tr>
<tr>
<td rowspan="5">WTH</td>
<td>24</td>
<td><bold>0.093</bold></td>
<td><bold>0.093</bold></td>
<td>0.254</td>
<td>0.143</td>
<td>0.116</td>
<td>0.150</td>
<td>0.094</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.137</bold></td>
<td>0.140</td>
<td>0.257</td>
<td>0.181</td>
<td>0.203</td>
<td>0.196</td>
<td><underline>0.138</underline></td>
</tr>
<tr>
<td>168</td>
<td>0.233</td>
<td><underline>0.224</underline></td>
<td>0.307</td>
<td>0.273</td>
<td>0.284</td>
<td>0.272</td>
<td><bold>0.221</bold></td>
</tr>
<tr>
<td>336</td>
<td><bold>0.276</bold></td>
<td>0.306</td>
<td>0.336</td>
<td>0.320</td>
<td>0.331</td>
<td>0.315</td>
<td><underline>0.284</underline></td>
</tr>
<tr>
<td>720</td>
<td><bold>0.351</bold></td>
<td>0.398</td>
<td>0.371</td>
<td>0.404</td>
<td>0.353</td>
<td>0.405</td>
<td><underline>0.372</underline></td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Note: A lower MSE indicates a better prediction. The best results are highlighted in bold and the next best result is highlighted with an underline.</p>
</table-wrap-foot>
</table-wrap><table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>The MAE results for time series forecasting tasks with different prediction lengths</title>
</caption>
<table frame="hsides">
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th colspan="2">Method</th>
<th>TSCND</th>
<th>PatchTST</th>
<th>FEDformer</th>
<th>Autoformer</th>
<th>Informer</th>
<th>LSTM</th>
<th>TCN</th>
</tr>
<tr>
<th colspan="2">Metric</th>
<th>MAE</th>
<th>MAE</th>
<th>MAE</th>
<th>MAE</th>
<th>MAE</th>
<th>MAE</th>
<th>MAE</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="5">ETTh1</td>
<td>24</td>
<td><bold>0.125</bold></td>
<td><underline>0.128</underline></td>
<td>0.162</td>
<td>0.204</td>
<td>0.247</td>
<td>0.205</td>
<td>0.163</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.155</bold></td>
<td><underline>0.157</underline></td>
<td>0.191</td>
<td>0.237</td>
<td>0.319</td>
<td>0.288</td>
<td>0.190</td>
</tr>
<tr>
<td>168</td>
<td><bold>0.212</bold></td>
<td><bold>0.212</bold></td>
<td>0.246</td>
<td>0.267</td>
<td>0.346</td>
<td>0.431</td>
<td>0.225</td>
</tr>
<tr>
<td>336</td>
<td><underline>0.231</underline></td>
<td><bold>0.227</bold></td>
<td>0.284</td>
<td>0.271</td>
<td>0.387</td>
<td>0.424</td>
<td>0.286</td>
</tr>
<tr>
<td>720</td>
<td><bold>0.235</bold></td>
<td><underline>0.239</underline></td>
<td>0.319</td>
<td>0.278</td>
<td>0.435</td>
<td>0.449</td>
<td>0.364</td>
</tr>
<tr>
<td rowspan="5">ETTh2</td>
<td>24</td>
<td><bold>0.190</bold></td>
<td><underline>0.207</underline></td>
<td>0.251</td>
<td>0.255</td>
<td>0.240</td>
<td>0.307</td>
<td>0.235</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.237</bold></td>
<td><underline>0.248</underline></td>
<td>0.279</td>
<td>0.291</td>
<td>0.314</td>
<td>0.342</td>
<td>0.270</td>
</tr>
<tr>
<td>168</td>
<td><bold>0.318</bold></td>
<td><underline>0.324</underline></td>
<td>0.335</td>
<td>0.332</td>
<td>0.389</td>
<td>0.424</td>
<td>0.384</td>
</tr>
<tr>
<td>336</td>
<td><bold>0.348</bold></td>
<td><underline>0.357</underline></td>
<td>0.361</td>
<td>0.403</td>
<td>0.417</td>
<td>0.442</td>
<td>0.423</td>
</tr>
<tr>
<td>720</td>
<td><bold>0.363</bold></td>
<td><underline>0.366</underline></td>
<td>0.443</td>
<td>0.472</td>
<td>0.431</td>
<td>0.484</td>
<td>0.450</td>
</tr>
<tr>
<td rowspan="5">ECL</td>
<td>24</td>
<td><underline>0.269</underline></td>
<td><bold>0.264</bold></td>
<td>0.452</td>
<td>0.472</td>
<td>0.363</td>
<td>0.726</td>
<td>0.317</td>
</tr>
<tr>
<td>48</td>
<td><bold>0.300</bold></td>
<td><underline>0.302</underline></td>
<td>0.447</td>
<td>0.504</td>
<td>0.382</td>
<td>0.786</td>
<td>0.341</td>
</tr>
<tr>
<td>168</td>
<td><bold>0.345</bold></td>
<td><underline>0.350</underline></td>
<td>0.407</td>
<td>0.624</td>
<td>0.442</td>
<td>0.773</td>
<td>0.387</td>
</tr>
<tr>
<td>336</td>
<td><bold>0.386</bold></td>
<td><underline>0.384</underline></td>
<td>0.493</td>
<td>0.618</td>
<td>0.481</td>
<td>0.788</td>
<td>0.407</td>
</tr>
<tr>
<td>720</td>
<td><bold>0.450</bold></td>
<td>0.514</td>
<td>0.513</td>
<td>0.646</td>
<td>0.480</td>
<td>0.818</td>
<td><underline>0.458</underline></td>
</tr>
<tr>
<td rowspan="5">WTH</td>
<td>24</td>
<td><bold>0.211</bold></td>
<td><bold>0.211</bold></td>
<td>0.386</td>
<td>0.278</td>
<td>0.255</td>
<td>0.291</td>
<td>0.212</td>
</tr>
<tr>
<td>48</td>
<td><underline>0.264</underline></td>
<td><underline>0.264</underline></td>
<td>0.386</td>
<td>0.315</td>
<td>0.338</td>
<td>0.329</td>
<td><bold>0.262</bold></td>
</tr>
<tr>
<td>168</td>
<td>0.354</td>
<td><underline>0.341</underline></td>
<td>0.440</td>
<td>0.390</td>
<td>0.416</td>
<td>0.391</td>
<td><bold>0.345</bold></td>
</tr>
<tr>
<td>336</td>
<td><bold>0.390</bold></td>
<td>0.412</td>
<td>0.449</td>
<td>0.419</td>
<td>0.457</td>
<td>0.420</td>
<td><underline>0.397</underline></td>
</tr>
<tr>
<td>720</td>
<td><bold>0.450</bold></td>
<td>0.475</td>
<td>0.475</td>
<td>0.468</td>
<td>0.464</td>
<td>0.495</td>
<td><underline>0.457</underline></td>
</tr>
</tbody>
</table>
<table-wrap-foot><p>Note: A lower MAE indicates a better prediction. The best results are highlighted in bold and the next best result is highlighted with an underline.</p>
</table-wrap-foot>
</table-wrap>
<p>Compared to TCN, our method brings improvement in all cases on ETTh1, ETTh2, and ECL datasets, and brings improvement in 7/10 cases on WHT datasets. It achieved better results than TCN, with an average MSE reduction of 19.4%. In the short-term forecasting of WTH, the results of TSCND and TCN are similar for the following reasons. The time series of the WTH dataset is relatively stationary, thus the DCM method is of little help to the model. And when the prediction length is short, this advantage of SDC using subsequences instead of elements to extract features is not obvious, so TSCND and TCN perform similarly. However, when predicting long-term time series, TSCND is superior to TCN.</p>
<p>To evaluate the efficiency of our proposed model in handling inputs of different lengths, we chose PatchTST and TCN for comparison. We compared the time required for training each model for 1 epoch on the ETTh2 dataset under different input sequence lengths. The results are shown in <xref ref-type="table" rid="table-3">Table 3</xref>. Compared with PatchTST and TCN, it reduces training time by over 40% on average. The reason for the faster training speed of TSCND compared to TCN is that <bold>(1)</bold> TSCND uses only one convolutional layer per layer, while TCN uses two convolutional layers per layer. <bold>(2)</bold> TSCND is designed with the number of layers adjusted according to the length of the inputs, while TCN has a fixed number of layers. TSCND uses fewer layers when processing short inputs. <bold>(3)</bold> Since the stride length of the convolutional layers in TCN is shorter than in TSCND, the elements of each layer in TCN participate in the convolution more times.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Time taken by each model to train for an epoch on the ETTh2 dataset</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Input length</th>
<th>Ours</th>
<th>PatchTST</th>
<th>TCN</th>
</tr>
</thead>
<tbody>
<tr>
<td>24</td>
<td><bold>2.43 s</bold></td>
<td>5.23 s</td>
<td>6.28 s</td>
</tr>
<tr>
<td>48</td>
<td><bold>2.69 s</bold></td>
<td>5.70 s</td>
<td>6.30 s</td>
</tr>
<tr>
<td>168</td>
<td><bold>4.06 s</bold></td>
<td>6.26 s</td>
<td>6.41 s</td>
</tr>
<tr>
<td>336</td>
<td><bold>4.33 s</bold></td>
<td>7.09 s</td>
<td>7.13 s</td>
</tr>
<tr>
<td>720</td>
<td><bold>5.01 s</bold></td>
<td>7.18 s</td>
<td>7.61 s</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_6">
<label>4.6</label>
<title>Ablation Studies</title>
<p>To evaluate the proposed method for addressing the distribution shift problem, we conducted experiments on the ETTh1 dataset using our proposed method, the Revin method (Revin) [<xref ref-type="bibr" rid="ref-39">39</xref>], and the NLinear method (SubLast) [<xref ref-type="bibr" rid="ref-10">10</xref>]. The experimental results are shown in <xref ref-type="table" rid="table-4">Table 4</xref>, and it can be seen that our method achieves the best performance in almost all cases. Compared with the original input, our method yields an overall 52.6% relative MSE reduction, indicating the effectiveness of the proposed method. We conducted experiments using CPU AMD Ryzen 7 5800 to compare the time taken by our method with the other methods. The results of CPU times are shown in <xref ref-type="table" rid="table-5">Table 5</xref>. Our method takes more time than the SubLast method but much less time than the Revin method. This time is almost negligible compared to the overall training time of the model. Therefore, we believe that our method remains an alternative to SubLast method where efficiency is pursued.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>The MSE results of TSCND on the ETTh1 dataset using three different methods to address the distribution shift problem</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Predicted length</th>
<th>Origin</th>
<th>Ours</th>
<th>Revin [<xref ref-type="bibr" rid="ref-39">39</xref>]</th>
<th>SubLast [<xref ref-type="bibr" rid="ref-10">10</xref>]</th>
</tr>
</thead>
<tbody>
<tr>
<td>24</td>
<td>0.044</td>
<td><bold>0.027</bold></td>
<td>0.029</td>
<td>0.028</td>
</tr>
<tr>
<td>48</td>
<td>0.067</td>
<td><bold>0.040</bold></td>
<td>0.041</td>
<td><bold>0.040</bold></td>
</tr>
<tr>
<td>168</td>
<td>0.110</td>
<td>0.073</td>
<td>0.087</td>
<td><bold>0.071</bold></td>
</tr>
<tr>
<td>336</td>
<td>0.131</td>
<td><bold>0.085</bold></td>
<td>0.095</td>
<td>0.089</td>
</tr>
<tr>
<td>720</td>
<td>0.311</td>
<td><bold>0.089</bold></td>
<td>0.111</td>
<td>0.097</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>The comparison of CPU times taken by methods for addressing distribution shift</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Predicted length</th>
<th>Ours</th>
<th>Revin</th>
<th>SubLast</th>
</tr>
</thead>
<tbody>
<tr>
<td>168</td>
<td><underline>2.23 ms</underline></td>
<td>3.31 ms</td>
<td><bold>1.23 ms</bold></td>
</tr>
<tr>
<td>336</td>
<td><underline>6.08 ms</underline></td>
<td>15.62 ms</td>
<td><bold>4.71 ms</bold></td>
</tr>
<tr>
<td>720</td>
<td><underline>9.41 ms</underline></td>
<td>21.45 ms</td>
<td><bold>8.42 ms</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>To test whether a subsequence can capture more temporal dependencies than a single element, we tested three different structures of the Decoder Layer for making predictions using SDC output.</p>
<p><bold>Structure A:</bold> This is also the structure as that in TSCND. The structure uses the whole output sequence to predict each future time point, which is shown as Structure A in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Three different decoding structures</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-5.tif"/>
</fig>
<p><bold>Structure B:</bold> In this structure, each single output element of the Multi-layer SDC is used to predict a time point independently, which is shown as Structure B in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<p><bold>Structure C:</bold> This is similar to the dilated causal convolution, where only one output element can obtain information about the whole input sequence. The structure predicts each time point using a single element of the Multi-layer SDC outputs, which is shown as Structure C in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<p>We conducted experiments using above structures on the ECL dataset, and the experimental results are shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. The prediction error of Structure A is much lower than the other two structures under all lengths, especially for long lengths. Using output sequences rather than single elements for prediction improves prediction accuracy, indicating that SDC can enhance feature extraction by using subsequences to extract features from receptive fields.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Time series forecasting results of three different decoding structures on the ECL dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_48008-fig-6.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusions</title>
<p>This paper proposes a temporal subsequence-based convolutional network with difference for time series forecasting, with two effective modules: (i) SDC, which extracts information from a receptive field via a subsequence rather than a single element, and multiple convolutional filters are used at each layer to enable the elements in the same subsequence to share a receptive field. These designs allow the receptive fields of all final output elements to cover the entire input sequence, thus reducing the loss of information in the dilated convolution. (ii) DCM, which can effectively solve the problem of distribution shift. The method reduces the discrepancies between and within input sequences through difference operations. By restoring the original input information to the outputs, the method can compensate for the loss of information due to difference operations.</p>
<p>We conducted experiments on commonly used time series datasets. Firstly, TSCND outperforms the SOTA method on most of the MAE and MSE metrics, which indicates that TSCND is effective in both short-term and long-term time series forecasting. Secondly, the experiments on training time demonstrate the advantage of TSCND in efficiency. Finally, ablation experiments show that DCM is useful in mitigating the distribution shift problem and the use of subsequences instead of single elements in the SDC method can enhance the feature extraction capability.</p>
<p>In future work, we will further study from the following directions: Firstly, the convolution filter size and hidden layer dimension of the proposed model are important hyperparameters. They are currently set manually relying on experience, and it would be valuable to study the automatic selection of these hyperparameters. Then we will explore combining the proposed SDC with research on heterogeneous information systems for multivariate time series forecasting. Finally, we will consider the possibility of the SDC as an alternative to dilated causal convolution for time series classification and anomaly detection.</p>
</sec>
</body>
<back>
<ack>
<p>We thank the members of the MOE Research Center of Software/Hardware Co-Design Engineering for their contributions to this work.</p>
</ack>
<sec><title>Funding Statement</title>
<p>This work was supported by the National Key Research and Development Program of China (No. 2018YFB2101300), the National Natural Science Foundation of China (Grant No. 61871186), and the Dean&#x2019;s Fund of Engineering Research Center of Software/Hardware Co-Design Technology and Application, Ministry of Education (East China Normal University).</p>
</sec>
<sec><title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Haoran Huang and Weiting Chen; data collection: Haoran Huang; analysis and interpretation of results: Haoran Huang and Weiting Chen; draft manuscript preparation: Haoran Huang, Weiting Chen, and Zheming Fan. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>All datasets that support the findings of this study are openly available. The ETT dataset is available at <ext-link ext-link-type="uri" xlink:href="https://github.com/zhouhaoyi/ETDataset">https://github.com/zhouhaoyi/ETDataset</ext-link>; The ECL dataset is available at <ext-link ext-link-type="uri" xlink:href="https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014">https://archive.ics.uci.edu/ml/datasets/ElectricityLoadDiagrams20112014</ext-link>; The WTH dataset is available at <ext-link ext-link-type="uri" xlink:href="https://www.ncei.noaa.gov/data/local-climatological-data/">https://www.ncei.noaa.gov/data/local-climatological-data/</ext-link>. The code repository address of this paper is <ext-link ext-link-type="uri" xlink:href="https://github.com/1546645614/TSCND">https://github.com/1546645614/TSCND</ext-link>.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X. H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J. J.</given-names> <surname>Peng</surname></string-name>, <string-name><given-names>Z. C.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>X. L.</given-names> <surname>Tang</surname></string-name></person-group>, &#x201C;<article-title>A hybrid framework for multivariate long-sequence time series forecasting</article-title>,&#x201D; <source>Appl. Intell.</source>, vol. <volume>53</volume>, no. <issue>11</issue>, pp. <fpage>13549</fpage>&#x2013;<lpage>13568</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1007/s10489-022-04110-1</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Fathi</surname></string-name>, <string-name><given-names>K. M.</given-names> <surname>Haghi</surname></string-name>, <string-name><given-names>S. M.</given-names> <surname>Jameii</surname></string-name>, and <string-name><given-names>E.</given-names> <surname>Mahdipour</surname></string-name></person-group>, &#x201C;<article-title>Big data analytics in weather forecasting: A systematic review</article-title>,&#x201D; <source>Arch. Comput. Method. Eng.</source>, vol. <volume>29</volume>, no. <issue>2</issue>, pp. <fpage>1247</fpage>&#x2013;<lpage>1275</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s11831-021-09616-4</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>C. X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S. M.</given-names> <surname>Xiang</surname></string-name>, and <string-name><given-names>C. H.</given-names> <surname>Pan</surname></string-name></person-group>, &#x201C;<article-title>TVGCN: Time-variant graph convolutional network for traffic forecasting</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>471</volume>, no. <issue>6</issue>, pp. <fpage>118</fpage>&#x2013;<lpage>129</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.neucom.2021.11.006</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>D&#x2019;Urso</surname></string-name>, <string-name><given-names>L. de</given-names> <surname>Giovanni</surname></string-name>, and <string-name><given-names>R.</given-names> <surname>Massari</surname></string-name></person-group>, &#x201C;<article-title>Trimmed fuzzy clustering of financial time series based on dynamic time warping</article-title>,&#x201D; <source>Ann. Oper. Res.</source>, vol. <volume>299</volume>, pp. <fpage>1379</fpage>&#x2013;<lpage>1395</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1007/s10479-019-03284-1</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Piccialli</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Giampaolo</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Prezioso</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Camacho</surname></string-name>, and <string-name><given-names>G.</given-names> <surname>Acampora</surname></string-name></person-group>, &#x201C;<article-title>Artificial intelligence and healthcare: Forecasting of medical bookings through multi-source time-series fusion</article-title>,&#x201D; <source>Inf. Fusion</source>, vol. <volume>74</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.inffus.2021.03.004</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. Z.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>H. J.</given-names> <surname>Dong</surname></string-name>, <string-name><given-names>W. Y.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>S. L.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y. T.</given-names> <surname>Huang</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Xi</surname></string-name></person-group>, &#x201C;<article-title>Review and prospect of data-driven techniques for load forecasting in integrated energy systems</article-title>,&#x201D; <source>Appl. Energ.</source>, vol. <volume>321</volume>, pp. <fpage>119269</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1016/j.apenergy.2022.119269</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. R.</given-names> <surname>Yin</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Qun</surname></string-name></person-group>, &#x201C;<article-title>A deep multivariate time series multistep forecasting network</article-title>,&#x201D; <source>Appl. Intell.</source>, vol. <volume>52</volume>, no. <issue>8</issue>, pp. <fpage>8956</fpage>&#x2013;<lpage>8974</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s10489-021-02899-x</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W. F.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>Q. Y.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>X. B.</given-names> <surname>Hou</surname></string-name> and <string-name><given-names>S. Q.</given-names> <surname>He</surname></string-name></person-group>, &#x201C;<article-title>Time series forecasting fusion network model based on prophet and improved LSTM</article-title>,&#x201D; <source>Comput. Mater. Contin.</source>, vol. <volume>74</volume>, no. <issue>2</issue>, pp. <fpage>3199</fpage>&#x2013;<lpage>3219</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.32604/cmc.2023.032595</pub-id>; <pub-id pub-id-type="pmid">37303558</pub-id></mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S. J.</given-names> <surname>Bai</surname></string-name>, <string-name><given-names>J. Z.</given-names> <surname>Kolter</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Koltun</surname></string-name></person-group>, &#x201C;<article-title>An Empirical evaluation of generic convolutional and recurrent networks for sequence modeling</article-title>,&#x201D; <comment>arXiv preprint arXiv:1803.01271</comment>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A. L.</given-names> <surname>Zeng</surname></string-name>, <string-name><given-names>M. X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>Q.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>Are transformers effective for time series forecasting?</article-title>,&#x201D; in <conf-name>Proc. AAAI Conf. Artif. Intell.</conf-name>, <conf-loc>Washington, USA</conf-loc>, <year>2023</year>, vol. <volume>37</volume>, pp. <fpage>11121</fpage>&#x2013;<lpage>11128</lpage>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M. H.</given-names> <surname>Liu</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Scinet: Time series modeling and forecasting with sample convolution and interaction</article-title>,&#x201D; in <conf-name>Proc. Annu. Conf. Neural Inform. Process. Syst.</conf-name>, <conf-loc>New Orleans, USA</conf-loc>, <year>2022</year>, vol. <volume>35</volume>, pp. <fpage>5816</fpage>&#x2013;<lpage>5828</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G. E.</given-names> <surname>Box</surname></string-name> and <string-name><given-names>G. M.</given-names> <surname>Jenkins</surname></string-name></person-group>, &#x201C;<article-title>Some recent advances in forecasting and control</article-title>,&#x201D; <source>J. R. Stat. Soc. Ser. C. (Appl. Stat.)</source>, vol. <volume>17</volume>, no. <issue>2</issue>, pp. <fpage>91</fpage>&#x2013;<lpage>109</lpage>, <year>1968</year>. doi: <pub-id pub-id-type="doi">10.2307/2985674</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G. W.</given-names> <surname>Morrison</surname></string-name> and <string-name><given-names>D. H.</given-names> <surname>Pike</surname></string-name></person-group>, &#x201C;<article-title>Kalman filtering applied to statistical forecasting</article-title>,&#x201D; <source>Manage. Sci.</source>, vol. <volume>23</volume>, no. <issue>7</issue>, pp. <fpage>768</fpage>&#x2013;<lpage>774</lpage>, <year>1977</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Greff</surname></string-name>, <string-name><given-names>R. K.</given-names> <surname>Srivastava</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Koutn&#x00ED;k</surname></string-name>, <string-name><given-names>B. R.</given-names> <surname>Steunebrink</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Schmidhuber</surname></string-name></person-group>, &#x201C;<article-title>LSTM: A search space odyssey</article-title>,&#x201D; <source>IEEE Trans. Neur. Net. Lear. Syst.</source>, vol. <volume>28</volume>, no. <issue>10</issue>, pp. <fpage>2222</fpage>&#x2013;<lpage>2232</lpage>, <year>2016</year>. doi: <pub-id pub-id-type="doi">10.1109/TNNLS.2016.2582924</pub-id>; <pub-id pub-id-type="pmid">27411231</pub-id></mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Salinas</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Flunkert</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Gasthaus</surname></string-name>, and <string-name><given-names>T.</given-names> <surname>Januschowski</surname></string-name></person-group>, &#x201C;<article-title>DeepAR: Probabilistic forecasting with autoregressive recurrent networks</article-title>,&#x201D; <source>Int. J. Forecasting</source>, vol. <volume>36</volume>, no. <issue>3</issue>, pp. <fpage>1181</fpage>&#x2013;<lpage>1191</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1016/j.ijforecast.2019.07.001</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>X. B.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>W. J.</given-names> <surname>Ji</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Jing</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>He</surname></string-name></person-group>, &#x201C;<article-title>Short-term electricity load forecasting model based on EMD-GRU with feature selection</article-title>,&#x201D; <source>Energies</source>, vol. <volume>12</volume>, no. <issue>6</issue>, pp. <fpage>1140</fpage>, <year>2019</year>. doi: <pub-id pub-id-type="doi">10.3390/en12061140</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>J. X.</given-names> <surname>Cao</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yuan</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Cheng</surname></string-name></person-group>, &#x201C;<article-title>Short-term forecasting of natural gas prices by using a novel hybrid method based on a combination of the CEEMDAN-SE-and the PSO-ALS-optimized GRU network</article-title>,&#x201D; <source>Energy</source>, vol. <volume>233</volume>, no. <issue>10</issue>, pp. <fpage>121082</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.energy.2021.121082</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Q. L.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>Z. X.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>E. H.</given-names> <surname>Chen</surname></string-name>, and <string-name><given-names>G.</given-names> <surname>Cottrell</surname></string-name></person-group>, &#x201C;<article-title>Temporal pyramid recurrent neural network</article-title>,&#x201D; in <conf-name>Proc. AAAI Conf. Artif. Intell.</conf-name>, <conf-loc>New York, USA</conf-loc>, <year>2020</year>, vol. <volume>34</volume>, no. <issue>1</issue>, pp. <fpage>5061</fpage>&#x2013;<lpage>5068</lpage>. doi: <pub-id pub-id-type="doi">10.1609/aaai.v34i04.5947</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Vaswani</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Attention is all you need</article-title>,&#x201D; in <conf-name>Proc. Annu. Conf. Neural Inform. Process. Syst.</conf-name>, <conf-loc>California, USA</conf-loc>, <year>2017</year>, vol. <volume>30</volume>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Zaheer</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Big bird: Transformers for longer sequences</article-title>,&#x201D; in <conf-name>Proc. Annu. Conf. Neural Inform. Process. Syst.</conf-name>, <year>2020</year>, vol. <volume>33</volume>, pp. <fpage>17283</fpage>&#x2013;<lpage>17297</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H. Y.</given-names> <surname>Zhou</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Informer: Beyond efficient transformer for long sequence time-series forecasting</article-title>,&#x201D; in <conf-name>Proc. AAAI. Conf. Artif. Intell.</conf-name>, <year>2021</year>, vol. <volume>35</volume>, pp. <fpage>11106</fpage>&#x2013;<lpage>11115</lpage>. doi: <pub-id pub-id-type="doi">10.1609/aaai.v35i12.17325</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H. X.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>J. H.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>J. M.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>M. S.</given-names> <surname>Long</surname></string-name></person-group>, &#x201C;<article-title>Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting</article-title>,&#x201D; in <conf-name>Proc. Annu. Conf. Neural Inform. Process. Syst.</conf-name>, <conf-loc>Montreal, Canada</conf-loc>, <year>2021</year>, vol. <volume>34</volume>, pp. <fpage>22419</fpage>&#x2013;<lpage>22430</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>Z. Q.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>Q. S.</given-names> <surname>Wen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Sun</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Jin</surname></string-name></person-group>, &#x201C;<article-title>FEDformer: Frequency enhanced decomposed transformer for long-term series forecasting</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. Mach. Learn.</conf-name>, <conf-loc>Maryland, USA</conf-loc>, <year>2022</year>, pp. <fpage>27268</fpage>&#x2013;<lpage>27286</lpage>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Nie</surname></string-name>, <string-name><given-names>N. H.</given-names> <surname>Nguyen</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Sinthong</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Kalagnanam</surname></string-name></person-group>, &#x201C;<article-title>A time series is worth 64 words: Long-term forecasting with transformers</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. Learn. Representations</conf-name>, <conf-loc>Kigali, Rwanda</conf-loc>, <year>2023</year>, pp. <fpage>1</fpage>&#x2013;<lpage>25</lpage>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. F.</given-names> <surname>Torres</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hadjout</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sebaa</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Mart&#x00ED;nez-&#x00C1;lvarez</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Troncoso</surname></string-name></person-group>, &#x201C;<article-title>Deep learning for time series forecasting: A survey</article-title>,&#x201D; <source>Big Data</source>, vol. <volume>9</volume>, no. <issue>1</issue>, pp. <fpage>3</fpage>&#x2013;<lpage>21</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1089/big.2020.0159</pub-id>; <pub-id pub-id-type="pmid">33275484</pub-id></mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>X. M.</given-names> <surname>Kang</surname></string-name>, <string-name><given-names>J. P.</given-names> <surname>Xiong</surname></string-name>, and <string-name><given-names>J. H.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>A new time series forecasting model based on complete ensemble empirical mode decomposition with adaptive noise and temporal convolutional network</article-title>,&#x201D; <source>Neural Process. Lett.</source>, vol. <volume>55</volume>, no. <issue>4</issue>, pp. <fpage>4397</fpage>&#x2013;<lpage>4417</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1007/s11063-022-11046-7</pub-id>; <pub-id pub-id-type="pmid">36248248</pub-id></mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Limouni</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Yaagoubi</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Bouziane</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Guissi</surname></string-name>, and <string-name><given-names>E. H.</given-names> <surname>Baali</surname></string-name></person-group>, &#x201C;<article-title>Accurate one step and multistep forecasting of very short-term PV power using LSTM-TCN model</article-title>,&#x201D; <source>Renew. Energy.</source>, vol. <volume>205</volume>, pp. <fpage>1010</fpage>&#x2013;<lpage>1024</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1016/j.renene.2023.01.118</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Sun</surname></string-name>, and <string-name><given-names>B. G.</given-names> <surname>Du</surname></string-name></person-group>, &#x201C;<article-title>Multivariable time series forecasting for urban water demand based on temporal convolutional network combining random forest feature selection and discrete wavelet transform</article-title>,&#x201D; <source>Water Resour. Manag.</source>, vol. <volume>36</volume>, no. <issue>9</issue>, pp. <fpage>3385</fpage>&#x2013;<lpage>3400</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1007/s11269-022-03207-z</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Hewage</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Temporal convolutional neural (TCN) network for an effective weather forecasting using time-series data from the local weather station</article-title>,&#x201D; <source>Soft Comput.</source>, vol. <volume>24</volume>, pp. <fpage>16453</fpage>&#x2013;<lpage>16482</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1007/s00500-020-04954-0</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Cheng</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>High-efficiency chaotic time series prediction based on time convolution neural network</article-title>,&#x201D; <source>Chaos, Solit. Fractals</source>, vol. <volume>152</volume>, pp. <fpage>111304</fpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1016/j.chaos.2021.111304</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C. C.</given-names> <surname>Cai</surname></string-name>, <string-name><given-names>Y. J.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Z. H.</given-names> <surname>Su</surname></string-name>, <string-name><given-names>T. Q.</given-names> <surname>Zhu</surname></string-name>, and <string-name><given-names>Y. Y.</given-names> <surname>He</surname></string-name></person-group>, &#x201C;<article-title>Short-term electrical load forecasting based on VMD and GRU-TCN hybrid network</article-title>,&#x201D; <source>Appl. Sci.</source>, vol. <volume>12</volume>, no. <issue>13</issue>, pp. <fpage>6647</fpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.3390/app12136647</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Rhif</surname></string-name>, <string-name><given-names>A. B.</given-names> <surname>Abbes</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Mart&#x00ED;nez</surname></string-name>, and <string-name><given-names>I. R.</given-names> <surname>Farah</surname></string-name></person-group>, &#x201C;<article-title>Veg-W2TCN: A parallel hybrid forecasting framework for non-stationary time series using wavelet and temporal convolution network model</article-title>,&#x201D; <source>Appl. Soft. Comput.</source>, vol. <volume>137</volume>, pp. <fpage>110172</fpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Sen</surname></string-name>, <string-name><given-names>H. F.</given-names> <surname>Yu</surname></string-name>, and <string-name><given-names>I. S.</given-names> <surname>Dhillon</surname></string-name></person-group>, &#x201C;<article-title>Think globally, act locally: A deep neural network approach to high-dimensional time series forecasting</article-title>,&#x201D; in <conf-name>Proc. Annu. Conf. Neural Inform. Process. Syst.</conf-name>, <conf-loc>Vancouver, Canada</conf-loc>, <year>2019</year>, vol. <volume>32</volume>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y. P.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Y. F.</given-names> <surname>Zhu</surname></string-name>, and <string-name><given-names>B. P.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Parallel spatio-temporal attention-based TCN for multivariate time series prediction</article-title>,&#x201D; <source>Neural Comput. Appl.</source>, vol. <volume>35</volume>, no. <issue>18</issue>, pp. <fpage>13109</fpage>&#x2013;<lpage>13118</lpage>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J. D.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>W. J.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>Y. Q.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>M. Y.</given-names> <surname>Huang</surname></string-name> and <string-name><given-names>P. S.</given-names> <surname>Yu</surname></string-name></person-group>, &#x201C;<article-title>Visual domain adaptation with manifold embedded distribution alignment</article-title>,&#x201D; in <conf-name>Proc. 26th ACM Int. Conf. Multimed.</conf-name>, <conf-loc>Seoul, South Korea</conf-loc>, <year>2018</year>, pp. <fpage>402</fpage>&#x2013;<lpage>410</lpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H. L.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>S. J.</given-names> <surname>Pan</surname></string-name>, <string-name><given-names>S. Q.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>A. C.</given-names> <surname>Kot</surname></string-name></person-group>, &#x201C;<article-title>Domain generalization with adversarial feature learning</article-title>,&#x201D; in <conf-name>Proc. IEEE Conf. Comput. Vis. Pattern Recognit.</conf-name>, <conf-loc>Salt Lake City, Utah, USA</conf-loc>, <year>2018</year>, pp. <fpage>5400</fpage>&#x2013;<lpage>5409</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Ganin</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Domain-adversarial training of neural networks</article-title>,&#x201D; <source>J. Mach. Learn. Res.</source>, vol. <volume>17</volume>, no. <issue>59</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>35</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhou</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Fu</surname></string-name></person-group>, &#x201C;<article-title>Dish-TS: A general paradigm for alleviating distribution shift in time series forecasting</article-title>,&#x201D; in <source>Proc. AAAI. Conf. Artif. Intell.</source>, <publisher-loc>Washington, USA</publisher-loc>, <year>2023</year>, vol. <volume>37</volume>, no. <issue>6</issue>, pp. <fpage>7522</fpage>&#x2013;<lpage>7529</lpage>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Tae</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Park</surname></string-name>, <string-name><given-names>J. H.</given-names> <surname>Choi</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Choo</surname></string-name></person-group>, &#x201C;<article-title>Reversible instance normalization for accurate time-series forecasting against distribution shift</article-title>,&#x201D; in <conf-name>Proc. Int. Conf. Learn. Represent.</conf-name>, <year>2021</year>, pp. <fpage>1</fpage>&#x2013;<lpage>25</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Glorot</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Bordes</surname></string-name>, and <string-name><given-names>Y.</given-names> <surname>Bengio</surname></string-name></person-group>, &#x201C;<article-title>Deep sparse rectifier neural networks</article-title>,&#x201D; in <conf-name>Proc. Fourteenth Int. Conf. Artif. Intell. Stat.</conf-name>, <conf-loc>Fort Lauderdale, Florida, USA</conf-loc>, <year>2011</year>, pp. <fpage>315</fpage>&#x2013;<lpage>323</lpage>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Hochreiter</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Schmidhuber</surname></string-name></person-group>, &#x201C;<article-title>Long short-term memory</article-title>,&#x201D; <source>Neural Comput.</source>, vol. <volume>9</volume>, no. <issue>8</issue>, pp. <fpage>1735</fpage>&#x2013;<lpage>1780</lpage>, <year>1997</year>. doi: <pub-id pub-id-type="doi">10.1162/neco.1997.9.8.1735</pub-id>; <pub-id pub-id-type="pmid">9377276</pub-id></mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z. H.</given-names> <surname>Yue</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>TS2Vec: Towards universal representation of time series</article-title>,&#x201D; in <conf-name>Proc. AAAI. Conf. Artif. Intell.</conf-name>, <year>2022</year>, vol. <volume>36</volume>, pp. <fpage>8980</fpage>&#x2013;<lpage>8987</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>