<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">14232</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2020.014232</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Parallel Approach to Discords Discovery in Massive Time Series Data</article-title><alt-title alt-title-type="left-running-head">A Parallel Approach to Discords Discovery in Massive Time Series Data</alt-title><alt-title alt-title-type="right-running-head">A Parallel Approach to Discords Discovery in Massive Time Series Data</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western">
<surname>Zymbler</surname>
<given-names>Mikhail</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
<email>mzym@susu.ru</email>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western">
<surname>Grents</surname>
<given-names>Alexander</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western">
<surname>Kraeva</surname>
<given-names>Yana</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western">
<surname>Kumar</surname>
<given-names>Sachin</given-names>
</name>
<xref ref-type="aff" rid="aff-1"/>
</contrib>
<aff id="aff-1"><institution>Department of Computer Science, South Ural State University</institution>, <addr-line>Chelyabinsk, 454080</addr-line>, <country>Russian</country></aff>
</contrib-group><author-notes><corresp id="cor1">&#x002A;Corresponding Author: Mikhail Zymbler. Email: <email>mzym@susu.ru</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2020-11-23">
<day>23</day>
<month>11</month>
<year>2020</year>
</pub-date>
<volume>66</volume>
<issue>2</issue>
<fpage>1867</fpage>
<lpage>1878</lpage>
<history>
<date date-type="received">
<day>08</day>
<month>9</month>
<year>2020</year>
</date>
<date date-type="accepted">
<day>30</day>
<month>9</month>
<year>2020</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2020 Zymbler et al.</copyright-statement>
<copyright-year>2020</copyright-year>
<copyright-holder>Zymbler et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_14232.pdf"></self-uri>
<abstract>
<p>A discord is a refinement of the concept of an anomalous subsequence of a time series. Being one of the topical issues of time series mining, discords discovery is applied in a wide range of real-world areas (medicine, astronomy, economics, climate modeling, predictive maintenance, energy consumption, etc.). In this article, we propose a novel parallel algorithm for discords discovery on high-performance cluster with nodes based on many-core accelerators in the case when time series cannot fit in the main memory. We assumed that the time series is partitioned across the cluster nodes and achieved parallelization among the cluster nodes as well as within a single node. Within a cluster node, the algorithm employs a set of matrix data structures to store and index the subsequences of a time series, and to provide an efficient vectorization of computations on the accelerator. At each node, the algorithm processes its own partition and performs in two phases, namely candidate selection and discord refinement, with each phase requiring one linear scan through the partition. Then the local discords found are combined into the global candidate set and transmitted to each cluster node. Next, a node performs refinement of the global candidate set over its own partition resulting in the local true discord set. Finally, the global true discords set is constructed as intersection of the local true discord sets. The experimental evaluation on the real computer cluster with real and synthetic time series shows a high scalability of the proposed algorithm.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Time series</kwd>
<kwd>discords discovery</kwd>
<kwd>computer cluster</kwd>
<kwd>many-core accelerator</kwd>
<kwd>vectorization</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Currently, the discovery of anomalous subsequences in a very long time series is a topical issue in a wide spectrum of real-world applications, namely medicine, astronomy, economics, climate modeling, predictive maintenance, energy consumption, and others. For such applications, it is hard to deal with multi-terabyte time series, which cannot fit into the main memory.</p>
<p>Keogh et al. [<xref ref-type="bibr" rid="ref-1">1</xref>] introduced HOTSAX, the anomaly detection algorithm based on the discord concept. A <italic>discord</italic> of a time series can informally be defined as a subsequence that has the largest distance to its nearest non-self match neighbor subsequence. A discord looks attractive as an anomaly detector because it only requires one intuitive parameter (the subsequence length), as opposed to most anomaly detection algorithms, which typically require many parameters [<xref ref-type="bibr" rid="ref-2">2</xref>]. HOTSAX, however, assumes that time series reside in main memory.</p>
<p>Further, Yankov, Keogh <italic>et al</italic>. proposed a disk-aware algorithm (for brevity, referred to as DADD, Disk-Aware Discord Discovery) based on the <italic>range discord</italic> concept [<xref ref-type="bibr" rid="ref-3">3</xref>]. For a given range <inline-formula id="ieqn-1">
<alternatives><inline-graphic xlink:href="ieqn-1.png"/><tex-math id="tex-ieqn-1"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-1"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula>, DADD finds all discords at a distance of at least <inline-formula id="ieqn-2">
<alternatives><inline-graphic xlink:href="ieqn-2.png"/><tex-math id="tex-ieqn-2"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-2"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> from their nearest neighbor. The algorithm requires two linear scans through the time series on a disk. Later, Yankov, Keogh et al. [<xref ref-type="bibr" rid="ref-4">4</xref>] discussed parallelization of DADD based on the MapReduce paradigm. However, in the experimental evaluation, the authors just simulated the above-mentioned parallel implementation on up to eight computers.</p>
<p>Our research is devoted to parallel and distributed algorithms for time series mining. In the previous work [<xref ref-type="bibr" rid="ref-5">5</xref>], we parallelized HOTSAX for many-core accelerators. This article continues our study and contributes as follows. We present a parallel implementation of DADD on the high-performance cluster with the nodes based on many-core accelerators. The original algorithm is extended by a set of index structures to provide an efficient vectorization of computations on each cluster node. We carried out the experiments on the real computer cluster with the real and synthetic time series, which showed a high scalability of our approach.</p>
<p>The rest of the article is organized as follows. In Section 2, we give the formal definitions along with a brief description of DADD. Section 3 provides the brief state of the art literature review. Section 4 presents the proposed methodology. In Section 5, the results of the experimental evaluation of our approach have been provided. Finally, Section 6 summarizes the results obtained and suggests directions for a further research.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Problem Statement and the Serial Algorithm</title>
<sec id="s2_1">
<label>2.1</label>
<title>Notations and Definitions</title>
<p>Below, we follow [<xref ref-type="bibr" rid="ref-4">4</xref>] to give some formal definitions and the statement of the problem.</p>
<p>A <italic>time series</italic> <inline-formula id="ieqn-3">
<alternatives><inline-graphic xlink:href="ieqn-3.png"/><tex-math id="tex-ieqn-3"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-3"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> is a sequence of real-valued elements: <inline-formula id="ieqn-4">
<alternatives><inline-graphic xlink:href="ieqn-4.png"/><tex-math id="tex-ieqn-4"><![CDATA[$T = \left( {{t_1}, \ldots ,{t_m}} \right)$]]></tex-math><mml:math id="mml-ieqn-4"><mml:mi>T</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-5">
<alternatives><inline-graphic xlink:href="ieqn-5.png"/><tex-math id="tex-ieqn-5"><![CDATA[${t_i} \in {\rm {\mathbb{R}}}$]]></tex-math><mml:math id="mml-ieqn-5"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>. The length of a time series is denoted by <inline-formula id="ieqn-6">
<alternatives><inline-graphic xlink:href="ieqn-6.png"/><tex-math id="tex-ieqn-6"><![CDATA[$\left| T \right|$]]></tex-math><mml:math id="mml-ieqn-6"><mml:mrow><mml:mo>|</mml:mo><mml:mi>T</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula>.</p>
<p>A <italic>subsequence</italic> <inline-formula id="ieqn-7">
<alternatives><inline-graphic xlink:href="ieqn-7.png"/><tex-math id="tex-ieqn-7"><![CDATA[${T_{i,n}}$]]></tex-math><mml:math id="mml-ieqn-7"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> of a time series <inline-formula id="ieqn-8">
<alternatives><inline-graphic xlink:href="ieqn-8.png"/><tex-math id="tex-ieqn-8"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-8"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> is its contiguous subset of <inline-formula id="ieqn-9">
<alternatives><inline-graphic xlink:href="ieqn-9.png"/><tex-math id="tex-ieqn-9"><![CDATA[$n$]]></tex-math><mml:math id="mml-ieqn-9"><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> elements that starts at position <inline-formula id="ieqn-10">
<alternatives><inline-graphic xlink:href="ieqn-10.png"/><tex-math id="tex-ieqn-10"><![CDATA[$i$]]></tex-math><mml:math id="mml-ieqn-10"><mml:mi>i</mml:mi></mml:math>
</alternatives></inline-formula>: <inline-formula id="ieqn-11">
<alternatives><inline-graphic xlink:href="ieqn-11.png"/><tex-math id="tex-ieqn-11"><![CDATA[${T_{i,n}} = \left( {{t_i}, \ldots ,{t_{i + n - 1}}} \right)$]]></tex-math><mml:math id="mml-ieqn-11"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-12">
<alternatives><inline-graphic xlink:href="ieqn-12.png"/><tex-math id="tex-ieqn-12"><![CDATA[$1 \le n \ll m$]]></tex-math><mml:math id="mml-ieqn-12"><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x226A;</mml:mo><mml:mi>m</mml:mi></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-13">
<alternatives><inline-graphic xlink:href="ieqn-13.png"/><tex-math id="tex-ieqn-13"><![CDATA[$1 \le i \le m - n + 1$]]></tex-math><mml:math id="mml-ieqn-13"><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula>. We denote the set of all subsequences of length <inline-formula id="ieqn-14">
<alternatives><inline-graphic xlink:href="ieqn-14.png"/><tex-math id="tex-ieqn-14"><![CDATA[$n$]]></tex-math><mml:math id="mml-ieqn-14"><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> in <inline-formula id="ieqn-15">
<alternatives><inline-graphic xlink:href="ieqn-15.png"/><tex-math id="tex-ieqn-15"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-15"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> by <inline-formula id="ieqn-16">
<alternatives><inline-graphic xlink:href="ieqn-16.png"/><tex-math id="tex-ieqn-16"><![CDATA[$S_T^n$]]></tex-math><mml:math id="mml-ieqn-16"><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula>. Let <inline-formula id="ieqn-17">
<alternatives><inline-graphic xlink:href="ieqn-17.png"/><tex-math id="tex-ieqn-17"><![CDATA[$N$]]></tex-math><mml:math id="mml-ieqn-17"><mml:mi>N</mml:mi></mml:math>
</alternatives></inline-formula> denotes the number of subsequences in <inline-formula id="ieqn-18">
<alternatives><inline-graphic xlink:href="ieqn-18.png"/><tex-math id="tex-ieqn-18"><![CDATA[$S_T^n$]]></tex-math><mml:math id="mml-ieqn-18"><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula>, i.e., <inline-formula id="ieqn-19">
<alternatives><inline-graphic xlink:href="ieqn-19.png"/><tex-math id="tex-ieqn-19"><![CDATA[$N = \left| {S_T^n} \right| = m - n + 1$]]></tex-math><mml:math id="mml-ieqn-19"><mml:mi>N</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula>.</p>
<p>A <italic>distance function</italic> for any two subsequences is a nonnegative and symmetric function <inline-formula id="ieqn-20">
<alternatives><inline-graphic xlink:href="ieqn-20.png"/><tex-math id="tex-ieqn-20"><![CDATA[${{\rm {\mathbb R}}^n} \times {{\rm {\mathbb R}}^n} \to {\rm {\mathbb R}}$]]></tex-math><mml:math id="mml-ieqn-20"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mi>n</mml:mi></mml:msup></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mi>n</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>.</p>
<p>Two subsequences <inline-formula id="ieqn-21">
<alternatives><inline-graphic xlink:href="ieqn-21.png"/><tex-math id="tex-ieqn-21"><![CDATA[${T_{i,n}},{T_{j,n}} \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-21"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> are <italic>non-trivial matches</italic> [<xref ref-type="bibr" rid="ref-6">6</xref>] with respect to a distance function <inline-formula id="ieqn-22">
<alternatives><inline-graphic xlink:href="ieqn-22.png"/><tex-math id="tex-ieqn-22"><![CDATA[$\rm Dist$]]></tex-math><mml:math id="mml-ieqn-22"><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:math>
</alternatives></inline-formula>, if <inline-formula id="ieqn-23">
<alternatives><inline-graphic xlink:href="ieqn-23.png"/><tex-math id="tex-ieqn-23"><![CDATA[$\exists {T_{p,n}} \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-23"><mml:mi mathvariant="normal">&#x2203;</mml:mi><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-24">
<alternatives><inline-graphic xlink:href="ieqn-24.png"/><tex-math id="tex-ieqn-24"><![CDATA[$i < p < j$]]></tex-math><mml:math id="mml-ieqn-24"><mml:mi>i</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>p</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>j</mml:mi></mml:math>
</alternatives></inline-formula>, and <inline-formula id="ieqn-25">
<alternatives><inline-graphic xlink:href="ieqn-25.png"/><tex-math id="tex-ieqn-25"><![CDATA[${\rm Dist}\left( {{T_{i,n}},{T_{j,n}}} \right) < {\rm Dist}\left( {{T_{i,n}},{T_{p,n}}} \right)$]]></tex-math><mml:math id="mml-ieqn-25"><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>p</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula>. Let us denote a non-trivial match of a subsequence <inline-formula id="ieqn-26">
<alternatives><inline-graphic xlink:href="ieqn-26.png"/><tex-math id="tex-ieqn-26"><![CDATA[$C \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-26"><mml:mi>C</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> by <inline-formula id="ieqn-27">
<alternatives><inline-graphic xlink:href="ieqn-27.png"/><tex-math id="tex-ieqn-27"><![CDATA[${M_C}$]]></tex-math><mml:math id="mml-ieqn-27"><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>C</mml:mi></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula>.</p>
<p>A subsequence <inline-formula id="ieqn-28">
<alternatives><inline-graphic xlink:href="ieqn-28.png"/><tex-math id="tex-ieqn-28"><![CDATA[$D \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-28"><mml:mi>D</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> is said to be the <italic>most significant discord</italic> in <inline-formula id="ieqn-29">
<alternatives><inline-graphic xlink:href="ieqn-29.png"/><tex-math id="tex-ieqn-29"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-29"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> if the distance to its nearest non-trivial match is the largest. That is,</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-1.png"/><tex-math id="tex-eqn-1"><![CDATA[$$\forall C \in S_T^n\; \min \left( {{\rm Dist}\left( {D,{M_D}} \right)} \right) > \min \left( {{\rm Dist}\left( {C,{M_C}} \right)} \right).$$]]></tex-math><mml:math id="mml-eqn-1" display="block"><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mi>C</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mspace width="thickmathspace"></mml:mspace><mml:mo form="prefix" movablelimits="true">min</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>D</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003E;</mml:mo><mml:mo form="prefix" movablelimits="true">min</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>C</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>C</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math>
</alternatives></disp-formula></p>
<p>A subsequence <inline-formula id="ieqn-30">
<alternatives><inline-graphic xlink:href="ieqn-30.png"/><tex-math id="tex-ieqn-30"><![CDATA[$D \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-30"><mml:mi>D</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> is called the <italic>most significant</italic> <inline-formula id="ieqn-31">
<alternatives><inline-graphic xlink:href="ieqn-31.png"/><tex-math id="tex-ieqn-31"><![CDATA[$k$]]></tex-math><mml:math id="mml-ieqn-31"><mml:mi>k</mml:mi></mml:math>
</alternatives></inline-formula><italic>-th discord</italic> in <inline-formula id="ieqn-32">
<alternatives><inline-graphic xlink:href="ieqn-32.png"/><tex-math id="tex-ieqn-32"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-32"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> if the distance to its <inline-formula id="ieqn-33">
<alternatives><inline-graphic xlink:href="ieqn-33.png"/><tex-math id="tex-ieqn-33"><![CDATA[$k$]]></tex-math><mml:math id="mml-ieqn-33"><mml:mi>k</mml:mi></mml:math>
</alternatives></inline-formula>-th nearest non-trivial match is the largest.</p>
<p>Given a positive parameter <inline-formula id="ieqn-34">
<alternatives><inline-graphic xlink:href="ieqn-34.png"/><tex-math id="tex-ieqn-34"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-34"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula>, the discord at a distance at least <inline-formula id="ieqn-35">
<alternatives><inline-graphic xlink:href="ieqn-35.png"/><tex-math id="tex-ieqn-35"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-35"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> from its nearest neighbor is called the <italic>range discord</italic>, i.e., for discord <inline-formula id="ieqn-36">
<alternatives><inline-graphic xlink:href="ieqn-36.png"/><tex-math id="tex-ieqn-36"><![CDATA[$D$]]></tex-math><mml:math id="mml-ieqn-36"><mml:mi>D</mml:mi></mml:math>
</alternatives></inline-formula> <inline-formula id="ieqn-37">
<alternatives><inline-graphic xlink:href="ieqn-37.png"/><tex-math id="tex-ieqn-37"><![CDATA[$\min \left( {{\rm Dist}\left( {D,{M_D}} \right)} \right) \ge r$]]></tex-math><mml:math id="mml-ieqn-37"><mml:mo form="prefix" movablelimits="true">min</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>D</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>D</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula>.</p>
<p>DADD, the original serial disk-based algorithm [<xref ref-type="bibr" rid="ref-4">4</xref>] addresses discovering range discords, and provides researchers with a procedure to choose the <inline-formula id="ieqn-38">
<alternatives><inline-graphic xlink:href="ieqn-38.png"/><tex-math id="tex-ieqn-38"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-38"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> parameter. To accelerate the above-mentioned procedure, our parallel algorithm [<xref ref-type="bibr" rid="ref-5">5</xref>] for many-core accelerators can be applied, which discovers discords for the case when time series fit in the main memory.</p>
<p>When computing distance between subsequences, DADD demands that the arguments have been previously z-normalized to have mean zero and a standard deviation of one. Here, <inline-formula id="ieqn-39">
<alternatives><inline-graphic xlink:href="ieqn-39.png"/><tex-math id="tex-ieqn-39"><![CDATA[$z$]]></tex-math><mml:math id="mml-ieqn-39"><mml:mi>z</mml:mi></mml:math>
</alternatives></inline-formula>-normalization of a subsequence <inline-formula id="ieqn-40">
<alternatives><inline-graphic xlink:href="ieqn-40.png"/><tex-math id="tex-ieqn-40"><![CDATA[$C \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-40"><mml:mi>C</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> is defined as a subsequence <inline-formula id="ieqn-41">
<alternatives><inline-graphic xlink:href="ieqn-41.png"/><tex-math id="tex-ieqn-41"><![CDATA[$\hat C = \left( {{{\hat c}_1}, \ldots ,{{\hat c}_n}} \right)$]]></tex-math><mml:math id="mml-ieqn-41"><mml:mrow><mml:mover><mml:mi>C</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mi>n</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> in which</p>
<p><disp-formula id="eqn-2">
<label>(2)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-2.png"/><tex-math id="tex-eqn-2"><![CDATA[$${\hat c_i} = \displaystyle{{{c_i} - \mu } \over \sigma }:\; \mu = \displaystyle{1 \over n}\mathop \sum \nolimits_{i = 1}^n {c_i},\; \sigma = \sqrt {\displaystyle{1 \over n}\mathop \sum \nolimits_{i = 1}^n c_i^2 - {\mu ^2}}.$$]]></tex-math><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>c</mml:mi><mml:mo stretchy="false">^</mml:mo></mml:mover></mml:mrow><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mfrac></mml:mrow><mml:mo>:</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>&#x03BC;</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>&#x03C3;</mml:mi><mml:mo>&#x003D;</mml:mo><mml:msqrt><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mi>n</mml:mi></mml:mfrac></mml:mrow><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msup><mml:mi>&#x03BC;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mstyle></mml:msqrt><mml:mo>.</mml:mo></mml:mstyle></mml:mstyle></mml:math>
</alternatives></disp-formula></p>
<p>In the original DADD algorithm, the Euclidean distance is used as a distance measure yet the algorithm can be utilized with any distance function, which may not necessarily be a metric [<xref ref-type="bibr" rid="ref-4">4</xref>]. Given two subsequences <inline-formula id="ieqn-42">
<alternatives><inline-graphic xlink:href="ieqn-42.png"/><tex-math id="tex-ieqn-42"><![CDATA[$X,\; Y \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-42"><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>Y</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula>, the Euclidean distance between them is calculated as</p>
<p><disp-formula id="eqn-3">
<label>(3)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-3.png"/><tex-math id="tex-eqn-3"><![CDATA[$${\rm ED}\left( {X,Y} \right) = \sqrt {\mathop \sum \nolimits_{i = 1}^n {{\left( {{x_i} - {y_i}} \right)}^2}}.$$]]></tex-math><mml:math id="mml-eqn-3" display="block"><mml:mi>E</mml:mi><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>X</mml:mi><mml:mo>,</mml:mo><mml:mi>Y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msqrt><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:msqrt><mml:mo>.</mml:mo></mml:math>
</alternatives></disp-formula></p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>The Original Algorithm</title>
<p>The DADD algorithm [<xref ref-type="bibr" rid="ref-4">4</xref>] performs in two phases, namely the candidate selection and discord refinement, with each phase requiring one linear scan through the time series on disk. <xref ref-type="table" rid="table-3">Algorithm 1</xref> depicts a pseudo code of DADD (up to the replacement of the Euclidean distance by an arbitrary distance function). The algorithm takes time series <inline-formula id="ieqn-43">
<alternatives><inline-graphic xlink:href="ieqn-43.png"/><tex-math id="tex-ieqn-43"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-43"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> and range <inline-formula id="ieqn-44">
<alternatives><inline-graphic xlink:href="ieqn-44.png"/><tex-math id="tex-ieqn-44"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-44"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> as an input and outputs set of discords <inline-formula id="ieqn-45">
<alternatives><inline-graphic xlink:href="ieqn-45.png"/><tex-math id="tex-ieqn-45"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-45"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>. For a discord <inline-formula id="ieqn-46">
<alternatives><inline-graphic xlink:href="ieqn-46.png"/><tex-math id="tex-ieqn-46"><![CDATA[$c \in {\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-46"><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>, we denote the distance to its nearest neighbor as <inline-formula id="ieqn-47">
<alternatives><inline-graphic xlink:href="ieqn-47.png"/><tex-math id="tex-ieqn-47"><![CDATA[$c.dist$]]></tex-math><mml:math id="mml-ieqn-47"><mml:mi>c</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:math>
</alternatives></inline-formula>.</p>

<table-wrap id="table-3">
<label>Algorithm 1</label>
<caption>
<title>Disk Aware Discord Discovery (in <italic>T</italic>, <italic>r</italic>; out <inline-formula id="ieqn-48">
<alternatives><inline-graphic xlink:href="ieqn-48.png"/><tex-math id="tex-ieqn-48"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-48"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>)</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
</colgroup>
<tbody>
<tr>
<td>&#x2003;&#x2003;Phase 1. Candidate selection<break/>1: &#x2002;&#x2003;<inline-formula id="ieqn-49">
<alternatives><inline-graphic xlink:href="ieqn-49.png"/><tex-math id="tex-ieqn-49"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-49"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <inline-formula id="ieqn-50">
<alternatives><inline-graphic xlink:href="ieqn-50.png"/><tex-math id="tex-ieqn-50"><![CDATA[$\left\{ {{T_{1,n}}} \right\}$]]></tex-math><mml:math id="mml-ieqn-50"><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula><break/>2: &#x2002;&#x2003;<bold>for all</bold> <inline-formula id="ieqn-51">
<alternatives><inline-graphic xlink:href="ieqn-51.png"/><tex-math id="tex-ieqn-51"><![CDATA[$s \in S_T^n{\rm \setminus }{T_{1,n}}$]]></tex-math><mml:math id="mml-ieqn-51"><mml:mi>s</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mo>&#x2216;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>3: &#x2003;&#x2003;&#x2002;<italic>isCand</italic> &#x2190; TRUE<break/>4: &#x2002;&#x2003;&#x2002;<bold>for all</bold> <inline-formula id="ieqn-52">
<alternatives><inline-graphic xlink:href="ieqn-52.png"/><tex-math id="tex-ieqn-52"><![CDATA[$c \in {\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-52"><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>and</bold> <inline-formula id="ieqn-53">
<alternatives><inline-graphic xlink:href="ieqn-53.png"/><tex-math id="tex-ieqn-53"><![CDATA[$c \in {M_s}$]]></tex-math><mml:math id="mml-ieqn-53"><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>5: &#x2002;&#x2003;&#x2002;&#x2002;<bold>if</bold> <inline-formula id="ieqn-54">
<alternatives><inline-graphic xlink:href="ieqn-54.png"/><tex-math id="tex-ieqn-54"><![CDATA[${\rm ED}\left( {s,c} \right) < r$]]></tex-math><mml:math id="mml-ieqn-54"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> <bold>then</bold><break/>6: &#x2002;&#x2003;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-55">
<alternatives><inline-graphic xlink:href="ieqn-55.png"/><tex-math id="tex-ieqn-55"><![CDATA[${\rm {\cal C}} \leftarrow {\rm {\cal C} \setminus }c$]]></tex-math><mml:math id="mml-ieqn-55"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow><mml:mo>&#x2216;</mml:mo></mml:mrow><mml:mi>c</mml:mi></mml:math>
</alternatives></inline-formula><break/>7: &#x2002;&#x2003;&#x2002;&#x2002;&#x2002;<italic>isCand</italic> &#x2190; FALSE<break/>8: &#x2002;&#x2003;&#x2002;&#x2002;<bold>if</bold> <italic>isCand</italic> <bold>then</bold><break/>9: &#x2002;&#x2003;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-56">
<alternatives><inline-graphic xlink:href="ieqn-56.png"/><tex-math id="tex-ieqn-56"><![CDATA[${\rm {\cal C}} \leftarrow {\rm {\cal C}} \cup s$]]></tex-math><mml:math id="mml-ieqn-56"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x222A;</mml:mo><mml:mi>s</mml:mi></mml:math>
</alternatives></inline-formula><break/>10: &#x2003;<bold>return</bold> <inline-formula id="ieqn-57">
<alternatives><inline-graphic xlink:href="ieqn-57.png"/><tex-math id="tex-ieqn-57"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-57"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula></td>
<td>Phase 2. Discord refinement<break/>1: &#x2003;<bold>for all</bold> <inline-formula id="ieqn-58">
<alternatives><inline-graphic xlink:href="ieqn-58.png"/><tex-math id="tex-ieqn-58"><![CDATA[$c \in {\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-58"><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>2: &#x2002;&#x2003;&#x2002;<inline-formula id="ieqn-59">
<alternatives><inline-graphic xlink:href="ieqn-59.png"/><tex-math id="tex-ieqn-59"><![CDATA[$c.dist$]]></tex-math><mml:math id="mml-ieqn-59"><mml:mi>c</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:math>
</alternatives></inline-formula> &#x2190; <inline-formula id="ieqn-60">
<alternatives><inline-graphic xlink:href="ieqn-60.png"/><tex-math id="tex-ieqn-60"><![CDATA[$\infty$]]></tex-math><mml:math id="mml-ieqn-60"><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:math>
</alternatives></inline-formula><break/>3: &#x2003;<bold>for all</bold> <inline-formula id="ieqn-61">
<alternatives><inline-graphic xlink:href="ieqn-61.png"/><tex-math id="tex-ieqn-61"><![CDATA[$s \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-61"><mml:mi>s</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>4: &#x2002;&#x2003;&#x2002;<bold>for all</bold> <inline-formula id="ieqn-62">
<alternatives><inline-graphic xlink:href="ieqn-62.png"/><tex-math id="tex-ieqn-62"><![CDATA[$c \in {\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-62"><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>and</bold> <inline-formula id="ieqn-63">
<alternatives><inline-graphic xlink:href="ieqn-63.png"/><tex-math id="tex-ieqn-63"><![CDATA[$c \in {M_s}$]]></tex-math><mml:math id="mml-ieqn-63"><mml:mi>c</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>M</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>5: &#x2002;&#x2003;&#x2002;&#x2002;<bold>if</bold> <inline-formula id="ieqn-64">
<alternatives><inline-graphic xlink:href="ieqn-64.png"/><tex-math id="tex-ieqn-64"><![CDATA[$s = c$]]></tex-math><mml:math id="mml-ieqn-64"><mml:mi>s</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mi>c</mml:mi></mml:math>
</alternatives></inline-formula> <bold>then</bold><break/>6: &#x2002;&#x2003;&#x2002;&#x2002;&#x2002;<bold>continue</bold><break/>7: &#x2002;&#x2003;&#x2002;&#x2002;<italic>d</italic> &#x2190; <inline-formula id="ieqn-65">
<alternatives><inline-graphic xlink:href="ieqn-65.png"/><tex-math id="tex-ieqn-65"><![CDATA[${\rm EarlyAbandonED}\left( {s,c} \right)$]]></tex-math><mml:math id="mml-ieqn-65"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">l</mml:mi><mml:mi mathvariant="normal">y</mml:mi><mml:mi mathvariant="normal">A</mml:mi><mml:mi mathvariant="normal">b</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">d</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula><break/>8: &#x2002;&#x2003;&#x2002;&#x2002;<bold>if</bold> <inline-formula id="ieqn-66">
<alternatives><inline-graphic xlink:href="ieqn-66.png"/><tex-math id="tex-ieqn-66"><![CDATA[$d < r$]]></tex-math><mml:math id="mml-ieqn-66"><mml:mi>d</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> <bold>then</bold><break/>9: &#x2002;&#x2003;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-67">
<alternatives><inline-graphic xlink:href="ieqn-67.png"/><tex-math id="tex-ieqn-67"><![CDATA[${\rm {\cal C}} \leftarrow {\rm {\cal C} \setminus }c$]]></tex-math><mml:math id="mml-ieqn-67"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow><mml:mo>&#x2216;</mml:mo></mml:mrow><mml:mi>c</mml:mi></mml:math>
</alternatives></inline-formula><break/>10: &#x2003;&#x2002;&#x2002;<bold>else</bold><break/>11: &#x2002;&#x2003;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-68">
<alternatives><inline-graphic xlink:href="ieqn-68.png"/><tex-math id="tex-ieqn-68"><![CDATA[$c.dist \leftarrow \min \left( {c.dist,d} \right)$]]></tex-math><mml:math id="mml-ieqn-68"><mml:mi>c</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x2190;</mml:mo><mml:mo form="prefix" movablelimits="true">min</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mi>d</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula><break/>12: &#x2003;<bold>return</bold> <inline-formula id="ieqn-69">
<alternatives><inline-graphic xlink:href="ieqn-69.png"/><tex-math id="tex-ieqn-69"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-69"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>At the first phase, the algorithm scans through the time series <inline-formula id="ieqn-70">
<alternatives><inline-graphic xlink:href="ieqn-70.png"/><tex-math id="tex-ieqn-70"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-70"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula>, and for each subsequence <inline-formula id="ieqn-71">
<alternatives><inline-graphic xlink:href="ieqn-71.png"/><tex-math id="tex-ieqn-71"><![CDATA[$s \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-71"><mml:mi>s</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> it validates the possibility for each candidate <inline-formula id="ieqn-72">
<alternatives><inline-graphic xlink:href="ieqn-72.png"/><tex-math id="tex-ieqn-72"><![CDATA[$c$]]></tex-math><mml:math id="mml-ieqn-72"><mml:mi>c</mml:mi></mml:math>
</alternatives></inline-formula> already in the set <inline-formula id="ieqn-73">
<alternatives><inline-graphic xlink:href="ieqn-73.png"/><tex-math id="tex-ieqn-73"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-73"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula> to be discord. If a candidate <inline-formula id="ieqn-74">
<alternatives><inline-graphic xlink:href="ieqn-74.png"/><tex-math id="tex-ieqn-74"><![CDATA[$c$]]></tex-math><mml:math id="mml-ieqn-74"><mml:mi>c</mml:mi></mml:math>
</alternatives></inline-formula> fails the validation, then it is removed from this set. In the end, the new <inline-formula id="ieqn-75">
<alternatives><inline-graphic xlink:href="ieqn-75.png"/><tex-math id="tex-ieqn-75"><![CDATA[$s$]]></tex-math><mml:math id="mml-ieqn-75"><mml:mi>s</mml:mi></mml:math>
</alternatives></inline-formula> is either added to the candidates set, if it is likely to be a discord, or it is discarded. The correctness of this procedure is proved in Yankov et al. [<xref ref-type="bibr" rid="ref-4">4</xref>].</p>
<p>At the second phase, the algorithm initially sets distances of all candidates to their nearest neighbors to infinity. Then, the algorithm scans through the time series <inline-formula id="ieqn-76">
<alternatives><inline-graphic xlink:href="ieqn-76.png"/><tex-math id="tex-ieqn-76"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-76"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula>, calculating the distance between each subsequence <inline-formula id="ieqn-77">
<alternatives><inline-graphic xlink:href="ieqn-77.png"/><tex-math id="tex-ieqn-77"><![CDATA[$s \in S_T^n$]]></tex-math><mml:math id="mml-ieqn-77"><mml:mi>s</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> and each candidate <inline-formula id="ieqn-78">
<alternatives><inline-graphic xlink:href="ieqn-78.png"/><tex-math id="tex-ieqn-78"><![CDATA[$c$]]></tex-math><mml:math id="mml-ieqn-78"><mml:mi>c</mml:mi></mml:math>
</alternatives></inline-formula>. Here, when calculating <inline-formula id="ieqn-79">
<alternatives><inline-graphic xlink:href="ieqn-79.png"/><tex-math id="tex-ieqn-79"><![CDATA[${\rm ED}\left( {s,c} \right)$]]></tex-math><mml:math id="mml-ieqn-79"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mo>,</mml:mo><mml:mi>c</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula>, the <inline-formula id="ieqn-80">
<alternatives><inline-graphic xlink:href="ieqn-80.png"/><tex-math id="tex-ieqn-80"><![CDATA[${\rm EarlyAbandonED}$]]></tex-math><mml:math id="mml-ieqn-80"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">l</mml:mi><mml:mi mathvariant="normal">y</mml:mi><mml:mi mathvariant="normal">A</mml:mi><mml:mi mathvariant="normal">b</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">d</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">n</mml:mi><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:mrow></mml:math>
</alternatives></inline-formula> procedure stops the summation of <inline-formula id="ieqn-81">
<alternatives><inline-graphic xlink:href="ieqn-81.png"/><tex-math id="tex-ieqn-81"><![CDATA[$\mathop \sum \nolimits_{k = 1}^n {\left( {{s_k} - {c_k}} \right)^2}$]]></tex-math><mml:math id="mml-ieqn-81"><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> if it reaches <inline-formula id="ieqn-82">
<alternatives><inline-graphic xlink:href="ieqn-82.png"/><tex-math id="tex-ieqn-82"><![CDATA[$k = \ell$]]></tex-math><mml:math id="mml-ieqn-82"><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mi>&#x2113;</mml:mi></mml:math>
</alternatives></inline-formula>, such that <inline-formula id="ieqn-83">
<alternatives><inline-graphic xlink:href="ieqn-83.png"/><tex-math id="tex-ieqn-83"><![CDATA[$1 \le \ell \le n$]]></tex-math><mml:math id="mml-ieqn-83"><mml:mn>1</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>&#x2113;</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> for which <inline-formula id="ieqn-84">
<alternatives><inline-graphic xlink:href="ieqn-84.png"/><tex-math id="tex-ieqn-84"><![CDATA[$\mathop \sum \nolimits_{k = 1}^\ell {\left( {{s_k} - {c_k}} \right)^2} \ge c.dis{t^2}$]]></tex-math><mml:math id="mml-ieqn-84"><mml:msubsup><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>&#x2113;</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mi>c</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mrow><mml:msup><mml:mi>t</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>. If the distance is less than <inline-formula id="ieqn-85">
<alternatives><inline-graphic xlink:href="ieqn-85.png"/><tex-math id="tex-ieqn-85"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-85"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> then the candidate is false positive and permanently removed from <inline-formula id="ieqn-86">
<alternatives><inline-graphic xlink:href="ieqn-86.png"/><tex-math id="tex-ieqn-86"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-86"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>. If the above-mentioned distance is less than the current value of <inline-formula id="ieqn-87">
<alternatives><inline-graphic xlink:href="ieqn-87.png"/><tex-math id="tex-ieqn-87"><![CDATA[$c.dist$]]></tex-math><mml:math id="mml-ieqn-87"><mml:mi>c</mml:mi><mml:mo>.</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:math>
</alternatives></inline-formula> (and still greater than <inline-formula id="ieqn-88">
<alternatives><inline-graphic xlink:href="ieqn-88.png"/><tex-math id="tex-ieqn-88"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-88"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula>, otherwise it would have been removed) then the current distance to the nearest neighbor is updated.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Related Work</title>
<p>Being introduced in Keogh et al. [<xref ref-type="bibr" rid="ref-1">1</xref>], currently, time-series discords are considered one of the best techniques for the time series anomaly detection [<xref ref-type="bibr" rid="ref-7">7</xref>].</p>
<p>The original HOTSAX algorithm [<xref ref-type="bibr" rid="ref-1">1</xref>] is based on the SAX (Symbolic Aggregate ApproXimation) transformation [<xref ref-type="bibr" rid="ref-8">8</xref>]. Among the improvements of HOTSAX, we can mention the following algorithms, namely <italic>i</italic>SAX [<xref ref-type="bibr" rid="ref-9">9</xref>] and HOT-<italic>i</italic>SAX [<xref ref-type="bibr" rid="ref-10">10</xref>] (indexable SAX), WAT [<xref ref-type="bibr" rid="ref-11">11</xref>] (Haar wavelets instead of SAX), HashDD [<xref ref-type="bibr" rid="ref-12">12</xref>] (use of a hash table instead of the prefix trie), HDD-MBR [<xref ref-type="bibr" rid="ref-13">13</xref>] (application of R-trees), and BitClusterDiscord [<xref ref-type="bibr" rid="ref-14">14</xref>] (clustering of the bit representation of subsequences). However, the above-mentioned algorithms are able to discover discords if the time series fits in the main memory, and have no parallel implementations, to the best of our knowledge.</p>
<p>Further, Yankov, Keogh et al. [<xref ref-type="bibr" rid="ref-3">3</xref>] overcame the main memory size limitation having proposed a disk-aware discord discovery algorithm (DADD) based on the <italic>range discord</italic> concept. For a given range <inline-formula id="ieqn-89">
<alternatives><inline-graphic xlink:href="ieqn-89.png"/><tex-math id="tex-ieqn-89"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-89"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula>, DADD finds all discords at a distance of at least <inline-formula id="ieqn-90">
<alternatives><inline-graphic xlink:href="ieqn-90.png"/><tex-math id="tex-ieqn-90"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-90"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> from their nearest neighbor. The algorithm performs in two phases, namely the candidate selection and discord refinement, with each phase requiring one linear scan through the time series on the disk.</p>
<p>There are a couple of worth-noting works devoted to parallelization of DADD. The DDD (Distributed Discord Discovery) algorithm [<xref ref-type="bibr" rid="ref-15">15</xref>] parallelizes DADD through a Spark cluster [<xref ref-type="bibr" rid="ref-16">16</xref>] and HDFS (Hadoop Distributed File System) [<xref ref-type="bibr" rid="ref-17">17</xref>]. DDD distributes time series onto the HDFS cluster and handles each partition in a memory of a computing node. As opposed to DADD, DDD computes the distance without taking advantage of an upper bound for early abandoning, which would increase the algorithm&#x2019;s performance.</p>
<p>The PDD (Parallel Discord Discovery) algorithm [<xref ref-type="bibr" rid="ref-18">18</xref>] also utilizes a Spark cluster but employs transmission of a subsequence and its non-trivial matches to one or more computing nodes to calculate the distance between them. A bulk of continuous subsequences is transmitted and calculated in a batch mode to reduce the message passing overhead. PDD is not scalable since intensive message passing between the cluster nodes leads to a significant degradation of the algorithm&#x2019;s performance as the number of nodes increases.</p>
<p>In their further work [<xref ref-type="bibr" rid="ref-4">4</xref>], Yankov, Keogh <italic>et al</italic>. discussed the parallel version of DADD based on the MapReduce paradigm (hereinafter referred to as MR-DADD), and the basic idea is as follows. Let the input time series <inline-formula id="ieqn-91">
<alternatives><inline-graphic xlink:href="ieqn-91.png"/><tex-math id="tex-ieqn-91"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-91"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> be partitioned evenly across <inline-formula id="ieqn-92">
<alternatives><inline-graphic xlink:href="ieqn-92.png"/><tex-math id="tex-ieqn-92"><![CDATA[$P$]]></tex-math><mml:math id="mml-ieqn-92"><mml:mi>P</mml:mi></mml:math>
</alternatives></inline-formula> cluster nodes. Each node performs the selection phase on its own partition with the same <inline-formula id="ieqn-93">
<alternatives><inline-graphic xlink:href="ieqn-93.png"/><tex-math id="tex-ieqn-93"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-93"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> parameter and produces distinct candidate set <inline-formula id="ieqn-94">
<alternatives><inline-graphic xlink:href="ieqn-94.png"/><tex-math id="tex-ieqn-94"><![CDATA[${{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-94"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>. Then the combined candidate set <inline-formula id="ieqn-95">
<alternatives><inline-graphic xlink:href="ieqn-95.png"/><tex-math id="tex-ieqn-95"><![CDATA[${{\rm {\cal C}}_P}$]]></tex-math><mml:math id="mml-ieqn-95"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> is constructed as <inline-formula id="ieqn-96">
<alternatives><inline-graphic xlink:href="ieqn-96.png"/><tex-math id="tex-ieqn-96"><![CDATA[${{\rm {\cal C}}_P} = \cup _{i = 1}^P{{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-96"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mo>&#x222A;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> and transmitted to each cluster node. Next, a node performs the refinement phase on its own partition taking <inline-formula id="ieqn-97">
<alternatives><inline-graphic xlink:href="ieqn-97.png"/><tex-math id="tex-ieqn-97"><![CDATA[${{\rm {\cal C}}_P}$]]></tex-math><mml:math id="mml-ieqn-97"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> as an input, and produces the refined candidate set <inline-formula id="ieqn-98">
<alternatives><inline-graphic xlink:href="ieqn-98.png"/><tex-math id="tex-ieqn-98"><![CDATA[${{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-98"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>. The final discords are given by the set <inline-formula id="ieqn-99">
<alternatives><inline-graphic xlink:href="ieqn-99.png"/><tex-math id="tex-ieqn-99"><![CDATA[${{\rm {\cal C}}_P} = \cap _{i = 1}^P{{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-99"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mo>&#x2229;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>. In the experimental evaluation, the authors, however, just simulated the above-mentioned scheme on up to eight computers resulting in a near-to-linear speedup.</p>
<p>Concluding this brief review, we should also mention the matrix profile (MP) concept proposed by Keogh et al. [<xref ref-type="bibr" rid="ref-19">19</xref>]. MP is a data structure that annotates a time series, and can be applied to solve an impressively large list of time series mining problems including discords discovery but at computational cost of <inline-formula id="ieqn-100">
<alternatives><inline-graphic xlink:href="ieqn-100.png"/><tex-math id="tex-ieqn-100"><![CDATA[${\rm {\rm O}}\left( {{m^2}} \right)$]]></tex-math><mml:math id="mml-ieqn-100"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">O</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>m</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> where <inline-formula id="ieqn-101">
<alternatives><inline-graphic xlink:href="ieqn-101.png"/><tex-math id="tex-ieqn-101"><![CDATA[$m$]]></tex-math><mml:math id="mml-ieqn-101"><mml:mi>m</mml:mi></mml:math>
</alternatives></inline-formula> is the time series length [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-20">20</xref>]. Recent parallel algorithms of the MP computation include GPU-STAMP [<xref ref-type="bibr" rid="ref-19">19</xref>] and MP-HPC [<xref ref-type="bibr" rid="ref-21">21</xref>], which are implementations for graphic processors through CUDA (Compute Unified Device Architecture) technology and computer cluster through MPI (Message Passing Interface) technology, respectively.</p>
</sec>
<sec id="s4">
<label>4</label>
<title>Discords Discovery on Computer Cluster with Many-Core Accelerators</title>
<p>The parallelization employs a two-level parallelism, namely across cluster nodes and among threads of a single node. We implemented these levels through partitioning of an input time series and MPI technology, and OpenMP technology, respectively. Within a single node, we employed the matrix representation of data to effectively parallelize computations through OpenMP. Below, we will show an approach to the implementation of these ideas.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Time Series Representation</title>
<p>To provide parallelism at the level of the cluster nodes, we perform time series partitioning across the nodes as follows. Let <inline-formula id="ieqn-102">
<alternatives><inline-graphic xlink:href="ieqn-102.png"/><tex-math id="tex-ieqn-102"><![CDATA[$P$]]></tex-math><mml:math id="mml-ieqn-102"><mml:mi>P</mml:mi></mml:math>
</alternatives></inline-formula> be the number of nodes in the cluster, then <inline-formula id="ieqn-103">
<alternatives><inline-graphic xlink:href="ieqn-103.png"/><tex-math id="tex-ieqn-103"><![CDATA[$k$]]></tex-math><mml:math id="mml-ieqn-103"><mml:mi>k</mml:mi></mml:math>
</alternatives></inline-formula>-th partition (<inline-formula id="ieqn-104">
<alternatives><inline-graphic xlink:href="ieqn-104.png"/><tex-math id="tex-ieqn-104"><![CDATA[$0 \le k \le P - 1$]]></tex-math><mml:math id="mml-ieqn-104"><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x2264;</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula>) of the time series is defined as <inline-formula id="ieqn-105">
<alternatives><inline-graphic xlink:href="ieqn-105.png"/><tex-math id="tex-ieqn-105"><![CDATA[${T_{start,\; len}}$]]></tex-math><mml:math id="mml-ieqn-105"><mml:mrow><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> where</p>
<p><disp-formula id="eqn-4">
<label>(4)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-4.png"/><tex-math id="tex-eqn-4"><![CDATA[$$start = k\cdot \left\lfloor\displaystyle{N \over P}\right\rfloor + 1;\; len = \left\{ {\matrix{ {\left\lfloor\displaystyle{N \over P}\right\rfloor + n - 1 + \left( {N\; {\rm mod}\; P} \right),\; \; k = P - 1} \cr {\left\lfloor\displaystyle{N \over P}\right\rfloor + n - 1,\; \; otherwise.} \cr } } \right..$$]]></tex-math><mml:math id="mml-eqn-4" display="block"><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>r</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mi>k</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>&#x230A;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>N</mml:mi><mml:mi>P</mml:mi></mml:mfrac></mml:mrow></mml:mstyle><mml:mo>&#x230B;</mml:mo></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn><mml:mo>;</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:mo>&#x230A;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>N</mml:mi><mml:mi>P</mml:mi></mml:mfrac></mml:mrow></mml:mstyle><mml:mo>&#x230B;</mml:mo></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x002B;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>N</mml:mi><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mi mathvariant="normal">m</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">d</mml:mi></mml:mrow><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>P</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>k</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:mo>&#x230A;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mi>N</mml:mi><mml:mi>P</mml:mi></mml:mfrac></mml:mrow></mml:mstyle><mml:mo>&#x230B;</mml:mo></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math>
</alternatives></disp-formula></p>
<p>This means the head part of every partition except the first overlaps with the tail part of the previous partition in <inline-formula id="ieqn-106">
<alternatives><inline-graphic xlink:href="ieqn-106.png"/><tex-math id="tex-ieqn-106"><![CDATA[$n - 1$]]></tex-math><mml:math id="mml-ieqn-106"><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula> data points. Such a technique prevents us from a loss of the resulting subsequences in the junctions of two neighbor partitions. To simplify the presentation of the algorithm, hereinafter in this section, we use symbol <inline-formula id="ieqn-107">
<alternatives><inline-graphic xlink:href="ieqn-107.png"/><tex-math id="tex-ieqn-107"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-107"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> and the above-mentioned related notions implying a partition on the current node but not the whole input time series.</p>
<p>The time series partition is stored as a matrix of aligned subsequences to enable computations over aligned data with as many auto-vectorizable loops as possible. We avoid the unaligned memory access since it can cause an inefficient vectorization due to time overhead for the loop peeling [<xref ref-type="bibr" rid="ref-22">22</xref>].</p>
<p>Let us denote the number of floats stored in the VPU (vector processing unit of the many-core accelerator) by <inline-formula id="ieqn-108">
<alternatives><inline-graphic xlink:href="ieqn-108.png"/><tex-math id="tex-ieqn-108"><![CDATA[$widt{h_{VPU}}$]]></tex-math><mml:math id="mml-ieqn-108"><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>V</mml:mi><mml:mi>P</mml:mi><mml:mi>U</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula>. If the discord length <inline-formula id="ieqn-109">
<alternatives><inline-graphic xlink:href="ieqn-109.png"/><tex-math id="tex-ieqn-109"><![CDATA[$n$]]></tex-math><mml:math id="mml-ieqn-109"><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> is not a multiple of <inline-formula id="ieqn-110">
<alternatives><inline-graphic xlink:href="ieqn-110.png"/><tex-math id="tex-ieqn-110"><![CDATA[$widt{h_{VPU}}$]]></tex-math><mml:math id="mml-ieqn-110"><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>V</mml:mi><mml:mi>P</mml:mi><mml:mi>U</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula>, then each subsequence is padded with zeroes where the number of zeroes is calculated as <inline-formula id="ieqn-111">
<alternatives><inline-graphic xlink:href="ieqn-111.png"/><tex-math id="tex-ieqn-111"><![CDATA[$pad = widt{h_{VPU}} - \left( {n\; {\rm mod\; }widt{h_{VPU}}} \right)$]]></tex-math><mml:math id="mml-ieqn-111"><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>V</mml:mi><mml:mi>P</mml:mi><mml:mi>U</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mi mathvariant="normal">m</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">d</mml:mi><mml:mspace width="thickmathspace"></mml:mspace></mml:mrow><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>V</mml:mi><mml:mi>P</mml:mi><mml:mi>U</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula>. Thus, the aligned (and previously z-normalized) subsequence <inline-formula id="ieqn-112">
<alternatives><inline-graphic xlink:href="ieqn-112.png"/><tex-math id="tex-ieqn-112"><![CDATA[${\tilde T_{i,n}}$]]></tex-math><mml:math id="mml-ieqn-112"><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>T</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> is defined as follows:</p>
<p><disp-formula id="eqn-5">
<label>(5)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-5.png"/><tex-math id="tex-eqn-5"><![CDATA[$${\tilde T_{i,n}} = \left\{ {\matrix{ {({{\tilde T}_{i,n}},\mathop {\overbrace{{0, \ldots ,0}}^{{}}}\limits^{pad} ),} &#9; {if\;n\;{\rm mod}\;widt{h_{VPU}}\;{\rm > }\;0} \cr {{\hskip 40pt{\tilde T}_{i,n}},} &#9; {\hskip -50pt otherwise.} \cr } } \right..$$]]></tex-math><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>T</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>T</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mover><mml:mrow><mml:mover><mml:mrow><mml:mover><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mo>&#x23DE;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow></mml:mrow></mml:mrow></mml:mover></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:mover><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>n</mml:mi><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mi mathvariant="normal">m</mml:mi><mml:mi mathvariant="normal">o</mml:mi><mml:mi mathvariant="normal">d</mml:mi></mml:mrow><mml:mspace width="thickmathspace"></mml:mspace><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>V</mml:mi><mml:mi>P</mml:mi><mml:mi>U</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mo>&#x003E;</mml:mo></mml:mrow><mml:mspace width="thickmathspace"></mml:mspace><mml:mn>0</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mover><mml:mi>T</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>,</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:mrow><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>r</mml:mi><mml:mi>w</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi><mml:mo>.</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow><mml:mo>.</mml:mo></mml:math>
</alternatives></disp-formula></p>
<p>The <italic>subsequence matrix</italic> <inline-formula id="ieqn-113">
<alternatives><inline-graphic xlink:href="ieqn-113.png"/><tex-math id="tex-ieqn-113"><![CDATA[$S_T^n \in {{\rm {\mathbb R}}^{N \times \left( {n + pad} \right)}}\;$]]></tex-math><mml:math id="mml-ieqn-113"><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>p</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mspace width="thickmathspace"></mml:mspace></mml:math>
</alternatives></inline-formula>is defined as</p>
<p><disp-formula id="eqn-6">
<label>(6)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-6.png"/><tex-math id="tex-eqn-6"><![CDATA[$$S_T^n\left( {i,j} \right) = {\tilde t_{i + j - 1}}.$$]]></tex-math><mml:math id="mml-eqn-6" display="block"><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:msub><mml:mrow><mml:mover><mml:mi>t</mml:mi><mml:mo stretchy="false">&#x007E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow><mml:mo>.</mml:mo></mml:math>
</alternatives></disp-formula></p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Internal Data Layout</title>
<p>The parallel algorithm employs the data structures depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. Defining structures to store data in the main memory of a cluster node, we suppose that each structure is shared by all threads the algorithm is running on, and each thread processes its own data segment independently. Let us denote the amount of threads employing by the algorithm on a cluster node by <inline-formula id="ieqn-114">
<alternatives><inline-graphic xlink:href="ieqn-114.png"/><tex-math id="tex-ieqn-114"><![CDATA[${p}$]]></tex-math><mml:math id="mml-ieqn-114"><mml:mtext mathcolor="red">\bi</mml:mtext><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:math>
</alternatives></inline-formula>, and let <inline-formula id="ieqn-115">
<alternatives><inline-graphic xlink:href="ieqn-115.png"/><tex-math id="tex-ieqn-115"><![CDATA[${iam}$]]></tex-math><mml:math id="mml-ieqn-115"><mml:mtext mathcolor="red">\bi</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow></mml:math>
</alternatives></inline-formula> (<inline-formula id="ieqn-116">
<alternatives><inline-graphic xlink:href="ieqn-116.png"/><tex-math id="tex-ieqn-116"><![CDATA[$0 \le {iam} \le {p - {\it 1}}$]]></tex-math><mml:math id="mml-ieqn-116"><mml:mn>0</mml:mn><mml:mo>&#x2264;</mml:mo><mml:mtext mathcolor="red">\bi</mml:mtext><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>&#x2264;</mml:mo><mml:mtext mathcolor="red">\bi</mml:mtext><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula>) denotes the number of the current thread.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Data layout of the algorithm</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-1.png"/>
</fig>
<p>Set of discords <inline-formula id="ieqn-117">
<alternatives><inline-graphic xlink:href="ieqn-117.png"/><tex-math id="tex-ieqn-117"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-117"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula> is implemented as an object with two basic attributes, namely <italic>candidate index</italic> and <italic>candidate body</italic>, to store indices of all potential discord subsequences and their values themselves, respectively.</p>
<p>Let us denote a ratio of candidates selected at a cluster node and all subsequences of the time series by <inline-formula id="ieqn-118">
<alternatives><inline-graphic xlink:href="ieqn-118.png"/><tex-math id="tex-ieqn-118"><![CDATA[$\xi$]]></tex-math><mml:math id="mml-ieqn-118"><mml:mi>&#x03BE;</mml:mi></mml:math>
</alternatives></inline-formula>. The exact value of the <inline-formula id="ieqn-119">
<alternatives><inline-graphic xlink:href="ieqn-119.png"/><tex-math id="tex-ieqn-119"><![CDATA[$\xi$]]></tex-math><mml:math id="mml-ieqn-119"><mml:mi>&#x03BE;</mml:mi></mml:math>
</alternatives></inline-formula> parameter is a subject of an empirical choice. In our experiments, <inline-formula id="ieqn-120">
<alternatives><inline-graphic xlink:href="ieqn-120.png"/><tex-math id="tex-ieqn-120"><![CDATA[$\xi = 0.01$]]></tex-math><mml:math id="mml-ieqn-120"><mml:mi>&#x03BE;</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0.01</mml:mn></mml:math>
</alternatives></inline-formula> was enough to store all candidates. Thus, we denote the number of candidates as <inline-formula id="ieqn-121">
<alternatives><inline-graphic xlink:href="ieqn-121.png"/><tex-math id="tex-ieqn-121"><![CDATA[$L = \left\lceil\xi \cdot N\right\rceil$]]></tex-math><mml:math id="mml-ieqn-121"><mml:mi>L</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mo>&#x2308;</mml:mo><mml:mi>&#x03BE;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>N</mml:mi><mml:mo>&#x2309;</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> and assume that <inline-formula id="ieqn-122">
<alternatives><inline-graphic xlink:href="ieqn-122.png"/><tex-math id="tex-ieqn-122"><![CDATA[$L \ll N$]]></tex-math><mml:math id="mml-ieqn-122"><mml:mi>L</mml:mi><mml:mo>&#x226A;</mml:mo><mml:mi>N</mml:mi></mml:math>
</alternatives></inline-formula>.</p>
<p>The <italic>candidate index</italic> is organized as a matrix <inline-formula id="ieqn-123">
<alternatives><inline-graphic xlink:href="ieqn-123.png"/><tex-math id="tex-ieqn-123"><![CDATA[${\rm {\cal C}}.index \in {{\rm {\mathbb N}}^{p \times L}}$]]></tex-math><mml:math id="mml-ieqn-123"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>L</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, which stores indices of candidates in the subsequence matrix <inline-formula id="ieqn-124">
<alternatives><inline-graphic xlink:href="ieqn-124.png"/><tex-math id="tex-ieqn-124"><![CDATA[$S_T^n$]]></tex-math><mml:math id="mml-ieqn-124"><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup></mml:math>
</alternatives></inline-formula> found by each thread, i.e., <inline-formula id="ieqn-125">
<alternatives><inline-graphic xlink:href="ieqn-125.png"/><tex-math id="tex-ieqn-125"><![CDATA[$i$]]></tex-math><mml:math id="mml-ieqn-125"><mml:mi>i</mml:mi></mml:math>
</alternatives></inline-formula>-th row keeps indices of the candidates that have been found by <inline-formula id="ieqn-126">
<alternatives><inline-graphic xlink:href="ieqn-126.png"/><tex-math id="tex-ieqn-126"><![CDATA[$i$]]></tex-math><mml:math id="mml-ieqn-126"><mml:mi>i</mml:mi></mml:math>
</alternatives></inline-formula>-th thread. Initially, the candidate index is filled by NULL values.</p>
<p>To provide a fast access to the candidate index during the selection phase, it is implemented as a deque (double-ended queue) with three attributes, namely <inline-formula id="ieqn-127">
<alternatives><inline-graphic xlink:href="ieqn-127.png"/><tex-math id="tex-ieqn-127"><![CDATA[$count$]]></tex-math><mml:math id="mml-ieqn-127"><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-128">
<alternatives><inline-graphic xlink:href="ieqn-128.png"/><tex-math id="tex-ieqn-128"><![CDATA[$head$]]></tex-math><mml:math id="mml-ieqn-128"><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi></mml:math>
</alternatives></inline-formula>, and <inline-formula id="ieqn-129">
<alternatives><inline-graphic xlink:href="ieqn-129.png"/><tex-math id="tex-ieqn-129"><![CDATA[$tail$]]></tex-math><mml:math id="mml-ieqn-129"><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi></mml:math>
</alternatives></inline-formula>. The <italic>deque count</italic> is an array <inline-formula id="ieqn-130">
<alternatives><inline-graphic xlink:href="ieqn-130.png"/><tex-math id="tex-ieqn-130"><![CDATA[${\rm {\cal C}}.count \in {{\rm {\mathbb N}}^p}$]]></tex-math><mml:math id="mml-ieqn-130"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mi>p</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, which for each thread keeps the amount of non-NULL elements in the respective row of the candidate index matrix. The <italic>deque head</italic> and <italic>tail</italic> are arrays <inline-formula id="ieqn-131">
<alternatives><inline-graphic xlink:href="ieqn-131.png"/><tex-math id="tex-ieqn-131"><![CDATA[${\rm {\cal C}}.head$]]></tex-math><mml:math id="mml-ieqn-131"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi></mml:math>
</alternatives></inline-formula> and <inline-formula id="ieqn-132">
<alternatives><inline-graphic xlink:href="ieqn-132.png"/><tex-math id="tex-ieqn-132"><![CDATA[${\rm {\cal C}}.tail \in {{\rm {\mathbb N}}^p}$]]></tex-math><mml:math id="mml-ieqn-132"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mi>p</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, respectively, which represent the second-level indices that for each thread keep a number of column in <inline-formula id="ieqn-133">
<alternatives><inline-graphic xlink:href="ieqn-133.png"/><tex-math id="tex-ieqn-133"><![CDATA[${\rm {\cal C}}.index$]]></tex-math><mml:math id="mml-ieqn-133"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi></mml:math>
</alternatives></inline-formula> with the most recent NULL value, and with the least recent non-NULL value, respectively.</p>
<p>Let <inline-formula id="ieqn-134">
<alternatives><inline-graphic xlink:href="ieqn-134.png"/><tex-math id="tex-ieqn-134"><![CDATA[$H$]]></tex-math><mml:math id="mml-ieqn-134"><mml:mi>H</mml:mi></mml:math>
</alternatives></inline-formula> (<inline-formula id="ieqn-135">
<alternatives><inline-graphic xlink:href="ieqn-135.png"/><tex-math id="tex-ieqn-135"><![CDATA[$H < L \ll N$]]></tex-math><mml:math id="mml-ieqn-135"><mml:mi>H</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>L</mml:mi><mml:mo>&#x226A;</mml:mo><mml:mi>N</mml:mi></mml:math>
</alternatives></inline-formula>) be the number of candidates selected at a cluster node during the algorithm&#x2019;s first phase. Then the <italic>candidate body</italic> is the matrix <inline-formula id="ieqn-136">
<alternatives><inline-graphic xlink:href="ieqn-136.png"/><tex-math id="tex-ieqn-136"><![CDATA[${\rm {\cal C}}.cand \in {{\rm {\mathbb R}}^{H \times n}}$]]></tex-math><mml:math id="mml-ieqn-136"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, which represents the candidate subsequences itself. The candidate body is accompanied by an array <inline-formula id="ieqn-137">
<alternatives><inline-graphic xlink:href="ieqn-137.png"/><tex-math id="tex-ieqn-137"><![CDATA[${\rm {\cal C}}.pos \in {{\rm {\mathbb N}}^H}$]]></tex-math><mml:math id="mml-ieqn-137"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mi>H</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, which stores starting points of candidate subsequences in the input time series.</p>
<p>After the selection phase, all the nodes exchange the candidates found to construct the combined candidate set, so at each cluster node the candidate body will contain potential discords from all the nodes. At the second phase, the algorithm refines the combined candidate set comparing the parameter <inline-formula id="ieqn-138">
<alternatives><inline-graphic xlink:href="ieqn-138.png"/><tex-math id="tex-ieqn-138"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-138"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> and distances between each element of the candidate body and each element of the subsequence matrix.</p>
<p>To parallelize this activity, we process rows of the subsequence matrix in the segment-wise manner and employ an additional attribute of the candidate body, namely <italic>bitmap</italic>. The <italic>bitmap</italic> is organized as a matrix <inline-formula id="ieqn-139">
<alternatives><inline-graphic xlink:href="ieqn-139.png"/><tex-math id="tex-ieqn-139"><![CDATA[${\rm {\cal C}}.bitmap \in {{\rm {\cal B}}^{p \times H}}$]]></tex-math><mml:math id="mml-ieqn-139"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">B</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>p</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>H</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, which indicates the fact that an element of the candidate body has been successfully validated against all elements in a segment of the subsequence matrix. Thus, after the algorithm&#x2019;s second phase, <inline-formula id="ieqn-140">
<alternatives><inline-graphic xlink:href="ieqn-140.png"/><tex-math id="tex-ieqn-140"><![CDATA[$i$]]></tex-math><mml:math id="mml-ieqn-140"><mml:mi>i</mml:mi></mml:math>
</alternatives></inline-formula>-th element of the candidate body is successfully validated if <inline-formula id="ieqn-141">
<alternatives><inline-graphic xlink:href="ieqn-141.png"/><tex-math id="tex-ieqn-141"><![CDATA[$\wedge _{\ell = 1}^p{\rm {\cal C}}.bitmap\left( {\ell ,i} \right)$]]></tex-math><mml:math id="mml-ieqn-141"><mml:msubsup><mml:mo>&#x2227;</mml:mo><mml:mrow><mml:mi>&#x2113;</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>p</mml:mi></mml:msubsup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x2113;</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> is true.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Parallel Implementation of the Algorithm</title>
<p>In the implementation, we apply the following parallelization scheme at the level of the cluster nodes. Let the input time series <inline-formula id="ieqn-142">
<alternatives><inline-graphic xlink:href="ieqn-142.png"/><tex-math id="tex-ieqn-142"><![CDATA[$T$]]></tex-math><mml:math id="mml-ieqn-142"><mml:mi>T</mml:mi></mml:math>
</alternatives></inline-formula> be partitioned evenly across <inline-formula id="ieqn-143">
<alternatives><inline-graphic xlink:href="ieqn-143.png"/><tex-math id="tex-ieqn-143"><![CDATA[$P$]]></tex-math><mml:math id="mml-ieqn-143"><mml:mi>P</mml:mi></mml:math>
</alternatives></inline-formula> cluster nodes. Each node performs the selection phase on its own partition with the same threshold parameter <inline-formula id="ieqn-144">
<alternatives><inline-graphic xlink:href="ieqn-144.png"/><tex-math id="tex-ieqn-144"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-144"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> and produces distinct candidate set <inline-formula id="ieqn-145">
<alternatives><inline-graphic xlink:href="ieqn-145.png"/><tex-math id="tex-ieqn-145"><![CDATA[${{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-145"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>.</p>
<p>Next, as opposed to MR-DADD [<xref ref-type="bibr" rid="ref-4">4</xref>], each node refines its own candidate set <inline-formula id="ieqn-146">
<alternatives><inline-graphic xlink:href="ieqn-146.png"/><tex-math id="tex-ieqn-146"><![CDATA[${{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-146"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> with respect to the <inline-formula id="ieqn-147">
<alternatives><inline-graphic xlink:href="ieqn-147.png"/><tex-math id="tex-ieqn-147"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-147"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> value. Indeed, a candidate cannot be the true discord if it is pruned in the refinement phase in at least one cluster node. Thus, by the local refinement procedure, we try to reduce each candidate set <inline-formula id="ieqn-148">
<alternatives><inline-graphic xlink:href="ieqn-148.png"/><tex-math id="tex-ieqn-148"><![CDATA[${{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-148"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> and, in turn, the combined candidate set <inline-formula id="ieqn-149">
<alternatives><inline-graphic xlink:href="ieqn-149.png"/><tex-math id="tex-ieqn-149"><![CDATA[${{\rm {\cal C}}^P} = \cup _{i = 1}^P{{\rm {\cal C}}^i}.$]]></tex-math><mml:math id="mml-ieqn-149"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msup></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mo>&#x222A;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow><mml:mo>.</mml:mo></mml:math>
</alternatives></inline-formula> In the experiments, this allows us to reduce the size of the combined candidate set at times.</p>
<p>Then the combined candidate set <inline-formula id="ieqn-150">
<alternatives><inline-graphic xlink:href="ieqn-150.png"/><tex-math id="tex-ieqn-150"><![CDATA[${{\rm {\cal C}}^P}$]]></tex-math><mml:math id="mml-ieqn-150"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> is constructed and transmitted to each cluster node. Next, a node refines <inline-formula id="ieqn-151">
<alternatives><inline-graphic xlink:href="ieqn-151.png"/><tex-math id="tex-ieqn-151"><![CDATA[${{\rm {\cal C}}^P}$]]></tex-math><mml:math id="mml-ieqn-151"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> over its own partition, and produces the result <inline-formula id="ieqn-152">
<alternatives><inline-graphic xlink:href="ieqn-152.png"/><tex-math id="tex-ieqn-152"><![CDATA[${{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-152"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>. Finally, the true discords set is constructed as <inline-formula id="ieqn-153">
<alternatives><inline-graphic xlink:href="ieqn-153.png"/><tex-math id="tex-ieqn-153"><![CDATA[${{\rm {\cal C}}^P} = \cap _{i = 1}^P{{\rm {\cal C}}^i}$]]></tex-math><mml:math id="mml-ieqn-153"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>P</mml:mi></mml:msup></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mo>&#x2229;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>P</mml:mi></mml:msubsup><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mi>i</mml:mi></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>.</p>
<p>The parallel implementation of the candidate selection and refinement phases is depicted in <xref ref-type="table" rid="table-4">Algorithm 2</xref> and <xref ref-type="table" rid="table-5">Algorithm 3</xref>, respectively. To speed up the computations at a cluster node, we omit the square root calculation since this does not change the relative ranking of the candidates (indeed, the <inline-formula id="ieqn-154">
<alternatives><inline-graphic xlink:href="ieqn-154.png"/><tex-math id="tex-ieqn-154"><![CDATA[${\rm ED}$]]></tex-math><mml:math id="mml-ieqn-154"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi><mml:mi mathvariant="normal">D</mml:mi></mml:mrow></mml:math>
</alternatives></inline-formula> function is monotonic and concave).</p>

<table-wrap id="table-4">
<label>Algorithm 2</label>
<caption>
<title>Parallel Candidate Selection (in <italic>T</italic>, <italic>r</italic>; out <inline-formula id="ieqn-155">
<alternatives><inline-graphic xlink:href="ieqn-155.png"/><tex-math id="tex-ieqn-155"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-155"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>)</title>
</caption>
<table>
<colgroup>
<col/>
</colgroup>
<tbody>
<tr>
<td>1: <bold>#pragma omp parallel</bold><break/>2: <italic>iam</italic> &#x2190; <bold>omp_get_thread_num()</bold><break/>3: <bold>#pragma omp for</bold><break/>4: <bold>for</bold> <italic>i</italic> <bold>from</bold> 1 <bold>to</bold> <inline-formula id="ieqn-156">
<alternatives><inline-graphic xlink:href="ieqn-156.png"/><tex-math id="tex-ieqn-156"><![CDATA[$N$]]></tex-math><mml:math id="mml-ieqn-156"><mml:mi>N</mml:mi></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>5: &#x2002;<italic>isCand</italic> &#x2190; TRUE<break/>6: &#x2002;<bold>for</bold> <italic>j</italic> <bold>from</bold> 1 <bold>to</bold> <inline-formula id="ieqn-157">
<alternatives><inline-graphic xlink:href="ieqn-157.png"/><tex-math id="tex-ieqn-157"><![CDATA[${\rm {\cal C}}.tail\left( {iam} \right)$]]></tex-math><mml:math id="mml-ieqn-157"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>7: &#x2002;&#x2002;<bold>if</bold> <inline-formula id="ieqn-158">
<alternatives><inline-graphic xlink:href="ieqn-158.png"/><tex-math id="tex-ieqn-158"><![CDATA[${\rm {\cal C}}.index\left( {iam,j} \right) =$]]></tex-math><mml:math id="mml-ieqn-158"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo></mml:math>
</alternatives></inline-formula> NULL <bold>or</bold> <inline-formula id="ieqn-159">
<alternatives><inline-graphic xlink:href="ieqn-159.png"/><tex-math id="tex-ieqn-159"><![CDATA[$\left| {{\rm {\cal C}}.index\left( {iam,j} \right) - i} \right| < n$]]></tex-math><mml:math id="mml-ieqn-159"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> <bold>then</bold><break/>8: &#x2002;&#x2002;&#x2002;<bold>continue</bold><break/>9: &#x2002;&#x2002;<bold>if</bold> <inline-formula id="ieqn-160">
<alternatives><inline-graphic xlink:href="ieqn-160.png"/><tex-math id="tex-ieqn-160"><![CDATA[${\rm E}{{\rm D}^2}\left( {S_T^n\left( {i,\cdot} \right),\; S_T^n\left( {{\rm {\cal C}}.index\left( {iam,i} \right),\cdot} \right)} \right) < {r^2}$]]></tex-math><mml:math id="mml-ieqn-160"><mml:mrow><mml:mi mathvariant="normal">E</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>i</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003C;</mml:mo><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>then</bold><break/>10: &#x2002;&#x2002;&#x2002;<italic>isCand</italic> &#x2190; FALSE; <inline-formula id="ieqn-161">
<alternatives><inline-graphic xlink:href="ieqn-161.png"/><tex-math id="tex-ieqn-161"><![CDATA[${\rm {\cal C}}.count\left( {iam} \right)$]]></tex-math><mml:math id="mml-ieqn-161"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <inline-formula id="ieqn-162">
<alternatives><inline-graphic xlink:href="ieqn-162.png"/><tex-math id="tex-ieqn-162"><![CDATA[${\rm {\cal C}}.count\left( {iam} \right) - 1$]]></tex-math><mml:math id="mml-ieqn-162"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula><break/>11: &#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-163">
<alternatives><inline-graphic xlink:href="ieqn-163.png"/><tex-math id="tex-ieqn-163"><![CDATA[${\rm {\cal C}}.index\left( {iam,j} \right)$]]></tex-math><mml:math id="mml-ieqn-163"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; NULL; <inline-formula id="ieqn-164">
<alternatives><inline-graphic xlink:href="ieqn-164.png"/><tex-math id="tex-ieqn-164"><![CDATA[$C.head\left( {iam} \right)$]]></tex-math><mml:math id="mml-ieqn-164"><mml:mi>C</mml:mi><mml:mo>.</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <italic>j</italic><break/>12: &#x2002;<bold>if</bold> <italic>isCand</italic> <bold>then</bold><break/>13: &#x2002;&#x2002;<inline-formula id="ieqn-165">
<alternatives><inline-graphic xlink:href="ieqn-165.png"/><tex-math id="tex-ieqn-165"><![CDATA[${\rm {\cal C}}.count\left( {iam} \right)$]]></tex-math><mml:math id="mml-ieqn-165"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <inline-formula id="ieqn-166">
<alternatives><inline-graphic xlink:href="ieqn-166.png"/><tex-math id="tex-ieqn-166"><![CDATA[${\rm {\cal C}}.count\left( {iam} \right) + 1$]]></tex-math><mml:math id="mml-ieqn-166"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>u</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula><break/>14: &#x2002;&#x2002;<bold>if</bold> <inline-formula id="ieqn-167">
<alternatives><inline-graphic xlink:href="ieqn-167.png"/><tex-math id="tex-ieqn-167"><![CDATA[${\rm {\cal C}}.index\left( {iam,\; {\rm {\cal C}}.head\left( {iam} \right)} \right) =$]]></tex-math><mml:math id="mml-ieqn-167"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo></mml:math>
</alternatives></inline-formula> NULL <bold>then</bold><break/>15: &#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-168">
<alternatives><inline-graphic xlink:href="ieqn-168.png"/><tex-math id="tex-ieqn-168"><![CDATA[${\rm {\cal C}}.index\left( {iam,\; {\rm {\cal C}}.head\left( {iam} \right)} \right)$]]></tex-math><mml:math id="mml-ieqn-168"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>h</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <italic>i</italic><break/>16: &#x2002;&#x2002;<bold>else</bold><break/>17: &#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-169">
<alternatives><inline-graphic xlink:href="ieqn-169.png"/><tex-math id="tex-ieqn-169"><![CDATA[${\rm {\cal C}}.index\left( {iam,\; {\rm {\cal C}}.tail\left( {iam} \right)} \right)$]]></tex-math><mml:math id="mml-ieqn-169"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <italic>i</italic>; <inline-formula id="ieqn-170">
<alternatives><inline-graphic xlink:href="ieqn-170.png"/><tex-math id="tex-ieqn-170"><![CDATA[${\rm {\cal C}}.tail\left( {iam} \right)$]]></tex-math><mml:math id="mml-ieqn-170"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <inline-formula id="ieqn-171">
<alternatives><inline-graphic xlink:href="ieqn-171.png"/><tex-math id="tex-ieqn-171"><![CDATA[${\rm {\cal C}}.tail\left( {iam} \right) + 1$]]></tex-math><mml:math id="mml-ieqn-171"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>l</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mn>1</mml:mn></mml:math>
</alternatives></inline-formula><break/>18: <bold>return</bold> <inline-formula id="ieqn-172">
<alternatives><inline-graphic xlink:href="ieqn-172.png"/><tex-math id="tex-ieqn-172"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-172"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>

<table-wrap id="table-5">
<label>Algorithm 3</label>
<caption>
<title>Parallel Discord Refinement (in <italic>T</italic>, <italic>r</italic>; in out <inline-formula id="ieqn-173">
<alternatives><inline-graphic xlink:href="ieqn-173.png"/><tex-math id="tex-ieqn-173"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-173"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula>)</title>
</caption>
<table>
<colgroup>
<col/>
</colgroup>
<tbody>
<tr>
<td>1: <inline-formula id="ieqn-174">
<alternatives><inline-graphic xlink:href="ieqn-174.png"/><tex-math id="tex-ieqn-174"><![CDATA[${\rm {\cal C}}.bitmap$]]></tex-math><mml:math id="mml-ieqn-174"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi></mml:math>
</alternatives></inline-formula> &#x2190; TRUE<sub><italic>p &#x00D7; H</italic></sub><break/>2: <bold>#pragma omp parallel</bold><break/>3: <italic>iam</italic> &#x2190; <bold>omp_get_thread_num()</bold><break/>4: <bold>#pragma omp for</bold><break/>5: <bold>for</bold> <italic>i</italic> <bold>from</bold> 1 <bold>to</bold> <inline-formula id="ieqn-175">
<alternatives><inline-graphic xlink:href="ieqn-175.png"/><tex-math id="tex-ieqn-175"><![CDATA[$N$]]></tex-math><mml:math id="mml-ieqn-175"><mml:mi>N</mml:mi></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>6: &#x2002;<bold>for</bold> <italic>j</italic> <bold>from</bold> 1 <bold>to</bold> <inline-formula id="ieqn-176">
<alternatives><inline-graphic xlink:href="ieqn-176.png"/><tex-math id="tex-ieqn-176"><![CDATA[$H$]]></tex-math><mml:math id="mml-ieqn-176"><mml:mi>H</mml:mi></mml:math>
</alternatives></inline-formula> <bold>do</bold><break/>7: &#x2002;&#x2002;<inline-formula id="ieqn-177">
<alternatives><inline-graphic xlink:href="ieqn-177.png"/><tex-math id="tex-ieqn-177"><![CDATA[${\rm {\cal C}}.bitmap\left( {iam,j} \right)$]]></tex-math><mml:math id="mml-ieqn-177"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> &#x2190; <inline-formula id="ieqn-178">
<alternatives><inline-graphic xlink:href="ieqn-178.png"/><tex-math id="tex-ieqn-178"><![CDATA[${\rm {\cal C}}.bitmap\left( {iam,j} \right)$]]></tex-math><mml:math id="mml-ieqn-178"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>b</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>m</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula> <bold>and</bold> <inline-formula id="ieqn-179">
<alternatives><inline-graphic xlink:href="ieqn-179.png"/><tex-math id="tex-ieqn-179"><![CDATA[$\left( {{\rm E}{{\rm D}^2}\left( {S_T^n\left( {i,\cdot} \right),{\rm \; {\cal C}}.cand\left( {j,\cdot} \right)} \right) \ge {r^2}} \right)$]]></tex-math><mml:math id="mml-ieqn-179"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">E</mml:mi></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msubsup><mml:mi>S</mml:mi><mml:mi>T</mml:mi><mml:mi>n</mml:mi></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mspace width="thickmathspace"></mml:mspace><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>.</mml:mo><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi><mml:mi>d</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mo>&#x22C5;</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mrow><mml:msup><mml:mi>r</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula><break/>8: <bold>return</bold> <inline-formula id="ieqn-180">
<alternatives><inline-graphic xlink:href="ieqn-180.png"/><tex-math id="tex-ieqn-180"><![CDATA[${\rm {\cal C}}$]]></tex-math><mml:math id="mml-ieqn-180"><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow></mml:math>
</alternatives></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>In the selection phase, we parallelize the outer loop along the rows of the subsequence matrix while in the inner loop along the candidates, each thread processes its own segment of the candidate index. By the end of the phase, the candidates found by each thread are placed into the candidate body, and all the cluster nodes exchange the resulting candidate bodies by the MPI_Send and MPI_Recv functions to form the combined candidate set, which serves as an input for the second phase.</p>
<p>In the refinement phase, we also parallelize the outer loop along the rows of the subsequence matrix, and in the inner loop along the candidates, each thread processes its own segments of the candidate body and bitmap. In this implementation, we do not use the early abandoning technique for the distance calculation relying on the fact that vectorization of the square of the Euclidean distance may give us more benefits. By the end of the phase, the column-wise conjunction of the elements in the bitmap matrix will result in a set of true discords found by the current cluster node. An intersection of such sets is implemented by one of the cluster nodes where the rest nodes send their resulting sets.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Experiments</title>
<p>We evaluated the proposed algorithm during the experiments conducted on the Tornado SUSU computer cluster [<xref ref-type="bibr" rid="ref-23">23</xref>] with the nodes based on the Intel MIC accelerators [<xref ref-type="bibr" rid="ref-24">24</xref>]. Each cluster node is equipped by the Intel Xeon Phi SE10X accelerator with a peak performance 1.076 TFLOPS (60 cores at 1.1 GHz with hyper-threading factor <inline-formula id="ieqn-181">
<alternatives><inline-graphic xlink:href="ieqn-181.png"/><tex-math id="tex-ieqn-181"><![CDATA[$4 \times$]]></tex-math><mml:math id="mml-ieqn-181"><mml:mn>4</mml:mn><mml:mo>&#x00D7;</mml:mo></mml:math>
</alternatives></inline-formula>). In the experiments, we investigated scalability of our approach and compared it with analogs, and the results are given below in Sections 5.1 and 5.2, respectively.</p>
<sec id="s5_1">
<label>5.1</label>
<title>The Algorithm&#x2019;s Scalability</title>
<p>In the first series of the experiments, we assessed the algorithm&#x2019;s scaled speedup, which is defined as the speedup obtained when the problem size is increased linearly with the number of the nodes added to the computer cluster [<xref ref-type="bibr" rid="ref-25">25</xref>]. Being applied to our problem, the algorithm&#x2019;s scaled speedup is calculated as</p>
<p><disp-formula id="eqn-7">
<label>(7)</label>
<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-7.png"/><tex-math id="tex-eqn-7"><![CDATA[$${s_{scaled}} = \displaystyle{{n\cdot P\cdot \left| {{{\rm {\cal C}}_{\left( {P\cdot m} \right)}}} \right|} \over {{t_{P\cdot \left( {P\cdot m} \right)}}}},$$]]></tex-math><mml:math id="mml-eqn-7" display="block"><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>d</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:mo>,</mml:mo></mml:mstyle></mml:math>
</alternatives></disp-formula></p>
<p>where <inline-formula id="ieqn-182">
<alternatives><inline-graphic xlink:href="ieqn-182.png"/><tex-math id="tex-ieqn-182"><![CDATA[$n$]]></tex-math><mml:math id="mml-ieqn-182"><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> is the discord length, <inline-formula id="ieqn-183">
<alternatives><inline-graphic xlink:href="ieqn-183.png"/><tex-math id="tex-ieqn-183"><![CDATA[$P$]]></tex-math><mml:math id="mml-ieqn-183"><mml:mi>P</mml:mi></mml:math>
</alternatives></inline-formula> is the number of the cluster nodes, <inline-formula id="ieqn-184">
<alternatives><inline-graphic xlink:href="ieqn-184.png"/><tex-math id="tex-ieqn-184"><![CDATA[$m$]]></tex-math><mml:math id="mml-ieqn-184"><mml:mi>m</mml:mi></mml:math>
</alternatives></inline-formula> is a factor of the time series length, <inline-formula id="ieqn-185">
<alternatives><inline-graphic xlink:href="ieqn-185.png"/><tex-math id="tex-ieqn-185"><![CDATA[${{\rm {\cal C}}_{\left( {P\cdot m} \right)}}$]]></tex-math><mml:math id="mml-ieqn-185"><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> is a set of all the candidate discords selected by the algorithm at its first phase from a time series of length <inline-formula id="ieqn-186">
<alternatives><inline-graphic xlink:href="ieqn-186.png"/><tex-math id="tex-ieqn-186"><![CDATA[$P\cdot m$]]></tex-math><mml:math id="mml-ieqn-186"><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:math>
</alternatives></inline-formula> and <inline-formula id="ieqn-187">
<alternatives><inline-graphic xlink:href="ieqn-187.png"/><tex-math id="tex-ieqn-187"><![CDATA[${t_{P\cdot \left( {P\cdot m} \right)}}$]]></tex-math><mml:math id="mml-ieqn-187"><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:math>
</alternatives></inline-formula> is the algorithm&#x2019;s run time when the time series is processed on <inline-formula id="ieqn-188">
<alternatives><inline-graphic xlink:href="ieqn-188.png"/><tex-math id="tex-ieqn-188"><![CDATA[$P$]]></tex-math><mml:math id="mml-ieqn-188"><mml:mi>P</mml:mi></mml:math>
</alternatives></inline-formula> nodes.</p>
<p>For the evaluation, we took ECG time series [<xref ref-type="bibr" rid="ref-26">26</xref>] (see <xref ref-type="table" rid="table-1">Tab. 1</xref> for the summary of the data involved). In the experiments, we discovered discords on up to 128 cluster nodes with the time series factor <inline-formula id="ieqn-189">
<alternatives><inline-graphic xlink:href="ieqn-189.png"/><tex-math id="tex-ieqn-189"><![CDATA[$m = {10^6}$]]></tex-math><mml:math id="mml-ieqn-189"><mml:mi>m</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:msup><mml:mn>10</mml:mn><mml:mn>6</mml:mn></mml:msup></mml:mrow></mml:math>
</alternatives></inline-formula>, and varied the discord&#x2019;s length <inline-formula id="ieqn-190">
<alternatives><inline-graphic xlink:href="ieqn-190.png"/><tex-math id="tex-ieqn-190"><![CDATA[$n$]]></tex-math><mml:math id="mml-ieqn-190"><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula> while the range parameter <inline-formula id="ieqn-191">
<alternatives><inline-graphic xlink:href="ieqn-191.png"/><tex-math id="tex-ieqn-191"><![CDATA[$r$]]></tex-math><mml:math id="mml-ieqn-191"><mml:mi>r</mml:mi></mml:math>
</alternatives></inline-formula> was chosen empirically to provide the algorithm&#x2019;s best performance.</p>

<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Time series involved in the experiments on the algorithm&#x2019;s scalability</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr><th rowspan="2"># cluster nodes, <inline-formula id="ieqn-192">
<alternatives><inline-graphic xlink:href="ieqn-192.png"/><tex-math id="tex-ieqn-192"><![CDATA[$P$]]></tex-math><mml:math id="mml-ieqn-192"><mml:mi>P</mml:mi></mml:math>
</alternatives></inline-formula></th><th rowspan="2">Time series length, <inline-formula id="ieqn-193">
<alternatives><inline-graphic xlink:href="ieqn-193.png"/><tex-math id="tex-ieqn-193"><![CDATA[$P\cdot m$]]></tex-math><mml:math id="mml-ieqn-193"><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:math>
</alternatives></inline-formula></th><th colspan="2"><inline-formula id="ieqn-194">
<alternatives><inline-graphic xlink:href="ieqn-194.png"/><tex-math id="tex-ieqn-194"><![CDATA[$n = 128$]]></tex-math><mml:math id="mml-ieqn-194"><mml:mi>n</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>128</mml:mn></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-195">
<alternatives><inline-graphic xlink:href="ieqn-195.png"/><tex-math id="tex-ieqn-195"><![CDATA[$r = 10$]]></tex-math><mml:math id="mml-ieqn-195"><mml:mi>r</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>10</mml:mn></mml:math>
</alternatives></inline-formula><break/># discords</th><th colspan="2"><inline-formula id="ieqn-196">
<alternatives><inline-graphic xlink:href="ieqn-196.png"/><tex-math id="tex-ieqn-196"><![CDATA[$n = 256$]]></tex-math><mml:math id="mml-ieqn-196"><mml:mi>n</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>256</mml:mn></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-197">
<alternatives><inline-graphic xlink:href="ieqn-197.png"/><tex-math id="tex-ieqn-197"><![CDATA[$r = 16$]]></tex-math><mml:math id="mml-ieqn-197"><mml:mi>r</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>16</mml:mn></mml:math>
</alternatives></inline-formula><break/># discords</th><th colspan="2"><inline-formula id="ieqn-198">
<alternatives><inline-graphic xlink:href="ieqn-198.png"/><tex-math id="tex-ieqn-198"><![CDATA[$n = 512$]]></tex-math><mml:math id="mml-ieqn-198"><mml:mi>n</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>512</mml:mn></mml:math>
</alternatives></inline-formula>, <inline-formula id="ieqn-199">
<alternatives><inline-graphic xlink:href="ieqn-199.png"/><tex-math id="tex-ieqn-199"><![CDATA[$r = 24.5$]]></tex-math><mml:math id="mml-ieqn-199"><mml:mi>r</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>24.5</mml:mn></mml:math>
</alternatives></inline-formula><break/># discords</th>
</tr>
<tr>
<th>candidate,<break/><inline-formula id="ieqn-200">
<alternatives><inline-graphic xlink:href="ieqn-200.png"/><tex-math id="tex-ieqn-200"><![CDATA[$\left| {{{\rm {\cal C}}_{\left( {P\cdot m} \right)}}} \right|$]]></tex-math><mml:math id="mml-ieqn-200"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
<th>refined,<break/><inline-formula id="ieqn-201">
<alternatives><inline-graphic xlink:href="ieqn-201.png"/><tex-math id="tex-ieqn-201"><![CDATA[$\left| {\rm {\cal C}} \right|$]]></tex-math><mml:math id="mml-ieqn-201"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
<th>candidate,<break/><inline-formula id="ieqn-202">
<alternatives><inline-graphic xlink:href="ieqn-202.png"/><tex-math id="tex-ieqn-202"><![CDATA[$\left| {{{\rm {\cal C}}_{\left( {P\cdot m} \right)}}} \right|$]]></tex-math><mml:math id="mml-ieqn-202"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
<th>refined,<break/><inline-formula id="ieqn-203">
<alternatives><inline-graphic xlink:href="ieqn-203.png"/><tex-math id="tex-ieqn-203"><![CDATA[$\left| {\rm {\cal C}} \right|$]]></tex-math><mml:math id="mml-ieqn-203"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
<th>candidate,<break/><inline-formula id="ieqn-204">
<alternatives><inline-graphic xlink:href="ieqn-204.png"/><tex-math id="tex-ieqn-204"><![CDATA[$\left| {{{\rm {\cal C}}_{\left( {P\cdot m} \right)}}} \right|$]]></tex-math><mml:math id="mml-ieqn-204"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi>m</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
<th>refined,<break/><inline-formula id="ieqn-205">
<alternatives><inline-graphic xlink:href="ieqn-205.png"/><tex-math id="tex-ieqn-205"><![CDATA[$\left| {\rm {\cal C}} \right|$]]></tex-math><mml:math id="mml-ieqn-205"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">C</mml:mi></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>10<sup>6</sup></td>
<td>5,407</td>
<td>772<break/>(14%)</td>
<td>28,916</td>
<td>533<break/>(2%)</td>
<td>17,366</td>
<td>548<break/>(3%)</td>
</tr>
<tr>
<td>2</td>
<td>2&#x00B7;10<sup>6</sup></td>
<td>15,115</td>
<td>3,245<break/>(21%)</td>
<td>15,979</td>
<td>2,124<break/>(13%)</td>
<td>28,376</td>
<td>1,174<break/>(4%)</td>
</tr>
<tr>
<td>4</td>
<td>4&#x00B7;10<sup>6</sup></td>
<td>23,990</td>
<td>3,199<break/>(13%)</td>
<td>29,075</td>
<td>2,098<break/>(14%)</td>
<td>54,281</td>
<td>1,169<break/>(2%)</td>
</tr>
<tr>
<td>8</td>
<td>8&#x00B7;10<sup>6</sup></td>
<td>45,989</td>
<td>3,167<break/>(7%)</td>
<td>53,678</td>
<td>1,992<break/>(4%)</td>
<td>109,890</td>
<td>1,070<break/>(0.9%)</td>
</tr>
<tr>
<td>16</td>
<td>1.6&#x00B7;10<sup>7</sup></td>
<td>117,659</td>
<td>2,779<break/>(2%)</td>
<td>129,528</td>
<td>1,533<break/>(1%)</td>
<td>299,157</td>
<td>1892<break/>(0.3%)</td>
</tr>
<tr>
<td>32</td>
<td>3.2&#x00B7;10<sup>7</sup></td>
<td>374,536</td>
<td>2,666<break/>(0.7%)</td>
<td>294,844</td>
<td>1,288<break/>(0.4%)</td>
<td>732,517</td>
<td>721<break/>(0.09%)</td>
</tr>
<tr>
<td>64</td>
<td>6.4&#x00B7;10<sup>7</sup></td>
<td>587,795</td>
<td>2,517<break/>(0.4%)</td>
<td>707,956</td>
<td>1,036<break/>(0.1%)</td>
<td>1,785,410</td>
<td>570<break/>(0.03%)</td>
</tr>
<tr>
<td>128</td>
<td>1.28&#x00B7;10<sup>8</sup></td>
<td>1,145,661</td>
<td>2,324<break/>(0.2%)</td>
<td>1,541,099</td>
<td>764<break/>(0.05%)</td>
<td>4,743,032</td>
<td>765<break/>(0.02%)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The results of the experiments are depicted in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. As can be seen, our algorithm adapts well to increasing both the time series length and number of cluster nodes, and demonstrates the linear scaled speedup. As expected, the algorithm shows a better scalability with larger values of the discord length because this provides a higher computational load.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The scaled speedup of the algorithm</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-2.png"/>
</fig>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Comparison with Analogs</title>
<p>In the second series of the experiments, we compared the performance of our algorithm against the analogs we have already considered in Section 3, namely DDD [<xref ref-type="bibr" rid="ref-15">15</xref>], MR-DADD [<xref ref-type="bibr" rid="ref-4">4</xref>], GPU-STAMP [<xref ref-type="bibr" rid="ref-19">19</xref>], and MP-HPC [<xref ref-type="bibr" rid="ref-21">21</xref>]. We omit the PDD algorithm [<xref ref-type="bibr" rid="ref-18">18</xref>] since in our previous experiments [<xref ref-type="bibr" rid="ref-5">5</xref>], PDD was substantially far behind our parallel in-memory algorithm due to the overhead caused by the message passing among the cluster nodes.</p>
<p>Throughout the experiments, we used the synthetic time series generated according the Random Walk model [<xref ref-type="bibr" rid="ref-27">27</xref>] as that ones were employed for the evaluation by the competitors. For comparison purposes, we used the run times reported by the authors of the respective algorithms. To perform the comparison, we ran our algorithm on Tornado SUSU with a reduced number of nodes and cores at a node to make the peak performance of our hardware platform approximately equal to that of the system on which the corresponding competitor was evaluated.</p>
<p><xref ref-type="table" rid="table-2">Tab. 2</xref> summarizes the performance of the proposed algorithm compared with the analogs. We can see that our algorithm outruns its competitors. As expected, direct analogs DDD and MR-DADD are inferior to our algorithm since they do not employ parallelism within a single cluster node. Additionally, indirect analogs GPU-STAMP and MP-HPC are behind our algorithm since they initially aim to solve a computationally more complex problem of computing the matrix profile, which can also be used for discords discovery among many other time series mining problems.</p>

<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Comparison of the proposed algorithm with analogs</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr><th colspan="2">Analog</th><th colspan="2">Our algorithm,<break/>hardware</th><th colspan="2">Time series</th><th colspan="2">Performance, s</th>
</tr>
<tr>
<th>Algorithm</th>
<th>Hardware</th>
<th># Cluster nodes</th>
<th># Cores (threads) per node</th>
<th><inline-formula id="ieqn-206">
<alternatives><inline-graphic xlink:href="ieqn-206.png"/><tex-math id="tex-ieqn-206"><![CDATA[$\left| T \right|$]]></tex-math><mml:math id="mml-ieqn-206"><mml:mrow><mml:mo>|</mml:mo><mml:mi>T</mml:mi><mml:mo>|</mml:mo></mml:mrow></mml:math>
</alternatives></inline-formula></th>
<th><inline-formula id="ieqn-207">
<alternatives><inline-graphic xlink:href="ieqn-207.png"/><tex-math id="tex-ieqn-207"><![CDATA[$n$]]></tex-math><mml:math id="mml-ieqn-207"><mml:mi>n</mml:mi></mml:math>
</alternatives></inline-formula></th>
<th>Analog</th>
<th>Our<break/>algorithm</th>
</tr>
</thead>
<tbody>
<tr>
<td>DDD</td>
<td>4 CPU @2.13 GHz</td>
<td>2</td>
<td>4 (16)</td>
<td>10<sup>7</sup></td>
<td>512</td>
<td>5,382</td>
<td>745</td>
</tr>
<tr>
<td>MR-DADD</td>
<td>8 CPU @3.0 GHz</td>
<td>2</td>
<td>8 (32)</td>
<td>10<sup>6</sup></td>
<td>512</td>
<td>240</td>
<td>99</td>
</tr>
<tr>
<td>GPU-STAMP</td>
<td>2880 CUDA cores<break/>@0.745 GHz</td>
<td>2</td>
<td>30 (120)</td>
<td>2<sup>21</sup></td>
<td>256</td>
<td>11,664</td>
<td>83</td>
</tr>
<tr>
<td>MP-HPC</td>
<td>39 CPU @2.6 GHz</td>
<td>4</td>
<td>60 (240)</td>
<td>8&#x00B7;10<sup>5</sup></td>
<td>1000</td>
<td>6,000</td>
<td>32</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusion</title>
<p>In this article, we addressed the problem of discovering the anomalous subsequences in a very long time series. Currently, there is a wide spectrum of real-world applications where it is typical to deal with multi-terabyte time series, which cannot fit in the main memory: medicine, astronomy, economics, climate modeling, predictive maintenance, energy consumption, and others. In the study, we employ the discord concept, which is a subsequence of the time series that has the largest distance to its nearest non-self match neighbor subsequence.</p>
<p>We proposed a novel parallel algorithm for discords discovery in very long time series on the modern high-performance cluster with the nodes based on many-core accelerators. Our algorithm utilizes the serial disk-aware algorithm by Yankov, Keogh et al. [<xref ref-type="bibr" rid="ref-4">4</xref>] as a basis. We achieve parallelization among the cluster nodes as well as within a single node. At the level of the cluster nodes, we modified the original parallelization scheme that allowed us to reduce the number of candidate discords to be processed. Within a single cluster node, we proposed a set of the matrix data structures to store and index the subsequences of a time series, and to provide an efficient vectorization of computations on the many-core accelerator.</p>
<p>The experimental evaluation on the real computer cluster with the real and synthetic time series shows the high scalability of the proposed algorithm. Throughout the experiments on real computer cluster over real and synthetic time series, our algorithm showed the linear scalability, increasing in the case of a high computational load due to a greater discord length. Also, the algorithm&#x2019;s performance was ahead of the analogs that do not employ both computer cluster and many-core accelerators.</p>
<p>In further studies, we plan to elaborate versions of the algorithm for computer clusters with GPU nodes.</p>
</sec>
</body>
<back><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> This work was financially supported by the Russian Foundation for Basic Research (Grant No. 20-07-00140) and by the Ministry of Science and Higher Education of the Russian Federation (Government Order FENU-2020-0022).</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1">
<label>[1]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Lin</surname></string-name> and <string-name>
<given-names>A. W.</given-names> 
<surname>Fu</surname></string-name>
</person-group>, &#x201C;
<article-title>HOT SAX: Efficiently finding the most unusual time series subsequence</article-title>,&#x201D; in <conf-name>Proc. ICDM</conf-name>, 
<conf-loc>Houston, TX, USA</conf-loc>, pp. 
<fpage>226</fpage>&#x2013;
<lpage>233</lpage>, 
<year>2005</year>.</mixed-citation>
</ref>
<ref id="ref-2">
<label>[2]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Lonardi</surname></string-name> and <string-name>
<given-names>C. A.</given-names> 
<surname>Ratanamahatana</surname></string-name>
</person-group>, &#x201C;
<article-title>Towards parameter-free data mining</article-title>,&#x201D; in <conf-name>Proc. KDD</conf-name>, 
<conf-loc>Seattle, WA, USA</conf-loc>, pp. 
<fpage>206</fpage>&#x2013;
<lpage>215</lpage>, 
<year>2004</year>.</mixed-citation>
</ref>
<ref id="ref-3">
<label>[3]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>D.</given-names> 
<surname>Yankov</surname></string-name>, <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name> and <string-name>
<given-names>U.</given-names> 
<surname>Rebbapragada</surname></string-name>
</person-group>, &#x201C;
<article-title>Disk aware discord discovery: Finding unusual time series in terabyte sized datasets</article-title>,&#x201D; in <conf-name>Proc. ICDM</conf-name>, 
<conf-loc>Omaha, NE, USA</conf-loc>, pp. 
<fpage>381</fpage>&#x2013;
<lpage>390</lpage>, 
<year>2007</year>.</mixed-citation>
</ref>
<ref id="ref-4">
<label>[4]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>D.</given-names> 
<surname>Yankov</surname></string-name>, <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name> and <string-name>
<given-names>U.</given-names> 
<surname>Rebbapragada</surname></string-name>
</person-group>, &#x201C;
<article-title>Disk aware discord discovery: Finding unusual time series in terabyte sized datasets</article-title>,&#x201D; 
<source>Knowledge and Information Systems</source>, vol. 
<volume>17</volume>, no. 
<issue>2</issue>, pp. 
<fpage>241</fpage>&#x2013;
<lpage>262</lpage>, 
<year>2008</year>.</mixed-citation>
</ref>
<ref id="ref-5">
<label>[5]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> 
<surname>Zymbler</surname></string-name>, <string-name>
<given-names>A.</given-names> 
<surname>Polyakov</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Kipnis</surname></string-name>
</person-group>, &#x201C;
<article-title>Time series discord discovery on Intel many-core systems</article-title>,&#x201D; in <conf-name>Proc. PCT</conf-name>, 
<conf-loc>Kaliningrad, Russia</conf-loc>, pp. 
<fpage>168</fpage>&#x2013;
<lpage>182</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-6">
<label>[6]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>B. Y.</given-names> 
<surname>Chiu</surname></string-name>, <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Lonardi</surname></string-name>
</person-group>, &#x201C;
<article-title>Probabilistic discovery of time series motifs</article-title>,&#x201D; in <conf-name>Proc. KDD</conf-name>, 
<conf-loc>Washington, D.C., USA</conf-loc>, pp. 
<fpage>493</fpage>&#x2013;
<lpage>498</lpage>, 
<year>2003</year>.</mixed-citation>
</ref>
<ref id="ref-7">
<label>[7]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>V.</given-names> 
<surname>Chandola</surname></string-name>, <string-name>
<given-names>D.</given-names> 
<surname>Cheboli</surname></string-name> and <string-name>
<given-names>V.</given-names> 
<surname>Kumar</surname></string-name>
</person-group>, &#x201C;
<article-title>Detecting anomalies in a time series database</article-title>.&#x201D; 
<comment>Technical Report</comment>, 
<publisher-name>Department of Computer Science and Engineering, University of Minnesota</publisher-name>, 
<publisher-loc>Minneapolis, MN, USA</publisher-loc>, 
<year>2009</year>.</mixed-citation>
</ref>
<ref id="ref-8">
<label>[8]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>J.</given-names> 
<surname>Lin</surname></string-name>, <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Lonardi</surname></string-name> and <string-name>
<given-names>B. Y.</given-names> 
<surname>Chiu</surname></string-name>
</person-group>, &#x201C;
<article-title>A symbolic representation of time series, with implications for streaming algorithms</article-title>,&#x201D; in <conf-name>Proc. DMKD</conf-name>, 
<conf-loc>San Diego, California, USA</conf-loc>, pp. 
<fpage>2</fpage>&#x2013;
<lpage>11</lpage>, 
<year>2003</year>.</mixed-citation>
</ref>
<ref id="ref-9">
<label>[9]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>J.</given-names> 
<surname>Shieh</surname></string-name> and <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name>
</person-group>, &#x201C;
<article-title>iSAX: Indexing and mining terabyte sized time series</article-title>,&#x201D; in <conf-name>Proc. KDD</conf-name>, 
<conf-loc>Las Vegas, Nevada, USA</conf-loc>, pp. 
<fpage>623</fpage>&#x2013;
<lpage>631</lpage>, 
<year>2008</year>.</mixed-citation>
</ref>
<ref id="ref-10">
<label>[10]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>H. T. Q.</given-names> 
<surname>Buu</surname></string-name> and <string-name>
<given-names>D. T.</given-names> 
<surname>Anh</surname></string-name>
</person-group>, &#x201C;
<article-title>Time series discord discovery based on iSAX symbolic representation</article-title>,&#x201D; in <conf-name>Proc. KSE</conf-name>, 
<conf-loc>Hanoi, Vietnam</conf-loc>, pp. 
<fpage>11</fpage>&#x2013;
<lpage>18</lpage>, 
<year>2011</year>.</mixed-citation>
</ref>
<ref id="ref-11">
<label>[11]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>A. W.</given-names> 
<surname>Fu</surname></string-name>, <string-name>
<given-names>O. T.</given-names> 
<surname>Leung</surname></string-name>, <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name> and <string-name>
<given-names>J.</given-names> 
<surname>Lin</surname></string-name>
</person-group>, &#x201C;
<article-title>Finding time series discords based on Haar transform</article-title>,&#x201D; in <conf-name>Proc. ADMA</conf-name>, 
<conf-loc>Xi&#x2019;an, China</conf-loc>, pp. 
<fpage>31</fpage>&#x2013;
<lpage>41</lpage>, 
<year>2006</year>.</mixed-citation>
</ref>
<ref id="ref-12">
<label>[12]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>H. T. T.</given-names> 
<surname>Thuy</surname></string-name>, <string-name>
<given-names>D. T.</given-names> 
<surname>Anh</surname></string-name> and <string-name>
<given-names>V. T. N.</given-names> 
<surname>Chau</surname></string-name>
</person-group>, &#x201C;
<article-title>An effective and efficient hash-based algorithm for time series discord discovery</article-title>,&#x201D; in <conf-name>Proc. NICS</conf-name>, 
<conf-loc>Danang, Vietnam</conf-loc>, pp. 
<fpage>85</fpage>&#x2013;
<lpage>90</lpage>, 
<year>2016</year>.</mixed-citation>
</ref>
<ref id="ref-13">
<label>[13]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>P. M.</given-names> 
<surname>Chau</surname></string-name>, <string-name>
<given-names>B. M.</given-names> 
<surname>Duc</surname></string-name> and <string-name>
<given-names>D. T.</given-names> 
<surname>Anh</surname></string-name>
</person-group>, &#x201C;
<article-title>Discord detection in streaming time series with the support of R-tree</article-title>,&#x201D; in <conf-name>Proc. ACOMP</conf-name>, 
<conf-loc>Ho Chi Minh City, Vietnam</conf-loc>, pp. 
<fpage>96</fpage>&#x2013;
<lpage>103</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-14">
<label>[14]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>G.</given-names> 
<surname>Li</surname></string-name>, <string-name>
<given-names>O.</given-names> 
<surname>Br&#x00E4;ysy</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Jiang</surname></string-name>, <string-name>
<given-names>Z.</given-names> 
<surname>Wu</surname></string-name> and <string-name>
<given-names>Y.</given-names> 
<surname>Wang</surname></string-name>
</person-group>, &#x201C;
<article-title>Finding time series discord based on bit representation clustering</article-title>,&#x201D; 
<source>Knowledge-Based Systems</source>, vol. 
<volume>54</volume>, pp. 
<fpage>243</fpage>&#x2013;
<lpage>254</lpage>, 
<year>2013</year>.</mixed-citation>
</ref>
<ref id="ref-15">
<label>[15]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Wu</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Zhu</surname></string-name>, <string-name>
<given-names>T.</given-names> 
<surname>Huang</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Li</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Liu</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Distributed discord discovery: Spark based anomaly detection in time series</article-title>,&#x201D; in <conf-name>Proc. HPCC, Proc. CSS, Proc. ICESS</conf-name>, 
<conf-loc>New York, NY, USA</conf-loc>, pp. 
<fpage>154</fpage>&#x2013;
<lpage>159</lpage>, 
<year>2015</year>.</mixed-citation>
</ref>
<ref id="ref-16">
<label>[16]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> 
<surname>Zaharia</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Chowdhury</surname></string-name>, <string-name>
<given-names>M. J.</given-names> 
<surname>Franklin</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Shenker</surname></string-name> and <string-name>
<given-names>I.</given-names> 
<surname>Stoica</surname></string-name>
</person-group>, &#x201C;
<article-title>Spark: Cluster computing with working sets</article-title>,&#x201D; in <conf-name>Proc. HotCloud</conf-name>, 
<conf-loc>Boston, MA</conf-loc>, 
<year>2010</year>.</mixed-citation>
</ref>
<ref id="ref-17">
<label>[17]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Ghemawat</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Gobioff</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Leung</surname></string-name>
</person-group>, &#x201C;
<article-title>The Google file system</article-title>,&#x201D; in <conf-name>Proc. SOSP</conf-name>, 
<conf-loc>Bolton Landing, NY, USA</conf-loc>, pp. 
<fpage>29</fpage>&#x2013;
<lpage>43</lpage>, 
<year>2003</year>.</mixed-citation>
</ref>
<ref id="ref-18">
<label>[18]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>T.</given-names> 
<surname>Huang</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Zhu</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Mao</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Li</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Liu</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Parallel discord discovery</article-title>,&#x201D; in <conf-name>Proc. PAKDD</conf-name>, 
<conf-loc>Auckland, New Zealand</conf-loc>, pp. 
<fpage>233</fpage>&#x2013;
<lpage>244</lpage>, 
<year>2016</year>.</mixed-citation>
</ref>
<ref id="ref-19">
<label>[19]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>C. M.</given-names> 
<surname>Yeh</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Zhu</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Ulanova</surname></string-name>, <string-name>
<given-names>N.</given-names> 
<surname>Begum</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Ding</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Time series joins, motifs, discords and shapelets: A unifying view that exploits the matrix profile</article-title>,&#x201D; 
<source>Data Mining and Knowledge Discovery</source>, vol. 
<volume>32</volume>, no. 
<issue>1</issue>, pp. 
<fpage>83</fpage>&#x2013;
<lpage>123</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-20">
<label>[20]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Zhu</surname></string-name>, <string-name>
<given-names>C. M.</given-names> 
<surname>Yeh</surname></string-name>, <string-name>
<given-names>Z.</given-names> 
<surname>Zimmerman</surname></string-name>, <string-name>
<given-names>K.</given-names> 
<surname>Kamgar</surname></string-name> and <string-name>
<given-names>E. J.</given-names> 
<surname>Keogh</surname></string-name>
</person-group>, &#x201C;
<article-title>Matrix profile XI: SCRIMP&#x002B;&#x002B;: Time series motif discovery at interactive speeds</article-title>,&#x201D; in <conf-name>Proc. ICDM</conf-name>, 
<conf-loc>Singapore</conf-loc>, pp. 
<fpage>837</fpage>&#x2013;
<lpage>846</lpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-21">
<label>[21]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>G.</given-names> 
<surname>Pfeilschifter</surname></string-name>
</person-group>, &#x201C;
<article-title>Time series analysis with matrix profile on HPC systems</article-title>,&#x201D; 
<comment>Ph.D. dissertation</comment>. 
<publisher-name>Department of Informatics, Technical University of Munich</publisher-name>, 
<publisher-loc>Munich, Bavaria, Germany</publisher-loc>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-22">
<label>[22]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>D. F.</given-names> 
<surname>Bacon</surname></string-name>, <string-name>
<given-names>S. L.</given-names> 
<surname>Graham</surname></string-name> and <string-name>
<given-names>O. J.</given-names> 
<surname>Sharp</surname></string-name>
</person-group>, &#x201C;
<article-title>Compiler transformations for high-performance computing</article-title>,&#x201D; 
<source>ACM Computing Surveys</source>, vol. 
<volume>26</volume>, no. 
<issue>4</issue>, pp. 
<fpage>345</fpage>&#x2013;
<lpage>420</lpage>, 
<year>1994</year>.</mixed-citation>
</ref>
<ref id="ref-23">
<label>[23]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>P.</given-names> 
<surname>Kostenetskiy</surname></string-name> and <string-name>
<given-names>P.</given-names> 
<surname>Semenikhina</surname></string-name>
</person-group>, &#x201C;
<article-title>SUSU supercomputer resources for industry and fundamental science</article-title>,&#x201D; in <conf-name>Proc. GloSIC</conf-name>, 
<conf-loc>Chelyabinsk, Russia</conf-loc>, 
<fpage>8570068</fpage>, 
<year>2018</year>.</mixed-citation>
</ref>
<ref id="ref-24">
<label>[24]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>G.</given-names> 
<surname>Chrysos</surname></string-name>
</person-group>, &#x201C;
<article-title>Intel&#x00AE; Xeon Phi coprocessor (codename Knights Corner)</article-title>,&#x201D; in <conf-name>Proc. HCS</conf-name>, 
<conf-loc>Cupertino, CA, USA</conf-loc>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>31</lpage>, 
<year>2012</year>.</mixed-citation>
</ref>
<ref id="ref-25">
<label>[25]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>Grama</surname></string-name>, <string-name>
<given-names>A.</given-names> 
<surname>Gupta</surname></string-name>, <string-name>
<given-names>G.</given-names> 
<surname>Karypis</surname></string-name> and <string-name>
<given-names>V.</given-names> 
<surname>Kumar</surname></string-name>
</person-group>, 
<source>Introduction to Parallel Computing</source>. 
<edition>2nd ed</edition>. 
<publisher-loc>Boston, MA, USA</publisher-loc>: 
<publisher-name>Addison-Wesley</publisher-name>, 
<year>2003</year>.</mixed-citation>
</ref>
<ref id="ref-26">
<label>[26]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>A. L.</given-names> 
<surname>Goldberger</surname></string-name>, <string-name>
<given-names>L. A. N.</given-names> 
<surname>Amaral</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Glass</surname></string-name>, <string-name>
<given-names>J. M.</given-names> 
<surname>Hausdorff</surname></string-name>, <string-name>
<given-names>P. C.</given-names> 
<surname>Ivanov</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>PhysioToolkit, and PhysioNet: Components of a new research resource for complex physiologic signals</article-title>,&#x201D; 
<source>Circulation</source>, vol. 
<volume>101</volume>, no. 
<issue>23</issue>, pp. 
<fpage>e215</fpage>&#x2013;
<lpage>e220</lpage>, 
<year>2000</year>.</mixed-citation>
</ref>
<ref id="ref-27">
<label>[27]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>K.</given-names> 
<surname>Pearson</surname></string-name>
</person-group>, &#x201C;
<article-title>The problem of the random walk</article-title>,&#x201D; 
<source>Nature</source>, vol. 
<volume>72</volume>, no. 
<issue>1865</issue>, pp. 
<fpage>294</fpage>, 
<year>1905</year>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>