<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CSSE</journal-id>
<journal-id journal-id-type="nlm-ta">CSSE</journal-id>
<journal-id journal-id-type="publisher-id">CSSE</journal-id>
<journal-title-group>
<journal-title>Computer Systems Science &#x0026; Engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">0267-6192</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">37505</article-id>
<article-id pub-id-type="doi">10.32604/csse.2023.037505</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>An Efficient Way to Parse Logs Automatically for Multiline Events</article-title>
<alt-title alt-title-type="left-running-head">An Efficient Way to Parse Logs Automatically for Multiline Events</alt-title>
<alt-title alt-title-type="right-running-head">An Efficient Way to Parse Logs Automatically for Multiline Events</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Yu</surname><given-names>Mingguang</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Zhang</surname><given-names>Xia</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref><email>zhangx@neusoft.com</email></contrib>
<aff id="aff-1"><label>1</label><institution>School of Computer Science and Engineering, Northeastern University</institution>, <addr-line>Shenyang, 110169</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Neusoft Corporation</institution>, <addr-line>Shenyang, 110179</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Xia Zhang. Email: <email>zhangx@neusoft.com</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>31</day>
<month>3</month>
<year>2023</year>
</pub-date>
<volume>46</volume>
<issue>3</issue>
<fpage>2975</fpage>
<lpage>2994</lpage>
<history>
<date date-type="received">
<day>06</day>
<month>11</month>
<year>2022</year>
</date>
<date date-type="accepted">
<day>06</day>
<month>1</month>
<year>2023</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Yu and Zhang</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Yu and Zhang</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CSSE_37505.pdf"></self-uri>
<abstract>
<p>In order to obtain information or discover knowledge from system logs, the first step is to perform log parsing, whereby unstructured raw logs can be transformed into a sequence of structured events. Although comprehensive studies on log parsing have been conducted in recent years, most assume that one event object corresponds to a single-line message. However, in a growing number of scenarios, one event object spans multiple lines in the log, for which parsing methods toward single-line events are not applicable. In order to address this problem, this paper proposes an automated <bold>l</bold>og <bold>p</bold>arsing method for <bold>m</bold>ultiline <bold>e</bold>vents (LPME). LPME finds multiline event objects via iterative scanning, driven by a set of heuristic rules derived from practice. The advantage of LPME is that it proposes a cohesion-based evaluation method for multiline events and a bottom-up search approach that eliminates the process of enumerating all combinations. We analyze the algorithmic complexity of LPME and validate it on four datasets from different backgrounds. Evaluations show that the actual time complexity of LPME parsing for multiline events is close to the constant time, which enables it to handle large-scale sample inputs. On the experimental datasets, the performance of LPME achieves 1.0 for recall, and the precision is generally higher than 0.9, which demonstrates the effectiveness of the proposed LPME.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Log parsing</kwd>
<kwd>log management</kwd>
<kwd>log analysis</kwd>
<kwd>system maintenance</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<sec id="s1_1">
<label>1.1</label>
<title>Background</title>
<p>Modern large-scale information systems continuously generate a substantial volume of log data. These data record the system running state, operation results, business processes, and detailed information on system exceptions. Thus, log analysis techniques have attracted considerable attention from researchers in the past decade. Many distinguished works have emerged, including detecting program running exceptions [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>], monitoring network failures and traffic [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>], diagnosing performance bottlenecks [<xref ref-type="bibr" rid="ref-5">5</xref>], and analyzing business [<xref ref-type="bibr" rid="ref-6">6</xref>] and user behavior [<xref ref-type="bibr" rid="ref-7">7</xref>].</p>
<p>Logs are printed by logging statements, such as &#x201C;log.info (...)&#x201D; or &#x201C;print (...)&#x201D; written by programmers. The contents and formats of logs are free because virtually no strict restrictions are placed on them while coding. However, structured input is required for most data mining models used in log analysis techniques. Therefore, the first step of log analysis is log parsing, in which unstructured log messages in plain text are transformed into structured event objects. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> refers to some logging code in OpenStack&#x2019;s sources as an example to illustrate the log printing and parsing process described above. In the log parsing shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, a resource request event is obtained from the original log through log parsing.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Illustrative example of log parsing</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-1.tif"/>
</fig>
<p>Traditional approaches to archive log parsing rely heavily on manually customized regular expressions. However, the rapid growth in log volume has brought unbearable labor and time costs. At the same time, artificial rules can hardly keep up with the frequent updates of modern software systems. For these reasons, many research efforts on automated log parsing techniques have emerged [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-13">13</xref>], which are dedicated to contributing automated log parsing methods to overcome the shortcomings of the manual method.</p>
</sec>
<sec id="s1_2">
<label>1.2</label>
<title>Motivation</title>
<p>With the rapid development of advanced technologies, such as cloud computing and the Internet of Things [<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-15">15</xref>], the logs generated by modern information systems are becoming increasingly complex. In the log&#x2019;s output by complex systems, it is common for multiple lines of text in a log to form an event object. That poses new challenges to log parsing, one of which is that existing log parsing efforts assume that an event maps to one text line in the log.</p>
<p><xref ref-type="fig" rid="fig-2">Fig. 2</xref> compares single-line event parsing and multiline event parsing with an example. The yellow background color area in the middle of <xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows a sample of raw logs; if parsed to single-line events, as shown by the upwards blue arrow in the figure, the six lines of raw log text will yield six separate event objects. However, these six lines of log text are strongly correlated and record a unit event together. If they are considered separately, fragmented information will hinder the subsequent analysis work. For example, the 6th line represents a claim. Suppose it is not associated with the context. In that case, the subsequent analysis will be confused by questions such as the specific parameters of the successful claim, how much memory is requested for the successful claim, and how many CPU cores are requested for the successful claim. However, if the raw log text is parsed correctly to a single multiline event, as shown by the downwards green arrow in the figure, the above problems will not arise. Therefore, it is essential to determine how to identify multiline events in complex logs.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The difference between single-line event parsing and multiline event parsing</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-2.tif"/>
</fig>
<p>To the best of our knowledge, there is no current research on log parsing that focuses on the problem of multiline event parsing. In order to address this problem, this paper innovatively proposes LPME, an automated <bold>l</bold>og <bold>p</bold>arsing method for <bold>m</bold>ultiline <bold>e</bold>vents. LPME is an iterative scanning algorithm based on the results of single-line text templates, and it employs a set of heuristic rules to identify multiline event objects.</p>
</sec>
<sec id="s1_3">
<label>1.3</label>
<title>Contributions</title>
<p>The contributions of this paper are summarized as follows: 1) We present the problem of multiline event parsing, which is explained based on practical experience. To address this problem, we design a multiline event-oriented parsing method called LPME. 2) We perform a thorough analysis of the complexity of the algorithm and conduct experiments on four datasets from different backgrounds to illustrate the effectiveness and feasibility of LPME.</p>
<p>The rest of the paper is organized as follows: Section 2 examines related research. Section 3 illustrates the details of LPME. Section 4 presents the experimental analysis. Finally, we conclude this paper and discuss future work in Section 5.</p>
</sec>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>Log analysis plays a vital role in service maintenance. Log parsing is the first step in automated log analysis [<xref ref-type="bibr" rid="ref-8">8</xref>]. Traditional methods of log parsing rely on handcrafted regular expressions or grok patterns to extract event templates. Although straightforward, manually writing ad hoc rules requires a deep understanding of the logs, and considerable manual effort is required to register different rules for various kinds of logs.</p>
<p>In order to reduce the manual effort devoted to log parsing, many studies have investigated automated log parsing [<xref ref-type="bibr" rid="ref-16">16</xref>]. Xu et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] obtained log templates through system source code analysis; however, in most cases, the source code is inaccessible. Therefore, most existing automated methods favor data-driven approaches to analyzing the log data to obtain templates of the events. Data-driven log parsing techniques can be roughly classified into three main categories: frequent pattern mining, clustering, and heuristic rules [<xref ref-type="bibr" rid="ref-8">8</xref>].</p>
<p>Frequent pattern mining is used to discover frequently occurring line templates from event logs; this approach assumes that each event is described by a single line in the event log and that each line pattern represents a group of similar events. Simple logfile clustering tool (SLCT) [<xref ref-type="bibr" rid="ref-18">18</xref>] is the first log parser to utilize frequent pattern mining. Furthermore, LogCluster [<xref ref-type="bibr" rid="ref-19">19</xref>] is an extension of SLCT that is robust to shifts in token positions. Logram [<xref ref-type="bibr" rid="ref-9">9</xref>] uses n-gram dictionaries to achieve efficient log parsing; it parses log messages into static text and a dynamic variable by counting the number of appearances of each n-gram. Paddy [<xref ref-type="bibr" rid="ref-10">10</xref>] clusters log messages incrementally according to Jaccard similarity and length features. It uses a dynamic dictionary structure to search template candidates efficiently.</p>
<p>The second category is cluster-based approaches, which formulate log parsing as a clustering problem and use various techniques to measure the similarity and distance between two log messages (e.g., log key extraction (LKE) [<xref ref-type="bibr" rid="ref-20">20</xref>], LogSig [<xref ref-type="bibr" rid="ref-21">21</xref>], LogMine [<xref ref-type="bibr" rid="ref-22">22</xref>], and length matters (LenMa) [<xref ref-type="bibr" rid="ref-23">23</xref>]). For example, LKE employs a hierarchical clustering algorithm based on the weighted edit distance between pairwise log messages.</p>
<p>The last category is heuristic approaches, which perform well in terms of accuracy and efficiency. Compared to general text data, log messages have some unique characteristics. In addition, some methods use heuristics to extract event templates. For example, Drain [<xref ref-type="bibr" rid="ref-24">24</xref>] used a fixed-depth tree to represent the hierarchical relationship between log messages. Each tree layer defines a rule for grouping log messages (e.g., log message length, preceding tokens, and token similarity). Iterative partitioning log mining (IPLoM) [<xref ref-type="bibr" rid="ref-25">25</xref>] applies an iterative partition strategy to partition log messages into groups according to the token amount, token position, and mapping relation. Abstracting execution logs (AEL) [<xref ref-type="bibr" rid="ref-26">26</xref>] groups logs by comparing the occurrence times of constants and variables and then obtains log templates if they have the same static components.</p>
<p>In addition to the above three categories, other data-driven methods exist. For example, Spell [<xref ref-type="bibr" rid="ref-11">11</xref>] proposed an online log parsing method based on the longest common subsequence (LCS) and can dynamically extract real-time log templates. In contrast, reference [<xref ref-type="bibr" rid="ref-12">12</xref>] built a graph in which each node represents a log message and clusters logs according to the word count and Hamming similarity. LogParse [<xref ref-type="bibr" rid="ref-13">13</xref>] is a novel method that transforms the log parsing problem into a word classification problem.</p>
<p>Unfortunately, the above research assumes that one event object corresponds to a single text line. However, as mentioned in Section 1.2, a single event often corresponds to multiple text lines in many complex logs. To the best of our knowledge, there has been no research work on multiline event parsing. Therefore, in this paper, we propose multiline event parsing and design a solution based on existing work to fill this gap.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Methods</title>
<p>In this section, we first present the main process of LPME and then describe each sub-step separately before finally analyzing the algorithm complexity.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Algorithm Overview</title>
<p>Regarding whether the problem of multiline event parsing can be solved by simply generalizing existing single-line event-oriented parsing methods, our answer is no. The underlying reason for our response is that multiline event parsing implies a sub-problem of line division. An &#x201C;event&#x201D; can be composed of <italic>x</italic> lines of text, where <italic>x</italic> is indeterminate. Therefore, based on previous related research, this paper proposes LPME, a log parsing method for multiline events. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates the LPME framework. LPME is a two-stage process with phases: &#x201C;Phase-1: Parsing Templates for Single Lines&#x201D; and &#x201C;Phase-2: Discovering Templates of Multiline&#x201D;. The initial input to LPME, i.e., the &#x201C;Raw Logs Sample,&#x201D; is a continuous <italic>n</italic>-line sample taken from the original log. Phase 1 could process the &#x201C;Raw Logs Sample&#x201D; by employing any existing single-line parsing method, such as AEL [<xref ref-type="bibr" rid="ref-26">26</xref>] or IPLoM [<xref ref-type="bibr" rid="ref-25">25</xref>], and even online parsing methods, such as Drain [<xref ref-type="bibr" rid="ref-24">24</xref>] or Spell [<xref ref-type="bibr" rid="ref-11">11</xref>]. LPME treats the output of Phase 1 as the intermediate result, which is input into Phase 2 to obtain the multiline templates. Phase 1 can utilize existing methods. Thus, this paper focuses on Phase 2.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Framework of LPME</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-3.tif"/>
</fig>
<p>For convenience, we use <italic>S</italic> to represent the &#x201C;Middle Results&#x201D; in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. <italic>S</italic> is the output of Phase 1 and serves as the input to Phase 2. <italic>S</italic> can be assumed to be a sequence of <italic>n</italic> length, each element corresponding to the original <italic>n</italic> lines of the &#x201C;Raw Logs Sample&#x201D; with a one-to-one mapping. Each element includes the original text line and the single-line template. If sliding windows of size 2, 3, 4, &#x2026;, <italic>n</italic> are used to scan <italic>S</italic> with a step size of 1 and the window fragments obtained during the scan are collected, then <italic>n(n-1)/2</italic> window fragments can be obtained. We use a set <italic>W</italic> &#x003D; {<italic>w</italic>| <italic>w</italic> represents all possible sliding window fragments of size 2 to <italic>n</italic>} to describe the result obtained via the above sliding scan. An element <italic>w</italic> in <italic>W</italic> has three key attributes: 1) the single-line template sequence, <italic>subseq</italic>, captured by the sliding window, 2) the start timestamp <italic>ts</italic><sub><italic>s</italic></sub> of the window sequence, and 3) the end timestamp <italic>ts</italic><sub><italic>e</italic></sub> of the window sequence. <italic>W</italic> can be represented in triplet form as &#x003C;<italic>subseq, ts</italic><sub><italic>s</italic></sub>, <italic>ts</italic><sub><italic>e</italic></sub>&#x003E;. To collect the <italic>subseq</italic> of all elements of <italic>W</italic> and remove the duplicates, we obtain the set <italic>SET</italic><sub><italic>seqs</italic></sub>. The focus of this paper is to find all <italic>subseq</italic> in <italic>SET</italic><sub><italic>seqs</italic></sub> that are indeed multiline events. The final results can be organized into a hash table <italic>R</italic>, of which the range of keys is {<italic>k</italic> | <italic>k</italic> is an integer and 2 &#x2264; <italic>k</italic> &#x2264; <italic>n</italic>}. <italic>R</italic>[<italic>k</italic>] is a set, and each element represents a <italic>k</italic>-line event template. The <italic>k</italic>-line event template can be expressed as <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> (<italic>0 &#x2264; i &#x2264; R</italic>[<italic>k</italic>]<italic>.size</italic>). <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> records the corresponding <italic>subseq</italic> and some attached parameters to represent a multiline event template.</p>
<p>From a practical perspective, logs requiring multiline template analysis usually originate from complex systems, often as big data. In practice, we also tend to draw large sample sizes. Therefore, <italic>S</italic> may be a huge sequence. For example, its length may be on the order of 10<sup>4</sup> or even 10<sup>5</sup>. In this case, if full scans collect <italic>R</italic>, then the time consumption of the task will be unbearable. Therefore, we must consider the actual situation to develop optimization methods to improve the efficiency of algorithm execution so that the final approach applies to production.</p>
<p>Experience from system log analysis indicates that the single-line templates constituting a multiline event are typically different from other ordinary single-line templates; they generally appear only in multiline events. Therefore, the multiline event template manifests as a cohesive-sequence template, with a priori characteristics similar to the frequent itemset. As a result, if the sequence <italic>subseq</italic> is not part of a multiline event, then the superset of <italic>subseq</italic> will not be part of a multiline event either. Conversely, if <italic>subseq</italic> is indeed a component of a multiline event, then any subset of <italic>subseq</italic> must also be a component of that multiline event. Moreover, we can obtain the corollary that if any subset of <italic>subseq</italic> is not a component of any multiline event, then <italic>subseq</italic> is not either.</p>
<p>Based on the above reasoning, LPME is designed to conduct an iterative layer-by-layer search. The mining results can be organized into a hash table <italic>R</italic>. The idea guiding the search is that <italic>R</italic>[<italic>K</italic> &#x002B; 1] must grow from the pre-result <italic>R</italic>[<italic>k</italic>]; that is, <italic>R</italic>[<italic>k</italic>] is fundamental for the derivation and evaluation of <italic>R</italic>[<italic>K</italic> &#x002B; 1].</p>
<p>Next, we detail how to evaluate the candidate templates in the iterative process. To answer this question, we define some preliminary concepts. Let <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> be the template to be evaluated. <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> is collected by a sliding window of length <italic>k</italic>, and <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> represents an element of set <italic>W</italic>. In <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup>, the single-line sequence <italic>subseq</italic> consists of <italic>s</italic><sub><italic>j</italic></sub> ( <italic>j &#x003D; 1, 2, &#x2026; , k</italic>). We define the cohesion support for <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> by component <italic>s</italic><sub><italic>j</italic></sub> as <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>.</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">h</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">S</mml:mi><mml:mi mathvariant="italic">u</mml:mi><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">m</mml:mi><mml:mi mathvariant="italic">p</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mspace width="thinmathspace" /><mml:mi mathvariant="italic">O</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">u</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mspace width="thinmathspace" /><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mspace width="thinmathspace" /><mml:mi>R</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mi>k</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="italic">T</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mspace width="thinmathspace" /><mml:mi mathvariant="italic">O</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">u</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mspace width="thinmathspace" /><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mspace width="thinmathspace" /><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mspace width="thinmathspace" /><mml:mi>S</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In turn, we can obtain the cohesion coefficient of <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> according to <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>.</p>
<p><disp-formula id="eqn-2">
<label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">h</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">C</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">f</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">m</mml:mi><mml:mi mathvariant="italic">p</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mi>C</mml:mi><mml:mrow><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">h</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">n</mml:mi><mml:mi mathvariant="italic">S</mml:mi><mml:mi mathvariant="italic">u</mml:mi><mml:mi mathvariant="italic">p</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi mathvariant="italic">t</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">m</mml:mi><mml:mi mathvariant="italic">p</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">t</mml:mi></mml:mrow><mml:msubsup><mml:mi>e</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mi>s</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:math></disp-formula></p>
<p>Based on the above definition, the main criterion for evaluating candidate <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> can be based on the cohesion coefficient. The core logic is that the higher <italic>CohesionCoef</italic> is, the more likely <italic>template</italic><sub arrange="stack"><italic>k</italic></sub><sup arrange="stack"><italic>i</italic></sup> will be treated as a multiline event template.</p>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> presents the overall multiline template discovery process, performed layer-by-layer. The process starts with the search and evaluation of the templates in layer <italic>k</italic> &#x003D; 2, and then the search and evaluation of templates in layer <italic>k</italic> (<italic>k</italic> &#x003E; 2) can be executed based on layer <italic>k</italic>&#x2212;1. The final merge step is a bottom-up merge of the preliminary results of the iterative growth to eliminate redundancy due to possible inclusion relationships between adjacent layers.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Overall process for multiline template discovery</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-4.tif"/>
</fig>
<p>The core step shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref> is &#x201C;Search for R[k],&#x201D; which contains two branches, including cases <italic>k &#x003D; 2</italic> and <italic>k &#x003E; 2</italic>. Each branch includes two similar steps, &#x201C;Collect the potential templates&#x201D; and &#x201C;Evaluate and filter the templates.&#x201D; However, there are subtle differences. The first difference is in the collection method. No previous works consider the case where <italic>k</italic> &#x003D; 2, and the collection of items to be evaluated involves all the original adjacent <italic>s</italic><sub><italic>j</italic></sub>, which requires a complete traversal of <italic>S</italic>. When <italic>k</italic> &#x003E; 2, there is no need to completely traverse <italic>S</italic> since the <italic>k</italic>&#x2212;1 layer results are already available. The results are directly expanded to <italic>k</italic> layers with some specific strategies based on the results of the <italic>k</italic>&#x2212;1 layers. The second difference is in the evaluation method. When <italic>k</italic> &#x003D; 2, the cohesion support must be calculated twice separately to obtain the cohesion coefficient, while if <italic>k</italic> &#x003E; 2, there is no need to calculate each cohesion support again because the cohesion coefficients of the <italic>k</italic>&#x2212;1 layer results are already obtained. Thus, only one calculation is needed for the growing component.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Detail Processes</title>
<p>This section details the crucial steps of the &#x201C;Search for <italic>R</italic>[<italic>k</italic>]&#x201D; module shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<p><bold>Step-A1</bold>: Collect the potential templates belonging to <italic>R</italic>[2] to prepare for the next step of evaluation and screening.</p>
<p>As mentioned previously, <italic>R</italic>[2] is a special stage result in the iterative process, which is the starting point of the whole iterative exploration. Compared with iterations in the higher layers, the initial collection and determination of <italic>R</italic>[2] results are slightly different. A more detailed explanation of this step is provided in Algorithm 1, which details the process of scanning <italic>S</italic>. In Algorithm 1, a control parameter, MAX_WINODW_TS_SPAN, is used in the scan to determine whether there is a possibility that two adjacent line templates can join into a two-line template. MAX_WINODW_TS_SPAN limits the maximum time span between the start and end rows in a multiline event template. This logic is easy to understand: usually, there is no substantial delay between steps for a single multiline event. According to our experience, MAX_WINODW_TS_SPAN should be set to 3&#x2013;5 s.</p>
<fig id="fig-8">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-8.tif"/>
</fig>
<p><bold>Step-A2:</bold> Evaluate the candidate set obtained from Step-A1 and acquire the filtered <italic>R</italic>[2].</p>
<p>A detailed description of this step is provided in Algorithm 2. The input of Step-A2 is <italic>ct</italic><sub><italic>2</italic></sub><italic>set</italic>, which is the output of Step-A1. Since the parameters required for computing the cohesion coefficient are available after Step-A1, Step-A2 no longer needs to access <italic>S</italic> again but only needs to traverse <italic>ct</italic><sub><italic>2</italic></sub><italic>set</italic>.</p>
<p>Two necessary conditions exist in the evaluation process. One condition is whether the occurrence of a multiline template is above the threshold MIN_OCCURRENCE, which means those general events should have a certain repeatability. If the candidate&#x2019;s statistical occurrence obtained in the preliminary collection is lower than MIN_OCCURRENCE, it should be screened. The other condition is the cohesion coefficient of the multiline templates. Because the higher CohesionCoef is, the more likely it is that the candidate template will be treated as a real multiline event template when evaluating multiline event template candidates, MIN_COHESION_COEF is used to screen the multiline candidates. Since the calculation of the cohesion coefficient for two-line templates does not have a prior basis, we must perform calculations for each component&#x2019;s cohesion support of the two constituent elements. After obtaining the cohesion coefficient, the candidate templates can be filtered according to MIN_COHESION_COEF to obtain the <italic>R</italic>[2] results.</p>
<fig id="fig-9">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-9.tif"/>
</fig>
<p><bold>Step-B1:</bold> Collect all candidate templates that may belong to <italic>R</italic>[<italic>k</italic>] (<italic>k</italic> &#x003E; 2) based on <italic>R</italic>[<italic>k</italic> &#x2212; 1] to prepare for the next step of evaluation and selection.</p>
<p>As mentioned previously, multiline template discovery explores <italic>R</italic>[<italic>k</italic>] based on <italic>R</italic>[<italic>k</italic>&#x2212;1]. Step-A1 and Step-A2 can be regarded as initialization steps. Based on <italic>R</italic>[2], the subsequent iterations already have the starting conditions. Like the process of collecting and evaluating <italic>R</italic>[2], each subsequent iteration involves collecting and evaluating the results of the current layer for filtering. The detailed process of the initial collection of <italic>template</italic><sub><italic>k</italic></sub> in the <italic>R</italic>[<italic>k</italic>] (<italic>k</italic> &#x003E; 2) layer is described in Algorithm 3.</p>
<fig id="fig-10">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-10.tif"/>
</fig>
<p>Compared to the collection of <italic>ct</italic><sub><italic>2</italic></sub><italic>set</italic> in Step-A1, Step-B1 has support from the lower layer <italic>R</italic>[<italic>k</italic>&#x2212;1]. Therefore, a proper growth strategy can replace the full traversal of <italic>S</italic>, which significantly improves execution efficiency.</p>
<p><bold>Step-B2:</bold> Evaluate the <italic>ct</italic><sub><italic>k</italic></sub><italic>set</italic> output from Step-B1 to obtain the filtered <italic>R</italic>[<italic>k</italic>] (<italic>k</italic> &#x003E; 2) result.</p>
<p>In contrast to the evaluation for <italic>R</italic>[2], we do not need to compute the cohesion support for each component line of the candidate template in ctkset. Since the <italic>R</italic>[<italic>k</italic>&#x2212;1] result is already determined, for the ctkset candidate template, we only need to compute the cohesion support of the last grown line based on the <italic>k</italic>&#x2212;1 layer. A detailed description of this sub-process is provided in Algorithm 4.</p>
<fig id="fig-11">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-11.tif"/>
</fig>
<p><bold>Step-C:</bold> Review result <italic>R</italic> to eliminate redundant items.</p>
<p>Since <italic>R</italic> is obtained by layer-by-layer growth, there may be situations that higher-layer templates contain lower-layer templates. Therefore, <italic>R</italic> must be reviewed after its initial acquisition to eliminate redundancy. The situation may be complicated, such as assuming <italic>R</italic>[3] contains <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub> and <italic>R</italic>[4] contains <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub><italic>s</italic><sub><italic>x</italic></sub> and <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub><italic>s</italic><sub><italic>y</italic></sub>. The correct result has two possible orientations: 1) eliminating <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub> in <italic>R</italic>[3] and keeping only the two templates in <italic>R</italic>[4]; 2) <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub> in <italic>R</italic>[3] has the reasons to be kept independently, so it is necessary to keep all three templates. To accurately eliminate the redundancy, <italic>R</italic> must be reviewed from the bottom up. The detailed process is described in Algorithm 5.</p>
<fig id="fig-12">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-12.tif"/>
</fig>
<p>Algorithm 5 describes the process of eliminating possible redundancies arising from adjacent entailment relations from <italic>R</italic>. Specifically, for some template in the lower layer of two adjacent layers, the total number of occurrences of the higher-level templates extending from it is calculated, and the redundancy of the targeted lower-layer template is determined by comparing the calculated result with its occurrence number. If the two occurrence numbers are close, the targeted lower-layer template is determined to be redundant, and if not, it is kept as a valid result. A threshold parameter, GROWTH_FACTOR, is used to control the trade-off when making interlayer redundancy judgments. Notably, the algorithm can benefit from support information obtained from the preorder steps to determine whether the inclusion relation is true. With Step C, the redundant subsequences in <italic>R</italic> will be eliminated from the low to the high layers.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Algorithm Parameters</title>
<p>In this subsection, several control parameters involved in LPME are briefly described. LPME has four critical control parameters that make it more applicable. Users can configure these parameters with reasonable values for different context environments to drive LPME to produce optimal results. These four parameters belong to two categories: those used to control the capacity of the sliding window, namely, MAX_WINODW_TS_SPAN, and those used to adjust the evaluation criteria to filter the initial collection of multiline template candidates, namely, MIN_OCCURRENCE, MIN_COHESION_COEF, and GROWTH_FACTOR.</p>
<p>MAX_WINODW_TS_SPAN limits the maximum time span between the start and end rows in a multiline event template. Usually, there is no substantial printing latency between the multiple lines recording the same event. According to our experience, MAX_WINODW_TS_SPAN should be set to 3&#x2013;5 s.</p>
<p>In order to filter out incorrect multiline template candidates, a frequency threshold must be defined according to field experience. This threshold is named MIN_OCCURRENCE. If the evaluated items have a lower frequency than MIN_OCCURRENCE, they will be filtered out because they are not sufficiently representative.</p>
<p>Because of the logic that the higher <italic>CohesionCoef</italic> is, the more likely it is that the candidate template will be treated as a real multiline event template, when evaluating multiline event template candidates, the most crucial evaluation indicator is the value of <italic>CohesionCoef</italic>. According to the definition in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> in Section 3.1, the range of the cohesion support is [0.0, 1.0]. The larger the cohesion support value of the evaluated multiline event, the more likely it is to be true. MIN_COHESION_COEF is used to screen the multiline candidates. However, according to practical experience, the ideal value of 1.0 is not suitable for use as the real evaluation criterion. Noise interference is inevitable in the production environment. Additionally, the discovery of multiline templates is based on the recognition of single-line templates, and that baseline is usually not 100% exact. Therefore, MIN_COHESION_COEF should be set to a value less than but close to 1.</p>
<p>In the iterative layer-by-layer search to obtain <italic>R</italic>, since the judgment input of layer <italic>k</italic> &#x002B; 1 originates from layer <italic>k</italic>, once the iteration of layer <italic>k</italic> &#x002B; 1 is complete, it is necessary to check whether there are redundant templates within these two adjacent layers. For example, consider the case where there is one template <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub> at layer three and two templates <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub><italic>s</italic><sub><italic>x</italic></sub>/<italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub><italic>s</italic><sub><italic>y</italic></sub> at layer 4. There is a trade-off for the above case: whether <italic>s</italic><sub><italic>u</italic></sub><italic>s</italic><sub><italic>v</italic></sub><italic>s</italic><sub><italic>w</italic></sub> is retained in the final result. We introduce the GROWTHTH_FACTOR parameter, which takes a value in [0, 1], to control the trade-off when making interlayer redundancy judgments. The smaller the GROWTHTH_FACTOR is, the more likely it is that only the low-layer results are kept; the larger the GROWTHTH_FACTOR is, the more likely it is that only the high-layer results are extended.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Complexity Analysis</title>
<p>Based on the explanation of Step-A1 and Step-A2 in Section 3.2, in the computing process for <italic>R</italic> [2], one full scan of <italic>S</italic> is performed using a sliding window of size 2, and a total of (<italic>n</italic>&#x2013;1) calculations are performed. The subsequent computations for <italic>R</italic>[<italic>k</italic>] (<italic>k</italic> &#x003E; 2) do not require another full scan of <italic>S</italic> but are based on the results of <italic>R</italic>[<italic>k</italic>&#x2212;1], which grows by matching the prefixes and suffixes between each pair in <italic>R</italic>[<italic>k</italic>&#x2212;1]. For example, a multiline template, <italic>template</italic><sub><italic>5</italic></sub>, with a length of five, is computed as shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. The figure shows five layers, corresponding to 5 iterations of the layer-by-layer growth calculation. Each dashed box at every layer in <xref ref-type="fig" rid="fig-5">Fig. 5</xref> is an operation of matching and evaluating. The operation can benefit from maintaining the indexes on some required information, including the template prefixes and suffixes, as well as the positions and counts of <italic>s</italic><sub><italic>i</italic></sub> in <italic>S</italic> so that the complexity of the matching and evaluation process can be considered as constant time with order <italic>O</italic>(1). Therefore, in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, the computation numbers for the &#x003C;<italic>s</italic><sub><italic>1</italic></sub><italic>s</italic><sub><italic>2</italic></sub><italic>s</italic><sub><italic>3</italic></sub><italic>s</italic><sub><italic>4</italic></sub><italic>s</italic><sub><italic>5</italic></sub>&#x003E; template are exactly the number of dashed boxes; in this case, 1 &#x002B; 2 &#x002B; 3 &#x002B; 4 &#x003D; 10. Generally, for a multiline template, <italic>template</italic><sub><italic>l</italic></sub>, with length <italic>l</italic>, the number of computations required is 1 &#x002B; 2 &#x002B; 3 &#x002B; &#x00B7; &#x00B7; &#x00B7; &#x002B; (<italic>l</italic>-2) &#x002B; (<italic>l</italic>-1) &#x003D; <italic>l</italic> &#x00D7; (<italic>l</italic>-1)/2. Therefore, the complexity of LPME is not related to the scale of the input <italic>S</italic> but is related to the scale of the output result, including the length l of the multiline template and the number of types of multiline templates <italic>v</italic>. In summary, the time complexity of LPME can be expressed as <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:math></inline-formula>. From the practical application perspective, the length <italic>l</italic> of the final multiline template is much smaller than the length <italic>n</italic> of <italic>S</italic>, and in most cases, it is at most a few dozen lines. In addition, the number of multiline event types <italic>v</italic> contained in a batch of samples is generally approximately a few dozen. Therefore, the time complexity of the actual execution is on the order of approximately 10<sup>2</sup>. Moreover, the length of the input <italic>S</italic> is <italic>n</italic>, and <italic>n</italic> is often on the order of 10<sup>4</sup> or 10<sup>5</sup>. Thus, the time complexity of the actual execution is far less than <italic>O</italic>(<italic>n</italic>). If <italic>l</italic> and <italic>v</italic> are predictable, it is closer to a constant time.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>The computation for a multiline template <italic>template</italic><sub><italic>5</italic></sub></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-5.tif"/>
</fig>
<p>Concerning space complexity, the total space complexity depends on the size of the intermediate results generated during the traversal process. The intermediate results are the multiline template candidates. For simplicity&#x2019;s sake, each template line is treated as a storage unit, and the space complexity is calculated based on the number of multiline template candidates. The number of multiline template candidates corresponds to the number of dashed boxes in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. The result is <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:munderover><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>l</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:math></inline-formula>. Therefore, similar to the time complexity, the size of the space occupied by the algorithm is at the same level as the number of final multiline templates present in the input sample. The number of multiline templates in an input sample is extremely limited, generally from a dozen to several dozen. In summary, the space complexity of LPME, on average, is close to the constant space complexity.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Evaluation</title>
<p>In this section, we perform an experimental evaluation of LPME on four real datasets. The experimental results corroborate the theoretical analysis and demonstrate the effectiveness of LPME.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Experimental Setting</title>
<p>The datasets used in our experiments are outlined in <xref ref-type="table" rid="table-1">Table 1</xref>. The essential information of the datasets is as follows.
<list list-type="simple">
<list-item><label>1)</label><p>Windows OS logs are generated by the component-based servicing (CBS) module of Microsoft Windows. The logs record various components&#x2019; loading, updating, and unloading processes.</p>
</list-item>
<list-item><label>2)</label><p>OpenStack logs are generated by OpenStack. The logs record the running status of OpenStack. OpenStack is an open-source cloud computing management platform that manages and controls many computing, storage, and network resources in data centers and provides cloud hosts.</p></list-item>
<list-item><label>3)</label><p>HealthApp logs are generated by a mobile application named HealthApp. The logs trace the running status of this app.</p></list-item>
<list-item><label>4)</label><p>Payment System logs are from the payment system of a commercial bank. These logs record the traces of service calls in the distributed system.</p></list-item>
</list></p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Summary of the experimental datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Log source</th>
<th>Lines</th>
<th>Data size</th>
<th>Time</th>
<th>Accessibility</th>
</tr>
</thead>
<tbody>
<tr>
<td>Windows OS</td>
<td>35,040</td>
<td>4.79 MB</td>
<td>11 days</td>
<td>Public</td>
</tr>
<tr>
<td>OpenStack</td>
<td>52,312</td>
<td>14.7 MB</td>
<td>6 h</td>
<td>Public</td>
</tr>
<tr>
<td>HealthApp</td>
<td>253,395</td>
<td>22.4 MB</td>
<td>10 days</td>
<td>Public</td>
</tr>
<tr>
<td>Payment system</td>
<td>308,388</td>
<td>38.3 MB</td>
<td>48 h</td>
<td>Private</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Datasets 1&#x2013;3 are publicly available. They are obtained from Zhu et al. [<xref ref-type="bibr" rid="ref-8">8</xref>], who reviewed the research in the field of automated log parsing and summarized the datasets used in previous research, from which we select three representative logs from an operating system, middleware, and an application. In addition to the three public datasets, we test LPME with logs from a payment system of a commercial bank, with whom we cooperate on a log-driven artificial intelligence for IT operations (AIOPS) project. This dataset cannot be disclosed due to confidentiality. These logs record traces of service calls in a distributed system. The trace of an entire payment transaction consists of multiple single-line logs. Compared to the three publicly available datasets, this dataset is complex. After a manual review, we obtain the number of single-line and multiline templates in these four datasets and list them in <xref ref-type="table" rid="table-2">Table 2</xref>. We published the multiline templates of public datasets online [<xref ref-type="bibr" rid="ref-27">27</xref>].</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>The statistics of the event templates in the test datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Log source</th>
<th>Number of single-line templates</th>
<th>Number of multiline templates</th>
</tr>
</thead>
<tbody>
<tr>
<td>Windows OS</td>
<td>50</td>
<td>2</td>
</tr>
<tr>
<td>OpenStack</td>
<td>43</td>
<td>5</td>
</tr>
<tr>
<td>HealthApp</td>
<td>75</td>
<td>2</td>
</tr>
<tr>
<td>Payment system</td>
<td>104</td>
<td>23</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The settings of the experimental parameters are presented in <xref ref-type="table" rid="table-3">Table 3</xref>. MAX_WINODW_TS_SPAN, MIN_OCCURRENCE, MIN_SUPPORT, and GROWTH_FACTOR are the control parameters of the algorithm, of which the values come from the empirical optimization results. INPUT_SAMPLE_SIZE is the size of the input sample. This parameter should be set appropriately to ensure that the sample contains as many multiline event types as possible.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>The experimental parameters</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Parameters</th>
<th align="center" colspan="4">Experimental datasets</th>
</tr>
<tr>
<th/>
<th>Windows OS</th>
<th>OpenStack</th>
<th>HealthApp</th>
<th>Payment system</th>
</tr>
</thead>
<tbody>
<tr>
<td>MAX_WINODW_TS_SPAN</td>
<td>3</td>
<td>3</td>
<td>3</td>
<td>3</td>
</tr>
<tr>
<td>MIN_OCCURRENCE</td>
<td>50</td>
<td>60</td>
<td>600</td>
<td>600</td>
</tr>
<tr>
<td>MIN_SUPPORT</td>
<td>0.9</td>
<td>0.95</td>
<td>0.98</td>
<td>0.98</td>
</tr>
<tr>
<td>GROWTH_FACTOR</td>
<td>0.01</td>
<td>0.02</td>
<td>0.02</td>
<td>0.02</td>
</tr>
<tr>
<td>INPUT_SAMPLE_SIZE</td>
<td>2,000</td>
<td>2,000</td>
<td>2,000</td>
<td>5,000</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>We use the F1-score to evaluate the effectiveness of LPME. The F1-score is pervasively used in clustering and retrieval algorithm evaluation [<xref ref-type="bibr" rid="ref-28">28</xref>]. Its definition is shown in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>.</p>
<p><disp-formula id="eqn-3">
<label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="italic">S</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mi mathvariant="italic">P</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">n</mml:mi></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mi mathvariant="italic">R</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="italic">P</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">n</mml:mi></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mi mathvariant="italic">R</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mi mathvariant="italic">l</mml:mi></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>The definitions of precision and recall are given in <xref ref-type="disp-formula" rid="eqn-4">Eqs. (4)</xref> and <xref ref-type="disp-formula" rid="eqn-5">(5)</xref>, respectively.</p>
<p><disp-formula id="eqn-4">
<label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mi mathvariant="italic">P</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">n</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:mo>(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p><disp-formula id="eqn-5">
<label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mi mathvariant="italic">R</mml:mi><mml:mi mathvariant="italic">e</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">l</mml:mi><mml:mi mathvariant="italic">l</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mrow><mml:mo>(</mml:mo><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In the above formula, TP (true positive) represents the correct induction of multiline events; FP (false positive) represents erroneous inductions that are not real multiline events; FN (false negative) represents results that should be multiline events but are not recognized. The statistics in <xref ref-type="table" rid="table-2">Table 2</xref> are used as the ground truth to calculate the F1 score.</p>

<p>To the best of our knowledge, no other work on automated parsing of multiline events has been published. Therefore, no other algorithms have the same purpose for comparative analysis. Thus, the experimental work focuses on the effectiveness of LPME for different types and sizes of logs. Theoretically, LPME is not dependent on pre-parsing for single-line templates, and any available single-line template parsing algorithm can be applied. To verify this assumption, for the pre-parsing of single-line logs in LPME, we use AEL [<xref ref-type="bibr" rid="ref-26">26</xref>], IPLoM [<xref ref-type="bibr" rid="ref-25">25</xref>], Drain [<xref ref-type="bibr" rid="ref-24">24</xref>], and Spell [<xref ref-type="bibr" rid="ref-11">11</xref>]. For the implementation of these algorithms, we referenced the published code (available online at [https://github.com/logpai/logparser]) from [<xref ref-type="bibr" rid="ref-8">8</xref>]. We code the program in Java 8. All the experiments are conducted on a server with an Intel(R) Core(TM) i5-8250U CPU @ 1.6 GHz, 1.80 GHz, 16 GB RAM, and Windows 11 installed.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Effectiveness</title>
<p>We divided the experiments into four groups according to the pre-parsing method, and each group was validated on four datasets. The evaluation results are provided in <xref ref-type="table" rid="table-4">Table 4</xref>. As seen from the resulting data in <xref ref-type="table" rid="table-4">Table 4</xref>, the F1 score of most of the experiments exceeds 0.9. Each group mainly obtained individual lower F1 scores for the OpenStack dataset because the total number of multiline event types in the OpenStack dataset is only five. When one or two false-positive results appear in the parsing result, it will considerably affect the precision. Therefore, for the above case, the effectiveness of LPME is sufficient to achieve the target of assisting manual recognition of multiline events.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>The evaluation results. The experiments are divided into four groups according to the pre-parsing method, and each group is validated on four datasets</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Group no.</th>
<th>Pre-parsing methods</th>
<th>Datasets</th>
<th>F1</th>
<th>Precision</th>
<th>Recall</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="4">1</td>
<td rowspan="4">AEL</td>
<td>Windows System</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>OpenStack</td>
<td>0.83</td>
<td>0.71</td>
<td>1.00</td>
</tr>
<tr>
<td>HealthApp</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>Payment System</td>
<td>0.94</td>
<td>0.88</td>
<td>1.00</td>
</tr>
<tr>
<td rowspan="4">2</td>
<td rowspan="4">IPLoM</td>
<td>Windows System</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>OpenStack</td>
<td>0.91</td>
<td>0.83</td>
<td>1.00</td>
</tr>
<tr>
<td>HealthApp</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>Payment System</td>
<td>0.96</td>
<td>0.92</td>
<td>1.00</td>
</tr>
<tr>
<td rowspan="4">3</td>
<td rowspan="4">Drain</td>
<td>Windows System</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>OpenStack</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>HealthApp</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>Payment System</td>
<td>0.94</td>
<td>0.88</td>
<td>1.00</td>
</tr>
<tr>
<td rowspan="4">4</td>
<td rowspan="4">Spell</td>
<td>Windows System</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>OpenStack</td>
<td>0.91</td>
<td>0.83</td>
<td>1.00</td>
</tr>
<tr>
<td>HealthApp</td>
<td>1.00</td>
<td>1.00</td>
<td>1.00</td>
</tr>
<tr>
<td>Payment System</td>
<td>0.98</td>
<td>0.96</td>
<td>1.00</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>According to the cross-group comparison of the results, different pre-parsing algorithms will not lead to conspicuous differences in the effectiveness of subsequent multiline event parsing. The prerequisite for correct parsing of multiline events is the accuracy of the pre-parsing, especially the accuracy of extracting the single-line templates contained in the multiline events. LPME is independent of the selected technical route of the pre-parsing algorithm. In our experiments, all four pre-parsing algorithms can correctly identify the single-line templates composing the multiline events in each dataset. However, for the OpenStack dataset, the precision is lower because some false multiline templates are misidentified.</p>
<p>Overall, LPME acquires acceptable results on different datasets that are sufficient to assist humans in identifying multiline events efficiently and correctly.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Performance and Scalability</title>
<p>In the big data era, the log volume is continuously growing. Meanwhile, the INPUT_SAMPLE_SIZE parameter in the above validation experiments is generally set as large as possible to ensure the integrity of the parsing results. This setting applies intense pressure to the performance and scalability of log parsing methods. Whether LPME can achieve satisfactory throughput for large-scale parsing samples is critical. To evaluate the scalability of LPME, we gradually increase the input sample size. Then, we use several comparison groups to assess the increasing time cost of running LPME. For the Windows System, OpenStack, and HealthApp datasets, INPUT_SAMPLE_SIZE is set to 2, 4, 8, 16, and 32 k. For the payment system dataset, INPUT_SAMPLE_SIZE is set to 5, 10, 20, 40, and 80 k. The experimental results are shown in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. The time cost recorded in our experiment is only for the multiline event parsing process and does not include pre-parsing. As seen from the resulting data in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, regardless of the preceding part, the running time of LPME does not increase considerably when the input size doubles; only a slight increase is observed. The main reason for the slight growth is that the increasing input scale increases the I/O overhead of the first load and traversal process. In addition, when comparing the performance of different pre-parsing methods on the same dataset, no substantial differences were observed; pre-parsing methods do not considerably affect the runtime of multiline event parsing in LPME. The phenomena observed in the experiment are consistent with the conclusions of the algorithm complexity in Section 3.4. The actual run time of LPME is not correlated with the size of input <italic>S</italic> but with the output&#x2019;s scale. In our experiments, both the number of types and the length of the multiline template are within an order of 10<sup>2</sup>, such that LPME&#x2019;s performance is close to constant time. Another notable point is that due to the periodicity of the logs, an extreme increase in sample size is equivalent to the effect of repeating samples. Therefore, after the input samples reach some threshold, further increasing the sample size no longer positively impacts the algorithm&#x0027;s effectiveness.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>The scalability test results. The horizontal axis represents different input sizes. The vertical axis represents the running time. The unit of the running time is seconds, and the presented value is the average of ten runs</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-6.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Interference of Noise</title>
<p>Due to factors such as concurrent processes, it cannot be ruled out that some multiline events are interleaved with other unrelated log outputs. Therefore, consecutively output multiline events may become discontinuous in the printed log text. This situation is regarded as noise disruption. We examine the noise tolerance of LPME as follows. We deliberately insert noisy logs into the interior of the multiline events to randomly disturb the experimental datasets. We control the noise level by the probability <italic>p</italic>. The larger <italic>p</italic> is, the greater the amount of noise added to the datasets. For the four datasets, we take <italic>p</italic> &#x003D; 0%, <italic>p</italic> &#x003D; 10%, <italic>p</italic> &#x003D; 15% and <italic>p</italic> &#x003D; 20%. For every experiment, we repeat the tests ten times and take the average F1 score for comparison. The results are shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. The result when <italic>p</italic> &#x003D; 0% is the original noise-free result. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows that as <italic>p</italic> increases, the F1-score obtained by LPME decreases rapidly. Moreover, the performance of LPME does not decrease linearly with an increasing value of <italic>p</italic> but accelerates. Starting from <italic>p</italic> &#x003D; 15%, LPME can hardly obtain the correct parsing result. The reason for the above phenomenon is that the LPME is designed based on cohesiveness, but the added noise damages the cohesiveness. Therefore, noise can easily lead to algorithm failure. Although noise is not naturally present in the validation dataset, we artificially create noise for the experiments, and such noise may exist in other types of system logs. Therefore, this problem will need to be solved in future work.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Testing for noise interference</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_37505-fig-7.tif"/>
</fig>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion and Future Work</title>
<sec id="s5_1">
<label>5.1</label>
<title>Conclusion and Discussion</title>
<p>Automated log parsing is required for log mining and analysis. Still, existing research on automated parsing assumes that each event object corresponds to only a single line of log text, which is inconsistent with current application requirements. Based on previous research, this paper proposes LPME, an automated parsing method for multiline events. LPME is a layer-by-layer iterative search algorithm based on heuristic and empirical rules. In addition to the theoretical analysis, we experimentally test the proposed algorithm on four real datasets, including three publicly available datasets and one confidential dataset. Evaluations show that the actual time complexity of LPME parsing for multiline events is close to the constant time, which enables it to manage large-scale sample inputs. On the experimental datasets, the performance of LPME achieves 1.0 for recall, and the precision is generally higher than 0.9. The experimental results corroborate the theoretical analysis and confirm the effectiveness and practicability of LPME.</p>
<p>In addition, we give the following notes about limitations and crucial assumptions in this paper. First, as discussed in Section 4.4, LMPE currently shows limits when dealing with noisy log data. The experimental data show that the performance of LPME is almost dissipated at noise probability &#x003D; 15%. Although there is no naturally occurring noise in the dataset used for the validation in this paper, various types of noise inevitably exist in other logs in reality. Therefore, it is necessary to supplement the response to this issue in future work. Second, LPME is essentially an offline batch data processing method that requires the user to provide samples for processing. This paper assumes that the user can select sample logs of appropriate size and content. Although we tested the performance of LPME in Section 4.3 and concluded that LPME has a good execution time performance for the growth of sample inputs, LPME can handle larger data samples. However, if the samples are not selected sufficiently, the results obtained by LPME will be incomplete. Therefore, it will be necessary to provide a more scientific method to guide users in selecting log samples.</p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Future Work</title>
<p>In future work, we will consider the possible noise interference and the sample selection problem mentioned above. In this regard, we envisage that a preprocessing module can be added to LPME to preprocess the total amount of logs. The purpose of preprocessing is 1) to mark the noisy data so that the subsequent steps can ignore the noisy log messages and 2) to mark the most reasonable range of log samples, which is guaranteed to contain all types of log messages without being too large. Machine learning techniques will be one of the possible routes to explore to implement the preprocessing module. In particular, recurrent neural network (RNN) technology with contextual memory represented by long short term memory (LSTM) [<xref ref-type="bibr" rid="ref-29">29</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>], which has a natural echo with the characteristics of log message streams, will be one of the technologies worthy of verification in the future. For LPME, these future efforts will effectively enhance the robustness, expand the scope of application, and improve the ease of use.</p>
</sec>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Peng</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Sha</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Fu</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>DeepTraLog: Trace-log combined microservice anomaly detection through graph-based deep learning</article-title>,&#x201D; in <conf-name>IEEE/ACM 44th Int. Conf. on Software Engineering (ICSE)</conf-name>, <publisher-loc>Pittsburgh, PA, USA</publisher-loc>, pp. <fpage>623</fpage>&#x2013;<lpage>634</lpage>, <year>2022</year>. </mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Sinha</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Sur</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Sharma</surname></string-name> and <string-name><given-names>A. K.</given-names> <surname>Shrivastava</surname></string-name></person-group>, &#x201C;<article-title>Anomaly detection using system logs: A deep learning approach</article-title>,&#x201D; <source>International Journal of Information Security and Privacy</source>, vol. <volume>16</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Song</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Meng</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Efficient and robust syslog parsing for network devices in datacenter networks</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>30245</fpage>&#x2013;<lpage>30261</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Abolfathi</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Shomorony</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Vahid</surname></string-name> and <string-name><given-names>J. H.</given-names> <surname>Jafarian</surname></string-name></person-group>, &#x201C;<article-title>A game-theoretically optimal defense paradigm against traffic analysis attacks using multipath routing and deception</article-title>,&#x201D; in <conf-name>Proc. of the 27th ACM on Symp. on Access Control Models and Technologies</conf-name>, <publisher-loc>New York, NY, USA</publisher-loc>, pp. <fpage>67</fpage>&#x2013;<lpage>78</lpage>, <year>2022</year>. </mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ying</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Jia</surname></string-name></person-group>, &#x201C;<article-title>Log data modeling and acquisition in supporting SaaS software performance issue diagnosis</article-title>,&#x201D; <source>International Journal of Software Engineering and Knowledge Engineering</source>, vol. <volume>29</volume>, no. <issue>9</issue>, pp. <fpage>1245</fpage>&#x2013;<lpage>1277</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Mac&#x00E1;k</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Kruzelova</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Chren</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Buhnova</surname></string-name></person-group>, &#x201C;<article-title>Using process mining for Git log analysis of projects in a software development course</article-title>,&#x201D; <source>Education and Information Technologies</source>, vol. <volume>26</volume>, no. <issue>5</issue>, pp. <fpage>5939</fpage>&#x2013;<lpage>5969</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Tao</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Guo</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Shi</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Chu</surname></string-name></person-group>, &#x201C;<article-title>User behavior analysis by cross-domain log data fusion</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>400</fpage>&#x2013;<lpage>406</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>He</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>He</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Xie</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Tools and benchmarks for automated log parsing</article-title>,&#x201D; in <conf-name>Proc. of the 41st Int. Conf. on Software Engineering: Software Engineering in Practices</conf-name>, <publisher-loc>Montreal, QC, Canada</publisher-loc>, pp. <fpage>121</fpage>&#x2013;<lpage>130</lpage>, <year>2019</year>. </mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Dai</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>C. -S.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Shang</surname></string-name> and <string-name><given-names>T. -H.</given-names> <surname>Chen</surname></string-name></person-group>, &#x201C;<article-title>Logram: Efficient log parsing using nn-gram dictionaries</article-title>,&#x201D; <source>IEEE Transactions on Software Engineering</source>, vol. <volume>48</volume>, no. <issue>3</issue>, pp. <fpage>879</fpage>&#x2013;<lpage>892</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>C. J.</given-names> <surname>Fung</surname></string-name>, <string-name><given-names>R.</given-names> <surname>He</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhao</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Paddy: An event log parsing approach using dynamic dictionary</article-title>,&#x201D; in <conf-name>IEEE/IFIP Network Operations and Management Symp.</conf-name>, <publisher-loc>Budapest, Hungary</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Du</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Spell: Online streaming parsing of large unstructured system logs</article-title>,&#x201D; <source>IEEE Transactions on Knowledge and Data Engineering</source>, vol. <volume>31</volume>, no. <issue>11</issue>, pp. <fpage>2213</fpage>&#x2013;<lpage>2227</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Studiawan</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Sohel</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Payne</surname></string-name></person-group>, &#x201C;<article-title>Automatic event log abstraction to support forensic investigation</article-title>,&#x201D; in <conf-name>Proc. of the Australasian Computer Science Week</conf-name>, <publisher-loc>Melbourne, VIC, Australia</publisher-loc>, vol. <volume>1</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Meng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Zaiter</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>LogParse: Making log parsing adaptive through word classification</article-title>,&#x201D; in <conf-name>29th Int. Conf. on Computer Communications and Networks</conf-name>, <publisher-loc>Honolulu, HI, USA</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>9</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Ayoade</surname></string-name>, <string-name><given-names>A.</given-names> <surname>El-Ghamry</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Karande</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>M. F.</given-names> <surname>Alrahmawy</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Secure data processing for IoT middleware systems</article-title>,&#x201D; <source>The Journal of Supercomputing</source>, vol. <volume>75</volume>, no. <issue>8</issue>, pp. <fpage>4684</fpage>&#x2013;<lpage>4709</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Qiu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Guo</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>Cloud computing assisted blockchain-enabled internet of things</article-title>,&#x201D; <source>IEEE Transactions on Cloud Computing</source>, vol. <volume>10</volume>, no. <issue>1</issue>, pp. <fpage>247</fpage>&#x2013;<lpage>257</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>El-Masri</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Petrillo</surname></string-name>, <string-name><given-names>Y. -G.</given-names> <surname>Gu&#x00E9;h&#x00E9;neuc</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Hamou-Lhadj</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Bouziane</surname></string-name></person-group>, &#x201C;<article-title>A systematic literature review on automated log abstraction techniques</article-title>,&#x201D; <source>Information and Software Technology</source>, vol. <volume>122</volume>, no. <issue>2</issue>, pp. <fpage>106276</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Fox</surname></string-name>, <string-name><given-names>D. A.</given-names> <surname>Patterson</surname></string-name> and <string-name><given-names>M. I.</given-names> <surname>Jordan</surname></string-name></person-group>, &#x201C;<article-title>Detecting large-scale system problems by mining console logs</article-title>,&#x201D; in <conf-name>Proc. of the 27th Int. Conf. on Machine Learning (ICML-10)</conf-name>, <publisher-loc>Haifa, Israel</publisher-loc>, pp. <fpage>37</fpage>&#x2013;<lpage>46</lpage>, <year>2010</year>. </mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Vaarandi</surname></string-name></person-group>, &#x201C;<article-title>Mining event logs with SLCT and LogHound</article-title>,&#x201D; in <conf-name>IEEE/IFIP Network Operations and Management Symp.: Pervasive Management for Ubioquitous Networks and Services</conf-name>, <publisher-loc>Salvador, Bahia, Brazil</publisher-loc>, pp. <fpage>1071</fpage>&#x2013;<lpage>1074</lpage>, <year>2008</year>. </mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Vaarandi</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Pihelgas</surname></string-name></person-group>, &#x201C;<article-title>LogCluster-a data clustering and pattern mining algorithm for event logs</article-title>,&#x201D; in <conf-name>11th Int. Conf. on Network and Service Management</conf-name>, <publisher-loc>Barcelona, Spain</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2015</year>. </mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Fu</surname></string-name>, <string-name><given-names>J. -G.</given-names> <surname>Lou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Execution anomaly detection in distributed systems through unstructured log analysis</article-title>,&#x201D; in <conf-name>The Ninth IEEE Int. Conf. on Data Mining</conf-name>, <publisher-loc>Miami, Florida, USA</publisher-loc>, pp. <fpage>149</fpage>&#x2013;<lpage>158</lpage>, <year>2009</year>. </mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>C. -S.</given-names> <surname>Perng</surname></string-name></person-group>, &#x201C;<article-title>LogSig: Generating system events from raw textual logs</article-title>,&#x201D; in <conf-name>Proc. of the 20th ACM Conf. on Information and Knowledge Management</conf-name>, <conf-loc>Glasgow, United Kingdom</conf-loc>, pp. <fpage>785</fpage>&#x2013;<lpage>794</lpage>, <year>2011</year>. </mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Hamooni</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Debnath</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Jiang</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>LogMine: Fast pattern recognition for log analytics</article-title>,&#x201D; in <conf-name>Proc. of the 25th ACM Int. Conf. on Information and Knowledge Management</conf-name>, <publisher-loc>Indianapolis, IN, USA</publisher-loc>, pp. <fpage>1573</fpage>&#x2013;<lpage>1582</lpage>, <year>2016</year>. </mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Shima</surname></string-name></person-group>, &#x201C;<article-title>Length matters: Clustering system log messages using length of words</article-title>,&#x201D; <comment>arXiv, 1611.03213</comment>, <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>He</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Zheng</surname></string-name> and <string-name><given-names>M. R.</given-names> <surname>Lyu</surname></string-name></person-group>, &#x201C;<article-title>Drain: An online log parsing approach with fixed depth tree</article-title>,&#x201D; in <conf-name>2017 IEEE Int. Conf. on Web Services</conf-name>, <publisher-loc>Honolulu, HI, USA</publisher-loc>, pp. <fpage>33</fpage>&#x2013;<lpage>40</lpage>, <year>2017</year>. </mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Makanju</surname></string-name>, <string-name><given-names>A. N.</given-names> <surname>Zincir-Heywood</surname></string-name> and <string-name><given-names>E. E.</given-names> <surname>Milios</surname></string-name></person-group>, &#x201C;<article-title>A lightweight algorithm for message type extraction in system application logs</article-title>,&#x201D; <source>IEEE Transactions on Knowledge and Data Engineering</source>, vol. <volume>24</volume>, no. <issue>11</issue>, pp. <fpage>1921</fpage>&#x2013;<lpage>1936</lpage>, <year>2012</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z. M.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>A. E.</given-names> <surname>Hassan</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Hamann</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Flora</surname></string-name></person-group>, &#x201C;<article-title>An automated approach for abstracting execution logs to execution events</article-title>,&#x201D; <source>Journal of Software Maintenance and Evolution: Research and Practice</source>, vol. <volume>20</volume>, no. <issue>4</issue>, pp. <fpage>249</fpage>&#x2013;<lpage>267</lpage>, <year>2008</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><collab>LPME</collab></person-group>, <year>2022</year>. [Online]. Available: <ext-link ext-link-type="uri" xlink:href="https://github.com/yumg/lpme,">https://github.com/yumg/lpme</ext-link> </mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>He</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>He</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>M. R.</given-names> <surname>Lyu</surname></string-name></person-group>, &#x201C;<article-title>Towards automated log parsing for large-scale log data analysis</article-title>,&#x201D; <source>IEEE Transactions on Dependable and Secure Computing</source>, vol. <volume>15</volume>, no. <issue>6</issue>, pp. <fpage>931</fpage>&#x2013;<lpage>944</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Althobaiti</surname></string-name>, <string-name><given-names>A. A.</given-names> <surname>Alotaibi</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Abdel-Khalek</surname></string-name>, <string-name><given-names>E. M.</given-names> <surname>Abdelrahim</surname></string-name>, <string-name><given-names>R. F.</given-names> <surname>Mansour</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Intelligent data science enabled reactive power optimization of a distribution system, sustainable computing</article-title>,&#x201D; <source>Informatics and Systems</source>, vol. <volume>35</volume>, pp. <fpage>100765</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>You</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>sBiLSAN: Stacked bidirectional self-attention LSTM network for anomaly detection and diagnosis from system logs</article-title>,&#x201D; in <conf-name>Intelligent Systems and Applications-Proc. of the 2021 Intelligent Systems Conf.</conf-name>, <publisher-loc>Amsterdam, The Netherlands</publisher-loc>, vol. <volume>296</volume>, pp. <fpage>777</fpage>&#x2013;<lpage>793</lpage>, <year>2021</year>. </mixed-citation></ref>
</ref-list>
</back>
</article>