<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">35741</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2023.035741</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Learning Based Cyber Event Detection from Open-Source Re-Emerging Social Data</article-title>
<alt-title alt-title-type="left-running-head">Deep Learning Based Cyber Event Detection from Open-Source Re-Emerging Social Data</alt-title>
<alt-title alt-title-type="right-running-head">Deep Learning Based Cyber Event Detection from Open-Source Re-Emerging Social Data</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Mohammad</surname><given-names>Farah</given-names>
</name><xref ref-type="aff" rid="aff-1">1</xref><email>fsheikh@ksu.edu.sa</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Al-Ahmadi</surname><given-names>Saad</given-names>
</name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Al-Muhtadi</surname><given-names>Jalal</given-names>
</name><xref ref-type="aff" rid="aff-1">1</xref>
<xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Center of Excellence in Information Assurance (CoEIA), King Saud University</institution>, <addr-line>Riyadh, 11543</addr-line>, <country>Saudi Arabia</country></aff>
<aff id="aff-2"><label>2</label><institution>College of Computer &#x0026; Information Sciences, King Saud University</institution>, <addr-line>Riyadh, 11543</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Farah Mohammad. Email: <email>fsheikh@ksu.edu.sa</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>30</day><month>8</month><year>2023</year></pub-date>
<volume>76</volume>
<issue>2</issue>
<fpage>1423</fpage>
<lpage>1438</lpage>
<history>
<date date-type="received"><day>01</day><month>9</month><year>2022</year>
</date>
<date date-type="accepted"><day>12</day><month>11</month><year>2022</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Mohammad, Al-Ahmadi, Al-Muhtadi, </copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Mohammad, Al-Ahmadi, Al-Muhtadi, </copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_35741.pdf"></self-uri>
<abstract>
<p>Social media forums have emerged as the most popular form of communication in the modern technology era, allowing people to discuss and express their opinions. This increases the amount of material being shared on social media sites. There is a wealth of information about the threat that may be found in such open data sources. The security of already-deployed software and systems relies heavily on the timely detection of newly-emerging threats to their safety that can be gleaned from such information. Despite the fact that several models for detecting cybersecurity events have been presented, it remains challenging to extract security events from the vast amounts of unstructured text present in public data sources. The majority of the currently available methods concentrate on detecting events that have a high number of dimensions. This is because the unstructured text in open data sources typically contains a large number of dimensions. However, to react to attacks quicker than they can be launched, security analysts and information technology operators need to be aware of critical security events as soon as possible, regardless of how often they are reported. This research provides a unique event detection method that can swiftly identify significant security events from open forums such as Twitter. The proposed work identified new threats and the revival of an attack or related event, independent of the volume of mentions relating to those events on Twitter. In this research work, deep learning has been used to extract predictive features from open-source text. The proposed model is composed of data collection, data transformation, feature extraction using deep learning, Latent Dirichlet Allocation (LDA) based medium-level cyber-event detection and final Google Trends-based high-level cyber-event detection. The proposed technique has been evaluated on numerous datasets. Experiment results show that the proposed method outperforms existing methods in detecting cyber events by giving 95.96% accuracy.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Social media; twitter; cyber</kwd>
<kwd>events; deep learning</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Center of Excellence in Information Assurance (CoEIA), KSU</funding-source>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>The number of cyberattacks and data theft events is rapidly rising. The threats to society and the economy have significantly increased as a result of technological innovation and internet-based platforms [<xref ref-type="bibr" rid="ref-1">1</xref>]. Currently, there is a lack of sophisticated security measures and threat detection mechanisms for software industries and organizations. According to estimates, cybercrimes and cyberattacks are becoming more serious and frequent, and businesses must deal with several difficulties as a result. Most of the time, machine learning models, tools, and applications have been developed to address this problem. It is necessary to define a model that can capture this sort of threat [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>Another factor in the inability to identify cyberattacks is the absence of information that attackers communicate and plot on various forums before carrying out an assault. A crucial tool for dealing with such circumstances is seen to be the messages made on public forums. Not just this, but also numerous other ongoing studies have found that such media choices have a significant impact. A remarkable study by Khatoon et al. [<xref ref-type="bibr" rid="ref-3">3</xref>] examined the possibility of using a Twitter post to alert the Japanese populace to an impending earthquake. The findings indicate that tweets were able to communicate more effectively and fastly than official announcements made by the Japan Meteorological Agency (JMA) [<xref ref-type="bibr" rid="ref-4">4</xref>]. This study illustrates a change in human behaviour caused by the use of such instruments to gather specific information.</p>
<p>On social media, it has been common practice to share hacker services like disseminating harmful software and software vulnerabilities. Moreover, hackers utilize these exploits for system flaws to compromise the organization&#x0027;s security network to carry out undesirable actions. These include stealing confidential data, spying and espionage as well as launching distributed denial-of-service assaults [<xref ref-type="bibr" rid="ref-5">5</xref>]. A prime example of this occurred on October 14, 2014, when 254 unique software flaws belonging to multiple vendors, including Adobe, Oracle, and Microsoft, were made public on discussion boards [<xref ref-type="bibr" rid="ref-6">6</xref>]. Due to the hackers&#x0027; inability to access their prior communications, this catastrophe occurred.</p>
<p>There is numerous research that has been covered by Open Source Intelligence (OSINT). Their analysis of OSINT indicates that the cyber security industry offers a variety of scenarios based on offensive and defensive methods that may be sufficient for a business to become secure. On the other hand, OSINT also depends on the conversation of hackers on social media sites. They also address Twitter&#x0027;s role in significant cyber events, such as the publication of multiple zero-day Denial-of-service (DDoS) vulnerabilities in Microsoft Windows, user reports on various DDoS attacks, the publication of sensitive data, and the origins of ransomware operations [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>].</p>
<p>Shin et al. [<xref ref-type="bibr" rid="ref-9">9</xref>] discussed that open data sources are a great place to find information about threats. Their work highlight the crucial aspect of the security of installed software and systems is the early detection of developing security threats from such information. In their research, they offer a novel event detection method that, regardless of the volume of mentions, can instantly identify significant security events from Twitter, such as new threats and the revival of an assault or related event. In contrast to the current methods, their suggested method identifies candidate events from among hundreds of occurrences by keeping track of new and re-emerging words. Then, by grouping tweets associated with the trigger words, it creates events. With this method, they were able to identify emerging and resurgent dangers as soon as feasible.</p>
<p>Another work [<xref ref-type="bibr" rid="ref-10">10</xref>] introduced a system that mines text for data on cybersecurity-related events and uses that data to fill a semantic model in preparation for inclusion into a knowledge network of cybersecurity data. It was trained using a fresh corpus of 1,000 English news items from 2017 to 2019 that are richly annotated using event-based labels and cover both cyberattack and vulnerability-related incidents. Their proposed model defines 20 argument types that are appropriate for events, along with five event subtypes and their semantic functions (e.g., file, device, software, money). Rich linguistic features and word embeddings can be incorporated using the proposed system, which employs various deep neural network techniques. The results of their testing on each part of the event detection pipeline demonstrated that each subsystem functions effectively.</p>
<p>The work of another research [<xref ref-type="bibr" rid="ref-11">11</xref>] presented that a common perception is that social media serves as a sensor for many societal events, such as epidemics, demonstrations and elections. Social media is used as a crowdsourced sensor to gather information on ongoing cyberattacks, according to their description. Their method requires no training or labeled samples and detects a wide range of cyber-attacks. A novel query expansion strategy based on convolution kernels and dependency parses was used to model semantic structure and identify crucial event features. They also showed that their methodology reliably recognizes and encodes events, exceeding previous methods, through a large-scale investigation across Twitter.</p>
<p>Many researchers have attempted to extract detailed semantic information about cyber security events, however, they have only extracted event arguments that fall within the span of sentences [<xref ref-type="bibr" rid="ref-12">12</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>]. When the event arguments that need to be recognized are dispersed across numerous phrases, these investigations still have limitations. In this study, they presented a methodology for efficiently extracting cyber security events from cyber security news, blogs, and announcements at the document level. Most of the work formulates the job of extracting document-level events as a sequence tagging issue. The objective is to extract from documents the cybersecurity-related arguments. The first step is to embed the characters and add the word information to the character representations. Then, to obtain the cross-sentence context information, they construct a sliding window technique. Finally, their approach forecast what each character&#x0027;s label will be. The experimental findings show the efficiency of the proposed model, which they test using three approaches and a Chinese cyber security dataset.</p>
<p>The processing of natural language processing (NLP) is an important field of artificial intelligence. The field of natural language processing (NLP) makes the potential interaction between people through social media forums. NLP makes use of Artificial Intelligence (AI) algorithms to take social media reviews as a dataset, process it, and then provide them in a format that is processable for analysis and prediction [<xref ref-type="bibr" rid="ref-15">15</xref>]. To successfully analyze the data, sentiment analysis, is one of the most significant approaches that determine the polarity of specific entities and events, which may be either positive, neutral, or negative. To identify cyber incidents in advance of their occurrence, this research work provides a hybrid sentiment analytic approach with a deep learning model. In this article, we developed a neural network-based end-to-end threat intelligence architecture without the need for additional feature engineering or processing pipeline techniques. The proposed technique is composed of data collection, data transformation, feature extraction using deep learning and final event detection. The proposed technique is beneficial for an organization where high security at an early stage is desirable. The following is the primary goal that this study aims to achieve:
<list list-type="bullet">
<list-item>
<p>The key contribution of this research is to define a new cyber event detection mechanism that is user-friendly and easily adaptable to any organizational setup.</p></list-item>
<list-item>
<p>A novel approach of using social page count and Google trending mechanism with LDA produces better event detection at both levels (medium as well as high).</p></list-item>
<list-item>
<p>From the experimental evaluation, it has been observed that the proposed technique produces better accuracy with a value of almost 96% due to the usage of new similarity measures.</p></list-item>
</list></p>
<p>The remainder of the paper is structured as follows: <xref ref-type="sec" rid="s2">Section 2</xref> discusses the proposed methodology, <xref ref-type="sec" rid="s3">Section 3</xref> describes experimental results and discussions and <xref ref-type="sec" rid="s4">Section 4</xref> concludes the proposed research.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Methods and Materials</title>
<p>This section describes the main phase and includes the corresponding diagrams. The suggested framework, which recognizes cyber-events from social media information is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The detail of data collection, data transformation, extraction of features and cyber-event prediction has been discussed in the following subsections.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Proposed framework for cyber event detection</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-1.tif"/>
</fig>
<sec id="s2_1">
<label>2.1</label>
<title>Data Collection</title>
<p>We have collected tweets from several websites to identify cyber events. These tweets were gathered straight from social media using the Twitter Application Programming Interface (API), sometimes known as the streaming API [<xref ref-type="bibr" rid="ref-16">16</xref>] because it enables us to obtain tweets in real-time. Twitter [<xref ref-type="bibr" rid="ref-17">17</xref>] has two unique applications available. The first one is a search API that retrieves prior tweets that adhere to requirements that a user has specified. Another dataset was collected using streaming API, which is completely different from the search API. To collect tweets over an extended period, this application keeps an open Hypertext Transfer Protocol (HTTP) connection. Different cyber-threat and cyber-event-related data have been gathered based on these APIs. <xref ref-type="table" rid="table-1">Table 1</xref> provides a detailed description of the collected data.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Dataset description</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>#.</th>
<th>Dataset description</th>
<th>URL</th>
<th>Number of reviews</th>
</tr>
</thead>
<tbody>
<tr>
<td>DS-I</td>
<td>Cyber dataset</td>
<td><ext-link ext-link-type="uri" xlink:href="https://shorturl.at/ajrTX">https://shorturl.at/ajrTX</ext-link></td>
<td>2225</td>
</tr>
<tr>
<td>DS-II</td>
<td>Dataset containing cyber threats</td>
<td><ext-link ext-link-type="uri" xlink:href="https://shorturl.at/jkoLW">https://shorturl.at/jkoLW</ext-link></td>
<td>31281</td>
</tr>
<tr>
<td>DS-III</td>
<td>Tweets containing cyber threats</td>
<td><ext-link ext-link-type="uri" xlink:href="https://shorturl.at/celv1">https://shorturl.at/celv1</ext-link></td>
<td>1578</td>
</tr>
<tr>
<td align="center" colspan="4"><bold>Review description</bold></td>
</tr>
<tr>
<td>#.</td>
<td>No. of document</td>
<td>Vocabulary size</td>
<td>Review contents</td>
</tr>
<tr>
<td>DS-I</td>
<td>45</td>
<td>7,679</td>
<td>Cybercrime and hacking</td>
</tr>
<tr>
<td>DS-II</td>
<td>32</td>
<td>6,323</td>
<td>Herrasment, Pishing</td>
</tr>
<tr>
<td>DS-III</td>
<td>35</td>
<td>4,231</td>
<td>Hacking, Privacy, spam event</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Two different streams have been adopted for data collection keywords-based and Twitter accounts of well-known security professionals [<xref ref-type="bibr" rid="ref-17">17</xref>], well-known security news sources, security firms and their research groups and vulnerability feeds. In addition, we also collected the Twitter accounts that different organizations&#x0027; security experts follow. The first dataset [DS-I] from the aforementioned table is about cyber-threat and was taken from the provided website. Nearly 31281 tweets about cyber threats are included in this dataset. The tweets&#x0027; vocabulary size was almost 48019 words. The following dataset is DS-II, which is gathered from a BBC news link and has about 2225 tweets on various cyber-related offenses that are posted by various communities. Last but not the least, the Twitter cyber threat dataset i.e., DS-III includes a total of 1578 tweets that were collected from a variety of social media sites, including Kaggle, Wikipedia, and Twitter.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Data Transformation</title>
<p>The classification of tweets enables the separation of several events connected to a term by event type. For instance, if the word &#x201C;linux&#x201D; is found to be re-emerging, several events, such as the discovery of new Linux vulnerabilities and the debut of new Linux malware may happen on the same day. As part of our strategy, we include broad phrases such as &#x201C;attack,&#x201D; &#x201C;hack,&#x201D; and &#x201C;leak&#x201D; in the &#x201C;others&#x201D; keyword list to guarantee that it does not overlook critical security-related incidents. After the data has been collected, the tweets are preprocessed and converted so that a list of terms by using the following predefined rules:
<list list-type="bullet">
<list-item>
<p>Each tweet has undergone through named entity recognition (NER) process to compile a list of names of individuals which subsequently exclude from tweets.</p></list-item>
<list-item>
<p>To find the proper nouns, including virus names, vulnerabilities, company names, and product names, we apply a part of speech (POS) tag to every tweet.</p></list-item>
<list-item>
<p>Symbols, Hypertext Markup Language (HTML) tags, Uniform Resource Locator (URLs) and Twitter handles are all stripped from each tweet. Moreover, the most frequent terms that appear in the majority of texts are stop-words such as &#x201C;the&#x201D;, &#x201C;a&#x201D;, &#x201C;of&#x201D;, &#x201C;or&#x201D;, &#x201C;to&#x201D; etc are also eliminated.</p></list-item>
<list-item>
<p>Twitter accounts generate a lot of noise when it comes to monitoring because many Twitter users abuse them for self-promotion that is also removed.</p></list-item>
<list-item>
<p>Each word is lemmatized so that all of its possible inflected forms can be represented by a single lemma.</p></list-item>
</list></p>
<p>After the successful deployment of the proposed data transformation process, some of the obtained topics from different tweets are shown in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Resulting terms from transforming data</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<tbody>
<tr>
<td><italic>Data lemmatization</italic></td>
<td>[&#x2018;program&#x2019;, &#x2018;security&#x2019;, &#x2018;bug&#x2019;],</td>
</tr>
<tr>
<td/>
<td>[&#x201C;Hotel,&#x201D; &#x201C;malware,&#x201D; &#x201C;program,&#x201D; &#x201C;disclose,&#x201D; &#x201C;customer,&#x201D; &#x201C;individual,&#x201D; &#x201C;data&#x201D;],</td>
</tr>
<tr>
<td/>
<td>[&#x2018;London,&#x2019; &#x2018;information,&#x2019; &#x2018;passport,&#x2019; &#x2018;grab,&#x2019; &#x2018;violation&#x2019;], [&#x2018;person&#x2019;, &#x2018;exposure,&#x2019; &#x2018;collude&#x2019;]</td>
</tr>
<tr>
<td/>
<td>[&#x2018;internet&#x2019;, &#x2018;mafia&#x2019;, &#x2018;organization&#x2019;, &#x2018;freedom&#x2019;],</td>
</tr>
<tr>
<td/>
<td>[&#x2018;coder,&#x2019; &#x2018;criticise,&#x2019; &#x2018;stealing&#x2019;]</td>
</tr>
<tr>
<td/>
<td>[&#x2018;washing&#x2019;],</td>
</tr>
<tr>
<td/>
<td>[&#x201C;worker,&#x201D; &#x201C;cost,&#x201D; &#x201C;fraud&#x201D;],</td>
</tr>
<tr>
<td/>
<td>[leak, record],</td>
</tr>
<tr>
<td/>
<td>[&#x2018;crime&#x2019;, &#x2018;brand&#x2019;, &#x2018;stolen&#x2019;]</td>
</tr>
<tr>
<td><italic>Preprocessing</italic></td>
<td>[&#x2018;program&#x2019;, &#x2018;security&#x2019;, &#x2018;bug&#x2019;],</td>
</tr>
<tr>
<td/>
<td>[&#x201C;hotel,&#x201D; &#x201C;malware,&#x201D; &#x201C;program&#x201D; &#x201C;disclose,&#x201D; &#x201C;customer,&#x201D; &#x201C;individual,&#x201D; &#x201C;data&#x201D;],</td>
</tr>
<tr>
<td/>
<td>[&#x2018;london,&#x2019; &#x2018;information,&#x2019; &#x2018;passport,&#x2019; &#x2018;grab,&#x2019; &#x2018;violation&#x2019;], [&#x2018;person&#x2019;, &#x2018;exposure,&#x2019; &#x2018;collude&#x2019;]</td>
</tr>
<tr>
<td/>
<td>[&#x2018;internet&#x2019;, &#x2018;mafia&#x2019;, &#x2018;organization&#x2019;, &#x2018;freedom&#x2019;],</td>
</tr>
<tr>
<td/>
<td>[&#x2018;coder,&#x2019; &#x2018;criticise,&#x2019; &#x2018;stealing&#x2019;]</td>
</tr>
<tr>
<td/>
<td>[&#x2018;washing&#x2019;],</td>
</tr>
<tr>
<td/>
<td>[&#x201C;worker,&#x201D; &#x201C;cost,&#x201D; &#x201C;fraud&#x201D;],</td>
</tr>
<tr>
<td/>
<td>[leak, record],</td>
</tr>
<tr>
<td/>
<td>[&#x2018;crime&#x2019;, &#x2018;brand&#x2019;, &#x2018;stolen&#x2019;]</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Deep Learning-Based Feature Extraction</title>
<p>Following the data transformation process, the feature vector has been extracted using a deep learning-based feature extraction approach. Algorithm 1 shows the working of the deep learning-based feature extraction process. The performance of deep learning-based prediction models like word2vec is superior to that of conventional machine learning models like Term Frequency-Inverse Document Frequency (TF-IDF) and frequency-based models. The Continuous Bag of Words (CBOW) mechanism has been used in this work where the input layer assigned a weight to each of the transformed words. These weighted words have several neurons in a hidden layer that is fully coupled to them. This layer&#x0027;s size is modified by the word vector dimensions obtained.</p>
<fig id="fig-8">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-8.tif"/>
</fig>
<p>Assume that <bold><italic>V</italic></bold> is the word-vocabulary vector and <bold><italic>N</italic></bold> is the word vectors dimension. The anticipated weight W1 of size <bold><italic>V.N</italic></bold>, where each row represents vocabulary, will be calculated by the hidden layer. On the other hand, the output layer, which has been represented by a predetermined matrix with the name <bold><italic>WO</italic></bold> and the size of <bold><italic>N.V</italic></bold>, is completely connected to the hidden layer. The columns of this matrix, like the last one, each represent a word from the dictionary. Consider sending a training tweet that includes the phrases &#x201C;the security leakage&#x201D; &#x201C;Hacker attack due to weak security&#x201D; and &#x201C;data lost&#x201D; to have a better knowledge of this procedure.</p>
<p>This tweet has a word count of 12, with each word being represented by its index. Assume that the proposed neural network uses 12 input neurons and 12 output neurons to represent this. The <bold><italic>WI</italic></bold> and <bold><italic>WO</italic></bold> will therefore be configured as 12 &#x00D7; 3 and 3 &#x00D7; 12 matrices for this example.</p>
<p>According to the characteristics of neural networks, each neuron will first be given a random weight as indicated below.</p>
<p><disp-formula id="ueqn-1"><mml:math id="mml-ueqn-1" display="block"><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>WI</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.094491</mml:mn><mml:mspace width="1em" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.443977</mml:mn><mml:mspace width="1em" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.313917</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.490796</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.229903</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.065460</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.072921</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.172246</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.357751</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.104514</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.463000</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.079367</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.226080</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.154679</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.038422</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.406115</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.192794</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.441992</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.181755</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.088268</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.277574</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mn>0.055334</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.491792</mml:mn><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mspace width="thinmathspace" /><mml:mn>0.263102</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p><disp-formula id="ueqn-2"><mml:math id="mml-ueqn-2" display="block"><mml:mtable columnalign="left left left left left left left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mtext>WO</mml:mtext></mml:mrow><mml:mo>=</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.023074</mml:mn></mml:mtd><mml:mtd><mml:mn>0.479901</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mn>0</mml:mn></mml:mphantom></mml:mrow><mml:mn>0.432148</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.375480</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.364732</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.119840</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.266070</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.351000</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.368008</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.424778</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.257104</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.148817</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.033922</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.353874</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.144942</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.130904</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.422434</mml:mn></mml:mtd><mml:mtd><mml:mn>0.364503</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.467865</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.020302</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.423890</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.438777</mml:mn></mml:mtd><mml:mtd><mml:mrow><mml:mphantom><mml:mo>&#x2212;</mml:mo></mml:mphantom></mml:mrow><mml:mn>0.268529</mml:mn></mml:mtd><mml:mtd><mml:mo>&#x2212;</mml:mo><mml:mn>0.446787</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>The result from the hidden layer can be determined using the data input vector as follows: Ht &#x003D; IWI &#x003D; [&#x2212;0.490796 &#x2212;0.229903 0.065460]</p>
<p><bold><italic>H</italic></bold> stands for the hidden layer, I for the input vector, and <bold><italic>WI</italic></bold> for the weight matrix. After that, an activation vector is used to connect the input layer to the output layer. Additionally, each output shows the word embedding of the input vocabulary items, which each represent one feature. The Feature that is produced by the word2vec technique described above is vectorized and has many dimensions. Each vector is transformed into a comprehensible 2D presentation to reveal the secret word. The final features in real space have been represented using t-distributed stochastic neighbour embedding (t-SNE) for these objectives.</p>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>Medium Level Event Detection</title>
<p>LDA has been used for medium-level event detection which is used as a filter to obtain words that accurately describe a cyber-event. The Bayesian theorem is a widely used statistical metric that forms the foundation of the LDA&#x0027;s construction. With the use of LDA, we identify collections of various cyber events based on their semantic similarity to a certain document. With this study, LDA extracts many subjects from each tweet document [<xref ref-type="bibr" rid="ref-18">18</xref>]. In cyber-event modeling, we assume that each event in our data collection is easily representable as a composite of numerous other events and that each event represents the representation of numerous other words.</p>
<p>LDA creates a set of topics from words identified in a given document by comparing a given set of documents to each term (ti). <bold><italic>N</italic></bold> topics have been produced as a result, and each subject represents a <bold><italic>Nt</italic></bold> keyword. The two variables <bold><italic>N</italic></bold> and <bold><italic>Nt</italic></bold> are used to adjust how specialized. LDA Obtain a list of subjects P(|d) for each document di, where each topic is a concatenation of the terms P(t|) described in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>. P(ti|d), where <bold><italic>I</italic></bold> is a specific topic and P(ti|i &#x003D; j) is the probability of a term <bold><italic>ti</italic></bold> in a topic <bold><italic>j</italic></bold>, is the probability of a term ti in document <bold><italic>d</italic></bold>. The likelihood of choosing a phrase from subject <bold><italic>j</italic></bold> is P(i &#x003D; j|d).</p>
<p><disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msubsup><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>d</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>LDA is used to estimate the document topic distribution P(&#x03B8;|d) and topic term distribution P(t|&#x03B8;) using an unlabeled corpus of documents. Gibbs sampler [<xref ref-type="bibr" rid="ref-19">19</xref>] executes many times for each word ti in a document di and then samples a new subject j depending on them. Ct&#x03B8; represents the number of topics, CD&#x03B8; represents the number of documents containing topic assignments, T represents all subject assignments, &#x03B8;-i represents all topic terms. Based on these total counts, posterior probabilities are calculated as stated in the below equations.</p>
<p><disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:munder><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:mfrac><mml:mo>&#x00D7;</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:munder><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>&#x03B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p><disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>w</mml:mi></mml:mrow></mml:munder><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p><disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mi>P</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>j</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B1;</mml:mi></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:munder><mml:msubsup><mml:mi>C</mml:mi><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mi>&#x03B8;</mml:mi></mml:mrow><mml:mrow><mml:mi>D</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>Because LDA&#x0027;s recursive functionality improves topic modeling&#x0027;s clarity, we can characterize LDA as part of NLP. Because of its statistical basis, a mix of subjects are produced and each topic can be inferred from the others. LDA model is merely a collection of words and its description is outlined in below:
<list list-type="bullet">
<list-item>
<p>In the initial stage, the modal displays the number of subjects you wish to extract from the input.</p></list-item>
<list-item>
<p>Each word in a document is given a temporary topic in the second step of the algorithm. If there are any terms that repeat, distinct themes may be allocated to each one. This assignment is temporary and will change when an algorithm discovers a word that fits the topic.</p></list-item>
<list-item>
<p>The last step of the algorithm updated the subjects that were provided. This update is effective based on the following two criteria: This work demonstrates, with regard to a single phrase, the frequency with which a particular word is used across themes.</p></list-item>
</list></p>
<p>The second question is, how often do the subjects come up in the provided document?</p>
<p>The organization of LDA is presented in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, here we can see the many components of the LDA algorithm, which comprise the following:
<list list-type="bullet">
<list-item>
<p>Parameter of dirichlet (&#x03B1;)</p>
</list-item>
<list-item>
<p>Regarding the topical distribution of the paper (&#x03B8;<sub>d</sub>)</p></list-item>
<list-item>
<p>Word count assignment for each subject (Z<sub>d,n</sub>)</p></list-item>
<list-item>
<p>Observed word (W<sub>d,n</sub>)</p></list-item>
<list-item>
<p>Topics (&#x03B2;<sub>k</sub>)</p></list-item>
<list-item>
<p>Topic hyperparameters (&#x03B7;)</p></list-item>
</list></p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Complete framework of LDA</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-2.tif"/>
</fig>
<p>In order to successfully deployed the LDA, the Gensim library was used to develop LDA techniques in Python (version 3.4). To implement LDA within Gensim it provides a wrapper. This programming-based work involved the structuring of the input text and pattern-finding. <xref ref-type="table" rid="table-3">Table 3a</xref> displays the calculations and weights for a few topics that were produced using LDA. After the successful deployment of the proposed LDA procedure, the topic obtained are considered to be as medium-level events. Some of the obtained medium-level events are depicted in <xref ref-type="table" rid="table-3">Table 3b</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>(a) Scoring values of the topic terms, (b) Medium level event detection using LDA</title>
</caption>
<table frame="hsides" >
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th colspan="3">(a)</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td><italic>Score of LDA</italic></td>
<td colspan="2">[(0, &#x2018;0.0.88&#x002A;&#x201C;program&#x201D; &#x002B; 0.026&#x002A;&#x201C;security&#x201D; &#x002B; 0.031&#x002A;&#x201C;bug&#x201D; &#x002B; 0.020&#x002A;&#x201C; hotel&#x201D; &#x002B; 0.020&#x002A;&#x201C;program&#x201D; &#x002B; 0.020&#x002A;&#x201C;error&#x201D; &#x002B; 0.020&#x002A;&#x201C;bug&#x201D; &#x002B; 0.020&#x002A;&#x201C;information&#x201D; &#x002B; 0.020&#x002A;&#x201C;coder&#x201D; &#x002B; 0.020&#x002A;&#x201C;organization&#x201D;), (1, &#x2018;0.041&#x002A;&#x201C;function&#x201D; &#x002B; 0.041&#x002A;&#x201C;break&#x201D; &#x002B; 0.036&#x002A;&#x201C;take&#x201D; &#x002B; 0.031&#x002A;&#x201C;stolen&#x201D; &#x002B; 0.021&#x002A;&#x201C;user&#x201D; &#x002B; 0.021&#x002A;<break/>&#x201C; program&#x201D; &#x002B; 0.021&#x002A;&#x201C;task&#x201D; &#x002B; 0.021&#x002A;&#x201C;hard&#x201D; &#x002B; 0.021&#x002A;&#x201C;internet&#x201D; &#x002B; 0.021&#x002A;&#x201C;card&#x201D;), (2, &#x2018;0.033&#x002A;&#x201C;worker&#x201D; &#x002B; 0.42&#x002A;&#x201C;bug&#x201D; &#x002B; 0.42&#x002A;&#x201C;cost&#x201D; &#x002B; 0.30&#x002A;&#x201C;farad&#x201D; &#x002B; 0.020&#x002A;&#x201C;stolen&#x201D; &#x002B; 0.016&#x002A;&#x201C;crime&#x201D; &#x002B; 0.021&#x002A;&#x201C;brand&#x201D; &#x002B; 0.021&#x002A;&#x201C;stolen&#x201D; &#x002B; 0.021&#x002A;&#x201C;site&#x201D; &#x002B; 0.021&#x002A;&#x201C;penalty&#x201D;), (3, &#x2018;0.031&#x002A;&#x201C;penalty&#x201D; &#x002B; 0.031&#x002A;&#x201C;breach&#x201D; &#x002B; 0.020&#x002A;&#x201C;bug&#x201D; &#x002B; 0.020&#x002A;&#x201C;cheater&#x201D; &#x002B; 0.030&#x002A;&#x201C;brand&#x201D; &#x002B; 0.014&#x002A;&#x201C;farad&#x201D; &#x002B; 0.021&#x002A;&#x201C;take&#x201D; &#x002B; 0.031&#x002A;&#x201C;verify&#x201D; &#x002B; 0.016&#x002A;&#x201C;away&#x201D; &#x002B; 0.021&#x002A;&#x201C;stolen&#x201D;), (4, &#x2018;0.031&#x002A;&#x201C;user&#x201D; &#x002B; 0.031&#x002A;&#x201C;criticise&#x201D; &#x002B; 0.031&#x002A;&#x201C;stealing&#x201D; &#x002B; 0.031&#x002A;&#x201C;error&#x201D; &#x002B; 0.032&#x002A;&#x201C;farud&#x201D; &#x002B; 0.022&#x002A;&#x201C;program&#x201D; &#x002B; 0.022&#x002A;&#x201C;stolen&#x201D; &#x002B; 0.021&#x002A;&#x201C;bug&#x201D; &#x002B; 0.021&#x002A;&#x201C;worker&#x201D; &#x002B; 0.021&#x002A;&#x201C;cost&#x201D;)]</td>
</tr>
<tr>
<td><italic>Coherence Score</italic></td>
<td colspan="2">Topic N1 &#x003D; 4 and Value of Coherence is 0.6543</td>
</tr>
<tr>
<td/>
<td colspan="2">Topic N2 &#x003D; 8 and Value of Coherence is 0.6942</td>
</tr>
<tr>
<td/>
<td colspan="2">Topic N3 &#x003D; 12 and Value of Coherence is 0.6971</td>
</tr>
<tr>
<td/>
<td colspan="2">Topic N4 &#x003D; 24 and Value of Coherence is 0.621</td>
</tr>
<tr>
<td/>
<td colspan="2">Topic N5 &#x003D; 28 and Value of Coherence is 0.6872</td>
</tr>
<tr>
<td/>
<td colspan="2">Topic N6 &#x003D; 32 and Value of Coherence is 0.6323</td>
</tr>
<tr>
<td colspan="3">(b)</td>
</tr>
<tr>
<td>Tweet ID</td>
<td colspan="2">Identified Medium-Level Event</td>
</tr>
<tr>
<td>T1</td>
<td colspan="2">Organization, London, Freedom</td>
</tr>
<tr>
<td>T2</td>
<td colspan="2">Insider Attack, Phishing</td>
</tr>
<tr>
<td>T3</td>
<td colspan="2">Cryptojacking, Malware</td>
</tr>
<tr>
<td>T4</td>
<td colspan="2">Zero-day, Expolite</td>
</tr>
<tr>
<td>T5</td>
<td colspan="2">Cyber attack, michaelfassbender, colmmeaney, mark halloran, hacking attack</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_5">
<label>2.5</label>
<title>Trend Analysis (Medium Level Event Detection)</title>
<p>Due to their complexity and ongoing improvement, the cyber events produced by LDA are regarded as medium-level events. Google Trends [<xref ref-type="bibr" rid="ref-20">20</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>] has been used to obtain a specific trending event, also known as high-level cyber-events, using the social page count method (SPC) that Google Trends has introduced. We calculate weights for all of the events that were collected from LDA in SPC by calculating the trend of the cyber-event throughout Google based on the cyber-event with the biggest trending weight that was picked as the final cyber-event. SPC counts the total number of pages utilized to gauge a specific site because it is natural for more important sites to receive more connections when it comes to how it operates.</p>
<p><disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mi mathvariant="italic">T</mml:mi><mml:mi mathvariant="italic">i</mml:mi><mml:mtext mathvariant="italic">&#x2009;</mml:mtext><mml:mi mathvariant="italic">s</mml:mi><mml:mi mathvariant="italic">c</mml:mi><mml:mi mathvariant="italic">o</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">e</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo>{</mml:mo><mml:mfrac><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mi>W</mml:mi><mml:mi>i</mml:mi></mml:mrow><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi></mml:mrow></mml:munderover><mml:mi>W</mml:mi><mml:mi>i</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mo>}</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The Internet search phrase is Ti and Wi is the total number of webpages that contain Ti, in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experimental Results</title>
<p>This section discusses the experimental evaluation and findings of the proposed method. Numerous experiments have been performed to test the viability of the proposed technique. In the very first experiment, the coherence and perplexity of the proposed model on various datasets have been computed. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the obtained result on a different dataset.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Performance of the proposed model on different datasets</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-3.tif"/>
</fig>
<p>Another experiment shows the uncalculated, fluctuating relationships between vocabulary words and the dataset document&#x0027;s occurrences of those words. A variety of subjects are seen in the reviews. The collection&#x0027;s previously unidentified themes are found in this effort, and those themes are then annotated onto the documents. Finally, it reveals the latent topical patterns that are presented in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Identification of cyber-events at the medium level using latent dirichlet allocation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-4.tif"/>
</fig>
<p>Through the use of Google Trends, the findings of Google Trends to carry out high-level event detection experiments have demonstrated some of the important event subjects. The results of four distinct sorts of themes have been shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref> when it was trending on Google. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the Google Trends that determine word count and key topic keywords.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>High-level final event</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-5.tif"/>
</fig><fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Using google trends to determine word count and key topic keywords</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-6.tif"/>
</fig>
<p>In addition, we compared the outcomes of the experiments to those obtained through the use of the IDCNN &#x0026; BiLSTM &#x0026; CRF [<xref ref-type="bibr" rid="ref-22">22</xref>] and SentiStrenght &#x0026; VADER &#x0026; ARIMAX [<xref ref-type="bibr" rid="ref-23">23</xref>] methodologies, as presented in <xref ref-type="table" rid="table-4">Table 4</xref>. The IDCNN &#x0026; BiLSTM &#x0026; CRF approach may identify cyber incidents using tweets taken from Twitter [<xref ref-type="bibr" rid="ref-24">24</xref>&#x2013;<xref ref-type="bibr" rid="ref-28">28</xref>]. They use both machine learning methods (Bidirectional Long Short-Term Memory, or BiLSTM) and natural language processing techniques to accomplish the multi-task learning methodology (IDCNN, or Iterated Dilated Convolutional Neural Network).</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Comparison with the state of art techniques</title>
</caption>
<table frame="hsides" >
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th>#</th>
<th>Techniques</th>
<th>Recall</th>
<th>Accuracy</th>
<th>Precision</th>
<th>F1 Score</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td>1</td>
<td>IDCNN &#x0026; BiLSTM &#x0026; CRF</td>
<td>96.98</td>
<td>95.02</td>
<td>94.99</td>
<td>95.97</td>
</tr>
<tr>
<td>2</td>
<td>SentiStrenght &#x0026; VADER &#x0026; ARIMAX</td>
<td>96.31</td>
<td>95.56</td>
<td>95.29</td>
<td>95.80</td>
</tr>
<tr>
<td>3</td>
<td>Proposed Technique</td>
<td>97.62</td>
<td>95.96</td>
<td>96.13</td>
<td>96.86</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The events are identified by applying sentiment analysis to hacker forum reviews, which is done using VADER&#x002B;SentiStrength&#x002B;ARIMAX. They also identify the patterns of behaviour that are associated with cyber occurrences by analyzing over 400,000 posts over two years, beginning in January 2020 and concluding in January 2022. These posts were culled from more than one hundred different hacker forums.</p>
<p>It has been determined that the cyber event detection methodology that we have suggested performs far better than any existing baselines. The most pertinent cyber events from a huge set of observed events cannot be extracted using the approaches that are now available. According to the findings, the utilization of LDA has the potential to increase the performance of cyber event detection. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> presents the findings in their entirety.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Comparison with existing techniques</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_35741-fig-7.tif"/>
</fig>
</sec>
<sec id="s4">
<label>4</label>
<title>Conclusion</title>
<p>The most widely used medium for communication in the present technological era is social media forums that enable people to converse and express their thoughts. As a result, there is an increase in the volume of content published on social media platforms. These open data sources have a plethora of information regarding the threat. The prompt detection of newly-emerging threats to the security of software and systems may be inferred from such information. This research offers a distinct event detection technique that can quickly recognize cyber events from public forums such as Twitter. Data collection, data processing, feature extraction using deep learning, medium-level cyber-event detection based on LDA, and high-level cyber-event detection based on Google Trends are the key phases of the proposed model. The proposed approach has been evaluated on several datasets. According to the results of the experimental evaluation, the suggested technique produced effective cyber event detection. In future work, the proposed work will be updated with new similarity measures and topic modeling techniques.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>This research work is funded by a grant from the Center of Excellence in Information Assurance (CoEIA), KSU.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Gu</surname></string-name></person-group>, &#x201C;<article-title>Sharing economy, technological innovation and carbon emissions: Evidence from Chinese cities</article-title>,&#x201D; <source>Journal of Innovation &#x0026; Knowledge</source>, vol. <volume>7</volume>, no. <issue>3</issue>, pp. <fpage>100228</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F. F.</given-names> <surname>Alruwaili</surname></string-name></person-group>, &#x201C;<article-title>Artificial intelligence based threat detection in industrial internet of things environment</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>73</volume>, no. <issue>3</issue>, pp. <fpage>5809</fpage>&#x2013;<lpage>5824</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Khatoon</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Alshamari</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Asif</surname></string-name>, <string-name><given-names>M. M.</given-names> <surname>Hasan</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Abdou</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Development of social media analytics system for emergency event detection and crisis management</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>63</volume>, no. <issue>3</issue>, pp. <fpage>3079</fpage>&#x2013;<lpage>3100</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Higuchi</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Kobori</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>R. B.</given-names> <surname>Primack</surname></string-name></person-group>, &#x201C;<article-title>Declining phenology observations by the Japan meteorological agency</article-title>,&#x201D; <source>Nature Ecology &#x0026; Evolution</source>, vol. <volume>5</volume>, no. <issue>7</issue>, pp. <fpage>886</fpage>&#x2013;<lpage>887</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Bhardwaj</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Mangat</surname></string-name></person-group>, &#x201C;<article-title>Distributed denial of service attacks in cloud: State-of-the-art of scientific and commercial solutions</article-title>,&#x201D; <source>Computer Science Review</source>, vol. <volume>39</volume>, no. <issue>1</issue>, pp. <fpage>100332</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Choi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Nagappan</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Kopyto</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Wexler</surname></string-name></person-group>, &#x201C;<article-title>Pregnant at the start of the pandemic: A content analysis of COVID-19-related posts on online pregnancy discussion boards</article-title>,&#x201D; <source>BMC Pregnancy and Childbirth</source>, vol. <volume>22</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>11</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Chowdhury</surname></string-name> and <string-name><given-names>V.</given-names> <surname>Gkioulos</surname></string-name></person-group>, &#x201C;<article-title>Cyber security training for critical infrastructure protection: A literature review</article-title>,&#x201D; <source>Computer Science Review</source>, vol. <volume>40</volume>, no. <issue>1</issue>, pp. <fpage>100361</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Bout</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Loscri</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Gallais</surname></string-name></person-group>, &#x201C;<article-title>How machine learning changes the nature of cyberattacks on IoT networks: A survey</article-title>,&#x201D; <source>IEEE Communications Surveys &#x0026; Tutorials</source>, vol. <volume>24</volume>, no. <issue>1</issue>, pp. <fpage>248</fpage>&#x2013;<lpage>279</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Shin</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Shim</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Moon</surname></string-name></person-group>, &#x201C;<article-title>Cyber security event detection with new and re-emerging words</article-title>,&#x201D; in <conf-name>Proc. of the 15th ACM Asia Conf. on Computer and Communications Security</conf-name>, <publisher-loc>Taipei, Taiwan</publisher-loc>, pp. <fpage>665</fpage>&#x2013;<lpage>678</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Satyapanich</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Ferraro</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Finin</surname></string-name></person-group>, &#x201C;<article-title>Casie: Extracting cyber security event information from text</article-title>,&#x201D; in <conf-name>Proc. of the AAAI Conf. on Artificial Intelligence</conf-name>, <publisher-loc>New York, USA</publisher-loc>, pp. <fpage>8749</fpage>&#x2013;<lpage>8757</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R. P.</given-names> <surname>Khandpur</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Ji</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Jan</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Crowdsourcing cyber security: Cyber attack detection using social media</article-title>,&#x201D; in <conf-name>Proc. of the 2017 ACM on Conf. on Information and Knowledge Management</conf-name>, <publisher-loc>Singapur</publisher-loc>, pp. <fpage>1049</fpage>&#x2013;<lpage>1057</lpage>, <year>2017</year>. </mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>U.</given-names> <surname>Javed</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Shaukat</surname></string-name>, <string-name><given-names>I. A.</given-names> <surname>Hameed</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Iqbal</surname></string-name>, <string-name><given-names>T. M.</given-names> <surname>Alam</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>A review of content-based and context-based recommendation systems</article-title>,&#x201D; <source>International Journal of Emerging Technologies in Learning (iJET)</source>, vol. <volume>16</volume>, no. <issue>3</issue>, pp. <fpage>274</fpage>&#x2013;<lpage>306</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>U.</given-names> <surname>Naseem</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Khushi</surname></string-name>, <string-name><given-names>S. K.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Shaukat</surname></string-name> and <string-name><given-names>M. A.</given-names> <surname>Moni</surname></string-name></person-group>, &#x201C;<article-title>A comparative analysis of active learning for biomedical text mining</article-title>,&#x201D; <source>Applied System Innovation</source>, vol. <volume>4</volume>, no. <issue>1</issue>, pp. <fpage>23</fpage>&#x2013;<lpage>33</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Shaukat</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Iqbal</surname></string-name>, <string-name><given-names>T. M.</given-names> <surname>Alam</surname></string-name>, <string-name><given-names>G. K.</given-names> <surname>Aujla</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Devnath</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>The impact of artificial intelligence and robotics on the future employment opportunities</article-title>,&#x201D; <source>Trends in Computer Science and Information Technology</source>, vol. <volume>5</volume>, no. <issue>1</issue>, pp. <fpage>50</fpage>&#x2013;<lpage>54</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Kang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Cai</surname></string-name>, <string-name><given-names>C. W.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Huang</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Natural language processing (NLP) in management research: A literature review</article-title>,&#x201D; <source>Journal of Management Analytics</source>, vol. <volume>7</volume>, no. <issue>2</issue>, pp. <fpage>139</fpage>&#x2013;<lpage>172</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Barrie</surname></string-name> and <string-name><given-names>J. C. T.</given-names> <surname>Ho</surname></string-name></person-group>, &#x201C;<article-title>academictwitteR: An R package to access the twitter academic research product track v2 API endpoint</article-title>,&#x201D; <source>Journal of Open Source Software</source>, vol. <volume>6</volume>, no. <issue>62</issue>, pp. <fpage>3272</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Nawaz</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Ali</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Hafeez</surname></string-name> and <string-name><given-names>M. R.</given-names> <surname>Rashid</surname></string-name></person-group>, &#x201C;<article-title>Mining public opinion: A sentiment based forecasting for democratic elections of Pakistan</article-title>,&#x201D; <source>Spatial Information Research</source>, vol. <volume>30</volume>, no. <issue>1</issue>, pp. <fpage>169</fpage>&#x2013;<lpage>181</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Ozyurt</surname></string-name> and <string-name><given-names>M. A.</given-names> <surname>Akcayol</surname></string-name></person-group>, &#x201C;<article-title>A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: SS-LDA</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>168</volume>, no. <issue>1</issue>, pp. <fpage>114231</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Park</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Lee</surname></string-name></person-group>, &#x201C;<article-title>Improving the Gibbs sampler</article-title>,&#x201D; <source>Wiley Interdisciplinary Reviews: Computational Statistics</source>, vol. <volume>14</volume>, no. <issue>2</issue>, pp. <fpage>e1546</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Aaronson</surname></string-name>, <string-name><given-names>S. A.</given-names> <surname>Brave</surname></string-name>, <string-name><given-names>R. A.</given-names> <surname>Butters</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Fogarty</surname></string-name>, <string-name><given-names>D. W.</given-names> <surname>Sacks</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Forecasting unemployment insurance claims in realtime with Google Trends</article-title>,&#x201D; <source>International Journal of Forecasting</source>, vol. <volume>38</volume>, no. <issue>2</issue>, pp. <fpage>567</fpage>&#x2013;<lpage>581</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Pullan</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Dey</surname></string-name></person-group>, &#x201C;<article-title>Vaccine hesitancy and anti-vaccination in the time of COVID-19: A Google trends analysis</article-title>,&#x201D; <source>Vaccine</source>, vol. <volume>39</volume>, no. <issue>14</issue>, pp. <fpage>1877</fpage>&#x2013;<lpage>1881</lpage>, <year>2021</year>; <pub-id pub-id-type="pmid">33715904</pub-id></mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Askarizad</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Jinliao</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Jafari</surname></string-name></person-group>, &#x201C;<article-title>The influence of COVID-19 on the societal mobility of urban spaces</article-title>,&#x201D; <source>Cities</source>, vol. <volume>119</volume>, no. <issue>1</issue>, pp. <fpage>103388</fpage>, <year>2021</year>; <pub-id pub-id-type="pmid">36540772</pub-id></mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Ling</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Cyber threat intelligence entity extraction based on deep learning and field knowledge engineering</article-title>,&#x201D; in <conf-name>2022 IEEE 25th Int. Conf. on Computer Supported Cooperative Work in Design (CSCWD)</conf-name>, <publisher-loc>Singapur</publisher-loc>, pp. <fpage>406</fpage>&#x2013;<lpage>413</lpage>, <year>2022</year>. </mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Borchers</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Rosenberg</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Gibbons</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Burchfield</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Fischer</surname></string-name></person-group>, &#x201C;<article-title>To scale or not to scale: Comparing popular sentiment analysis dictionaries on educational twitter data</article-title>,&#x201D; in <conf-name>Int. Conf. on Educational Data Mining</conf-name>, <publisher-loc>Durham, UK</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2021</year>. </mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S. R.</given-names> <surname>Baker</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Bloom</surname></string-name>, <string-name><given-names>S. J.</given-names> <surname>Davis</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Renault</surname></string-name></person-group>, &#x201C;<article-title>Twitter-derived measures of economic uncertainty</article-title>,&#x201D; [Online]. Available: <ext-link ext-link-type="uri" xlink:href="https://www.policyuncertainty.com">Policy Uncertainty.com</ext-link></mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Birjali</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Kasri</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Beni-Hssane</surname></string-name></person-group>, &#x201C;<article-title>A comprehensive survey on sentiment analysis: Approaches, challenges and trends</article-title>,&#x201D; <source>Knowledge-Based Systems</source>, vol. <volume>226</volume>, no. <issue>1</issue>, pp. <fpage>107134</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. R. R.</given-names> <surname>Rana</surname></string-name>, <string-name><given-names>S. U.</given-names> <surname>Rehman</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Nawaz</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Ali</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Ahmed</surname></string-name></person-group>, &#x201C;<article-title>A conceptual model for decision support systems using aspect based sentiment analysis</article-title>,&#x201D; <source>Proceedings of the Romanian Academy Series A-Mathematics Physics Technical Sciences Information Science</source>, vol. <volume>22</volume>, no. <issue>4</issue>, pp. <fpage>381</fpage>&#x2013;<lpage>390</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. R.</given-names> <surname>Saura</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Ribeiro-Soriano</surname></string-name> and <string-name><given-names>P. Z.</given-names> <surname>Salda&#x00F1;a</surname></string-name></person-group>, &#x201C;<article-title>Exploring the challenges of remote work on twitter users&#x0027; sentiments: From digital technology development to a post-pandemic era</article-title>,&#x201D; <source>Journal of Business Research</source>, vol. <volume>142</volume>, no. <issue>1</issue>, pp. <fpage>242</fpage>&#x2013;<lpage>254</lpage>, <year>2022</year>.</mixed-citation></ref>
</ref-list>
</back></article>