<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CSSE</journal-id>
<journal-id journal-id-type="nlm-ta">CSSE</journal-id>
<journal-id journal-id-type="publisher-id">CSSE</journal-id>
<journal-title-group>
<journal-title>Computer Systems Science &#x0026; Engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">0267-6192</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">19483</article-id>
<article-id pub-id-type="doi">10.32604/csse.2022.019483</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Integrated Approach to Detect Cyberbullying Text: Mobile Device Forensics Data</article-title><alt-title alt-title-type="left-running-head">Integrated Approach to Detect Cyberbullying Text: Mobile Device Forensics Data</alt-title><alt-title alt-title-type="right-running-head">Integrated Approach to Detect Cyberbullying Text: Mobile Device Forensics Data</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western">
<surname>Maria Jones</surname>
<given-names>G.</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<email>joneofarc26@gmail.com</email>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western">
<surname>Godfrey Winster</surname>
<given-names>S.</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western">
<surname>Valarmathie</surname>
<given-names>P.</given-names>
</name>
<xref ref-type="aff" rid="aff-3">3</xref>
</contrib>
<aff id="aff-1">
<label>1</label><institution>Department of Computer Science and Engineering, Saveetha Engineering College</institution>, <addr-line>Chennai, 602105</addr-line>, <country>India</country></aff>
<aff id="aff-2">
<label>2</label><institution>Department of Computer Science and Engineering, School of Computing, SRM Institute of Science and Technology</institution>, <addr-line>Chengalpattu, 603203</addr-line>, <country>India</country></aff>
<aff id="aff-3">
<label>3</label><institution>Department of Information Technology, Saveetha Engineering College</institution>, <addr-line>Chennai, 602105</addr-line>, <country>India</country></aff>
</contrib-group><author-notes><corresp id="cor1">&#x002A;Corresponding Author: G. Maria Jones. Email: <email>joneofarc26@gmail.com</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-09-03">
<day>3</day>
<month>9</month>
<year>2021</year>
</pub-date>
<volume>40</volume>
<issue>3</issue>
<fpage>963</fpage>
<lpage>978</lpage>
<history>
<date date-type="received">
<day>14</day>
<month>4</month>
<year>2021</year>
</date>
<date date-type="accepted">
<day>23</day>
<month>5</month>
<year>2021</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Maria Jones, Godfrey Winster and Valarmathie</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Maria Jones, Godfrey Winster and Valarmathie</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CSSE_19483.pdf"></self-uri>
<abstract>
<p>Mobile devices and social networks provide communication opportunities among the young generation, which increases vulnerability and cybercrimes activities. A recent survey reports that cyberbullying and cyberstalking constitute a developing issue among youngsters. This paper focuses on cyberbullying detection in mobile phone text by retrieving with the help of an oxygen forensics toolkit. We describe the data collection using forensics technique and a corpus of suspicious activities like cyberbullying annotation from mobile phones and carry out a sequence of binary classification experiments to determine cyberbullying detection. We use forensics techniques, Machine Learning (ML), and Deep Learning (DL) algorithms to exploit suspicious patterns to help the forensics investigation where every evidence contributes to the case. Experiments on a real-time dataset reveal better results for the detection of cyberbullying content. The Random Forest in ML approach produces 87&#x0025; of accuracy without SMOTE technique, whereas the value of F1Score produces a good result with SMOTE technique. The LSTM has 92&#x0025; of validation accuracy in the DL algorithm compared with Dense and BiLSTM algorithms.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Mobile forensics</kwd>
<kwd>cyberbullying</kwd>
<kwd>machine learning</kwd>
<kwd>investigation model</kwd>
<kwd>suspicious pattern</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Bullying and Stalking are not novel phenomena to the world. The study states [<xref ref-type="bibr" rid="ref-1">1</xref>] that traditional bullying is limited to place, time, and predictable, whereas cyberbullying tends to happen at any time and place. With the ubiquity of the Internet and social media like blogs, social network sites like Twitter, Facebook and Instant Messaging like Whatsapp, Instagram, Telegram and many more applications make communication with anyone irrespective of place and time. There are two sides to social media: positive sides where people can share useful information and establish social relationships. The second phase of social media is the negative approach, where an increased risk for children with threatening messages, cyberbullying and cyberstalking, etc. The use of advanced techniques to commit cybercrimes is challenging for investigation and evidence collection. So, the use of forensics techniques helps to reveal the offender. The primary usage of digital forensics is to reconstruct and retrieve the digital data from electronic devices, which are utilized for legal proceedings by identifying, analyzing, and capturing the data. The existence of digital information in electronic and digital data represents each and everyone&#x0027;s activity while working, living, and playing in a virtual environment, which creates electronic trace in our daily lives. The evolution of cybercrime in today&#x0027;s world is at an unprecedented phase. In recent years, social networking usage by individuals and companies is drastically rising. This is because there is a rapid growth in smart devices, internet facilities, storage space, etc. The Internet has speed and the ability to transfer the data mainly used for communication purposes and opens the door for criminals to indulge in crimes [<xref ref-type="bibr" rid="ref-2">2</xref>]. In traditional times, criminals usually leave traces of a crime by fingerprints or physical evidence, requiring a short period to investigate. Since technology is increasing rapidly, cybercrimes are also rising exponentially. Most cybercrimes target information about individuals, governments, or corporations.</p>
<p>The data on the computer or network may be modified, merged, or deleted. Digital forensics investigators experience difficulty in gathering the evidence since the criminals use the false identity. Some of the crimes carried out by criminals are hacking, spoofing, phishing, etc. The ultimate goal is to find the veracity, where the evidence has been hidden and has not been discovered, increasing future attacks. Digital Evidence from the sources like social networking services (Whatsapp, WeChat, Line, Instagram, etc.) includes voice calls, SMS, MMS, Audio, Video to show the data breach. The investigator should answer the six essential questions during the investigation: Who, How, What, Why, When, and Where [<xref ref-type="bibr" rid="ref-3">3</xref>]. The information extracted from the compromised device will help to identify the criminals and for the legal proceedings. Mobile Forensics deals with the seizure, acquisition, analysis, and reporting with tools like Encase, Autopsy, Access data, FTK, Oxygen forensics, OSForensics, etc., that can be used to reveal the evidence. Cell phones and Smartphones come under the mobile phone category, which are portable devices. They are vital for day-to-day activities, so they are vulnerable to criminal activity or part of the crime scene. Many smart devices contain user-sensitive information, including their phone call logs, SMS, MMS, electronic mails, photographs, videos, memos, passwords, Web History, and credit/debit card numbers. These device holders use smartphones for communication, exchange photos, connect to social networks, write blogs, record audio, video, etc. Due to technology and transmission, the data rate is at its peak [<xref ref-type="bibr" rid="ref-4">4</xref>]. It allows most individuals to transfer digital data (e.g., digital video, digital images, etc.). Hence, the mobile computing and communication technologies development gives opportunities for criminals and investigators alike.</p>
<p>Cyberbullying and cyberstalking is the dark phase of human nature on a technical side, especially in social media. So, detection becomes a key area for cyberbullying and cyberstalking research. In this work, we propose a framework for cyberbullying detection from mobile text messages using forensics techniques to retrieve the content even if it is deleted. This work aims to help the forensics investigation department to analyze the behavior patterns of victims and offenders. We also present the SMOTE (Synthetic Minority Oversampling Technique) to solve the imbalance problem. Based on the extracted features from messages, we developed ML, DL models for cyberbullying detection. We applied SMOTE technique for ML algorithms and word embedding technique for Dense, LSTM, and BiLSTM models. The features are applied to Logistic Regression, Decision Tree, Random Forest, and XGBoost algorithms.</p>
<p>The paper is organized as follows: Section 2 describes forensics related works for cyberbullying and cyberstalking with sentimental analysis. Section 3 describes the architecture and implementation of integrated method of forensics with ML and DL models. Section 4 consists of Algorithms used for implementation. Section 5 provides the analysis of experimental outcomes compared to other algorithms with and without SMOTE technique. Section 6 provides the result and discussion. Finally, Section 7 provides the conclusion of the study.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Works</title>
<p>In this section, we provide a literature survey and an overview of cyberbullying, sentimental analysis for text, and text forensics analysis. We briefly summarize the forensics based model for cyberbullying and cyberstalking in 2.1 and the sentimental analysis based model in 2.2.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Forensics in Cyberbullying and Cyberstalking</title>
<p>Many works related to cyberbullying and cyberstalking detection rely on both machine learning and deep learning models. The study based on Behavioural Evidence Analysis (BEA) on cyberstalking cases is conducted by Noora et al. [<xref ref-type="bibr" rid="ref-5">5</xref>]. The authors used forensics techniques on 20 cyberstalking cases. They concluded that BEA helped to focus on an investigation that enables better understanding and victim, offender behavior based on digital evidence. The authors [<xref ref-type="bibr" rid="ref-6">6</xref>] have used crowdsourcing techniques to annotate post and hashtags from seven social media platforms to generate cyberbullying data sets. They used Support Vector Machine, XGBoost, and CNN models to perform the experiment. Ingo et al. [<xref ref-type="bibr" rid="ref-7">7</xref>] presented a framework called Anti Cyberstalking Text-based system (ACTS) for detecting text-based cyberstalking. The framework is designed as a prevention mechanism to analyze, detect, identify and block communication. The framework added a forensics technique for collecting evidence. Michael et al. [<xref ref-type="bibr" rid="ref-8">8</xref>] categorized the text and evaluated it using rule-based decision formula and machine learning approach. The authors also used forensics text for deep learning analysis, which help to identify the criminals.</p>
<p>The authors presented a hybrid ontology technology to collect the forensics data from social networks and intended to implement it with advanced operations as future work [<xref ref-type="bibr" rid="ref-9">9</xref>]. The work describes the methodology for retrieving information from Microsoft Skype to identify the end-user devices of a VoIP call by analysing the CODECs exchanged by the clients during the SIP (Session Initiation Protocol) handshaking phase [<xref ref-type="bibr" rid="ref-10">10</xref>]. The author used 7 machine learning algorithms to trace file system, identify how these files can be manipulated and compared with performance measure indicating that neural networks and random forest showed the highest accuracy among these 7 algorithms [<xref ref-type="bibr" rid="ref-11">11</xref>]. This article presents Structural Feature Extraction Methods (SFEM) to detect malicious content in documents by means of three experimental analysis of machine learning algorithms and proposed in future to work on the detection of malicious content in Excel and PowerPoint [<xref ref-type="bibr" rid="ref-12">12</xref>]. The author presented majorclust algorithm to detect suspicious activities in logs which assists forensics examiner to inspect the log files and achieved 70.59&#x0025;, 82.21&#x0025; and 83.14&#x0025; of sensitivity, specificity, and accuracy respectively [<xref ref-type="bibr" rid="ref-13">13</xref>].</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Sentimental Analysis</title>
<p>The authors [<xref ref-type="bibr" rid="ref-14">14</xref>] focused on sentimental analysis to identify the intensity of textual information. The study aimed to classy the social content based on high extreme, moderate, low extreme, and neutral with classification algorithms. Based on the result, Liner SVM performed 82&#x0025; of accuracy and 88&#x0025; accuracy in lexion validation. Bandeh et al. [<xref ref-type="bibr" rid="ref-15">15</xref>] proposed a framework for cyberbullying to generate the feature from Twitter content and used four machine learning algorithms. Finally, the authors have compared the proposed and baseline algorithm of machine learning summarized as a proposed result produced good outcome. Vijayaragavan et al. [<xref ref-type="bibr" rid="ref-16">16</xref>] proposed a new classification model for online product reviews with 1811 instances with two classes. To extract the features, sentimental analysis is used, and finally, a fuzzy-based approach is used to determine product purchases. Sergio et al. [<xref ref-type="bibr" rid="ref-17">17</xref>] used clustering techniques for the publicly available Enron email dataset to analyze the text, which is helpful for digital investigation. Kashfia et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] described how to detect people&#x0027;s emotions and sentiments from their Twitter posts. Their experimental analysis detected six types of emotions. Junseok et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] proposed a new method of weighing and feature selection for Twitter data using sentimental analysis. In this method, the researchers used Naiive Bayes algorithm to estimate the weight, and Multinomial NB was also used to remove the words. The final result produced a good accuracy compared to the existing method.</p>
<p>Gang et al. [<xref ref-type="bibr" rid="ref-20">20</xref>] presented Attention-based Bi-directional Long Term Short Memory with a Convolution layer (AC-BiLSTM) to extract the phrase from word embedding. The final result indicated that AC-BiLSTM performed with good accuracy as compared with other algorithms. Tao et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] revealed the spatial patterns os tweet messages and used the Latent Dirichlet Allocation model to classify the geo-tweet. Duyu et al. [<xref ref-type="bibr" rid="ref-22">22</xref>] introduced the encoding method and sentiment level data simultaneously into the word embedding model and applied a hybrid model to capture context and sentiment data which performed best in all three ways. Lei et al. [<xref ref-type="bibr" rid="ref-23">23</xref>] proposed Sentidiff algorithm to identify the relationship between information in Twitter messages. The data set is demonstrated by a hybrid approach classifier called sentiment classifier and sentiment reversal prediction. The algorithm achieved between 5.09&#x0025; and 8.38&#x0025; of PR-AUC. Guixian et al. [<xref ref-type="bibr" rid="ref-24">24</xref>] proposed a method to improve the word representation vector which is an integrated approach of sentiment analysis and TF-IDF. The word vector is given as input to BiSLTM and the study is compared with RNN, CNN, LSTM, and Na&#x00EF;ve Bayes. The experimental result showed that the proposed method effectively high accuracy on commends.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>The Integrated Model of Forensics with ML and DL Models</title>
<p>The block diagram of the proposed framework is represented in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The proposed methodology involves three main processes: evidence collection, analyzing the text messages, and performance measure. In the first stage, we collected the evidence from the mobile device in a forensically sound manner. The oxygen forensics software was used to collect the digital evidence from a mobile device (Samsung A50). There are three methods for data acquisition. They are Logical acquisition, Physical acquisition, and manual acquisition. There are about 944 text messages collected from a mobile device through logical acquisition. The text messages are classified as cyberbullying and non-cyber bullying. The dataset contains 168 bully content and 776 content of the non-bully text. We classified annotated cyberbully content into two labels like cyberbullying and non-cyber bullying, to perform the cyberbullying dataset experimental study.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Proposed method for detecting cyberbullying text</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-1.png"/>
</fig>
<p>The second stage is analyzing the text message which was retrieved from a mobile device. During analyzing phase, the text is analyzed and categorized as binary classification. The retrieved source of evidence from Samsung A50 is represented in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. Finally, the dataset comprises of training and testing in CSV (Comma Separated Value) format. The dataset is classified into 70:30 ratio where each content is text messages. <xref ref-type="table" rid="table-1">Tab. 1</xref> represents the sample conversation. The final segment is to analyze the text messages with respect to cyberbullying and the ML and DL models are used for training and testing the corpora. To understand the behavior of cyberbullies in text messages, we ran the dataset to understand how ML and DL models identify the bully conversations. This makes us to understand that it would be useful for digital forensics investigators to identify the pattern of offenders.</p>

<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Example for chat based cyberbullying</title>
</caption>
<table>
<colgroup><col/><col/><col/><col/>
</colgroup>
<thead>
<tr><th>Line</th><th>Message</th><th>Bully/Not Bully</th><th>Type</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Me and my friends going to party</td>
<td>Not</td>
<td>Neutral</td>
</tr>
<tr>
<td>2</td>
<td>F&#x002A;ck don&#x0027;t you have any boyfriend.</td>
<td>Bully</td>
<td>insult</td>
</tr>
<tr>
<td>3</td>
<td>Just kill yourself, I hate</td>
<td>Bully</td>
<td>Curse</td>
</tr>
<tr>
<td>4</td>
<td>I will message you later</td>
<td>Not</td>
<td>Neutral</td>
</tr>
<tr>
<td>5</td>
<td>U wanna be killed</td>
<td>Bully</td>
<td>Threat</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s3_1">
<label>3.1</label>
<title>Dataset Collection</title>
<p>Mobile phone users have sensitive information features and capabilities like Personal information storage or management, messaging, audio, video, web browsing, and many more features. These features vary based on the device, developers, and the modification is updating in each version and application installed by users. The following <xref ref-type="fig" rid="fig-2">Fig. 2</xref> represents the potential evidence that resides in the mobile phones:</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Potential Evidence from mobile devices</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-2.png"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Steps Involved in Data Preparation</title>
<sec id="s3_2_1">
<label>3.2.1</label>
<title>Pre-Processing</title>
<p>Data pre-processing is an essential step to prepare the raw data to analyze the text data. Pre-processing aims to facilitate the training and testing process in machine learning algorithms where the model learns from the data for better results. Some of the steps involved in data pre-processing is discussed below:</p>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>Tokenization</title>
<p>Tokenization is the method of breaking down the text into a small entity. During the process, unwanted elements like punctuation are eliminated. Each token is helpful to identify and reveal the pattern of the text document.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Evidence retrieved using Oxygen Forensics</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-3.png"/>
</fig>
<p>The token can divide the document into paragraphs, paragraphs into sentences, and phrases or sentences into words represented as individual words and sentences. The example is tabulated in <xref ref-type="table" rid="table-2">Tab. 2</xref>.</p>

<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Example for Tokenization</title>
</caption>
<table>
<colgroup><col/><col/><col/>
</colgroup>
<thead>
<tr><th>S.No</th><th>Sentence Without Tokenization</th><th>Sentence With Tokenization</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Online Chatting can be tricky</td>
<td>&#x2018;Online&#x2019;, &#x2018;Chatting&#x2019;, &#x2018;can&#x2019;, &#x2018;be&#x2019;, &#x2018;tricky&#x2019;</td>
</tr>
<tr>
<td>2</td>
<td>Harassing or threatening someone</td>
<td>&#x2018;Harassing&#x2019;, &#x2018;or&#x2019;, &#x2018;threatening&#x2019;, &#x2018;someone&#x2019;</td>
</tr>
<tr>
<td>3</td>
<td>Pretending to be someone</td>
<td>&#x2018;Pretending&#x2019;, &#x2018;to&#x2019;, &#x2018;be&#x2019;, &#x2018;someone&#x2019;</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2_3">
<label>3.2.3</label>
<title>Stemming and Lemmatization</title>
<p>Stemming and Lemmatization is the process of generating the root word from inflected words. The difference between stem and lemma is stem words are not an actual word whereas, a lemma word is an actual language word. Stemming follows an algorithm with steps to perform on the words, which makes it faster. The example for stemming and lemmatization is given in <xref ref-type="table" rid="table-3">Tab. 3</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Example for Stemming and Lemmatization</title>
</caption>
<table>
<colgroup><col/><col/><col/>
</colgroup>
<thead>
<tr><th>Words</th><th>Stem</th><th>Lemma</th>
</tr>
</thead>
<tbody>
<tr>
<td>Studies</td>
<td>Studi</td>
<td>Study</td>
</tr>
<tr>
<td>Dancing</td>
<td>Danci</td>
<td>Dance</td>
</tr>
<tr>
<td>Beautiful</td>
<td>Beauti</td>
<td>Beauty</td>
</tr>
<tr>
<td>Corpora</td>
<td>Corpora</td>
<td>Corpus</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2_4">
<label>3.2.4</label>
<title>Removing Stopwords, HTML Tags</title>
<p>Stop words are the most commonly used words in every document like &#x201C;is&#x201D;, &#x201C;was&#x201D;, &#x201C;the&#x201D;, &#x201C;a&#x201D; and so on. The stop words need to be removed to perform the task as they do not provide any meaning to the sentences as mentioned in <xref ref-type="table" rid="table-4">Tab. 4</xref>. Before training the machine learning and deep learning models, the stopwords are often removed from the dataset, increasing the time efficiency and the performance measures.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Example for Stopwords and HTML tags</title>
</caption>
<table>
<colgroup><col/><col/>
</colgroup>
<thead>
<tr><th>With Stop Words</th><th>Without Stop Words</th>
</tr>
</thead>
<tbody>
<tr>
<td>Online Chatting can be tricky</td>
<td>Online, Chatting, tricky</td>
</tr>
<tr>
<td>Harassing or threatening someone</td>
<td>Harassing, threatening, someone</td>
</tr>
<tr>
<td>Pretending to be someone</td>
<td>Pretending, someone</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Feature Extraction</title>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Term Frequency &#x2013; Inverse Document Frequency (TF-IDF)</title>
<p>TF-IDF is used to identify the frequency of words present in the document. The term frequency used to measure the frequency of the words present, and it can be formulated as below,<disp-formula id="ueqn-1">
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="ueqn-1.png"/><tex-math id="tex-ueqn-1"><![CDATA[$${\rm Tf}({{\rm t, \;d}} ){\rm = \ }{\rm No}{\rm .\ of\ words\ &#x2018;t&#x2019;\;in\ the\ document\ &#x2018;d&#x2019;/total\ no}{\rm .\ of\ words\ &#x2018;t&#x2019;\ in\ &#x2018;d&#x2019;} .$$]]></tex-math>--><mml:math id="mml-ueqn-1" display="block"><mml:mrow><mml:mtext>Tf</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>t</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x205F;d</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mtext>=&#x2009;No</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x2009;of&#x2009;words&#x2009;</mml:mtext><mml:mo>&#x2018;</mml:mo><mml:mtext>t</mml:mtext><mml:mo>&#x0027;</mml:mo><mml:mtext>&#x205F;in&#x2009;the&#x2009;document&#x2009;</mml:mtext><mml:mo>&#x2018;</mml:mo><mml:mtext>d</mml:mtext><mml:mo>&#x0027;</mml:mo><mml:mtext>/total&#x2009;no</mml:mtext><mml:mo>.</mml:mo><mml:mtext>&#x2009;of&#x2009;words&#x2009;</mml:mtext><mml:mo>&#x2018;</mml:mo><mml:mtext>t</mml:mtext><mml:mo>&#x0027;</mml:mo><mml:mtext>&#x2009;in&#x2009;</mml:mtext><mml:mo>&#x2018;</mml:mo><mml:mtext>d</mml:mtext><mml:mo>&#x0027;</mml:mo><mml:mo>.</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula>where,<list list-type="simple"><list-item>
<p>T represents words present in the document</p></list-item><list-item>
<p>D represents the document.</p></list-item></list></p>
<p>The next term is document frequency which is similar to term frequency. The difference among them is that the term frequency analyses the words &#x2018;t&#x2019; in document &#x2018;d&#x2019; whereas document frequency counts the number of occurrence words &#x2018;t&#x2019; in the document. In order words, it can define as count the number of documents in which a word is present. Next, IDF is termed as Inverse Document Frequency which is used to measure the term&#x0027;s information. The following formula is used to calculate the small corpus of data,<disp-formula id="ueqn-2">
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="ueqn-2.png"/><tex-math id="tex-ueqn-2"><![CDATA[$${\rm Idf}({\rm t} )= {\rm S}({{\rm Document\ set}} ){\rm /DF}$$]]></tex-math>--><mml:math id="mml-ueqn-2" display="block"><mml:mrow><mml:mtext>Idf</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>t</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mtext>S</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>Document&#x2009;set</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mtext>/DF</mml:mtext></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>In the case of the huge corpus, the log can be used to calculate as represented below,<disp-formula id="ueqn-3">
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="ueqn-3.png"/><tex-math id="tex-ueqn-3"><![CDATA[$${\rm Idf}(t )= {\rm log}({{\rm S/DF\ + \ 1}} )$$]]></tex-math>--><mml:math id="mml-ueqn-3" display="block"><mml:mrow><mml:mtext>Idf</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mi>t</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mtext>log</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>S/DF&#x2009;+&#x2009;1</mml:mtext><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Finally, by taking multiplicative of TF and IDF, we get TF-IDF as below,<disp-formula id="ueqn-4">
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="ueqn-4.png"/><tex-math id="tex-ueqn-4"><![CDATA[$${\rm TF-IDF\ = \ tf}({{\rm t,\;d}} )\;{\rm \ast }\;{\rm log}({{\rm S/}({{\rm DF\ + \ 1}} )} )$$]]></tex-math>--><mml:math id="mml-ueqn-4" display="block"><mml:mrow><mml:mtext>TF &#x2013; IDF&#x2009;=&#x2009;tf</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>t</mml:mtext><mml:mo>,</mml:mo><mml:mtext>&#x205F;d</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:mo>&#x2217;</mml:mo><mml:mtext>&#x205F;log</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>S/</mml:mtext><mml:mo stretchy='false'>(</mml:mo><mml:mtext>DF&#x2009;+&#x2009;1</mml:mtext><mml:mo stretchy='false'>)</mml:mo><mml:mo stretchy='false'>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>BOW (Bag of Words)</title>
<p>Bag of words (BOW) is the method of changing the text into fixed-length vectors by the occurrence of terms present in the text document. It is one way for feature extraction from the text for implementing machine learning and deep learning algorithms.</p>
<p>For example, consider two documents;<list list-type="simple"><list-item>
<p>D1: After the exam, let&#x0027;s have a party at my house</p></list-item><list-item>
<p>D2: Today&#x0027;s exam is challenging. Let&#x0027;s have some break and go to a party</p></list-item></list></p>
<p>After eliminating the stop words, the matrix can be formed using unique words from all the documents as given below in <xref ref-type="table" rid="table-5">Tab. 5</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>BOW representation</title>
</caption>
<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/>
</colgroup>
<thead>
<tr><th></th><th>After</th><th>Exam</th><th>party</th><th>tough</th><th>Break</th><th>House</th><th>Today&#x0027;s</th>
</tr>
</thead>
<tbody>
<tr>
<td>D1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
<td>0</td>
<td>0</td>
</tr>
<tr>
<td>D2</td>
<td>0</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>1</td>
<td>0</td>
<td>1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Considered Algorithms</title>
<sec id="s4_1">
<label>4.1</label>
<title>Machine Learning Algorithms</title>
<p>A comparative analysis was done on the cyberbullying dataset using four classifier algorithms: Logistic Regression, Decision Tree, XGBoost and Random Forest. Since the dataset is imposed with imbalance, SMOTE&#x0027;s resampling technique (Synthetic Minority Oversampling Technique) as mentioned in <xref ref-type="table" rid="table-11">Algorithm 1</xref> which is used to have a balanced dataset. In this technique, the synthetic samples are created to a less labeled class, and also it helps avoid the overfitting problem. The SMOTE technique is applied to an original dataset with respect to four classification algorithms. The evaluation metrics are compared with and without SMOTE technique to analyze the better output and following algorithms are used for binary classification.<list list-type="simple"><list-item>
<p>Logistic Regression</p></list-item><list-item>
<p>Decision Tree</p></list-item><list-item>
<p>Random Forest</p></list-item><list-item>
<p>XGBoost</p></list-item></list></p>
<table-wrap id="table-11">
<label>Algorithm 1:</label>
<caption>
<title>SMOTE Algorithm</title>
</caption>
<table>
<colgroup>
<col/>
</colgroup>
<tbody>
<tr>
<td>&#x0023;SMOTE Algorithm</td>
</tr>
<tr>
<td>Input: Minority data M, Majority data N, Nearest Neighbor n</td>
</tr>
<tr>
<td>Output: M</td>
</tr>
<tr>
<td>&#x2003;&#x2003;1:for S&#x2009;&#x003D;&#x2009;1 to M do</td>
</tr>
<tr>
<td>&#x2003;&#x2003;2:Compute n Nearest Neighbor for s</td>
</tr>
<tr>
<td>&#x2003;&#x2003;3:While M!&#x003D;N do</td>
</tr>
<tr>
<td>&#x2003;&#x2003;4: Choose one nearest neighbor for k</td>
</tr>
<tr>
<td>&#x2003;&#x2003;5: Computer vector and multiply the random number</td>
</tr>
<tr>
<td>&#x2003;&#x2003;6: Synthetic Data&#x003D;k &#x002B; vector</td>
</tr>
<tr>
<td>&#x2003;&#x2003;7: End while</td>
</tr>
<tr>
<td>&#x2003;&#x2003;8: End for</td>
</tr>
<tr>
<td>&#x2003;&#x2003;9: Return M</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Deep Learning Algorithms</title>
<p>The three various deep learning algorithms such as Dense, LSTM and BiLSTM are used to analyze cyberbullying dataset. In dense architecture, the first layer is an embedding layer set as 16 and the input length is fifty. In order to avoid the overfitting problem, a pooling layer is used. The dense network is defined with the activation function called &#x201C;relu&#x201D; with a dropout layer to avoid the overfitting problem, and a final output layer is fixed with a sigmoid function.</p>
<p>The variant of RNN (Recurrent Neural Network) is defined as Long Term Short Memory (LSTM). The main aim of designing the structure is to avoid long-term dependency and vanishing gradient problems due to which the network stops learning. In RNN, the repeating mode of tan<sub>n</sub> takes place in simple structure, whereas in LSTM, the repeating way takes place in the various structures, and the representation is presented in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. In this work, the input series is given as x&#x003D;[x1, x2, x3,&#x2026;xn] where x is a word vector and a hidden vector is represented by hd<sub>t</sub> at a period of t. To learn the cyberbullying classification, we used the LSTM model. Initially, the model learns the hidden vector which is given in the input series and generates the target output based on historical data. The key component of LSTM is cd<sub>t</sub> termed as candidate state or cell state and the model can take decision whether the cell can be modified or added to the memory by using sigmoid gate which is in three forms: input gate (iP<sub>t</sub>), forget gate (frt<sub>t</sub>) and output gate (oP<sub>t</sub>).</p>
<p>In forget gate, the sigmoid is set to be executed initially. The LSTM cell chooses how significant the past state in the cell C<sub>t-1</sub> is and, at that point, chooses which new data are saved in the cd<sub>t</sub> cell state. This segment has two parts: the iP<sub>t</sub> (input gate) will decide what information to be updated and next, the tan<sub>n</sub> layer makes a vector of c<sub>t-1</sub> (candidate layer) included to the state. Finally, the decision can be taken to remove the data. Now, the next stage is to update the old state c<sub>t-1</sub> to a new cell state. Then, the old state can be multiplied by frt<sub>t</sub> and add it by it&#x002A;c&#x0060;t.</p>
<p>This generates the new candidate cell. The final stage is to compute the LSTM model result that can be carried out using the sigmoid and tan<sub>n</sub> layers. The final result is based on the information that resides in cell state and it also a sigmoid layer used to filter which decides which part of a cell will affect the final output result. Finally, the cell state value applied to the tan<sub>n</sub> filter and multiplied by the sigmoid layer&#x0027;s output and the formula is described in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>:</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-1.png"/><tex-math id="tex-eqn-1"><![CDATA[\eqalign{ frt_t = \sigma (w_{fk}[h_{t-1},\;x_t] + b_f) \cr &#9; ip_t = \sigma (w_{ik}[h_{t-1},\;_t] + b_i) \cr &#9; c&grave\semicolon \;t = \tan _n(w_{ck}[h_{t-1},\;x_t] + b_c) \cr &#9; cd_t = f_t\ast c_{t-1} + i_t\ast c&grave\semicolon \;t \cr &#9; op_t = \sigma (w_{ok}[h_{t-1},\;x_t] + b_o) \cr &#9; hd_t = o_t\ast \tan _nh(c_t)} ]]></tex-math>--><mml:math id="mml-eqn-1" display="block"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>]</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>f</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>i</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mtext>&#x205F;</mml:mtext><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>]</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>c</mml:mi><mml:mtext>&#x205F;</mml:mtext><mml:mmultiscripts><mml:mi>t</mml:mi><mml:mprescripts/><mml:none/><mml:mo>&#x2035;</mml:mo></mml:mmultiscripts><mml:mo>=</mml:mo><mml:msub><mml:mi>tan</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>]</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>c</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>f</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:mi>c</mml:mi><mml:mtext>&#x205F;</mml:mtext><mml:mmultiscripts><mml:mi>t</mml:mi><mml:mprescripts/><mml:none/><mml:mo>&#x2035;</mml:mo></mml:mmultiscripts></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>o</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>o</mml:mi><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo stretchy='false'>[</mml:mo><mml:msub><mml:mi>h</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:msub><mml:mi>x</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>]</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mi>o</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>h</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>o</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>tan</mml:mi><mml:mi>n</mml:mi></mml:msub><mml:mi>h</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:msub><mml:mi>c</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>where <italic>&#x03C3;</italic> is an activation function which ranges from 0 to 1, i.e., that information can be removed completely, partially removed or completely stored, c&#x0060;t is abbreviated as candidate hidden state which is computed based on present input values and past hidden state, it is defined as input gate which defines the amount of newly computed state for present input values, h<sub>t-1</sub> is termed as recurrent of past and present hidden layer, W is weight, c is the internal memory cell and hd<sub>t</sub> is the output state.</p>
<p>Unlike LSTM, BiLSTM works in back-propagation, which means the propagation takes places in both forward and backward directions as represented in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. It learns the pattern from before and after with two independent LSTM. It sums up the data from two directions of a sentence and merges the sentimental data. More precisely, At each period of step &#x2018;t&#x2019;, the forward LSTM computes the hidden state &#x2018;f&#x2019; h<sub>t</sub> based on the past state f h<sub>t-1</sub> with vector x<sub>t</sub>. Meanwhile, the backward propagation computes the xh<sub>t</sub> based on the xh<sub>t-1</sub> with the same vector x<sub>t</sub>. Finally, both the result combined together as the final hidden state. Due to this, the computation time is increased as compared to LSTM. The final result of the BiLSTM model is as follows in <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>,<disp-formula id="eqn-2">
<label>(2)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-2.png"/><tex-math id="tex-eqn-2"><![CDATA[$$h_{t} = [fh_{t}, xh_{t}]$$]]></tex-math>--><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mo stretchy='false'>[</mml:mo><mml:mi>f</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mi>x</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy='false'>]</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Analysis</title>
<sec id="s5.1">
<title>Performance Measure</title>
<p>The standard measure to evaluate the system performances are Precision, Recall, Accuracy, and F1&#x002A;score. A confusion matrix is used to measure the correctness and accuracy of the model. Primarily it is used for classification problems. There are four terms associated with the confusion matrix, as mentioned below and in <xref ref-type="table" rid="table-10">Tab. 10</xref>.<list list-type="simple"><list-item>
<p>TPc (True Positive of cyberbullies): The actual class and predicted class samples are true (1).</p></list-item><list-item>
<p>TNc (True Negative of Cyberbullies): The samples of cyberbullying actual and predicted class are false (0).</p></list-item><list-item>
<p>FPc (False Positive of cyberbullies): The sample of the actual class is false (0) and the predicted class is true (1).</p></list-item><list-item>
<p>FNc (False Negative of cyberbullies): The sample of the actual class is true (1) and the predicted class is false (0).</p></list-item></list></p>
<p>Accuracy is termed as the ratio of correct prediction to the all prediction made in the classification as represented in <xref ref-type="disp-formula" rid="eqn-3">(3)</xref>. Precision is the process of measuring the true samples to the positive samples as represented in <xref ref-type="disp-formula" rid="eqn-4">(4)</xref>. The recall is the measure of correctly classified samples to the total number of class and it can be calculated by <xref ref-type="disp-formula" rid="eqn-5">(5)</xref>. In F1Score, the harmonic mean is used to calculate as given in <xref ref-type="disp-formula" rid="eqn-6">(6)</xref>.<disp-formula id="eqn-3">
<label>(3)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-3.png"/><tex-math id="tex-eqn-3"><![CDATA[$$Accuracy(A) = \left\langle {\displaystyle{{TP_c + TN_c} \over {TP_c + FP_c + TN_c + FN_c}}} \right\rangle,\;k\in S_{cb}$$]]></tex-math>--><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mi>A</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>u</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>A</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x2329;</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula><disp-formula id="eqn-4">
<label>(4)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-4.png"/><tex-math id="tex-eqn-4"><![CDATA[$$\Pr ecision(P) = \left\langle {\displaystyle{{TP_c} \over {TP_c + FP_c}}} \right\rangle,\;k\in S_{cb}$$]]></tex-math>--><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mi>Pr</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>P</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x2329;</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula><disp-formula id="eqn-5">
<label>(5)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-5.png"/><tex-math id="tex-eqn-5"><![CDATA[$${\mathop{\rm Re}\nolimits} call(R) = \left\langle {\displaystyle{{TP_n} \over {TP_c + FN_c}}} \right\rangle,\;k\in S_{cb}$$]]></tex-math>--><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mtext>Re</mml:mtext><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo stretchy='false'>(</mml:mo><mml:mi>R</mml:mi><mml:mo stretchy='false'>)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x2329;</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>c</mml:mi></mml:msub><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:msub><mml:mi>N</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula><disp-formula id="eqn-6">
<label>(6)</label>
<!--<alternatives>
<graphic mimetype="image" mime-subtype="png" xlink:href="eqn-6.png"/><tex-math id="tex-eqn-6"><![CDATA[$$F1\ast Score = \left\langle {\displaystyle{{2\ast PR} \over {P + R}}} \right\rangle,\;k\in S_{cb}$$]]></tex-math>--><mml:math id="mml-eqn-6" display="block"><mml:mrow><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2217;</mml:mo><mml:mi>S</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>&#x2329;</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2217;</mml:mo><mml:mi>P</mml:mi><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>&#x0232A;</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x205F;</mml:mtext><mml:mi>k</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>b</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula>From <xref ref-type="disp-formula" rid="eqn-3">Eqs. (3)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-6">(6)</xref>, we describe how performance measures are computed for all the classes k in the dataset that belongs to the S<sub>cb</sub> set of suspicious in cyberbullying.</p>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Result and Discussion</title>
<p>This article comprises three main stages, and the performance measure of ML and DL algorithms is evaluated, which are explained in this session. The first stage is data collection from smartphones using mobile forensics toolkit and the acquisition results vary from the forensics toolkit. We generated the data using oxygen forensics software, performing logical acquisition. The acquired data were analyzed using NLP, and SMOTE techniques were used to perform imbalanced data. The integrated process of Forensics, Machine Learning and Deep Learning process helps to analyze user patterns and produces better results. The second stage is associated with the text processing for better performance. Initially, the four classification algorithms of ML and DL, such as Logistic Regression, Decision Tree, Random Forest, XGBoost, Dense, LSTM and BiLSTM are used. In third stage, the better performance was calculated in terms of Accuracy, Precision, Recall, F1&#x002A;score and Auc-Roc. Since the data is imbalanced, the SMOTE technique is also performed and compared with actual dataset.</p>
<p>Based on the performance measure depicted in <xref ref-type="table" rid="table-6">Tab. 6</xref> and <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, the Random Forest performs the highest accuracy with the rate of 87&#x0025; followed by XGBoost, Decision Tree and Logistic Regression with 85&#x0025;, 85&#x0025; and 84&#x0025;, respectively, for actual data. In contrast, in SMOTE technique, also XGBoost reached the highest accuracy of 82&#x0025;, followed by Random Forest, Logistic Regression and Decision Tree with 80&#x0025;, 79&#x0025; and 71&#x0025;, respectively. Comparatively, in terms of accuracy, without smote technique, the algorithm performed well, whereas, in terms of Recall, Smote technique performed well with the rate of 61&#x0025; for XGBoost, 63&#x0025; for Random Forest, 63&#x0025; for Decision Tree and 71&#x0025; for Logistic Regression as represented in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Comparison of Actual and SMOTE classification</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-4.png"/>
</fig>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Recall measures between SMOTE and without SMOTE</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-5.png"/>
</fig>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Structure of LSTM</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-6.png"/>
</fig>

<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Performance measure of proposed work</title>
</caption>
<table>
<colgroup><col/><col/><col/><col/><col/><col/><col/><col/>
</colgroup>
<thead>
<tr><th/><th>Algorithm</th><th>resample</th><th>Accuracy</th><th>Precision</th><th>Recall</th><th>F1-score</th><th>AUC-ROC</th>
</tr>
</thead>
<tbody>
<tr>
<td>0</td>
<td>Logistic Regression</td>
<td>actual</td>
<td>0.84</td>
<td>0.750000</td>
<td>0.125000</td>
<td>0.214286</td>
<td>0.746717</td>
</tr>
<tr>
<td>1</td>
<td>Logistic Regression</td>
<td>smote</td>
<td>0.79</td>
<td>0.448819</td>
<td>0.678571</td>
<td>0.540284</td>
<td>0.753835</td>
</tr>
<tr>
<td>2</td>
<td>Decision Tree</td>
<td>actual</td>
<td>0.85</td>
<td>0.577181</td>
<td>0.511905</td>
<td>0.542587</td>
<td>0.773188</td>
</tr>
<tr>
<td>3</td>
<td>Decision Tree</td>
<td>smote</td>
<td>0.71</td>
<td>0.336449</td>
<td>0.642857</td>
<td>0.441718</td>
<td>0.725370</td>
</tr>
<tr>
<td>4</td>
<td>Random Forest</td>
<td>actual</td>
<td>0.87</td>
<td>0.738636</td>
<td>0.386905</td>
<td>0.507812</td>
<td>0.816665</td>
</tr>
<tr>
<td>5</td>
<td>Random Forest</td>
<td>smote</td>
<td>0.80</td>
<td>0.447368</td>
<td>0.607143</td>
<td>0.515152</td>
<td>0.782792</td>
</tr>
<tr>
<td>6</td>
<td>XGBOOST</td>
<td>actual</td>
<td>0.85</td>
<td>0.630631</td>
<td>0.416667</td>
<td>0.501792</td>
<td>0.805090</td>
</tr>
<tr>
<td>7</td>
<td>XGBOOST</td>
<td>smote</td>
<td>0.82</td>
<td>0.491979</td>
<td>0.547619</td>
<td>0.518310</td>
<td>0.776617</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The final segment of the work deals with deep learning analysis for identifying and classification of cyberbullying data. The data is classified as 70&#x0025; for training and 30&#x0025; for testing. Based on the three algorithms evaluation, LSTM performed well where validation accuracy reached 92&#x0025; and 68&#x0025;, 89&#x0025; for Bilstm, presented in <xref ref-type="table" rid="table-8">Tabs. 8</xref> and <xref ref-type="table" rid="table-9">9</xref>; <xref ref-type="fig" rid="fig-9">Figs. 9</xref> and <xref ref-type="fig" rid="fig-10">10</xref>. The Dense algorithm reached about 84&#x0025; of validation accuracy, as tabulated in <xref ref-type="table" rid="table-7">Tab. 7</xref> and represented in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Structure of BiLSTM</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-7.png"/>
</fig>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Performance of Dense classifier</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-8.png"/>
</fig>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Performance of LSTM</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-9.png"/>
</fig>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Performance of BiLSTM</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19483-fig-10.png"/>
</fig>

<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Performance measure for Dense Network</title>
</caption>
<table>
<colgroup><col/><col/><col/><col/>
</colgroup>
<thead>
<tr><th>Training_Loss</th><th>Training_Accuracy</th><th>Validation_Loss</th><th>Validation_Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.686263</td>
<td>0.702092</td>
<td>0.676125</td>
<td>0.842809</td>
</tr>
<tr>
<td>0.657547</td>
<td>0.859414</td>
<td>0.631415</td>
<td>0.849498</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>Performance Measure of LSTM</title>
</caption>
<table>
<colgroup><col/><col/><col/><col/>
</colgroup>
<thead>
<tr><th>Training_Loss</th><th>Training_Accuracy</th><th>Validation_Loss</th><th>Validation_Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.139294</td>
<td>0.952486</td>
<td>0.258179</td>
<td>0.925485</td>
</tr>
<tr>
<td>0.136586</td>
<td>0.953992</td>
<td>0.293780</td>
<td>0.921271</td>
</tr>
</tbody>
</table>
</table-wrap>

<table-wrap id="table-9">
<label>Table 9</label>
<caption>
<title>Performance Measure of biLSTM</title>
</caption>
<table>
<colgroup><col/><col/><col/><col/>
</colgroup>
<thead>
<tr><th>Training_Loss</th><th>Training_Accuracy</th><th>Validation_Loss</th><th>Validation_Accuracy</th>
</tr>
</thead>
<tbody>
<tr>
<td>0.672027</td>
<td>0.605992</td>
<td>0.605110</td>
<td>0.680803</td>
</tr>
<tr>
<td>0.412528</td>
<td>0.859732</td>
<td>0.312381</td>
<td>0.892642</td>
</tr>
</tbody>
</table>
</table-wrap>

<table-wrap id="table-10">
<label>Table 10</label>
<caption>
<title>Confusion matrix for performance metrics</title>
</caption>
<table>
<colgroup><col/><col/><col/>
</colgroup>
<thead>
<tr><th rowspan="2">Predicted Value</th><th colspan="2">Actual Value</th>
</tr>
<tr><th>1</th><th>0</th>
</tr>
</thead>
<tbody>
<tr>
<td>Positive (1)</td>
<td>True Positive (TPc)</td>
<td>False Positive (FPc)</td>
</tr>
<tr>
<td>Negative (0)</td>
<td>False Negative (FNc)</td>
<td>True Negative (TNc)</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The experimental result showed that implementing forensics data to ML and DL models can provide better results when investigating cyberbullying cases. In this scenario, usually, the physical crime scene and evidence is absent. So, in cybercrime activities, the computer, laptops, mobile devices and tablets are considered essential evidence sources. Like in the physical crime scene, the offender leaves the virtual evidence that can be inferred and analyzed using digital forensics methodology. In many cases, the offender uses sufficient skills to hide the traces. In that case, the behavioral patterns can help to distinguish the pattern from other contents.</p>
</sec>
<sec id="s7">
<label>7</label>
<title>Conclusion and Future Work</title>
<p>This study examined the behavioral pattern of cyberbullies in the context of a digital forensics investigation. Text analysis and machine learning approach with forensics data are new techniques towards cybercrime investigation or incident response teams. This integrated approach helps to identify the behavioral patterns of victim and offender and solve many criminal cases. Many times, the Internet and social media usage lead to the involvement in cyberbullying and cyberstalking. In this paper, we developed a framework for detecting and identifying the pattern of cyberbullies. The forensics technique has been used to retrieve text messages from mobile phones and it is pre-processed using NLP techniques. The four ML model is developed and compared with SMOTE technique. The accuracy of ML reached 87&#x0025; using Random Forest, whereas using SMOTE, the recall value in XGBoost reached the highest. The three deep learning algorithms are also performed in which LSTM reached the highest validation accuracy compared to Dense and BiLSTM. In this regard, future work aims to develop a new mechanism for automatic detection of harassment, threats, hate, and stalking content of offenders. With the help of ML and DL models, the investigation team can get an accurate pattern of the victim and offender.</p>
</sec>
</body>
<back><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> The authors received no specific funding for this study.</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1">
<label>[1]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> <surname>Cepin</surname></string-name>, <string-name>
<given-names>U.</given-names> <surname>Slana</surname></string-name>, <string-name>
<given-names>B.</given-names> <surname>Roggenbuck</surname></string-name>, <string-name>
<given-names>V.</given-names> <surname>Edert</surname></string-name>, <string-name>
<given-names>M.</given-names> <surname>Kaps</surname></string-name>, <string-name>
<given-names>G.</given-names> <surname>Trevisan</surname></string-name> <etal>et al.</etal></person-group> <article-title>How is Cyberbullying different from Traditional bullying?</article-title>. <year>2016</year>. [Online]. Available: <uri xlink:href="http://socialna-akademija.si/joiningforces/3-2-how-is-cyberullying-different-from-traditional-bullying/">http://socialna-akademija.si/joiningforces/3-2-how-is-cyberullying-different-from-traditional-bullying/</uri>.</mixed-citation>
</ref>
<ref id="ref-2">
<label>[2]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>G. M.</given-names> 
<surname>Jones</surname></string-name> and <string-name>
<given-names>S. G.</given-names> 
<surname>Winster</surname></string-name>
</person-group>, &#x201C;
<article-title>Forensics analysis on smart phones using mobile forensics tools</article-title>,&#x201D; 
<source>International Journal of Computational Intelligence Research</source>, vol. 
<volume>13</volume>, no. 
<issue>8</issue>, pp. 
<fpage>1859</fpage>&#x2013;
<lpage>1869</lpage>, 
<year>2017</year>.</mixed-citation>
</ref>
<ref id="ref-3">
<label>[3]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>D.</given-names> 
<surname>Quick</surname></string-name> and <string-name>
<given-names>K. R.</given-names> 
<surname>Choo</surname></string-name>
</person-group>, &#x201C;
<article-title>Pervasive social networking forensics: Intelligence and evidence from mobile device extracts</article-title>,&#x201D; 
<source>Journal of Network and Computer Appication</source>, vol. 
<volume>86</volume>, pp. 
<fpage>24</fpage>&#x2013;
<lpage>33</lpage>, 
<year>2017</year>.</mixed-citation>
</ref>
<ref id="ref-4">
<label>[4]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>G. M.</given-names> 
<surname>Jones</surname></string-name> and <string-name>
<given-names>S. G.</given-names> 
<surname>Winster</surname></string-name>
</person-group>, &#x201C;
<article-title>Analysis of crime report by data analytics using python,&#x201D;</article-title> in 
<source>challenges and applications of data analytics in social perspectives, IGI Global</source>, <comment>Hershey, PA 17033, USA</comment>, pp. 
<fpage>54</fpage>&#x2013;
<lpage>79</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-5">
<label>[5]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>N.</given-names> 
<surname>Al</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Bryce</surname></string-name>, <string-name>
<given-names>V. N. L.</given-names> 
<surname>Franqueira</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Marrington</surname></string-name>
</person-group>, &#x201C;
<article-title>Forensic investigation of cyberstalking cases using behavioural evidence analysis</article-title>,&#x201D; 
<source>Digital Investigation</source>, vol. 
<volume>16</volume>, pp. 
<fpage>S96</fpage>&#x2013;
<lpage>S103</lpage>, 
<year>2016</year>.</mixed-citation>
</ref>
<ref id="ref-6">
<label>[6]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>D. V.</given-names> 
<surname>Bruwaene</surname></string-name> and <string-name>
<given-names>Q.</given-names> 
<surname>Huang</surname></string-name>
</person-group>, &#x201C;
<article-title>A multi-platform dataset for detecting cyberbullying in social media</article-title>,&#x201D; 
<source>Language Resource Evaluation</source>, vol. <volume>54</volume>, no. <issue>4</issue>, pp. 
<fpage>851</fpage>&#x2013;
<lpage>874</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-7">
<label>[7]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>I.</given-names> 
<surname>Frommholz</surname></string-name>, <string-name>
<given-names>H. M.</given-names> 
<surname>Martin</surname></string-name>, <string-name>
<given-names>P.</given-names> 
<surname>Zinnar</surname></string-name>, <string-name>
<given-names>G.</given-names> 
<surname>Mitul</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Emma</surname></string-name>
</person-group>, &#x201C;
<article-title>On textual analysis and machine learning for cyberstalking detection</article-title>,&#x201D; 
<source>Datenbank Spektrum</source>, vol. 
<volume>16</volume>, pp. 
<fpage>127</fpage>&#x2013;
<lpage>135</lpage>, 
<year>2016</year>.</mixed-citation>
</ref>
<ref id="ref-8">
<label>[8]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> 
<surname>Spranger</surname></string-name> and <string-name>
<given-names>D.</given-names> 
<surname>Labudde</surname></string-name>
</person-group>, &#x201C;
<article-title>Semantic Tools for Forensics: Approaches in Forensic Text Analysis</article-title>,&#x201D; in <conf-name>Proc. IMMM</conf-name>, pp. 
<fpage>97</fpage>&#x2013;
<lpage>100</lpage>, 
<year>2013</year>.</mixed-citation>
</ref>
<ref id="ref-9">
<label>[9]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>H.</given-names> 
<surname>Arshad</surname></string-name>, <string-name>
<given-names>A.</given-names> 
<surname>Jantan</surname></string-name>, <string-name>
<given-names>G.</given-names> 
<surname>Keng</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Sahar</surname></string-name>
</person-group>, &#x201C;
<article-title>A multilayered semantic framework for integrated forensic acquisition on social media</article-title>,&#x201D; 
<source>Digital Investigation</source>, vol. 
<volume>29</volume>, pp. 
<fpage>147</fpage>&#x2013;
<lpage>158</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-10">
<label>[10]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> 
<surname>Nicoletti</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Bernaschi</surname></string-name>
</person-group>, &#x201C;
<article-title>Forensic analysis of microsoft skype for business</article-title>,&#x201D; 
<source>Digital Investigation</source>, vol. 
<volume>29</volume>, pp. 
<fpage>159</fpage>&#x2013;
<lpage>179</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-11">
<label>[11]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>R. M. A.</given-names> 
<surname>Mohammad</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Alqahtani</surname></string-name>
</person-group>, &#x201C;
<article-title>Journal of information security and applications a comparison of machine learning techniques for file system forensics analysis</article-title>,&#x201D; 
<source>Journal of Information Security and Application</source>, vol. 
<volume>46</volume>, pp. 
<fpage>53</fpage>&#x2013;
<lpage>61</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-12">
<label>[12]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>Cohen</surname></string-name>, <string-name>
<given-names>N.</given-names> 
<surname>Nissim</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Rokach</surname></string-name> and <string-name>
<given-names>Y.</given-names> 
<surname>Elovici</surname></string-name>
</person-group>, &#x201C;
<article-title>SFEM: Structural feature extraction methodology for the detection of malicious office documents using machine learning methods</article-title>,&#x201D; 
<source>Expert System Application</source>, vol. 
<volume>63</volume>, pp. 
<fpage>324</fpage>&#x2013;
<lpage>343</lpage>, 
<year>2016</year>.</mixed-citation>
</ref>
<ref id="ref-13">
<label>[13]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>H.</given-names> 
<surname>Studiawan</surname></string-name>, <string-name>
<given-names>C.</given-names> 
<surname>Payne</surname></string-name> and <string-name>
<given-names>F.</given-names> 
<surname>Sohel</surname></string-name>
</person-group>, &#x201C;
<article-title>Graph clustering and anomaly detection of access control log for forensic purposes</article-title>,&#x201D; 
<source>Digital Investigation</source>, vol. 
<volume>21</volume>, pp. 
<fpage>76</fpage>&#x2013;
<lpage>87</lpage>, 
<year>2017</year>.</mixed-citation>
</ref>
<ref id="ref-14">
<label>[14]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>M.</given-names> 
<surname>Asif</surname></string-name>, <string-name>
<given-names>A.</given-names> 
<surname>Ishtiaq</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Ahmad</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Aljuaid</surname></string-name> and <string-name>
<given-names>J.</given-names> 
<surname>Shah</surname></string-name>
</person-group>, &#x201C;
<article-title>Telematics and informatics sentiment analysis of extremism in social media from textual information</article-title>,&#x201D; 
<source>Telematics Informatics</source>, vol. 
<volume>48</volume>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>20</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-15">
<label>[15]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>B.</given-names> 
<surname>Ali</surname></string-name>, <string-name>
<given-names>T.</given-names> 
<surname>Id</surname></string-name> and <string-name>
<given-names>D. O.</given-names> 
<surname>Sullivan</surname></string-name>
</person-group>, &#x201C;
<article-title>Cyberbullying severity detection: A machine learning approach</article-title>,&#x201D; 
<source>PLOS One</source><italic>,</italic> vol. <volume>15</volume>, no. <issue>10</issue>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>19</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-16">
<label>[16]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>P.</given-names> 
<surname>Vijayaragavan</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Ponnusamy</surname></string-name> and <string-name>
<given-names>M.</given-names> 
<surname>Aramudhan</surname></string-name>
</person-group>, &#x201C;
<article-title>An optimal support vector machine based classification model for sentimental analysis of online product reviews</article-title>,&#x201D; 
<source>Future Generation Computer System</source>, vol. 
<volume>111</volume><italic>,</italic> pp. 
<fpage>234</fpage>&#x2013;
<lpage>240</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-17">
<label>[17]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Decherchi</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Tacconi</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Redi</surname></string-name>, <string-name>
<given-names>F.</given-names> 
<surname>Sangiacomo</surname></string-name>, <string-name>
<given-names>A.</given-names> 
<surname>Leoncini</surname></string-name> <etal>et al.</etal>
</person-group>, &#x201C;<chapter-title>Text clustering for digital forensics analysis</chapter-title>,&#x201D; In: 
<person-group person-group-type="editor"><string-name>
<surname>Herrero</surname> 
<given-names>&#x00C1;.</given-names></string-name>, <string-name>
<surname>Gastaldo</surname> 
<given-names>P.</given-names></string-name>, <string-name>
<surname>Zunino</surname> 
<given-names>R.</given-names></string-name>, <string-name>
<surname>Corchado</surname> 
<given-names>E.</given-names></string-name>
</person-group> (Eds). 
<source>Computational Intelligence in Security for Information Systems. Advances in Intelligent and Soft Computing</source><italic>,</italic> vol. 
<volume>63</volume><italic>.</italic> <publisher-loc>Heidelberg</publisher-loc>: <publisher-name>Springer</publisher-name>, 
<year>2009</year>.</mixed-citation>
</ref>
<ref id="ref-18">
<label>[18]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>K.</given-names> 
<surname>Sailunaz</surname></string-name> and <string-name>
<given-names>R.</given-names> 
<surname>Alhajj</surname></string-name>
</person-group>, &#x201C;
<article-title>Emotion and sentiment analysis from twitter text</article-title>,&#x201D; 
<source>Journal of Computational Science</source>, vol. 
<volume>36</volume>, pp. 
<fpage>101003</fpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-19">
<label>[19]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>J.</given-names> 
<surname>Song</surname></string-name>, <string-name>
<given-names>K. T.</given-names> 
<surname>Kim</surname></string-name>, <string-name>
<given-names>B.</given-names> 
<surname>Lee</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Kim</surname></string-name> and <string-name>
<given-names>H. Y.</given-names> 
<surname>Youn</surname></string-name>
</person-group>, &#x201C;
<article-title>A novel classification approach based on na&#x00EF;ve Bayes for twitter sentiment analysis</article-title>,&#x201D; 
<source>TIIS</source>, vol. 
<volume>11</volume>, no. 
<issue>6</issue>, pp. 
<fpage>2996</fpage>&#x2013;
<lpage>3011</lpage>, 
<year>2017</year>.</mixed-citation>
</ref>
<ref id="ref-20">
<label>[20]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>G.</given-names> 
<surname>Liu</surname></string-name> and <string-name>
<given-names>J.</given-names> 
<surname>Guo</surname></string-name>
</person-group>, &#x201C;
<article-title>Bidirectional LSTM with attention mechanism and convolutional layer for text classification</article-title>,&#x201D; 
<source>Neurocomputing</source>, vol. 
<volume>337</volume>, pp. 
<fpage>325</fpage>&#x2013;
<lpage>338</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-21">
<label>[21]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>B.</given-names> 
<surname>She</surname></string-name> and <string-name>
<given-names>L.</given-names> 
<surname>Duan</surname></string-name>
</person-group>, &#x201C;
<article-title>A systematic spatial and temporal sentiment analysis on geo-tweets</article-title>,&#x201D; <source>IEEE Access</source><italic>,</italic> vol. 
<volume>8</volume><italic>,</italic> pp. <fpage>8658</fpage>&#x2013;<lpage>8667</lpage>, 
<year>2020</year>.</mixed-citation>
</ref>
<ref id="ref-22">
<label>[22]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>D.</given-names> 
<surname>Tang</surname></string-name>, <string-name>
<given-names>F.</given-names> 
<surname>Wei</surname></string-name>, <string-name>
<given-names>B.</given-names> 
<surname>Qin</surname></string-name>, <string-name>
<given-names>N.</given-names> 
<surname>Yang</surname></string-name>, <string-name>
<given-names>T.</given-names> 
<surname>Liu</surname></string-name> <etal>et al.</etal>
</person-group>, &#x201C;
<article-title>Sentiment embeddings with applications to sentiment analysis</article-title>,&#x201D;
<source>IEEE Transactions on Knowledge and Data Engineering</source>, vol. 
<volume>28</volume>, no. 
<issue>2</issue>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>14</lpage>, 
<year>2016</year>.</mixed-citation>
</ref>
<ref id="ref-23">
<label>[23]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>L.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Niu</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Member</surname></string-name>, <string-name>
<given-names>S.</given-names> 
<surname>Yu</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Member</surname></string-name>
</person-group>, &#x201C;
<article-title>Sentidiff: Combining textual information and sentiment diffusion patterns for twitter sentiment snalysis</article-title>,&#x201D; 
<source>IEEE Transaction Knowledge Data Engineering</source>, vol. 
<volume>14</volume><italic>,</italic> no. 
<issue>8</issue>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>14</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
<ref id="ref-24">
<label>[24]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>G.</given-names> 
<surname>Xu</surname></string-name>
</person-group>, &#x201C;
<article-title>Sentiment analysis of comment texts based on BiLSTM</article-title>,&#x201D; 
<source>IEEE Access</source>, vol. 
<volume>7</volume>, pp. 
<fpage>51522</fpage>&#x2013;
<lpage>51532</lpage>, 
<year>2019</year>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>