<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CSSE</journal-id>
<journal-id journal-id-type="nlm-ta">CSSE</journal-id>
<journal-id journal-id-type="publisher-id">CSSE</journal-id>
<journal-title-group>
<journal-title>Computer Systems Science &#x0026; Engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">0267-6192</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">19300</article-id>
<article-id pub-id-type="doi">10.32604/csse.2022.019300</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection</article-title><alt-title alt-title-type="left-running-head">Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection</alt-title><alt-title alt-title-type="right-running-head">Semantic Based Greedy Levy Gradient Boosting Algorithm for Phishing Detection</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Sakunthala Jenni</surname><given-names>R.</given-names></name>
<email>sakunthalajenni2021@gmail.com</email>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Shankar</surname><given-names>S.</given-names></name>
</contrib>
<aff id="aff-1"><institution>Department of Computer Science and Engineering, Hindusthan College of Engineering and Technology</institution>, <addr-line>Coimbatore, 641032</addr-line>, <country>India</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Corresponding Author: R. Sakunthala Jenni. Email: <email>sakunthalajenni2021@gmail.com</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-10-4"><day>4</day>
<month>10</month>
<year>2021</year></pub-date>
<volume>41</volume>
<issue>2</issue>
<fpage>525</fpage>
<lpage>538</lpage>
<history>
<date date-type="received"><day>09</day><month>4</month><year>2021</year></date>
<date date-type="accepted"><day>14</day><month>6</month><year>2021</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2022 Sakunthala and Shankar</copyright-statement>
<copyright-year>2022</copyright-year>
<copyright-holder>Sakunthala and Shankar</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CSSE_19300.pdf"></self-uri>
<abstract>
<p>The detection of phishing and legitimate websites is considered a great challenge for web service providers because the users of such websites are indistinguishable. Phishing websites also create traffic in the entire network. Another phishing issue is the broadening malware of the entire network, thus highlighting the demand for their detection while massive datasets (i.e., big data) are processed. Despite the application of boosting mechanisms in phishing detection, these methods are prone to significant errors in their output, specifically due to the combination of all website features in the training state. The upcoming big data system requires MapReduce, a popular parallel programming, to process massive datasets. To address these issues, a probabilistic latent semantic and greedy levy gradient boosting (PLS-GLGB) algorithm for website phishing detection using MapReduce is proposed. A feature selection-based model is provided using a probabilistic intersective latent semantic preprocessing model to minimize errors in website phishing detection. Here, the missing data in each URL are identified and discarded for further processing to ensure data quality. Subsequently, with the preprocessed features (URLs), feature vectors are updated by the greedy levy divergence gradient (model) that selects the optimal features in the URL and accurately detects the websites. Thus, greedy levy efficiently differentiates between phishing websites and legitimate websites. Experiments are conducted using one of the largest public corpora of a website phish tank dataset. Results show that the PLS-GLGB algorithm for website phishing detection outperforms state-of-the-art phishing detection methods. Significant amounts of phishing detection time and errors are also saved during the detection of website phishing.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Web service providers</kwd>
<kwd>probabilistic intersective</kwd>
<kwd>latent semantic</kwd>
<kwd>greedy levy</kwd>
<kwd>divergence</kwd>
<kwd>gradient</kwd>
<kwd>phishing detection</kwd>
<kwd>big data</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Web security is a materializing inclination in novel big data settings. Conventionally, web security is directed by utilizing different methods, such as privacy preservation techniques, hidden Markov models, and reasoning-based strategies. Amid various issues, web phishing is the current pertinent interest. Phishing refers to the process of mimicking an official website of banks and social networking sites. Phishing detection refers to the process of detecting a phishing activity. Several algorithms have been designed by many researchers.</p>
<p>An optimal feature selection and neural network (OFS-NN) was proposed in [<xref ref-type="bibr" rid="ref-1">1</xref>] to detect phishing websites. A feature validity value (FVV) was initially introduced to measure the significance of sensitive features on detecting phishing websites. On the basis of the FVV value, an algorithm was developed to select optimal features. Thus, the issue related to overfitting in the neural network was solved.</p>
<p>Finally, the features selected were utilized to train the neural network with which phishing websites were detected by means of an optimal classifier, which resulted in the accurate detection of phishing websites. With continuous changes in the sensitive features involved in phishing attacks, optimal feature selection remained an important issue. Optimal feature selection was performed by observing the URL to identify the missing data and, accordingly, eliminate the URL. In this manner, optimal feature selection was ensured even in the presence of continuous changes in the sensitive features.</p>
<p>A lightweight application called CatchPhish was proposed in [<xref ref-type="bibr" rid="ref-2">2</xref>] to predict the authority of the URL without searching the website. The proposed method used the host name of the user, complete URL, and inverse document frequency of the corresponding term frequency. Finally, phish-hinted words were utilized from the suspicious URL for classification via random forest classifier, thereby contributing to the accuracy. However, the significance of the features was not concentrated. To address this issue, only significant features were obtained through preprocessing by applying the probabilistic intersective latent semantic preprocessing (PILSP) model.</p>
<p>A machine learning framework to assist in the comprehensive analysis and detection of web phishing was proposed in [<xref ref-type="bibr" rid="ref-3">3</xref>]. Decision tree algorithms were utilized. Therefore, significant classification, whether the website was phishing or normal, was provided in an exhaustive manner based on certain features, improving the precision and recall factor. Despite improvements observed in terms of precision and recall, the phishing attack detection was less focused. The model of greedy levy divergence gradient (GLDG) website phishing detection was developed to attain optimal detection and improve accuracy rate.</p>
<p>The contributions of the proposed method are as follows:<list list-type="order"><list-item>
<p>We proposed a latent semantic preprocessing model (i.e., PILSP) to identify missing data in the URL and discard the URL with certain missing data to detect phishing sites.</p></list-item><list-item>
<p>We designed a GLDG model by combining gradient boosting and contrast divergence to increase the detection accuracy of phishing sites.</p></list-item><list-item>
<p>We deployed our method, namely, probabilistic latent semantic and greedy levy gradient boosting (PLS-GLGB), to provide protection and safeguard users from accessing phishing sites even for big data using MapReduce.</p></list-item><list-item>
<p>Experimental evaluations on a complex phish tank data set demonstrate the efficiency of the proposed method in terms of phishing detection time, phishing detection overhead, and efficiency of the detection system using a confusion matrix.</p></list-item></list></p>
<p>The remainder of the paper is organized as follows. A detailed literature of the prevailing antiphishing methods is presented in Section 2. The proposed method is discussed in Section 3. The experimentation details and the results are presented in Section 4. The deployment and effectiveness of the proposed method are discussed in Section 5. Finally, the conclusion is provided in Section 6.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Structure</title>
<p>One of the materializing trends in the big data environment is web security. Among several problems in this environment, web phishing has started receiving attention in recent years. In several existing works, web security has been addressed by different methods, such as privacy preservation techniques and Markov models.</p>
<p>Decision tree algorithms were applied to detect between phishing and nonphishing attacks. Despite several antiphishing mechanisms launched by software companies, such as blacklists, heuristic mechanisms, and machine learning-based approaches, not all phishing attacks can be detected at an early stage. In [<xref ref-type="bibr" rid="ref-4">4</xref>], a real-time antiphishing system utilizing seven classification models was presented to improve accuracy detection.</p>
<p>Compared with conventional visual similarity-based techniques using whitelists, in [<xref ref-type="bibr" rid="ref-5">5</xref>], a lightweight approach using visual similarity at a first-level filter was utilized to detect phishing sites. A survey of different web phishing detection schemes was provided in [<xref ref-type="bibr" rid="ref-6">6</xref>]. Automated page layout-based phishing detection methods were discussed in [<xref ref-type="bibr" rid="ref-7">7</xref>] in addition to learning-based aggregation model, thereby contributing to accuracy. Another model with support vector and na&#x00EF;ve Bayes was also designed in [<xref ref-type="bibr" rid="ref-8">8</xref>] to efficiently differentiate between phishing and benign instances.</p>
<p>However, all these aforementioned methods are time consuming and utilize static detection rules. In [<xref ref-type="bibr" rid="ref-9">9</xref>], a PhishLimiter was designed for deep packet inspection and then integrated with software-defined networking to detect activities involving phishing via e-mail and web-based communication in a timely manner.</p>
<p>An FVV [<xref ref-type="bibr" rid="ref-10">10</xref>] was initially identified to obtain significant features in the preliminary stage and then detection mechanism was performed to improve the accuracy rate. In [<xref ref-type="bibr" rid="ref-11">11</xref>], information-based brand authorization techniques were utilized to handle statistical antiphishing. Deep learning techniques were applied in [<xref ref-type="bibr" rid="ref-12">12</xref>] to handle big data.</p>
<p>However, existing antiphishing techniques were designed on the basis of page-related features. For instance, a method involving a fast phishing detection model was designed in [<xref ref-type="bibr" rid="ref-13">13</xref>] to improve the speed of the method and reduce detection time. Another method was proposed in [<xref ref-type="bibr" rid="ref-14">14</xref>] to improve the accuracy rate using principal component analysis and random forest and successfully classify suspicious websites.</p>
<p>A review of machine learning methods for spam detection along with phishing was discussed in [<xref ref-type="bibr" rid="ref-15">15</xref>]. Sine cosine algorithm was utilized in [<xref ref-type="bibr" rid="ref-16">16</xref>] to detect the presence of spam by using artificial neural network and multilayer network perceptron, thereby resulting in minimum error. Deep belief network was applied in [<xref ref-type="bibr" rid="ref-17">17</xref>], which first identified original features and interaction features and efficiently classified between true positive and false positive.</p>
<p>A systematic measure of the infiltration process involved in phishing detection along with the application for chrome extension, namely, Sniff-Phish, was developed in [<xref ref-type="bibr" rid="ref-18">18</xref>], where the computational time involving the resource-intensive model in cloud was designed to detect several types of phishing attacks, thus minimizing the false positive significantly. Text and image watermarking tools [<xref ref-type="bibr" rid="ref-19">19</xref>] were used as means for detecting between phishing attack and normal traffic. Here, watermark was used in the client side and was found to be fool proof.</p>
<p>A companion scheme was introduced in [<xref ref-type="bibr" rid="ref-20">20</xref>] to identify the brands of zero-hour phishing web pages through categorizing the target brand logos in page screenshots. The features of histogram of oriented gradients were used to attain visual representation of target brand logos in a scale-invariant approach.</p>
<p>In [<xref ref-type="bibr" rid="ref-21">21</xref>], systematic benchmarking study and evaluation were carried out with phishing features on diverse and extensive datasets. The imbalanced nature of phishing attacks affected the researchers and the performance of the detection system. New features were required to stop the attackers from fooling detection systems.</p>
<p>Motivated by these works for website phishing detection, we propose a PLS-GLGB for website phishing detection with the objective of reducing phishing detection time, overhead, and misclassification error via a confusion matrix. The description of the proposed method is elaborated in the following sections.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>PLS-GLGB</title>
<p>The foremost objective of malicious users utilizing website phishing is to break recognition from users. Different materials and methods have been designed to safeguard users from website phishing attacks. PLS-GLGB for website phishing detection is presented in this paper. Latent semantic analysis is the method used for natural language processing. It helps analyze the relationship between a set of features and the objective (i.e., phishing attack detection). Greedy levy gradient boosting is an ensemble learning method used to resolve the exploration/exploitation dilemma. Gradient boosting generates and combines weak prediction models to obtain strong classification output results. The three main processes used for phishing detection technique are preprocessing, feature selection, and classification. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> shows the block diagram of the PLS-GLGB method.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Block diagram of the PLS-GLGB method</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19300-fig-1.png"/>
</fig>
<p>As illustrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, two steps are involved in the processing of the PLS-GLGB method for timely and accurate web phishing detection, namely, preprocessing and phishing detection. Preprocessing is performed to identify missing data in the URL provided as input from the Phish tank dataset. It is performed in our work by applying the PILSP model.</p>
<p>With this preprocessing model, missing data in the URL are identified and eliminated by means of data maximum likelihood, which ensures data quality during phishing detection. Subsequently, with the preprocessed features, the GLDG model is applied for accurate website phishing detection in an optimal manner. The elaborate description of the proposed method is presented in the following section.</p>
<sec id="s3_1">
<label>3.1</label>
<title>PILSP Model Text Layout</title>
<p>The first step involved in the design of web phishing detection is to preprocess the given input dataset (i.e., missing data) with the objective of ensuring data quality. A PILSP model is used to obtain computationally efficient preprocessed features that analyze missing data in the URL path. Thus, only data pertaining to the entire information are used for web phishing detection while missing data in a certain path are eliminated for further processing. This approach is performed by mapping high-dimensional vector parts of a URL to a lower-dimensional vector of phish identifier. The PILSP model is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Flow diagram of the PILSP model</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19300-fig-2.png"/>
</fig>
<p>A collection of URLs, <italic>URL</italic>&#x2009;&#x003D;&#x2009;<italic>U</italic><sub>1</sub>, <italic>U</italic><sub>2</sub>, &#x2026;., <italic>U</italic><sub><italic>n</italic></sub> and a set of three parts &#x007B;<italic>P</italic>, <italic>DN</italic>, <italic>Path</italic>&#x007D; representing protocol <italic>P</italic>, domain name <italic>DN</italic>, and path <italic>Path</italic> which occur in those URLs, <italic>Path</italic>&#x2009;&#x003D;&#x2009;<italic>P</italic><sub>1</sub>, <italic>P</italic><sub>2</sub>, &#x2026;, <italic>P</italic><sub><italic>n</italic></sub> (with only the path used for analyzing the missing data to ensure data quality), are considered. The model then links a latent phish identifier variable <italic>p</italic>&#x2009;&#x003D;&#x2009;<italic>p</italic><sub>1</sub>, <italic>p</italic><sub>2</sub>, &#x2026;, <italic>p</italic><sub><italic>n</italic></sub> with the contingency of each path <italic>Path</italic>&#x2009;&#x003D;&#x2009;<italic>P</italic><sub>1</sub>, <italic>P</italic><sub>2</sub>, &#x2026;, <italic>P</italic><sub><italic>n</italic></sub> in a particular URL. Thus, the PILSP for the path&#x2013;URL contingency is indicated by means of the probability intersection function (PIF) as shown in the following section.<disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula>In <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>, <italic>Prob</italic> (<italic>U</italic><sub><italic>i</italic></sub>) is the probability that a path is observed in a given URL <italic>U</italic><sub><italic>i</italic></sub>, <italic>Prob</italic> (<italic>P</italic><sub><italic>j</italic></sub>&#x007C;<italic>p</italic><sub><italic>k</italic></sub>) is the probability of a specific URL governed on latent phish identifier variable <italic>p</italic><sub><italic>k</italic></sub>, <italic>Prob</italic> (<italic>p</italic><sub><italic>k</italic></sub>&#x007C;<italic>U</italic><sub><italic>i</italic></sub>) is the probability diffusion of a particular URL over the latent phish identifier variable space, and <italic>k</italic> refers to the number of phish IDs.</p>
<p>The probability <italic>Prob</italic> (<italic>P</italic><sub><italic>j</italic></sub>&#x007C;<italic>p</italic><sub><italic>k</italic></sub>) corresponds to URLs that make up a given phish ID, and the probability <italic>Prob</italic> (<italic>p</italic><sub><italic>k</italic></sub>&#x007C;<italic>U</italic><sub><italic>i</italic></sub>) corresponds to phish IDs that a given URL belongs to. With the aid of PIF, the parameters <italic>Prob</italic>(<italic>P</italic><sub><italic>j</italic></sub>&#x007C;<italic>p</italic><sub><italic>k</italic></sub>) and <italic>Prob</italic> (<italic>p</italic><sub><italic>k</italic></sub>&#x007C; <italic>U</italic><sub><italic>i</italic></sub>, <italic>P</italic><sub><italic>j</italic></sub>) are evaluated by means of data maximum likelihood <italic>ML</italic> and formulated as follows:<disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula>By applying the epistemological rule, the <italic>E</italic> step involved in the data maximum likelihood <italic>ML</italic> is mathematically expressed as follows:<disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mspace width="thickmathspace" /><mml:mo>&#x2208;</mml:mo><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mspace width="thickmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>The <italic>M</italic> step acquired by maximizing the expected data maximum likelihood is given by the following dual expression:<disp-formula id="eqn-4"><label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>M</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>M</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula><disp-formula id="eqn-5"><label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mi>P</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msub><mml:mi>P</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>With the aid of these functions, the pseudo code representation of probabilistic latent preprocessing is as follows:</p>
<fig id="fig-5">
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19300-fig-5.png"/>
</fig>
<p>In this probabilistic latent preprocessing algorithm, for each URL with its corresponding path and phish identifier variable as input, the objective remains to identify the path with a certain amount of missing data because phishing detection becomes simpler by identifying the missing data. This method is performed by means of PIF and data maximum likelihood. The parameters are evaluated in an iterative manner by varying the <italic>E</italic> step and <italic>M</italic> step when all the missing data in the phishing IDs are identified. Thus, the web phishing detection process is performed in a significant manner to improve data quality.</p>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>GLDG (Model) Website Phishing Detection</title>
<p>In preprocessed phish data (<italic>x</italic><sub>1</sub>, <italic>y</italic><sub>1</sub>), (<italic>x</italic><sub>2</sub>, <italic>y</italic><sub>2</sub>), &#x2026;, (<italic>x</italic><sub><italic>n</italic></sub>, <italic>y</italic><sub><italic>n</italic></sub>), <italic>x</italic><sub><italic>i</italic></sub> belongs to feature space <italic>X</italic>, and <italic>y</italic><sub><italic>i</italic></sub> belongs to label set <italic>Y</italic>&#x2009;&#x003D;&#x2009;&#x007B;&#x2009;&#x2212;&#x2009;1,&#x2009;&#x002B;&#x2009;1&#x007D;. Here, the feature space corresponds to the preprocessed URLS (<italic>x</italic><sub>1</sub>&#x2009;&#x2192;&#x2009;<italic>PU</italic><sub>1</sub>, <italic>x</italic><sub>2</sub>&#x2009;&#x2192;&#x2009;<italic>PU</italic><sub>2</sub>, &#x2026;, <italic>x</italic><sub><italic>n</italic></sub>&#x2009;&#x2192;&#x2009;<italic>PU</italic><sub><italic>n</italic></sub>) with a weak hypothesis, <italic>wh</italic><sub><italic>t</italic></sub>:<italic>X</italic>&#x2009;&#x2192;&#x2009;&#x007B;&#x2009;&#x2212;&#x2009;1,&#x2009;&#x002B;&#x2009;1&#x007D;. With the given assumptions, a GLDG model for website phishing detection is used to detect web phish in an accurate pattern. Gradient boosting generates an ensemble of weak prediction models.</p>
<p>The prediction process initiates with a weak model. Contrast divergence attempts to master the data space and is enhanced in an iterative manner by the succeeding model that minimizes the error of the preceding model. The objective of gradient boosting remains to integrate weak learning models into a single strong model as follows:<disp-formula id="eqn-6"><label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:mi>F</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mi>C</mml:mi><mml:msub><mml:mi>D</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula>Generally, <italic>CD</italic><sub><italic>j</italic></sub> is the contrast divergence of a specified depth that is ceaselessly enhanced over <italic>j</italic> assessments, and <italic>m</italic> refers to the regression parameter for that specific iteration.<disp-formula id="eqn-7"><label>(7)</label>
<mml:math id="mml-eqn-7" display="block"><mml:mi>C</mml:mi><mml:msub><mml:mi>D</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:msup><mml:mi>v</mml:mi><mml:mn>0</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msup><mml:mi>v</mml:mi><mml:mn>0</mml:mn></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>v</mml:mi><mml:mn>0</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:mo>+</mml:mo><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>b</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:msup><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msup></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msup><mml:mi>v</mml:mi><mml:mi>j</mml:mi></mml:msup><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mstyle></mml:math>
</disp-formula></p>
<p>At each iteration, the model is updated as follows:<disp-formula id="eqn-8"><label>(8)</label>
<mml:math id="mml-eqn-8" display="block"><mml:msub><mml:mi>F</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:msub><mml:mi>F</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mi>C</mml:mi><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:mi>U</mml:mi><mml:mi>R</mml:mi><mml:mi>L</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math>
</disp-formula><italic>CD</italic><sub><italic>j</italic>&#x002B;1</sub> is selected to reduce the loss function <italic>L</italic> involved in the current model fitting of a preprocessed URL <italic>PURL</italic> in an optimal manner using the sine&#x2013;cosine function, which is mathematically expressed as follows:<disp-formula id="eqn-9"><label>(9)</label>
<mml:math id="mml-eqn-9" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mi>n</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula>where mean square errors (9) <italic>x</italic><sub><italic>i</italic></sub> and <inline-formula id="ieqn-1">
<mml:math id="mml-ieqn-1"><mml:msubsup><mml:mi>x</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msubsup></mml:math>
</inline-formula> refer to the actual class of preprocessed URL sample and projected class of preprocessed URL samples, respectively. A preprocessed URL sample vector for reducing the objective function is examined as an optimal preprocessed URL sample vector; it is utilized to detect web phish. When a preprocessed URL sample vector is defined as <italic>PU</italic><sup><italic>i</italic></sup>, certain URL vectors are generated in a random manner to structure the primary population as follows:<disp-formula id="eqn-10"><label>(10)</label>
<mml:math id="mml-eqn-10" display="block"><mml:mi>P</mml:mi><mml:msup><mml:mi>U</mml:mi><mml:mi>i</mml:mi></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:msup><mml:mi>U</mml:mi><mml:mn>1</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mi>P</mml:mi><mml:msup><mml:mi>U</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mspace width="thickmathspace" /><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>P</mml:mi><mml:msup><mml:mi>U</mml:mi><mml:mi>n</mml:mi></mml:msup></mml:mrow></mml:mrow></mml:math>
</disp-formula>Each preprocessed URL sample vector is rationalized via sine and cosine association. Although rationalization is achieved by means of sine and cosine association, the domain of the local optimum is obtained with the lack of self-learning potentiality. To address this issue, in addition to the association factor, a technique based on greedy levy is proposed for optimum website phish detection with respect to individual URLs. In this manner, the primary individual population is jumped out of optimality via the greedy levy operation. The preprocessed URL with association and greedy levy operation is mathematically formulated as follows:<disp-formula id="eqn-11"><label>(11)</label>
<mml:math id="mml-eqn-11" display="block"><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mi>S</mml:mi><mml:mi>I</mml:mi><mml:mi>N</mml:mi><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>r</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mi>C</mml:mi><mml:mi>O</mml:mi><mml:mi>S</mml:mi><mml:mi>I</mml:mi><mml:mi>N</mml:mi><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>r</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x03B8;</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2217;</mml:mo><mml:mi>l</mml:mi><mml:mi>e</mml:mi><mml:mi>v</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:mrow><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>From the above <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref>, <italic>r</italic><sub>1</sub> refers to the ratio of current iterations to the maximum number of iterations, and <italic>&#x03B8;</italic>(<italic>j</italic>)&#x002A;<italic>levy</italic> corresponds to the coefficient of self-learning factor of the subsequent greedy level for the preprocessed URL. Finally, the evaluated classifier is as follows:<disp-formula id="eqn-12"><label>(12)</label>
<mml:math id="mml-eqn-12" display="block"><mml:mi>H</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>I</mml:mi><mml:mi>G</mml:mi><mml:mi>N</mml:mi><mml:mspace width="thickmathspace" /><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>n</mml:mi></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>&#x2217;</mml:mo><mml:mi>w</mml:mi><mml:msub><mml:mi>h</mml:mi><mml:mi>t</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula>The pseudo code representation of GLDG website phishing detection is as follows:</p>
<fig id="fig-6">
<label>Figure 3</label>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19300-fig-6.png"/>
</fig>
<p>With the preprocessed URLs (i.e., removing missing data) as the input, the objective here remains to accurately detect website phishing in an optimal manner. Two factors are considered for these objectives, namely, sine&#x2013;cosine and greedy levy. Rationalization is initially achieved, followed by individual URL optimality.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Map Reduce Phase</title>
<p>For each mapper, the subsequent mapper ID (phish_ID) corresponds to the input <italic>key</italic>, and the input <italic>value</italic> refers to a list of values. Two elements constitute the values; the first element identifies the value type, and the second element refers to the data itself. During each epoch, the value <italic>value</italic> corresponds to the output of the reducer in the previous epoch; it includes the updated <italic>U</italic>, <italic>P</italic>, <italic>p</italic>, and their accumulated approximate gradients.</p>
<p>The input dataset obtained from phish tank (i.e., <uri xlink:href="http://data.phishtank.com/data/online-valid.csv">http://data.phishtank.com/data/online-valid.csv</uri>) is partitioned into a number of disjoint subsets that are stored as grids on a Hadoop Distributed File System (HDFS). After obtaining each key&#x2013;value pair, each mapper loads one subset from the HDFS into memory. Each mapper emits three types of intermediate keys, <italic>&#x03B2;</italic>_<italic>U</italic>, <italic>&#x03B2;</italic>_<italic>P</italic>, and <italic>&#x03B2;</italic>_<italic>p</italic>, that denote the increments of <italic>U</italic><sub><italic>n</italic></sub>, <italic>P</italic><sub><italic>n</italic></sub>, and <italic>p</italic><sub><italic>n</italic></sub>, respectively.</p>
<p>Three reducers are used to train the GLDG model. Each reducer reads one type <inline-formula id="ieqn-2">
<mml:math id="mml-ieqn-2"><mml:mi>&#x03B2;</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:msub><mml:mi>U</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>
</inline-formula>, <inline-formula id="ieqn-3">
<mml:math id="mml-ieqn-3"><mml:mi>&#x03B2;</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>
</inline-formula>, or <inline-formula id="ieqn-4">
<mml:math id="mml-ieqn-4"><mml:mi>&#x03B2;</mml:mi><mml:mi mathvariant="normal">&#x005F;</mml:mi><mml:msub><mml:mi>p</mml:mi><mml:mi>n</mml:mi></mml:msub></mml:math>
</inline-formula> of the intermediate key&#x2013;value pairs as input and applies the reduce function to initially calculate the increments (i.e., computationally efficient feature) and the update parameter (i.e., website phishing detection). The reducer then obtains the mapper ID as the output <italic>key</italic> and the resulting website phishing detected value as the output.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Evaluation</title>
<p>The PLS-GLGB technique attains minimum phishing detection overhead compared with other existing methods because of the application of the probabilistic latent preprocessing algorithm. Two functions, namely, probability intersection and data maximum likelihood, are used to determine the missing data in the phish tank dataset by applying this algorithm. Web phishing is processed only after the missing data are determined, subsequently reducing the overhead incurred in web phishing.</p>
<p>In this section, the PLS-GLGB for website phishing detection is proposed. The method is implemented in JAVA MapReduce parallel programming language with CloudSim simulator. Version 1.1.2 is adopted for the Hadoop cluster. The performance is assessed by the boosting technique to evaluate the effectiveness of our method. Four metrics, precision, recall, time, and overhead, are used to analyze the results of our method.</p>
<p>Misclassification rate is evaluated by means of a confusion matrix. Comparisons are made with two state-of-the-art methods, namely, OFS-NN [<xref ref-type="bibr" rid="ref-1">1</xref>] and CatchPhish [<xref ref-type="bibr" rid="ref-2">2</xref>], using the phish tank dataset. <xref ref-type="table" rid="table-1">Tab. 1</xref> presents the features and the corresponding description of the phish details used for simulation.</p>
<table-wrap id="table-1"><label>Table 1</label>
<caption>
<title>Phish tank dataset details</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">S. No</th>
<th align="left">Features</th>
<th align="left">Description</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">1</td>
<td align="left">Phish_id</td>
<td align="left">The ID number by which a phish tank is referred to</td>
</tr>
<tr>
<td align="left">2</td>
<td align="left">Phish_detail_url</td>
<td align="left">Phish tank detail URL for the phish</td>
</tr>
<tr>
<td align="left">3</td>
<td align="left">url</td>
<td align="left">The phish URL</td>
</tr>
<tr>
<td align="left">4</td>
<td align="left">Submission_time</td>
<td align="left">The date and time at which the phish was reported to the phish tank</td>
</tr>
<tr>
<td align="left">5</td>
<td align="left">Verified</td>
<td align="left">Whether or not the phish has been verified</td>
</tr>
<tr>
<td align="left">6</td>
<td align="left">Verification time</td>
<td align="left">The date and time at which the phish was verified</td>
</tr>
<tr>
<td align="left">7</td>
<td align="left">Online</td>
<td align="left">Whether or not the phish is online and operational</td>
</tr>
<tr>
<td align="left">8</td>
<td align="left">Target</td>
<td align="left">The name of the company or brand the phish is impersonating</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5">
<label>5</label>
<title>Discussion</title>
<p>In this section, the performance measure of three parameters, namely, phishing detection time, phishing detection overhead, and confusion matrix, using the proposed PLS-GLGB method is compared with that of the existing OFS-NN [<xref ref-type="bibr" rid="ref-1">1</xref>] and CatchPhish [<xref ref-type="bibr" rid="ref-2">2</xref>] methods. Details are provided in the following sections with the aid of table value and graphical representation.</p>
<sec id="s5_1">
<label>5.1</label>
<title>Performance Measure of Phishing Detection Time</title>
<p>Phishing detection time refers to the time consumed in detecting the website phishing. A considerable amount of time is consumed to detect website phishing, and it is mathematically formulated as follows:<disp-formula id="eqn-13"><label>(13)</label>
<mml:math id="mml-eqn-13" display="block"><mml:mi>P</mml:mi><mml:msub><mml:mi>D</mml:mi><mml:mi>T</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mspace width="thickmathspace" /><mml:mrow><mml:mi>H</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
</disp-formula>In <xref ref-type="disp-formula" rid="eqn-13">Eq. (13)</xref>, phishing detection time <italic>PD</italic><sub><italic>T</italic></sub> is inferred from the latent phish identifier variable <italic>p</italic><sub><italic>i</italic></sub> and the time consumed in detecting <inline-formula id="ieqn-5">
<mml:math id="mml-ieqn-5"><mml:mi>T</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>e</mml:mi><mml:mspace width="thickmathspace" /><mml:mrow><mml:mi>H</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
</inline-formula>. It is measured in terms of milliseconds (ms). <xref ref-type="table" rid="table-2">Tab. 2</xref> shows the results of our feature selection. The total time for detecting web phishing with 50 phish_id is 6.75&#x2005;ms. This result is exceptionally lower than the total times of 7.75 and 9.25&#x2005;ms for [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>], respectively.</p>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the phishing detection time with respect to different numbers of latent phish identifier variables ranging between 50 and 150. The graph suggests that the phishing detection time is directly proportional to the number of phish_id. An increase in the number of phish_id increases the features considered for phishing detection, therefore increasing phishing detection time. However, a comparison made with 50 phish identifiers indicated that using PLS-GLGB consumes 6.75&#x2005;ms while [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>] consume 7.75 and 9.25&#x2005;ms, respectively.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Graphical representation of phishing detection time</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19300-fig-3.png"/>
</fig>
<p>This result indicates that the comparison of PLS-GLGB with two other methods show that PLS-GLGB consumes minimum phishing detection time compared with [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>]. This finding is attributed to the application of the PILSP model. By applying this model, in certain path information, data are missing, the path information are discarded from being processed, and only the prevailing path with sufficient information is used for detecting web phishing. Therefore, the web phishing detection time using PLS-GLGB is reduced by 28&#x0025; compared with [<xref ref-type="bibr" rid="ref-1">1</xref>] and 40&#x0025; compared with [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Performance Measure of Phishing Detection Overhead</title>
<p>Phishing detection overhead refers to the memory time incurred during the detection of website phishing. A significant amount of memory is incurred while detecting website phishing, and it is mathematically expressed as follows:<disp-formula id="eqn-14"><label>(14)</label>
<mml:math id="mml-eqn-14" display="block"><mml:mi>P</mml:mi><mml:msub><mml:mi>D</mml:mi><mml:mi>O</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:mi>M</mml:mi><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mspace width="thickmathspace" /><mml:mrow><mml:mi>H</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
</disp-formula>In <xref ref-type="disp-formula" rid="eqn-14">Eq. (14)</xref>, phishing detection overhead <italic>PD</italic><sub><italic>O</italic></sub> is inferred from the latent phish identifier variable <italic>p</italic><sub><italic>i</italic></sub> and the memory incurred in detecting <inline-formula id="ieqn-6">
<mml:math id="mml-ieqn-6"><mml:mi>M</mml:mi><mml:mi>e</mml:mi><mml:mi>m</mml:mi><mml:mspace width="thickmathspace" /><mml:mrow><mml:mi>H</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>P</mml:mi><mml:msubsup><mml:mi>U</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:math>
</inline-formula>. It is measured in terms of kilobytes (KB). <xref ref-type="table" rid="table-3">Tab. 3</xref> shows the results of our feature selection in terms of overhead consumed.</p>
<table-wrap id="table-3"><label>Table 3</label>
<caption>
<title>Phishing detection overhead</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="2">Phish_id</th>
<th align="center" colspan="3">Phishing detection overhead (KB)</th>
</tr>
<tr>
<th align="left">PLS-GLGB</th>
<th align="left">Optimal Feature Selection-Neural Network</th>
<th align="left">CatchPhish</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">50</td>
<td align="left">100</td>
<td align="left">150</td>
<td align="left">200</td>
</tr>
<tr>
<td align="left">100</td>
<td align="left">200</td>
<td align="left">300</td>
<td align="left">400</td>
</tr>
<tr>
<td align="left">150</td>
<td align="left">300</td>
<td align="left">450</td>
<td align="left">600</td>
</tr>
<tr>
<td align="left">200</td>
<td align="left">400</td>
<td align="left">600</td>
<td align="left">700</td>
</tr>
<tr>
<td align="left">250</td>
<td align="left">500</td>
<td align="left">650</td>
<td align="left">800</td>
</tr>
<tr>
<td align="left">300</td>
<td align="left">700</td>
<td align="left">800</td>
<td align="left">950</td>
</tr>
<tr>
<td align="left">350</td>
<td align="left">850</td>
<td align="left">900</td>
<td align="left">1100</td>
</tr>
<tr>
<td align="left">400</td>
<td align="left">1000</td>
<td align="left">1250</td>
<td align="left">1350</td>
</tr>
<tr>
<td align="left">450</td>
<td align="left">1150</td>
<td align="left">1400</td>
<td align="left">1500</td>
</tr>
<tr>
<td align="left">500</td>
<td align="left">1300</td>
<td align="left">1550</td>
<td align="left">1700</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> shows the phishing detection overhead for 500 different identifiers collected at different submission times and possessing different URLs. The figure shows that the phishing detection overhead is directly proportional to the number of phish identifiers considered for conducting simulations. Thus, increasing the number of phish identifiers increases the number of URLs to be processed and the targets to be reached, evidently increasing the overhead incurred.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Graphical representation of phishing detection overhead</title></caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="CSSE_19300-fig-4.png"/>
</fig>
<p>However, the simulations conducted with 50 phish IDs indicated that the overhead values incurred using PLS-GLGB were 100 KB and 150 and 200 KB when applied using [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>]. Therefore, the overhead is comparatively higher using PLS-GLGB than [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>] because of the application of the probabilistic latent preprocessing algorithm. Two functions, namely, PIF and data maximum likelihood, are used to determine the missing data in the phish tank dataset. Web phishing is processed only after the missing data are determined. Thus, the overall overhead incurred in web phishing using PLS-GLGB is reduced by 23&#x0025; compared with [<xref ref-type="bibr" rid="ref-1">1</xref>] and 35&#x0025; compared with [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
</sec>
<sec id="s5_3">
<label>5.3</label>
<title>Performance Measure of the Confusion Matrix</title>
<p>The confusion matrix corresponds to a table that is utilized to describe the performance of the GLDG classifier on a set of test data for which the true values are known. <xref ref-type="table" rid="table-4">Tab. 4</xref> shows an example of a confusion matrix.</p>
<p>The table (i.e., matrix) shows two probable predicted classes, namely, YES and NO; YES predicts the presence of a phishing attack, and NO indicates no attack. A total of 500 phish IDs are considered for prediction (i.e., 500 phish IDs were tested for the presence of phish attack). Among the 500 cases, the classifier predicted YES 455 times and NO 45 times. In reality, 430 phish IDs in the sample had an attack and 70 phish IDs did not. <xref ref-type="table" rid="table-5">Tab. 5</xref> represents the updated confusion matrix.</p>
<p>Several metrics, such as accuracy, misclassification rate, true positive rate, false positive rate, true negative rate, and precision, are usually evaluated from the confusion matrix for a binary classifier. In our work, misclassification rate is used and mathematically expressed as follows:<disp-formula id="eqn-15"><label>(15)</label>
<mml:math id="mml-eqn-15" display="block"><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mi>T</mml:mi></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula>In <xref ref-type="disp-formula" rid="eqn-15">Eq. (15)</xref>, <inline-formula id="ieqn-7">
<mml:math id="mml-ieqn-7"><mml:mi>E</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi><mml:mi>g</mml:mi><mml:mspace width="thickmathspace" /><mml:mrow><mml:mi mathvariant="normal">P</mml:mi><mml:mi mathvariant="normal">L</mml:mi><mml:mi mathvariant="normal">S</mml:mi></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mi mathvariant="normal">G</mml:mi><mml:mi mathvariant="normal">L</mml:mi><mml:mi mathvariant="normal">G</mml:mi><mml:mi mathvariant="normal">B</mml:mi></mml:mrow></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>45</mml:mn><mml:mo>+</mml:mo><mml:mn>20</mml:mn></mml:mrow><mml:mrow><mml:mn>500</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>=</mml:mo><mml:mn>0.13</mml:mn></mml:math>
</inline-formula>, E(using [<xref ref-type="bibr" rid="ref-1">1</xref>])<inline-formula id="ieqn-8">
<mml:math id="mml-ieqn-8"><mml:mtext>&#xA0;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>60</mml:mn><mml:mo>+</mml:mo><mml:mn>20</mml:mn></mml:mrow><mml:mrow><mml:mn>500</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>=</mml:mo><mml:mn>0.16</mml:mn></mml:math>
</inline-formula>, and E(using [<xref ref-type="bibr" rid="ref-2">2</xref>])<inline-formula id="ieqn-9">
<mml:math id="mml-ieqn-9"><mml:mtext>&#xA0;</mml:mtext><mml:mo>=</mml:mo><mml:mrow><mml:mfrac><mml:mrow><mml:mn>80</mml:mn><mml:mo>+</mml:mo><mml:mn>20</mml:mn></mml:mrow><mml:mrow><mml:mn>500</mml:mn></mml:mrow></mml:mfrac></mml:mrow><mml:mo>=</mml:mo><mml:mn>0.20</mml:mn></mml:math>
</inline-formula>. <xref ref-type="table" rid="table-6">Tab. 6</xref> shows the final misclassification rate using the confusion matrix.</p>
<p>Misclassification rate using the proposed PLS-GLGB was 0.13 and 0.16 and 0.20 using [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>], respectively. The results indicate that the misclassification rate using the PLS-GLGB method is less than [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>] because of the application of the GLDG website phishing detection algorithm. Contrast divergence function is utilized to master the data space, which in turn minimizes the error. In addition, sample vector is rationalized via sine and cosine association, where the primary individual population jumps out of the optimality via the greedy levy operation. Thus, misclassification using the PLS-GLGB method is less than that when using [<xref ref-type="bibr" rid="ref-1">1</xref>] and [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusion</title>
<p>A website phishing attack is a very customary social engineering model that attacks an institution of the end users. It is conceived to be one of the most dangerous attacks in recent years. Several studies on detecting and alleviating phishing attacks have been conducted. Conventional methods focus on the utilization of neural network models to address phishing attacks. We proposed PLS-GLGB for website phishing detection by using MapReduce as a novel solution to prevent website phishing attacks. PLS-GLGB has the ability to handle network traffic dynamics containing phishing attacks. It can provide a computationally efficient phishing detection mechanism because it utilizes probabilistic latent preprocessing. Specifically, we initially obtained computationally efficient preprocessed phish data using this preprocessing model. Then, we built a trustworthy system using GLDG website phishing detection to minimize the misclassification rate generated via confusion matrix. We theoretically and experimentally evaluated PLS-GLGB, and experiments showed minimum phishing detection time and overhead at minimum misclassification rate compared with the performance of state-of-the-art works. We performed phishing detection in websites. However, time consumption and overhead during phishing detection in websites were not reduced at the required level. A deep learning method can be introduced in the future to further enhance the phishing detection performance.</p>
<table-wrap id="table-6"><label>Table 6</label>
<caption>
<title>Misclassification rate (using the confusion matrix)</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">N&#x2009;&#x003D;&#x2009;500</th>
<th align="left">PLS-GLGB</th>
<th align="left">Optimal Feature Selection-Neural Network</th>
<th align="left">CatchPhish</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">Misclassification rate</td>
<td align="left">0.13</td>
<td align="left">0.16</td>
<td align="left">0.20</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-5"><label>Table 5</label>
<caption>
<title>Updated confusion matrix</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">N&#x2009;&#x003D;&#x2009;500 (phish IDs)</th>
<th align="left">Predicted: NO</th>
<th align="left" colspan="2">Predicted: YES</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><bold>Actual: NO</bold></td>
<td align="left">TN&#x2009;&#x003D;&#x2009;25</td>
<td align="left">FP&#x2009;&#x003D;&#x2009;45</td>
<td align="left">70</td>
</tr>
<tr>
<td align="left" rowspan="2"><bold>Actual: YES</bold></td>
<td align="left">FN&#x2009;&#x003D;&#x2009;20</td>
<td align="left">TP&#x2009;&#x003D;&#x2009;410</td>
<td align="left">430</td>
</tr>
<tr>
<td align="left">45</td>
<td align="left">455</td>
<td align="left"/>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-4"><label>Table 4</label>
<caption>
<title>Example of a confusion matrix</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">N&#x2009;&#x003D;&#x2009;500 (phish IDs)</th>
<th align="left">Predicted: NO</th>
<th align="left">Predicted: YES</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><bold>Actual: NO</bold></td>
<td align="left">25</td>
<td align="left">45</td>
</tr>
<tr>
<td align="left"><bold>Actual: YES</bold></td>
<td align="left">20</td>
<td align="left">410</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-2"><label>Table 2</label>
<caption>
<title>Phishing detection time</title></caption>
<table frame="hsides"><colgroup><col align="left"/><col align="left"/><col align="left"/><col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="2">Phish_id</th>
<th align="left" colspan="3">Phishing detection time (ms)</th>
</tr>
<tr>
<th align="left">PLS-GLGB</th>
<th align="left">Optimal Feature Selection-Neural Network</th>
<th align="left">CatchPhish</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">50</td>
<td align="left">6.75</td>
<td align="left">7.75</td>
<td align="left">9.25</td>
</tr>
<tr>
<td align="left">100</td>
<td align="left">8.25</td>
<td align="left">12.45</td>
<td align="left">14.55</td>
</tr>
<tr>
<td align="left">150</td>
<td align="left">11.55</td>
<td align="left">16.26</td>
<td align="left">18.95</td>
</tr>
<tr>
<td align="left">200</td>
<td align="left">13.35</td>
<td align="left">19.15</td>
<td align="left">23.25</td>
</tr>
<tr>
<td align="left">250</td>
<td align="left">14.25</td>
<td align="left">22.35</td>
<td align="left">25.15</td>
</tr>
<tr>
<td align="left">300</td>
<td align="left">19.15</td>
<td align="left">25.55</td>
<td align="left">30.25</td>
</tr>
<tr>
<td align="left">350</td>
<td align="left">21.25</td>
<td align="left">30.15</td>
<td align="left">35.55</td>
</tr>
<tr>
<td align="left">400</td>
<td align="left">23.55</td>
<td align="left">33.25</td>
<td align="left">40.35</td>
</tr>
<tr>
<td align="left">450</td>
<td align="left">25.55</td>
<td align="left">35.55</td>
<td align="left">45.55</td>
</tr>
<tr>
<td align="left">500</td>
<td align="left">30.15</td>
<td align="left">40.35</td>
<td align="left">50.25</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</body>
<back><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement:</bold> The authors received no specific funding for this study.</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest:</bold> The authors declare no conflicts of interest to report.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ye</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>OFS-Nn: An effective phishing websites detection model based on optimal feature selection and neural network</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, pp. <fpage>73271</fpage>&#x2013;<lpage>73284</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. S.</given-names> <surname>Rao</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Vaishnavi</surname></string-name> and <string-name><given-names>A. R.</given-names> <surname>Pais</surname></string-name></person-group>, &#x201C;<article-title>Catchphish: Detection of phishing websites by inspecting URLs</article-title>,&#x201D; <source>Ambient Intelligence &#x0026; Humanized Computing</source>, vol. <volume>11</volume>, pp. <fpage>813</fpage>&#x2013;<lpage>825</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Cuzzocrea</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Martinelli</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Mercaldo</surname></string-name></person-group>, &#x201C;<article-title>A machine learning framework for supporting intelligent web-phishing detection and analysis</article-title>,&#x201D; in <conf-name>Proc. IDEAS</conf-name>, <conf-loc>New York, NY, USA</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>3</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>O. K.</given-names> <surname>Sahingoz</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Buber</surname></string-name>, <string-name><given-names>O.</given-names> <surname>Demir</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Diri</surname></string-name></person-group>, &#x201C;<article-title>Machine learning based phishing detection from URLs</article-title>,&#x201D; <source>Expert Systems with Applications</source>, vol. <volume>117</volume>, pp. <fpage>345</fpage>&#x2013;<lpage>357</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. S.</given-names> <surname>Rao</surname></string-name> and <string-name><given-names>A. R.</given-names> <surname>Pais</surname></string-name></person-group>, &#x201C;<article-title>Two level filtering mechanism to detect phishing sites using lightweight visual similarity approach</article-title>,&#x201D; <source>Ambient Intelligence &#x0026; Humanized Computing</source>, vol. <volume>11</volume>, pp. <fpage>3853</fpage>&#x2013;<lpage>3872</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Varshney</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Misra</surname></string-name> and <string-name><given-names>P. K.</given-names> <surname>Atrey</surname></string-name></person-group>, &#x201C;<article-title>A survey and classification of web phishing detection schemes</article-title>,&#x201D; <source>Security &#x0026; Communication Networks</source>, vol. <volume>9</volume>, no. <issue>18</issue>, pp. <fpage>6266</fpage>&#x2013;<lpage>6284</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Mao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Bian</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Wei</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Phishing page detection via learning classifiers from page layout feature</article-title>,&#x201D; <source>EURASIP Journal on Wireless Communications &#x0026; Networking</source>, vol. <volume>2019</volume>, no. <issue>43</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>14</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. A.</given-names> <surname>Orunsolu</surname></string-name>, <string-name><given-names>A. S.</given-names> <surname>Sodiya</surname></string-name> and <string-name><given-names>A. T.</given-names> <surname>Akinwale</surname></string-name></person-group>, &#x201C;<article-title>A predictive model for phishing detection</article-title>,&#x201D; <source>Journal of King Saud University-Computer and Information Sciences</source>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Chin</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Xiong</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Hu</surname></string-name></person-group>, &#x201C;<article-title>Phishlimiter: A phishing detection and mitigation approach using software-define networking</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>6</volume>, pp. <fpage>42516</fpage>&#x2013;<lpage>42531</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Zhu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Ye</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>OFS-Nn: An effective phishing website detection model based on optimal feature selection and neural network</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, pp. <fpage>73271</fpage>&#x2013;<lpage>73284</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G. G.</given-names> <surname>Geng</surname></string-name>, <string-name><given-names>X. D.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>Y. M.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Combating phishing attacks via brand identity and authorization features</article-title>,&#x201D; <source>Security &#x0026; Communication Networks</source>, vol. <volume>8</volume>, no. <issue>6</issue>, pp. <fpage>888</fpage>&#x2013;<lpage>898</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Feng</surname></string-name>, and <string-name><given-names>C.</given-names> <surname>Yue</surname></string-name></person-group>, &#x201C;<article-title>Visualizing and interpreting RNN models in URL-based phishing detection</article-title>,&#x201D; <conf-name>Proc. SACMAT</conf-name>, <conf-loc>Barcelona, Spain</conf-loc>, pp. <fpage>13</fpage>&#x2013;<lpage>24</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Luo</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>PDRCNN: Precise phishing detection with recurrent convolutional neural networks</article-title>,&#x201D; <source>Security &#x0026; Communication Networks</source>, vol. <volume>2019</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. S.</given-names> <surname>Rao</surname></string-name> and <string-name><given-names>A. R.</given-names> <surname>Pais</surname></string-name></person-group>, &#x201C;<article-title>Detection of phishing websites using an efficient feature-based machine learning framework</article-title>,&#x201D; <source>Neural Computing &#x0026; Applications</source>, vol. <volume>31</volume>, pp. <fpage>3851</fpage>&#x2013;<lpage>3873</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Gangavarapu</surname></string-name>, <string-name><given-names>C. D.</given-names> <surname>Jaidhar</surname></string-name> and <string-name><given-names>B.</given-names> <surname>Chanduka</surname></string-name></person-group>, &#x201C;<article-title>Applicability of machine learning in spam and phishing email filtering: Review and approaches</article-title>,&#x201D; <source>Artificial Intelligence Review</source>, vol. <volume>53</volume>, pp. <fpage>5019</fpage>&#x2013;<lpage>5081</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. T.</given-names> <surname>Pashiri</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Rostami</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Mahrami</surname></string-name></person-group>, &#x201C;<article-title>Spam detection through feature selection using artificial neural network and sine&#x2013;cosine algorithm</article-title>,&#x201D; <source>Mathematical Sciences</source>, vol. <volume>14</volume>, pp. <fpage>193</fpage>&#x2013;<lpage>199</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names><surname>Yi</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Guan</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Zou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Wanget</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Web phishing detection using a deep learning framework</article-title>,&#x201D; <source>Wireless Communications &#x0026; Mobile Computing</source>, vol. <volume>2018</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Satheesh Kumar</surname></string-name>, <string-name><given-names>K. G.</given-names> <surname>Srinivasagan</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Ben</surname></string-name>-<string-name><surname>Othman</surname></string-name></person-group>, &#x201C;<article-title>Sniff phish: A novel framework for resource intensive computation in cloud to detect email scam</article-title>,&#x201D; <source>Transactions on Emerging Telecommunications Technologies</source>, vol. <volume>30</volume>, no. <issue>6</issue>, pp. <fpage>e3590</fpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Hajiali</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Amirmazlaghani</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Kordestani</surname></string-name></person-group>, &#x201C;<article-title>Preventing phishing attacks using text and image watermarking</article-title>,&#x201D; <source>Concurrency &#x0026; Computation: Practice &#x0026; Experience</source>, vol. <volume>31</volume>, no.<issue>13</issue>, pp. <fpage>e5083</fpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. S.</given-names> <surname>Bozkir</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Aydos</surname></string-name></person-group>, &#x201C;<article-title>LogoSENSE: A companion HOG based logo detection scheme for phishing webpage and E-mail brand recognition</article-title>,&#x201D; <source>Computer &#x0026; Security</source>, vol. <volume>95</volume>, pp. <fpage>101855</fpage>&#x2013;18, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Aassal</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Baki</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Das</surname></string-name> and <string-name><given-names>R. M.</given-names> <surname>Verma</surname></string-name></person-group>, &#x201C;<article-title>An in-depth benchmarking and evaluation of phishing detection research for security needs</article-title>,&#x201D;, <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>22170</fpage>&#x2013;<lpage>22192</lpage>, <year>2020</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>