<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CSSE</journal-id>
<journal-id journal-id-type="nlm-ta">CSSE</journal-id>
<journal-id journal-id-type="publisher-id">CSSE</journal-id>
<journal-title-group>
<journal-title>Computer Systems Science &#x0026; Engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">0267-6192</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">40212</article-id>
<article-id pub-id-type="doi">10.32604/csse.2023.040212</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Performance Analysis of Intelligent Neural-Based Deep Learning System on Rank Images Classification</article-title>
<alt-title alt-title-type="left-running-head">Performance Analysis of Intelligent Neural-Based Deep Learning System on Rank Images Classification</alt-title>
<alt-title alt-title-type="right-running-head">Performance Analysis of Intelligent Neural-Based Deep Learning System on Rank Images Classification</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Siddiqi</surname><given-names>Muhammad Hameed</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>mhsiddiqi@ju.edu.sa</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Khan</surname><given-names>Asfandyar</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Khan</surname><given-names>Muhammad Bilal</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Khan</surname><given-names>Abdullah</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Alruwaili</surname><given-names>Madallah</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Alanazi</surname><given-names>Saad</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>College of Computer and Information Sciences, Jouf University</institution>, <addr-line>Sakaka, Aljouf, 73211</addr-line>, <country>Kingdom of Saudi Arabia</country></aff>
<aff id="aff-2"><label>2</label><institution>Institute of Computer Science and Information Technology, ICS/IT FMCS, University of Agriculture</institution>, <addr-line>Peshawar, 25000</addr-line>, <country>Pakistan</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Muhammad Hameed Siddiqi. Email: <email>mhsiddiqi@ju.edu.sa</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>28</day><month>7</month><year>2023</year></pub-date>
<volume>47</volume>
<issue>2</issue>
<fpage>2219</fpage>
<lpage>2239</lpage>
<history>
<date date-type="received"><day>09</day><month>3</month><year>2023</year></date>
<date date-type="accepted"><day>25</day><month>5</month><year>2023</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Siddiqi et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Siddiqi et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CSSE_40212.pdf"></self-uri>
<abstract>
<p>The use of the internet is increasing all over the world on a daily basis in the last two decades. The increase in the internet causes many sexual crimes, such as sexual misuse, domestic violence, and child pornography. Various research has been done for pornographic image detection and classification. Most of the used models used machine learning techniques and deep learning models which show less accuracy, while the deep learning model ware used for classification and detection performed better as compared to machine learning. Therefore, this research evaluates the performance analysis of intelligent neural-based deep learning models which are based on Convolution neural network (CNN), Visual geometry group (VGG-16), VGG-14, and Residual Network (ResNet-50) with the expanded dataset, trained using transfer learning approaches applied in the fully connected layer for datasets to classify rank (Pornographic <italic>vs.</italic> Nonpornographic) classification in images. The simulation result shows that VGG-16 performed better than the used model in this study without augmented data. The VGG-16 model with augmented data reached a training and validation accuracy of 0.97, 0.94 with a loss of 0.070, 0.16. The precision, recall, and f-measure values for explicit and non-explicit images are (0.94, 0.94, 0.94) and (0.94, 0.94, 0.94). Similarly, The VGG-14 model with augmented data reached a training and validation accuracy of 0.98, 0.96 with a loss of 0.059, 0.11. The f-measure, recall, and precision values for explicit and non-explicit images are (0.98, 0.98, 0.98) and (0.98, 0.98, 0.98). The CNN model with augmented data reached a training and validation accuracy of 0.776 &#x0026; 0.78 with losses of 0.48 &#x0026; 0.46. The f-measure, recall, and precision values for explicit and non-explicit images are (0.80, 0.80, 0.80) and (0.78, 0.79, 0.78). The ResNet-50 model with expanded data reached with training accuracy of 0.89 with a loss of 0.389 and 0.86 of validation accuracy and a loss of 0.47. The f-measure, recall, and precision values for explicit and non-explicit images are (0.86, 0.97, 0.91) and (0.86, 0.93, 0.89). Where else without augmented data the VGG-16 model reached a training and validation accuracy of 0.997, 0.986 with a loss of 0.008, 0.056. The f-measure, recall, and precision values for explicit and non-explicit images are (0.94, 0.99, 0.97) and (0.99, 0.93, 0.96) which outperforms the used models with the augmented dataset in this study.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>VGG-16</kwd>
<kwd>VGG-14</kwd>
<kwd>pornography detection</kwd>
<kwd>expansion</kwd>
<kwd>ResNet-50</kwd>
<kwd>convolution neural network (CNN)</kwd>
<kwd>machine learning</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Deanship of Scientific Research at Jouf University</funding-source>
<award-id>DSR&#x2013;2022&#x2013;RG&#x2013;0101</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>The use of the internet is increasing all over the world on a daily basis in the last two decades. In this modern society, the internet is used to access global information [<xref ref-type="bibr" rid="ref-1">1</xref>]. According to &#x201C;We Are Social and Hootsuite&#x201D; in 2019 [<xref ref-type="bibr" rid="ref-2">2</xref>], 45 of users now using the internet of the world&#x2019;s population. With increasing the number of internet users, the use of the internet increased. And the communication network resources are more actively used as increased the number of internet users. Due to active network communication, resource information is easily shared on the internet at any time. And people can easily promote various information tools and share information on the Internet [<xref ref-type="bibr" rid="ref-1">1</xref>]. Due to increasing the number of internet users the misuse of the internet is also increased. Using the internet various pornographic information has also been usually reported [<xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;<xref ref-type="bibr" rid="ref-6">6</xref>].</p>
<p>On the other side, the authors in [<xref ref-type="bibr" rid="ref-7">7</xref>] reported that in the past ten years with the use of the internet negative content such as pornography has significantly increased. The negative internet content leads to many moral issues and social problems and affects people&#x2019;s normal lives, especially teenagers, [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>]. Moreover, various research has been done which described that with the use of the increase in internet negative content such as pornography causes sensual crimes, domestic violence, sexual manipulation, and child pornography at an upward rate [<xref ref-type="bibr" rid="ref-10">10</xref>]. Nowadays, majority of the people utilized different kinds of social media platforms to share various information and communicate with the world. Most young children use various kinds of social media platforms on their smartphones for playing games. While playing games, different pornographic material and ads are shared, which badly affect the mind of teenage kids. On the other hand, it is also widely reported that any information related to pornographic content is distributed on the internet.</p>
<p>Internet pornography content affects many people&#x2019;s life especially adolescents and creates many social problems and moral issues [<xref ref-type="bibr" rid="ref-1">1</xref>]. It is important to recognize internet pornography by using proper internet resources and principles and employing healthy development for using the internet. Therefore, it needs to plan a pornographic image recognition model, which effectively detected and recognized pornographic images [<xref ref-type="bibr" rid="ref-7">7</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>]. Due to the developments in computer vision, content-based internet pornographic image recognition has been widely investigated [<xref ref-type="bibr" rid="ref-13">13</xref>]. In past decades many researchers have done various research to resolve this problem. However, there are still many challenges in the existing detection methods of pornographic images. Firstly, to classify the pornographic image and non-pornographic images, in which the pornographic image recognition task is considered as category A, and normal images are considered as category B, considered a binary classification task [<xref ref-type="bibr" rid="ref-1">1</xref>]. Currently, deep learning methods in the field of image recognition achieved great attention from researchers and attained great success for image-based pornography detection. Similarly, to automatically extract pornographic image features to detect pornography [<xref ref-type="bibr" rid="ref-14">14</xref>] presented multi-layer convolutional neural networks. Which shows better performance in the field of pornography detection.</p>
<p>Moreover, in [<xref ref-type="bibr" rid="ref-15">15</xref>], the authors developed an efficient model for child pornographic image detection which is based on a skin tone filter with a novel set of facial features systems to increase the image recognition performance. Similarly, to further improve pornography detection, the authors of [<xref ref-type="bibr" rid="ref-16">16</xref>] presented a new model which used transfer learning with multiple feature fusion, which gives better performance for negative content image recognition, which showed the usefulness of transfer learning. Although [<xref ref-type="bibr" rid="ref-17">17</xref>] proposed a human pose model for sexual organ detection. Similarly, in [<xref ref-type="bibr" rid="ref-18">18</xref>], the authors used texture features and the maximum posterior approach for pornographic classification. The authors of [<xref ref-type="bibr" rid="ref-19">19</xref>] proposed a deformable part model and support vector machine (SVM) for sexual organ detection. Moreover, in [<xref ref-type="bibr" rid="ref-20">20</xref>], the authors used a high-level semantic tree model and SVM for Pornographic classification that achieved 87.6 accuracies. Furthermore, the authors of [<xref ref-type="bibr" rid="ref-21">21</xref>] implement Color features with convolutional neural networks (CNN) to classify Pornography and skirt images which show less accuracy. More specifically, in [<xref ref-type="bibr" rid="ref-22">22</xref>], the authors used artificial Neural networks (ANN) to increase the reliability of skin detectors which also did not perform better for skin detectors. Further, the authors of [<xref ref-type="bibr" rid="ref-23">23</xref>] proposed CNN training strategies for skin detection, which perform better than the ANN, Similarly, in [<xref ref-type="bibr" rid="ref-24">24</xref>], the authors implemented the combination of recurrent neural network (RNN) and CNN for human skin detection.</p>
<p>Similarly, for the Pornographic <italic>vs.</italic> Nonpornographic classification [<xref ref-type="bibr" rid="ref-14">14</xref>] proposed AlexNet &#x0026; Google Net models achieved 94 accuracies. Furthermore, the authors of [<xref ref-type="bibr" rid="ref-25">25</xref>] used the Google Net, CNN, model, and Pornographic <italic>vs.</italic> color feature histogram for the Pornographic <italic>vs.</italic> Nonpornographic images classification on the NPDI dataset. Likewise, the authors of [<xref ref-type="bibr" rid="ref-26">26</xref>] used CNN with multiple instance learning (MIL) to detect exposed body parts on the NPDI dataset. The authors of [<xref ref-type="bibr" rid="ref-27">27</xref>] proposed a Google Net model for region-based recognition of sexual organs on the NPDI dataset. In [<xref ref-type="bibr" rid="ref-28">28</xref>], the authors proposed CNN and Google Net architecture for video pornography detection. Most of the used models as mentioned above used machine learning techniques and deep learning models which show less accuracy, while the deep learning model was used for classification and detection that show high performance. Therefore, it needs to develop a new deep learning model for the accurate classification of nudity images. To further enhance the deep learning model for accurate classifying this paper presents a novel deep learning model for the Pornographic <italic>vs.</italic> Nonpornographic classification in images. The deep learning models will be based on CNN, VGG-14, VGG-16, and ResNet-50 with expansion and VGG-16 without expansion, which are used to classify Pornographic <italic>vs.</italic> Nonpornographic. The main contributions of this study are:
<list list-type="simple">
<list-item><label>&#x25A0;</label><p>This paper proposes ResNet-50, VGG-16, and its variant for the classification of rank images.</p></list-item>
<list-item><label>&#x25A0;</label><p>This paper describes the pre-trained model VGG-16 as a feature extractor for transfer learning on the NPDI images Dataset.</p></list-item>
<list-item><label>&#x25A0;</label><p>Further, the performance of the deep learning-based Convolutional Neural Network i.e., CNN, ResNet-50, VGGNet-14, and VGG-16 models are checked using accuracy, loss, precision, recall, and f-measure.</p></list-item>
<list-item><label>&#x25A0;</label><p>The proposed models use transfer learning approaches in the fully connected layer of the network.</p></list-item>
</list></p>
<p>The rest of the paper is organized as Section 2 will discuss the related work. Similarly, next Section 3 will describe the methodology of the paper. Further, Section 4 explained the result and discussion. Finally, Section 5 gives the conclusion of this research.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Work</title>
<p>A lot of existing research has previously shown that machine learning algorithms performed better and produced findings that were more accurate than those obtained. This section includes some of the best and most efficient learning strategies that shed light on earlier developments that numerous researchers had suggested on how to enhance the learning efficacy of their networks to obtain some favorable and promising results for this category. The early two decades of the 21<sup>st</sup> century have witnessed that more information is available than ever before on the internet. The use of the internet is increasing explosively and may offer people universal access to searching web images, as a significant information transmission medium. However, pornography images are widely available on the internet which affects our young generation badly, due to the lack of control over information sources. Various studies have been conducted to detect Pornographic <italic>vs.</italic> Nonpornographic images. The authors of [<xref ref-type="bibr" rid="ref-29">29</xref>] suggested a new approach using Multi-Layer Perceptron combine with fuzzy integral-based information fusion for identifying pornographic images. Similarly, in [<xref ref-type="bibr" rid="ref-30">30</xref>], the authors proposed principal component analysis (PCA) to gain high accuracy and detect the desired area properly with a high rate of feature extraction. The result showed that the proposed method increases the accuracy up to 4.0 and decreases the false positive ratio (FPR) by 20.6 respectively, and the authors of [<xref ref-type="bibr" rid="ref-27">27</xref>] worked in nude image classification. For this purpose, a local-context-aware network (LocoaNet) was proposed to classify the nude image. By developing a multi-task learning scheme, LocoaNet and employing an obscene detection network can extract negative features for obscene image classification. The result analysis shows that the desired model achieved better results having high processing speed on both datasets. The authors of [<xref ref-type="bibr" rid="ref-19">19</xref>] developed a new model for the detection of obscene image detection. The author used traditional detection whose design is based on shape features. The author represents sexual organs using a histogram of gradient-based shapes. The sexual organ detector is trained by the color-saliency preserved mixture deformable part model (CPMDPM). The evaluation shows that the performance of CPMDPM is superior to shape feature-based detectors. In [<xref ref-type="bibr" rid="ref-18">18</xref>], the authors developed a new model named Adult Content Recognition with Deep Neural Networks (ACRDNN) for adult video classification. ACRDNN is a combination of convolutional networks. (ConNet) and long short-term memory (LSTM). The performance evaluation shows that the proposed model performs better than the state-of-the-art for the detection of obscene images. Furthermore, in [<xref ref-type="bibr" rid="ref-27">27</xref>], the authors designed multiple instance learning (MIL) approaches to detect pornographic images. The author proposed weighted MIL under the CNN framework. The dataset consisted of 138&#x2005;K obscene images and 205&#x2005;K normal images. The overall result shows that the desired model achieved 97.52 accuracies. The authors of [<xref ref-type="bibr" rid="ref-1">1</xref>] designed a porn images and text detection system for gambling websites. The proposed study uses a decision mechanism. The performance evaluation shows that the model achieved highly satisfactory results, and the framework is feasible and effortless for the detection of such items. Likewise, the authors of [<xref ref-type="bibr" rid="ref-31">31</xref>] proposed the combination of COCO-trained you only look once (YOLOv3) for nudity detection. In addition, they use CNN models such as AlexNet, Google Net, VGG16, Inception v3, and ResNet for feature extraction. The proposed model classifies images into two classes nude and normal. The performance evaluation shows that the proposed model of YOLO-CNN achieved high efficiency as compared to standard CNN Models. <xref ref-type="table" rid="table-1">Table 1</xref> shows the comparative study of previous research in this paper.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Cooperative study</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Existing works</th>
<th align="left">Methodology</th>
<th align="left">Dataset</th>
<th align="left">Result</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-15">15</xref>]</td>
<td align="left">Skin tone filter with a novel set of facial features systems</td>
<td align="left">For child pornographic image detection</td>
<td align="left">80&#x0025; accuracy rate<break/>TP 83&#x0025;</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-16">16</xref>]</td>
<td align="left">Transfer learning with multiple feature fusion, recognition,</td>
<td align="left">And negative content images</td>
<td align="left">With 0.91&#x0025; accuracy</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-17">17</xref>]</td>
<td align="left">Proposed a human pose model for sexual organ detection.</td>
<td align="left">The sexual organ detection dataset</td>
<td align="left">89.50&#x0025; detection rate &#x0026; with an 18.64&#x0025; error rate</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td align="left">Maximum posterior approach for pornographic classification.</td>
<td align="left">Pornographic classification</td>
<td align="left">With a 0.8 low error rate</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td align="left">Proposed deformable part model and SVM for sexual organ detection.</td>
<td align="left">Sexual organ detection</td>
<td align="left">50&#x0025; higher result in the used models</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td align="left">The semantic tree model and SVM</td>
<td align="left">Pornographic classification</td>
<td align="left">Achieved 87.6&#x0025; accuracy.</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-21">21</xref>]</td>
<td align="left">Furthermore, the authors of implementing color features with CNN</td>
<td align="left">Pornography, and upskirt images dataset</td>
<td align="left">Show 90.23&#x0025; accuracy</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td align="left">ANN to increase the reliability of detectors skin</td>
<td align="left">detectors skin</td>
<td align="left">Achieved 95.6176&#x0025; TP &#x0026; 0.8795&#x0025; FP rate</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-23">23</xref>]</td>
<td align="left">CNN</td>
<td align="left">skin detectors</td>
<td align="left">Achieved 91&#x0025; accuracy which is better than ANN</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-24">24</xref>]</td>
<td align="left">RNN and CNN</td>
<td align="left">For human skin detection.</td>
<td align="left">Show better performance</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-25">25</xref>]</td>
<td align="left">Google Net, CNN, model and pornographic <italic>vs.</italic> color feature histogram</td>
<td align="left">Pornographic <italic>vs.</italic> nonpornographic classification on the NPDI dataset.</td>
<td align="left">The proposed model reaches 99.31&#x0025;, accuracy which is 2.67&#x0025; higher than the CNN</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-26">26</xref>]</td>
<td align="left">CNN with multiple instance learning (MIL)</td>
<td align="left">To detect exposed body parts on the NPDI dataset.</td>
<td align="left">Show remarkable accuracy with 97.01&#x0025; TP at 1&#x0025; FP,</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td align="left">Proposed Google Net model for region-based recognition.</td>
<td align="left">Sexual organs on the NPDI dataset</td>
<td align="left">The claimed accuracy is 97&#x0025; with a rate of true positive and 1&#x0025; with a rate of false positive</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td align="left">proposed CNN and Google Net architecture for</td>
<td align="left">Video pornography detection.</td>
<td align="left">97.8&#x0025; is the claimed accuracy along with 64.3&#x0025; error reduction</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td align="left">Multi-layer perceptron combine with fuzzy integral-</td>
<td align="left">pornographic images</td>
<td align="left">During training, the claimed accuracy is 93.1&#x0025; in true positive and 8.2&#x0025; in false positive; while in testing, 87.1&#x0025; and 5.4&#x0025; respectively</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-30">30</xref>]</td>
<td align="left">(PCA)</td>
<td align="left">Pornographic images</td>
<td align="left">Increase the accuracy by up to 12&#x0025; and decrease the FP by about 16&#x0025;,</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td align="left">YOLOv3 for nudity detection.</td>
<td align="left">Nudity image detection.</td>
<td align="left">The accuracy is improved by 4&#x0025; from 85.5&#x0025; to 89.5&#x0025;. Moreover, the AUC is also increased from 93&#x0025; to 97&#x0025;.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3"><label>3</label><title>Learning Models</title>
<sec id="s3_1"><label>3.1</label><title>CNN Overview</title>
<p>The generalized structure of the CNN network was proposed by [<xref ref-type="bibr" rid="ref-32">32</xref>]. Due to computation hardware limits the researcher did not widely use this network before for training. In the 1990s, the authors of [<xref ref-type="bibr" rid="ref-33">33</xref>] proposed CNNs with gradient-based learning algorithms for handwritten digit classification problems. After attaining successful results in different recognition tasks many works have been done by different researchers to further improve CNNs for achieving more state-of-the-art results. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> shows the overall CNNs architecture. The CNN architecture is made up of a variety of layers, including completely linked, max-pooling, and convolution layers. Max-pooling layers are in the network&#x2019;s middle level while convolutional layers are located at a low level. By performing convolution operations upon the input images, each node of the convolution layer retrieves the features of the input images [<xref ref-type="bibr" rid="ref-34">34</xref>]. Two Conv2D layers having Relu activation functions, two Max-pooling layers, a Flatten layer, &#x0026; two dense layers with Relu and sigmoid activation functions make up the tiny CNN model&#x2019;s detailed architecture. Conv2D has a kernel size of (3332) and a bias of 32. Batch input is in the form of 224, 224, 3, and 32-filter layers with Relu activation and Maxpooling layers. Similar to the first conv2D layer, the second conv2D layer was followed by Maxpooling layers. Additionally, the Relu activation function was utilized in the first dense layer, which has a kernel size of (93312&#x2009;&#x002A;&#x2009;128), a bias value of 128, and a kernel size. The last dense layer, on the other hand, employed a kernel size of (1281), a bias of 1, and a sigmoid activation function. The next model used in the paper is the ResNet-50 model which is discussed in the next section.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>The structure of CNN [<xref ref-type="bibr" rid="ref-34">34</xref>]</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-1.tif"/></fig>
</sec>
<sec id="s3_2"><label>3.2</label><title>Residual Network (ResNet-50)</title>
<p>The ResNet network model was developed by Kaiming [<xref ref-type="bibr" rid="ref-35">35</xref>,<xref ref-type="bibr" rid="ref-36">36</xref>]. This model is consisting of designing ultra-deep networks which solve the vanishing gradient problem that exists in the previous model. Various researchers have developed ResNet with many different numbers of layers; such as 101, 34, 50, 152, and even1202. One of the most efficient models among all of these models is the ResNet-50 network model. The ResNet-50 model consists of a total of 50 layers. Among all the 50 layers 49 layers are convolution layers while at the end of the network one fully connected layer is used. There were 3.9 million MACs and 25.5 million weights in the entire network. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the fundamental block diagram of the residual block inside the ResNet architecture. A residual connection makes up each of the basic residual blocks in the ResNet model. Based on the outputs, it is possible to define the output of a residual layer that is created from the output of the preceding layer after performing various operations, such as convolution with various filter sizes and batch normalization (BN) following through an activation function, like a Relu. However, based on the various architectures of residual networks, the operations inside the residual block can vary [<xref ref-type="bibr" rid="ref-35">35</xref>].</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Residual block diagram</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-2.tif"/></fig>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Methodology</title>
<sec id="s4_1"><label>4.1</label><title>Dataset</title>
<p>This study used Explicit and Non-Explicit Pornography datasets containing nearly 3,800 obscene and 3,200 non-obscene images, which are collected from the Explicit Content Detection System [<xref ref-type="bibr" rid="ref-37">37</xref>]. The dataset consists of certain classes of obscenity and depicts actors of multiple ethnicities, including multi-ethnic ones. The non-obscene images were chosen at random and were selected from textual search queries like &#x201C;beach,&#x201D; &#x201C;wrestling,&#x201D; and &#x201C;swimming,&#x201D; which contain body skin but not nudity or porn. In total, there are 7,000 frames (images). In this research, 70&#x0025;, of the dataset is used for training purposes while the rest of the dataset is used for testing the model. <xref ref-type="fig" rid="fig-3">Figs. 3</xref> and <xref ref-type="fig" rid="fig-4">4</xref> highlight some statistics and samples of this dataset, respectively. All the images are taken from the link given <ext-link ext-link-type="uri" xlink:href="https://drive.google.com/drive/folders/1T2lwjWcW3L2DQw27ruwm8hyLT9teIUb9?usp=sharing">https://drive.google.com/drive/folders/1T2lwjWcW3L2DQw27ruwm8hyLT9teIUb9?usp=sharing</ext-link>.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Explicit and non-explicit images</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-3.tif"/></fig>
<fig id="fig-4"><label>Figure 4</label><caption><title>Body skin but not nudity images</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-4.tif"/></fig>
</sec>
<sec id="s4_2"><label>4.2</label><title>Processing Image Data</title>
<p>To easily set up Python generators that automatically transform picture files in preprocessed tensors fed straight into models during training, utilize the preprocessing Keras ImageDataGenerator() method. On the training dataset, picture alteration operations including translation, rotation, and zooming are employed to produce new iterations of existing images. This study added these extra images to the model during training. It effectively completes the duties listed below for us: Recognize the RGB pixel grids from the JPEG data. Then transform these into floating point tensors. The image is randomly zoomed by 0.1 using the zoom range parameter. The image is rotated by a random 10 degrees using the rotation range parameter.</p>
<p>The height_shift_range and the width_shift_range parameters are used to translate the image randomly horizontally or vertically of the image&#x2019;s width or height by a 0.1 factor. The sheer range parameter is used to apply shear-based transformations randomly. Rescale (between 0 and 255) the pixel values to the [0, 1] interval. It takes the image pixel values and normalizes them to have a 0 mean and 1 standard deviation. The data expansion operation was applied to 30&#x0025; testing dataset. After applying these five data generator functions mentioned in <xref ref-type="table" rid="table-2">Table 2</xref>, on testing data, the total number of augmented data reach up to 9265 images. The main purpose of applying the data generator function is to check the performance of the used models with augmented and without augmented datasets. <xref ref-type="table" rid="table-2">Table 2</xref> gives the ImageDataGenerator function detail.</p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Image data generator function</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">ImageDataGenerator name</th>
<th align="left">Selected value</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Rescale</td>
<td align="left">1/255&#x2009;&#x003D;&#x2009;0.0039</td>
</tr>
<tr>
<td align="left">Width_shift_range</td>
<td align="left">0.2</td>
</tr>
<tr>
<td align="left">Heigh_shift_range</td>
<td align="left">0.2</td>
</tr>
<tr>
<td align="left">Shear_range</td>
<td align="left">0.2</td>
</tr>
<tr>
<td align="left">Zoom_range</td>
<td align="left">0.2</td>
</tr>
<tr>
<td align="left">Rotation_range</td>
<td align="left">10</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_3"><label>4.3</label><title>Processing VGG-16 Model</title>
<p>VGG-16 and VGG-14-based CNN architectures are proposed in this study. The VGG-14 model has 14 layers. While the VGG-16 has 16 layers. The ImageNet database was used to train the model. On various image classification &#x0026; image recognition datasets, this model showed good classification performance. VGG-16 and VGG-14 can be implemented using a high-level library that supports Tensor Flow &#x0026; Theano Backend called the Keras framework. Thirteen convolutional layers and a dense or linked layer were used to build the VGG-14 model. While the VGG-16 was built using 3 fully connected layers and 13 convolutional layers. Up till the dense prediction layer, each of the 14 levels carries a weight that is incrementally transferred. Whereas the convolutional layers need a 3&#x2009;&#x00D7;&#x2009;3 frame with such a Relu activation function for each hidden layer, the pooling layer uses a 2&#x2009;&#x00D7;&#x2009;2 window. The SoftMax activation function is used in the dense prediction layer or the final layer. The fully connected layer receives the bottleneck characteristic from the final activation layer. To cut down on training time, the bottleneck feature can be used as a pre-training weight. The completely connected layer is where the transfer learning takes place. In our experiment, <xref ref-type="fig" rid="fig-5">Fig. 5</xref> displays the hyper-parameters for the suggested max-pooling model&#x2019;s diagram. We made the following adjustments to the VGG-14 model:
<list list-type="simple">
<list-item><label>&#x25A0;</label><p>Remove the dense fully connected layer, which employs 1000 classes of SoftMax.</p>
</list-item>
<list-item><label>&#x25A0;</label><p>Replace the dense fully connected layer with two sigmoid classes.</p></list-item>
<list-item><label>&#x25A0;</label><p>All CNN layers are trainable before the fully connected layers because we want to train all CNN layers, and our modified VGG-14 is also trained on fully connected layers.</p></list-item>
<list-item><label>&#x25A0;</label><p>To improve the training&#x2019;s efficiency, use Adam&#x2019;s [<xref ref-type="bibr" rid="ref-38">38</xref>] learning algorithm.</p></list-item>
<list-item><label>&#x25A0;</label><p>Finally, the Optimizer Adam learning rate is 5e-5, and binary cross-entropy is used to evaluate losses.</p></list-item>
</list>
</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>The architecture of the proposed models</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-5.tif"/></fig>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Evaluation Parameters</title>
<p>To check the performance of the proposed model is compared to the state-of-the-art models in terms of various parameters which is used by the various researcher is discuss. The confusion matrix (<italic>n</italic>&#x2009;&#x00D7;&#x2009;<italic>n</italic>) which might be known as the likelihood table or error matrix is utilized to describe the performance of any identification analysis of having various drawbacks. The size of such a matrix relies on the produced number of classes. Therefore, the summation of the entire recovered positive values (such as true positive (TP) and true negative (TN)) is reflected as accurate identification. In true positive, the subject is presented as associated with a class; however, it does not belong to that class. While, in false positive (FP), the element is presented as not associated with a class, but does not belong to that class. On the other side, the rest of the cases are considered as rejected which are the combination of false positive and false negative (FP&#x2009;&#x002B;&#x2009;FN).</p>
<sec id="s5_1"><label>5.1</label><title>Recall</title>
<p>The recall is known as the rate of true positive rate, which is the whole number of TP values divided by the respective summation of the whole values of the TP and FN. Recall is a quantitative measure that presents the extensiveness of the results. The rate of the TP is an accurately identified number, which means high recall is accurately retrieved along with high efficiency.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext>FN</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
</sec>
<sec id="s5_2"><label>5.2</label><title>Precision</title>
<p>The precision describes the whole number of the value that is positively retrieved, which means that the whole number of retrieved values is correlated and belongs to a specific class. Precision might be measured by dividing the number of TP values by the respective summation of the whole values of the TP and FP.
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mtext>Precision</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext>FP</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
</sec>
<sec id="s5_3"><label>5.3</label><title>F-Measure</title>
<p>F-measure uses average harmonic instead of using average arithmetic. F-Measure is calculated from recall and precision is given as.
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mtext>F</mml:mtext></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mtext>Measure</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mn>2</mml:mn><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x2217;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:mo>&#x2217;</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mtext>Precision</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>Recall</mml:mtext></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:mo>+</mml:mo><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mtext>Precision</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
</sec>
<sec id="s5_4"><label>5.4</label><title>Accuracy</title>
<p>Accuracy presents the efficiency of the proposed model, which shows that the model is safe and compact with the detection of positive and negative values. Also, the proposed approach can retrieve accurate values, which are presented in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mtext>Accuracy</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext>TN</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>TP</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext>TN</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext>FP</mml:mtext></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mtext>FN</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
</sec>
</sec>
<sec id="s6"><label>6</label><title>Simulation Results</title>
<p>This section explains the simulation result of the various models used in this study. First, the random hyperparameter search experiment results, with the Small-CNN model, are presented. Then the result is followed by VGG16, VGG-14, and ResNet-50, models trained with transfer learning using Image Net pre-trained weights.</p>
<sec id="s6_1"><label>6.1</label><title>Small CNN</title>
<p>CNN architecture is the first proposed model of this research which is self-defined architecture, that&#x2019;s why we called it small CNN. The small CNN model consists of two convolution layers with max pooling, one flatten layer, and two dense layers. The kernel size is 3&#x2009;&#x00D7;&#x2009;3, the input shape of the image is 224&#x2009;&#x00D7;&#x2009;224, the feature maps are 32, and Relu has used an activation function. While the output layer consists of a sigmoid activation function. <xref ref-type="table" rid="table-3">Table 3</xref> shows the hyperparameter of the small CNN model and <xref ref-type="table" rid="table-4">Table 4</xref> describes the small CNN architecture.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Hyperparameters value rates</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Hyperparameter</th>
<th align="left">Selected value</th>
<th align="left">Value range</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Convolutional layers</td>
<td align="left">2</td>
<td align="left">&#x007B;1, 2&#x007D;</td>
</tr>
<tr>
<td align="left">Fully connected layers</td>
<td align="left">2</td>
<td align="left">&#x007B;1, 2&#x007D;</td>
</tr>
<tr>
<td align="left">Kernel size</td>
<td align="left">3</td>
<td align="left">&#x007B;3, 5, 7, 9, 11&#x007D;</td>
</tr>
<tr>
<td align="left">Feature maps</td>
<td align="left">32</td>
<td align="left">&#x007B;16, 32, 48, 64&#x007D;</td>
</tr>
<tr>
<td align="left">Learning rate</td>
<td align="left">0.00001</td>
<td align="left">10 &#x2212;5&#x2009;&#x2212;&#x2009;10 &#x2212;4</td>
</tr>
<tr>
<td align="left">Input size</td>
<td align="left">224</td>
<td align="left">&#x007B;64, 96, 128, 224&#x007D;</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-4"><label>Table 4</label><caption><title>The architecture of the small CNN</title></caption>
 
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Layer</th>
<th align="left">K</th>
<th align="left">AF</th>
<th align="left">FM</th>
<th align="left">Input size</th>
<th align="left">Output size</th>
<th align="left">Parameters</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Input size</td>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left">[224&#x2009;&#x00D7;&#x2009;224&#x2009;&#x00D7;&#x2009;3]</td>
<td align="left">[224&#x2009;&#x00D7;&#x2009;224&#x2009;&#x00D7;&#x2009;3]</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">Conv2d max_pooling2d</td>
<td align="left">3&#x2009;&#x00D7;&#x2009;3</td>
<td align="left">Relu</td>
<td align="left">32</td>
<td align="left">[224&#x2009;&#x00D7;&#x2009;224&#x2009;&#x00D7;&#x2009;3]<break/>[222&#x2009;&#x00D7;&#x2009;222&#x2009;&#x00D7;&#x2009;32]</td>
<td align="left">[222&#x2009;&#x00D7;&#x2009;222&#x2009;&#x00D7;&#x2009;32]<break/>[111&#x2009;&#x00D7;&#x2009;111&#x2009;&#x00D7;&#x2009;32]</td>
<td align="left">896<break/>0</td>
</tr>
<tr>
<td align="left">Conv2d_1 max_poolg2d1</td>
<td align="left">3&#x2009;&#x00D7;&#x2009;3</td>
<td align="left">Relu</td>
<td align="left">32</td>
<td align="left">[111&#x2009;&#x00D7;&#x2009;111&#x2009;&#x00D7;&#x2009;32]<break/>[109&#x2009;&#x00D7;&#x2009;109&#x2009;&#x00D7;&#x2009;32]</td>
<td align="left">[109&#x2009;&#x00D7;&#x2009;109&#x2009;&#x00D7;&#x2009;32]<break/>[54&#x2009;&#x00D7;&#x2009;54&#x2009;&#x00D7;&#x2009;32]</td>
<td align="left">9,248<break/>0</td>
</tr>
<tr>
<td align="left">Flatten</td>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left">[54&#x2009;&#x00D7;&#x2009;54&#x2009;&#x00D7;&#x2009;32]</td>
<td align="left">93312</td>
<td align="left">0</td>
</tr>
<tr>
<td align="left">Dense</td>
<td align="left"/>
<td align="left">Relu</td>
<td align="left"/>
<td align="left">93312</td>
<td align="left">128</td>
<td align="left">11,944,064</td>
</tr>
<tr>
<td align="left">dense_1</td>
<td align="left"/>
<td align="left">sigmoid</td>
<td align="left"/>
<td align="left">128</td>
<td align="left">1</td>
<td align="left">129</td>
</tr>
<tr>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"/>
<td align="left"><bold>11,954,337</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> displays the CNN model&#x2019;s total accuracy and loss. It demonstrates the training performance of CNN. The learning algorithm is archived once CNN training has been completed. When training starts on the first epoch its accuracy is 0.64 and gradually the accuracy increased with 8 epochs the final accuracy reached 0.776. The early halting policy caused the training to end after 8 epochs. After the last epoch, the model achieved an accuracy for the training of 0.776 and an accuracy for validation of 0.79 with losses of 0.48 &#x0026; 0.46. Similarly, <xref ref-type="table" rid="table-5">Table 5</xref> presents the performance assessment of the CNN on the training dataset. <xref ref-type="table" rid="table-5">Table 5</xref> contains the explicit Images and Non_Clear_Pictures According to testing data in terms of precision, recall, and <italic>f</italic><sub>1</sub>-measure. Here the average clear image precision is 0.81 and the average Non_Clear_Pictures is 0.79 on testing data. The explicit images recall average is 0.81 and the average is Non_Clear_Pictures 0.79 on testing data. F-measure average explicit Images is 0.81 while the average micro is 0.79. While <xref ref-type="fig" rid="fig-6">Figs. 6</xref> and <xref ref-type="fig" rid="fig-7">7</xref> describe the training performance and confusion matrix performance of the CNN model.</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Training performance of small CNN</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-6.tif"/></fig><table-wrap id="table-5"><label>Table 5</label><caption><title>Precision recall and <italic>f</italic>-measure of the performance of the CNN model</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="left">Support</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F-measure</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Clear pictures</td>
<td align="left">967</td>
<td align="left">0.81</td>
<td align="left">0.81</td>
<td align="left">0.81</td>
</tr>
<tr>
<td align="left">Non_clear pictures</td>
<td align="left">886</td>
<td align="left">0.79</td>
<td align="left">0.80</td>
<td align="left">0.79</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.79</td>
<td align="left">0.79</td>
<td align="left">0.79</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.79</td>
<td align="left">0.79</td>
<td align="left">0.79</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-7"><label>Figure 7</label><caption><title>Confusion matrix of small CNN</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-7.tif"/></fig>
</sec>
<sec id="s6_2"><label>6.2</label><title>ResNet-50</title>
<p>One of the most efficient models among all of the ResNet models is the ResNet-50 network model. The ResNet-50 model has a total of 50 layers. Among all the 50 layers, 49 layers are convolutional layers of the total of layers, and finally 1 dense layer. For the entire network, there are 25.5&#x2005;M weights and 3.9&#x2005;M MACs, respectively. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> depicts the fundamental block diagram of the residual block in the ResNet architecture. A conventional feed-forward network with a residual connection is called ResNet-50. <xref ref-type="table" rid="table-6">Table 6</xref> shows the ResNet-50 performance during training. On the final epoch, the model reached a training accuracy of 0.89 with a loss of 0.389 and 0.86 of validation accuracy and a loss of 0.47. Similarly, <xref ref-type="table" rid="table-6">Table 6</xref> shows the performance evaluation of the ResNet-50 model, which contains the explicit Images and Non_Clear_Pictures precision, recall, and <italic>f</italic>-measure on testing data. Here the average explicit images precision is 0.86 and the average Non_Clear_Pictures is 0.96 using validation data. Recall on clear images is 0.97 and on Non_Clear_Pictures is 0.83. Similarly, the <italic>f</italic>-measure on average explicit Images is 0.91, and on Non_Clear_Pictures is 0.89. The overall accuracy and loss of the ResNet-50 model are shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>. While <xref ref-type="fig" rid="fig-9">Fig. 9</xref> shows the confusion matrix performance of the ResNet-50 model.</p>
<table-wrap id="table-6"><label>Table 6</label><caption><title>Precision recall and <italic>f</italic> measure the performance of the ResNet-50 model</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="left">Support</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F-measure</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">Clear pictures</td>
<td align="left">967</td>
<td align="left">0.87</td>
<td align="left">0.98</td>
<td align="left">0.92</td>
</tr>
<tr>
<td align="left">Non-clear pictures</td>
<td align="left">886</td>
<td align="left">0.97</td>
<td align="left">0.84</td>
<td align="left">0.90</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.91</td>
<td align="left">0.90</td>
<td align="left">0.90</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.91</td>
<td align="left">0.90</td>
<td align="left">0.90</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-8"><label>Figure 8</label><caption><title>Training performance of ResNet-50 with expansion data</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-8.tif"/></fig>
<fig id="fig-9"><label>Figure 9</label><caption><title>Training performance of ResNet-50 with expansion data</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-9.tif"/></fig>
</sec>
<sec id="s6_3"><label>6.3</label><title>VGG-16</title>
<p>VGG-16-based CNN architectures are proposed in this study. This study further validated the VGG-14 model which has 14 layers. The model was trained using the Image Net database. The VGG-14 model is trained with expansion data. While the VGG-16 model is trained with both augmented and without augmented data. This model demonstrated strong classification performance on a variety of image classification and image recognition datasets. The Keras framework, tensor flow supported with Theano high-level library, is used to implement VGG-14 and VGG-16 models. The VGG-14 model was constructed having 13 convolutional layers with one dense or connected layer. Each of the 14 layers has a weight that is gradually increased to a dense prediction layer. The pooling layer employs a 2&#x2009;&#x00D7;&#x2009;2 window, while the convolutional layers use a 3&#x2009;&#x00D7;&#x2009;3 frame with a Relu activation function in each hidden layer. The SoftMax gradient descent algorithm is used in the dense or the prediction layer. The bottleneck feature is sent down to the fully connected layer from the final activation layer. To reduce training time, pre-trained weights are assigned to the bottleneck feature. Fully connected layers produce transfer learning <xref ref-type="table" rid="table-7">Tables 7</xref> and <xref ref-type="table" rid="table-8">8</xref> show the performance of VGG-14, and VGG-16 models in terms of loss, accuracy, precision-recall, and f-measure. The VGG-16 model with augmented data reached a training and validation accuracy of 0.97, 0.94 with a loss of 0.079, 0.16. The recall, f-measure and precision values for explicit, and non-explicit images are (0.94, 0.94, 0.94) and (0.94, 0.94, 0.94). Where else without augmented data the VGG-16 model reached a training and validation accuracy of 0.997, 0.986 with a loss of 0.008, 0.056. Further, the precision, recall, and f-measure values for explicit and non-explicit images without augmented data are (0.94, 0.99, 0.97) and (0.99, 0.93, 0.97). Similarly, the VGG-14 model contains the explicit Images and Non_Clear_Pictures of precision, recall, and <italic>f</italic><sub>1</sub>-measure on testing data. The VGG-14 model with augmented data reached a training and validation accuracy of 0.98, 0.96 with a loss of 0.059, 0.11. The f-measure, recall, and precision values for explicit and non-explicit images are (0.98, 0.98, 0.98) and (0.98, 0.98, 0.98). Similarly, <xref ref-type="fig" rid="fig-10">Fig. 10</xref> shows the accuracy and loss of the VGG-16 model without an augmented dataset. The overall accuracy and loss of the VGG-16 are shown in <xref ref-type="fig" rid="fig-11">Fig. 11</xref> with the augmented dataset. While <xref ref-type="fig" rid="fig-12">Fig. 12</xref> shows the performance evaluations of the VGG-14 model with the augmented dataset in terms of accuracy and loss. Further, <xref ref-type="fig" rid="fig-13">Figs. 13</xref> and <xref ref-type="fig" rid="fig-14">14</xref> consist of the confusion matrix of the VGG-16 and VGG-14 model respectively. Where the CNN give the evaluation performed on the average Explicit Images in term of precision is 0.80 and the average Non_Clear_Pictures is 0.78 on testing data. The explicit images recall average is 0.80 and the average is Non_Clear_Pictures 0.79 on testing data. F-measure average explicit images is 0.80 while the average micro is 0.78. In the ResNet-50 performance during training on the final epoch, the model reached a training accuracy of 0.89 with a loss of 0.389 and 0.86 of validation accuracy and a loss of 0.47. Here the average explicit images precision is 0.86 and the average Non_Clear_Pictures is 0.96 on testing data. Recall on explicit images is 0.97 and on Non_Clear_Pictures is 0.83. Similarly, the <italic>f</italic>-measure on average clear Images is 0.91, and on Non_Clear_Pictures is 0.89.</p>
<table-wrap id="table-7"><label>Table 7</label><caption><title>Accuracy and loss of VGG-16, and VGG-14 models</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Models</th>
<th align="left">Tr-Acc</th>
<th align="left">Val-Acc</th>
<th align="left">Tr-Loss</th>
<th align="left">Val-Loss</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="left">VGG-16 with expansion</td>
<td align="left">0.98</td>
<td align="left">0.95</td>
<td align="left">0.080</td>
<td align="left">0.17</td>
</tr>
<tr>
<td align="left">VGG-14 with expansion</td>
<td align="left">0.99</td>
<td align="left">0.97</td>
<td align="left">0.060</td>
<td align="left">0.12</td>
</tr>
<tr>
<td align="left">VGG-16 Without expansion</td>
<td align="left">0.99</td>
<td align="left">0.98</td>
<td align="left">0.008</td>
<td align="left">0.05</td>
</tr>
<tr>
<td align="left">CNN with expansion</td>
<td align="left">0.78</td>
<td align="left">0.78</td>
<td align="left">0.48</td>
<td align="left">0.46</td>
</tr>
<tr>
<td align="left">ResNet-50 with expansion</td>
<td align="left">0.89</td>
<td align="left">0.86</td>
<td align="left">0.39</td>
<td align="left">0.47</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-8"><label>Table 8</label><caption><title>Precision-recall and <italic>f</italic>-measure performance of VGG-16 and VGG-14 models</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead valign="top">
<tr>
<th align="left">Dataset</th>
<th align="left">Support</th>
<th align="left">Precision</th>
<th align="left">Recall</th>
<th align="left">F-measure</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td align="center" colspan="5">VGG-16 with augmentation data</td>
</tr>
<tr>
<td align="left">Clear pictures</td>
<td align="left">967</td>
<td align="left">0.95</td>
<td align="left">0.95</td>
<td align="left">0.95</td>
</tr>
<tr>
<td align="left">Non-clear pictures</td>
<td align="left">886</td>
<td align="left">0.95</td>
<td align="left">0.95</td>
<td align="left">0.95</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.94</td>
<td align="left">0.94</td>
<td align="left">0.94</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.94</td>
<td align="left">0.94</td>
<td align="left">0.94</td>
</tr>
<tr>
<td align="center" colspan="5">VGG-14 with expansion</td>
</tr>
<tr>
<td align="left">Clear pictures</td>
<td align="left">969</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
</tr>
<tr>
<td align="left">Non-clear pictures</td>
<td align="left">886</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
<td align="left">0.99</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.98</td>
<td align="left">0.98</td>
<td align="left">0.98</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.98</td>
<td align="left">0.98</td>
<td align="left">0.98</td>
</tr>
<tr>
<td align="center" colspan="5">VGG-16 without expansion</td>
</tr>
<tr>
<td align="left">Clear pictures</td>
<td align="left">969</td>
<td align="left">0.95</td>
<td align="left">0.98</td>
<td align="left">0.98</td>
</tr>
<tr>
<td align="left">Non-clear pictures</td>
<td align="left">886</td>
<td align="left">0.98</td>
<td align="left">0.94</td>
<td align="left">0.96</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.97</td>
<td align="left">0.96</td>
<td align="left">0.96</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.96</td>
<td align="left">0.96</td>
<td align="left">0.96</td>
</tr>
<tr>
<td align="center" colspan="5">CNN with expansion</td>
</tr>
<tr>
<td align="left">Clear pictures</td>
<td align="left">969</td>
<td align="left">0.81</td>
<td align="left">0.81</td>
<td align="left">0.81</td>
</tr>
<tr>
<td align="left">Non-clear pictures</td>
<td align="left">886</td>
<td align="left">0.79</td>
<td align="left">0.80</td>
<td align="left">0.79</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.96</td>
<td align="left">0.97</td>
<td align="left">0.97</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.96</td>
<td align="left">0.95</td>
<td align="left">0.96</td>
</tr>
<tr>
<td align="center" colspan="5">ResNet-50 with expansion</td>
</tr>
<tr>
<td align="left">Clear pictures</td>
<td align="left">969</td>
<td align="left">0.87</td>
<td align="left">0.98</td>
<td align="left">0.92</td>
</tr>
<tr>
<td align="left">Non-clear pictures</td>
<td align="left">886</td>
<td align="left">0.97</td>
<td align="left">0.84</td>
<td align="left">0.90</td>
</tr>
<tr>
<td align="left">Macro_avg</td>
<td align="left">1853</td>
<td align="left">0.95</td>
<td align="left">0.96</td>
<td align="left">0.97</td>
</tr>
<tr>
<td align="left">Weighted_avg</td>
<td align="left">1853</td>
<td align="left">0.96</td>
<td align="left">0.97</td>
<td align="left">0.96</td>
</tr>
</tbody>
</table>
</table-wrap>

<fig id="fig-10"><label>Figure 10</label><caption><title>Training performance of VGG-16 without augmented data</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-10.tif"/></fig>
<fig id="fig-11"><label>Figure 11</label><caption><title>Training performance of VGG-16 with augmented data</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-11.tif"/></fig>
<fig id="fig-12"><label>Figure 12</label><caption><title>Training performance of VGG-14 with expanded data</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-12.tif"/></fig><fig id="fig-13"><label>Figure 13</label><caption><title>Confusion matrix of VGG-16</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-13.tif"/></fig><fig id="fig-14"><label>Figure 14</label><caption><title>Confusion matrix of VGG-14</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_40212-fig-14.tif"/></fig>
</sec>
</sec>
<sec id="s7"><label>7</label><title>Conclusions</title>
<p>Nowadays, majority of the people utilized different kinds of social media platforms to share various information and communicate with the world. Most young children use various kinds of social media platforms on their smartphones for playing games. While playing games, different pornographic material and ads are shared, which badly affect the mind of teenage kids. On the other hand, it is also widely reported that any information related to pornographic content is distributed on the internet. While playing games, different pornographic material and ads are shared, which badly affect the mind of teenage kids. Various research has been done on pornographic images. Therefore, this research proposed deep learning models will be based on CNN, VGG-14, and ResNet-50 with the expanded dataset, and the VGG-16 model was trained with transfer learning occurs in the fully connected layer with both augmented and without augmented dataset are used to classify Pornographic <italic>vs.</italic> Nonpornographic. The simulation result shows that VGG-16 performed better than the used model in this study with augmented data. From the overall simulation result, it shows that the VGG-16 model with augmented data reached a training and validation accuracy of 0.97, 0.94 with a loss of 0.070, 0.16. The precision, recall, and <italic>f<sub>1</sub></italic>-measure values for explicit and non-explicit images are (0.94, 0.94, 0.94) and (0.94, 0.94, 0.94). Where else without augmented data the VGG-16 model reached a training and validation accuracy of 0.997, 0.986 with a loss of 0.008, 0.056. Further, the precision, recall, and f-measure values for explicit and non-explicit images without augmented data are (0.94, 0.99, 0.97) and (0.99, 0.93, 0.97). Similarly, the VGG-14 model comprises the clear pictures and Non_Clear_Pictures of precision, recall, and <italic>f<sub>1</sub></italic>-measure on assessment data. The VGG-14 model with augmented data reached a training and validation accuracy of 0.98, 0.96 with a loss of 0.050, 0.11. Form overall analysis shows that the VGG-16 performed better without augmented data as compared with the used model in this research. In the future, this study will be extended to detect and blur the Pornographic both in video and image pornography detection using deep learning to enhance the YOLO-4 model.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>This work was funded by the Deanship of Scientific Research at Jouf University under Gran Number DSR&#x2013;2022&#x2013;RG&#x2013;0101.</p></sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p></sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Liang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>He</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>A pornographic images recognition model based on deep one-class classification with visual attention mechanism</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>8</volume>, pp. <fpage>122709</fpage>&#x2013;<lpage>122721</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><collab>Global digital report</collab></person-group>, <year>2019</year>. [Online]. <comment>Accessed: Jan. 30</comment>, Available. <ext-link ext-link-type="uri" xlink:href="https://wearesocial.com/global-digital-report-2019">https://wearesocial.com/global-digital-report-2019</ext-link></mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D. K.</given-names> <surname>Braun-Courville</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Rojas</surname></string-name></person-group>, &#x201C;<article-title>Exposure to sexually explicit web sites and adolescent sexual attitudes and behaviors</article-title>,&#x201D; <source>Journal of Adolescent Health</source>, vol. <volume>45</volume>, no. <issue>2</issue>, pp. <fpage>156</fpage>&#x2013;<lpage>162</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. D.</given-names> <surname>Brown</surname></string-name> and <string-name><given-names>K. L.</given-names> <surname>L&#x2019;Engle</surname></string-name></person-group>, &#x201C;<article-title>X-rated: Sexual attitudes and behaviors associated with US early adolescents&#x2019; exposure to sexually explicit media</article-title>,&#x201D; <source>Communication Research</source>, vol. <volume>36</volume>, no. <issue>1</issue>, pp. <fpage>129</fpage>&#x2013;<lpage>151</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wu</surname></string-name> and <string-name><given-names>D. J.</given-names> <surname>Atkin</surname></string-name></person-group>, &#x201C;<article-title>Third person effect and internet pornography in China</article-title>,&#x201D; <source>Telematics and Informatics</source>, vol. <volume>32</volume>, no. <issue>4</issue>, pp. <fpage>823</fpage>&#x2013;<lpage>833</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Tyson</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Elkhatib</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Sastry</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Uhlig</surname></string-name></person-group>, &#x201C;<article-title>Measurements and analysis of a major adult video portal</article-title>,&#x201D; <source>ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)</source>, <year>2016</year>, vol. <volume>12</volume>, no. <issue>2</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>25</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Nian</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Xu</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>Pornographic image detection utilizing deep convolutional neural networks</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>210</volume>, pp. <fpage>283</fpage>&#x2013;<lpage>293</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Cohen-Almagor</surname></string-name></person-group>, &#x201C;<article-title>Online child sex offenders: Challenges and counter-measures</article-title>,&#x201D; <source>The Howard Journal of Criminal Justice</source>, vol. <volume>52</volume>, no. <issue>2</issue>, pp. <fpage>190</fpage>&#x2013;<lpage>215</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. L.</given-names> <surname>Ybarra</surname></string-name>, <string-name><given-names>V. C.</given-names> <surname>Strasburger</surname></string-name> and <string-name><given-names>K. J.</given-names> <surname>Mitchell</surname></string-name></person-group>, &#x201C;<article-title>Sexual media exposure, sexual behavior, and sexual violence victimization in adolescence</article-title>,&#x201D; <source>Clinical Pediatrics</source>, vol. <volume>53</volume>, no. <issue>13</issue>, pp. <fpage>1239</fpage>&#x2013;<lpage>1247</lpage>, <year>2014</year>; <pub-id pub-id-type="pmid">24928575</pub-id></mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. L.</given-names> <surname>Ybarra</surname></string-name>, <string-name><given-names>K. L.</given-names> <surname>Mitchell</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Hamburger</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Diener-West</surname></string-name> and <string-name><given-names>P. J.</given-names> <surname>Leaf</surname></string-name></person-group>, &#x201C;<article-title>X-rated material and perpetration of sexually aggressive behavior among children and adolescents</article-title>,&#x201D; <source>Aggressive Behavior</source>, vol. <volume>37</volume>, no. <issue>1</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2011</year>; <pub-id pub-id-type="pmid">21046607</pub-id></mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>I. M. A.</given-names> <surname>Agastya</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Setyanto</surname></string-name> and <string-name><given-names>D. O. D.</given-names> <surname>Handayani</surname></string-name></person-group>, &#x201C;<article-title>Convolutional neural network for pornographic images classification</article-title>,&#x201D; in <conf-name>Proc. of 4th Int. Conf. on Advances in Computing, Communication &#x0026; Automation</conf-name>, <publisher-loc>Taylor&#x2019;s University Lakeside Campus, Malaysia</publisher-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Wehrmann</surname></string-name>, <string-name><given-names>G. S.</given-names> <surname>Sim&#x00F5;es</surname></string-name>, <string-name><given-names>R. C.</given-names> <surname>Barros</surname></string-name> and <string-name><given-names>V. F.</given-names> <surname>Cavalcante</surname></string-name></person-group>, &#x201C;<article-title>Adult content detection in videos with convolutional and recurrent neural networks</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>272</volume>, pp. <fpage>432</fpage>&#x2013;<lpage>438</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. S.</given-names> <surname>Nadeem</surname></string-name>, <string-name><given-names>V. N.</given-names> <surname>Franqueira</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhai</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Kurugollu</surname></string-name></person-group>, &#x201C;<article-title>A survey of deep learning solutions for multimedia visual content analysis</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>7</volume>, pp. <fpage>84003</fpage>&#x2013;<lpage>84019</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Moustafa</surname></string-name></person-group>, &#x201C;<article-title>Applying deep learning to classify pornographic images and videos</article-title>,&#x201D; <comment>ArXiv preprint arXiv: 1511.08899</comment>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Sae-Bae</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>H. T.</given-names> <surname>Sencar</surname></string-name> and <string-name><given-names>N. D.</given-names> <surname>Memon</surname></string-name></person-group>, &#x201C;<article-title>Towards automatic detection of child pornography</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. on Image Processing (ICIP)</conf-name>, <conf-loc>Paris, France</conf-loc>, pp. <fpage>5332</fpage>&#x2013;<lpage>5336</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Qin</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Peng</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Shao</surname></string-name></person-group>, &#x201C;<article-title>Fine-grained pornographic image recognition with multiple feature fusion transfer learning</article-title>,&#x201D; <source>International Journal of Machine Learning and Cybernetics</source>, vol. <volume>12</volume>, no. <issue>1</issue>, pp. <fpage>73</fpage>&#x2013;<lpage>86</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Shen</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Wei</surname></string-name> and <string-name><given-names>Q.</given-names> <surname>Qian</surname></string-name></person-group>, &#x201C;<article-title>A pornographic image filtering model based on erotic part</article-title>,&#x201D; in <conf-name>Proc. 3rd Int. Congress on Image and Signal Processing</conf-name>, <conf-loc>Yantai, China</conf-loc>, pp. <fpage>2473</fpage>&#x2013;<lpage>2477</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Choi</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Han</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Chung</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Ryou</surname></string-name></person-group>, &#x201C;<article-title>Human body parts candidate segmentation using laws texture energy measures with skin color</article-title>,&#x201D; in <conf-name>Proc. 13th Int. Conf. on Advanced Communication Technology</conf-name>, <conf-loc>Gangwon-Do, South Korea</conf-loc>, pp. <fpage>556</fpage>&#x2013;<lpage>560</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Tian</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Wei</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Gao</surname></string-name></person-group>, &#x201C;<article-title>Color pornographic image detection based on color-saliency preserved mixture deformable part model</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, vol. <volume>77</volume>, no. <issue>6</issue>, pp. <fpage>6629</fpage>&#x2013;<lpage>6645</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Lv</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Lv</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Shang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Yang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Pornographic images detection using high-level semantic features</article-title>,&#x201D; in <conf-name>Proc. 7th Int. Conf. on Natural Computation</conf-name>, <conf-loc>Shanghai, China</conf-loc>, vol. <volume>2</volume>, pp. <fpage>1015</fpage>&#x2013;<lpage>1018</lpage>, <year>2011</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Huang</surname></string-name> and <string-name><given-names>A. W. K.</given-names> <surname>Kong</surname></string-name></person-group>, &#x201C;<article-title>Using a CNN ensemble for detecting pornographic and upskirt images</article-title>,&#x201D; in <conf-name>Proc 8th Int. Conf. on Biometrics Theory, Applications and Systems (BTAS)</conf-name>, <conf-loc>NY, USA</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y. T.</given-names> <surname>Alaa</surname></string-name> and <string-name><given-names>A. J.</given-names> <surname>Hamid</surname></string-name></person-group>, &#x201C;<article-title>Increasing the reliability of skin detectors</article-title>,&#x201D; <source>Scientific Research and Essays</source>, vol. <volume>5</volume>, no. <issue>17</issue>, pp. <fpage>2480</fpage>&#x2013;<lpage>2490</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Kim</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Hwang</surname></string-name> and <string-name><given-names>N. I.</given-names> <surname>Cho</surname></string-name></person-group>, &#x201C;<article-title>Convolutional neural networks and training strategies for skin detection</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. on Image Processing (ICIP)</conf-name>, <conf-loc>Beijing, China</conf-loc>, pp. <fpage>3919</fpage>&#x2013;<lpage>3923</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Zuo</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Blasch</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Ling</surname></string-name></person-group>, &#x201C;<article-title>Combining convolutional and recurrent neural networks for human skin detection</article-title>,&#x201D; <source>IEEE Signal Processing Letters</source>, vol. <volume>24</volume>, no. <issue>3</issue>, pp. <fpage>289</fpage>&#x2013;<lpage>293</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Huang</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Ren</surname></string-name></person-group>, &#x201C;<article-title>Erotic image recognition method of bagging integrated convolutional neural network</article-title>,&#x201D; in <conf-name>Proc. of the 2nd Int. Conf. on Computer Science and Application Engineering</conf-name>, <conf-loc>Hohhot, China</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>7</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Jin</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Tan</surname></string-name></person-group>, &#x201C;<article-title>Pornographic image recognition by strongly-supervised deep multiple instance learning</article-title>,&#x201D; in <conf-name>Proc. IEEE Int. Conf. on Image Processing (ICIP)</conf-name>, <conf-loc>Arizona, USA</conf-loc>, pp. <fpage>4418</fpage>&#x2013;<lpage>4422</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Jin</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Tan</surname></string-name></person-group>, &#x201C;<article-title>Pornographic image recognition via weighted multiple instance learning</article-title>,&#x201D; <source>IEEE Transactions on Cybernetics</source>, vol. <volume>49</volume>, no. <issue>12</issue>, pp. <fpage>4412</fpage>&#x2013;<lpage>4420</lpage>, <year>2018</year>; <pub-id pub-id-type="pmid">30222590</pub-id></mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Perez</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Avila</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Moreira</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Moraes</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Testoni</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Video pornography detection through deep learning techniques and motion information</article-title>,&#x201D; <source>Neurocomputing</source>, vol. <volume>230</volume>, pp. <fpage>279</fpage>&#x2013;<lpage>293</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>S. M.</given-names> <surname>Kia</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Rahmani</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Mortezaei</surname></string-name>, <string-name><given-names>M. E.</given-names> <surname>Moghaddam</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Namazi</surname></string-name></person-group>, &#x201C;<article-title>A novel scheme for intelligent recognition of pornographic images</article-title>,&#x201D; arXiv preprint arXi<italic>v</italic>: 1402.5792., <year>2014</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I. G. P. S.</given-names> <surname>Wijaya</surname></string-name>, <string-name><given-names>I. B. K.</given-names> <surname>Widiartha</surname></string-name> and <string-name><given-names>S. E.</given-names> <surname>Arjarwani</surname></string-name></person-group>, &#x201C;<article-title>Pornographic image recognition based on skin probability and eigenporn of skin ROIs images</article-title>,&#x201D; <source>TELKOMNIKA (Telecommunication Computing Electronics and Control)</source>, vol. <volume>13</volume>, no. <issue>3</issue>, pp. <fpage>985</fpage>&#x2013;<lpage>995</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N. A.</given-names> <surname>Dahoul</surname></string-name>, <string-name><given-names>H. A.</given-names> <surname>Karim</surname></string-name>, <string-name><given-names>M. H. L.</given-names> <surname>Abdullah</surname></string-name>, <string-name><given-names>M. F. A.</given-names> <surname>Fauzi</surname></string-name>, <string-name><given-names>A. S. B.</given-names> <surname>Wazir</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Transfer detection of YOLO to focus CNN&#x2019;s attention on nude regions for adult content detection</article-title>,&#x201D; <source>Symmetry</source>, vol. <volume>13</volume>, no. <issue>1</issue>, pp. <fpage>26</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K. F.</given-names> <surname>GI</surname></string-name></person-group>, &#x201C;<article-title>A hierarchical neural network capable of visual pattern recognition</article-title>,&#x201D; <source>Neural Network</source>, vol. <volume>1</volume>, pp. <fpage>90014</fpage>, <year>1989</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>LeCun</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Bottou</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Bengio</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Haffner</surname></string-name></person-group>, &#x201C;<article-title>Gradient-based learning applied to document recognition</article-title>,&#x201D; <source>Proceedings of the IEEE</source>, vol. <volume>86</volume>, no. <issue>11</issue>, pp. <fpage>2278</fpage>&#x2013;<lpage>2324</lpage>, <year>1998</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. Z.</given-names> <surname>Alom</surname></string-name>, <string-name><given-names>T. M.</given-names> <surname>Taha</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Yakopcic</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Westberg</surname></string-name> and <string-name><given-names>P.</given-names> <surname>Sidike</surname></string-name></person-group>, &#x201C;<article-title>A State-of-the-art survey on deep learning theory and architectures</article-title>,&#x201D; <source>Electronics</source>, vol. <volume>8</volume>, no. <issue>3</issue>, pp. <fpage>292</fpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ren</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Deep residual learning for image recognition</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>NV, USA</conf-loc>, pp. <fpage>770</fpage>&#x2013;<lpage>778</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Zhong</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Liu</surname></string-name></person-group>, &#x201C;<article-title>Deep residual learning for image steganalysis</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, vol. <volume>77</volume>, no. <issue>9</issue>, pp. <fpage>10437</fpage>&#x2013;<lpage>10453</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author">
<string-name><given-names>A. Q.</given-names> <surname>Bhatti</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Umer</surname></string-name>,
<string-name><given-names>S. H.</given-names> <surname>Adil</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Ebrahim</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Nawaz</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Explicit content detection system: An approach towards a safe and ethical environment</article-title>,&#x201D; <source>Applied Computational Intelligence and Soft Computing</source>, vol. <volume>2018</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>13</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>I. M. A.</given-names> <surname>Agastya</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Setyanto</surname></string-name> and <string-name><given-names>D. O. D.</given-names> <surname>Handayani</surname></string-name></person-group>, &#x201C;<article-title>Convolutional neural network for pornographic images classification</article-title>,&#x201D; in <conf-name>Proc. of the 4th Int. Conf. on Advances in Computing, Communication &#x0026; Automation (ICACCA)</conf-name>, <conf-loc>Taylor&#x2019;s University Lakeside Campus, Malaysia</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2018</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>