<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CSSE</journal-id>
<journal-id journal-id-type="nlm-ta">CSSE</journal-id>
<journal-id journal-id-type="publisher-id">CSSE</journal-id>
<journal-title-group>
<journal-title>Computer Systems Science &#x0026; Engineering</journal-title>
</journal-title-group>
<issn pub-type="ppub">0267-6192</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">34475</article-id>
<article-id pub-id-type="doi">10.32604/csse.2023.034475</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Efficient Deep Learning Framework for Fire Detection in Complex Surveillance Environment</article-title><alt-title alt-title-type="left-running-head">Efficient Deep Learning Framework for Fire Detection in Complex Surveillance Environment</alt-title><alt-title alt-title-type="right-running-head">Efficient Deep Learning Framework for Fire Detection in Complex Surveillance Environment</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Dilshad</surname><given-names>Naqqash</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Khan</surname><given-names>Taimoor</given-names></name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Song</surname><given-names>JaeSeung</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref><email>jssong@sejong.ac.kr</email>
</contrib>
<aff id="aff-1"><label>1</label><institution>Department of Convergence Engineering for Intelligent Drone, Sejong University</institution>, <addr-line>Seoul, 05006</addr-line>, <country>Korea</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Computer Science, Islamia College Peshawar</institution>, <addr-line>Peshawar, 25120</addr-line>, <country>Pakistan</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Corresponding Author: JaeSeung Song. Email: <email>jssong@sejong.ac.kr</email></corresp></author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>17</day><month>1</month><year>2023</year></pub-date>
<volume>46</volume>
<issue>1</issue>
<fpage>749</fpage>
<lpage>764</lpage>
<history>
<date date-type="received"><day>18</day><month>7</month><year>2022</year></date>
<date date-type="accepted"><day>30</day><month>9</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Dilshad, Khan and Song</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Dilshad, Khan and Song</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CSSE_34475.pdf"></self-uri>
<abstract><p>To prevent economic, social, and ecological damage, fire detection and management at an early stage are significant yet challenging. Although computationally complex networks have been developed, attention has been largely focused on improving accuracy, rather than focusing on real-time fire detection. Hence, in this study, the authors present an efficient fire detection framework termed E-FireNet for real-time detection in a complex surveillance environment. The proposed model architecture is inspired by the VGG16 network, with significant modifications including the entire removal of Block-5 and tweaking of the convolutional layers of Block-4. This results in higher performance with a reduced number of parameters and inference time. Moreover, smaller convolutional kernels are utilized, which are particularly designed to obtain the optimal details from input images, with numerous channels to assist in feature discrimination. In E-FireNet, three steps are involved: preprocessing of collected data, detection of fires using the proposed technique, and, if there is a fire, alarms are generated and transmitted to law enforcement, healthcare, and management departments. Moreover, E-FireNet achieves 0.98 accuracy, 1 precision, 0.99 recall, and 0.99 F1-score. A comprehensive investigation of various Convolutional Neural Network (CNN) models is conducted using the newly created Fire Surveillance SV-Fire dataset. The empirical results and comparison of numerous parameters establish that the proposed model shows convincing performance in terms of accuracy, model size, and execution time.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Deep learning</kwd>
<kwd>drone</kwd>
<kwd>embedded vision</kwd>
<kwd>emergency monitoring</kwd>
<kwd>fire classification</kwd>
<kwd>fire detection</kwd>
<kwd>IoT</kwd>
<kwd>search and rescue</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label><title>Introduction</title>
<p>In the last decade, drones have generated considerable attention as a remote sensing platform with a wide application range, including traffic control, disaster response, crop protection, and satellite image analysis [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>]. Of late, incorporating a vision system, drone applications have been developed for monitoring, perceiving, and analyzing active and passive threats at the incident sites, for example, fire detection, flood threats, car accidents, and landslide-prone areas [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>]. Additionally, drones can be rapidly deployed because their small size permits them to participate in mission-critical decisions, enabling better resource allocation and risk reduction. Drones are expected to function in disaster-affected areas, where connection to cloud services may not be effective and high-end equipment may not be readily accessible. To ensure operational performance and real analysis, a high-level of autonomy is required. Therefore, autonomous Unmanned Aerial Vehicles (UAVs), as well as Unmanned Ground Vehicles (UGVs), rely on their onboard sensors and embedded microchips for performing tasks rather than sending the data to a central control station. Furthermore, drones can cover a larger area within a shorter time span when combined with automatic route planning techniques, with onboard visual sensors as well as autonomous navigation. With the limited computational ability and power consumption, drones, however, present their own set of challenges [<xref ref-type="bibr" rid="ref-5">5</xref>].</p>
<p>CNNs and Deep Learning (DL), in particular, have been widely recognized as popular solutions for a broad range of applications based on computer vision, such as activity recognition, person recognition, vehicle recognition, and classification [<xref ref-type="bibr" rid="ref-6">6</xref>&#x2013;<xref ref-type="bibr" rid="ref-10">10</xref>]. In prior research, utilizing transfer learning, a pre-trained CNN was used as a feature extricate, and certain layers have been added to perform classification for the current job, learning techniques to outperform standard machine learning methods using handcrafted features. Even though CNNs have proven to be more effective in classification, their inference time is high due to their high-computational power requirements when embedded in low-power devices, such as drones, which must perform multiple vision tasks simultaneously. For certain applications, a localized, integrated approach is more desirable over cloud processing due to security and privacy concerns [<xref ref-type="bibr" rid="ref-11">11</xref>]. In addition, tiny CNNs can ascertain the accuracy needed and the performance for specialized applications where substantial information does not exist and computational resource limits are enforced. Moreover, their training process is considerably easier, and they can be conveniently updated over air owing to their computational efficiency.</p>
<p>Soft computing methods based on Traditional Fire Warning Systems (TFWSs) [<xref ref-type="bibr" rid="ref-12">12</xref>] and optical sensors to prevent flames from spreading have been extensively researched and developed. Various scalar sensors, including ocular sensors, inferno sensors, as well as smoke sensors closer to the blazing fire [<xref ref-type="bibr" rid="ref-13">13</xref>] have been used in a TFWS for fire detection. However, scalar sensor-based solutions do not provide additional information regarding the area coverage, level of burning, location of the fire, or size of the fire. Additionally, the above-mentioned sensor systems require human intervention, such as a visit to the fire site in the event of a disaster. To overcome these limitations, various visual sensor-based methods have been proposed [<xref ref-type="bibr" rid="ref-14">14</xref>]. In surveillance systems, for the autonomous observation of fire catastrophes, traditional, DL, and vision-driven techniques play a key role in fire detection [<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-16">16</xref>]. These algorithms offer a number of advantages such as rapid response, low-human-intervention requirements, cost-effectiveness, and greater coverage. However, traditional fire detection is difficult and time-consuming to process because it relies on hand-crafted feature extraction, and the procedures for constructing and evaluating the features are tedious. In particular, the monitoring of early fire and alarm generation, in traditional-based approaches is difficult because of the fluctuating lighting conditions, shadows, reflections, and low-detection accuracy. This study applies CNN models motivated by their potential in several areas, such as fire detection in surveillance footage [<xref ref-type="bibr" rid="ref-17">17</xref>]. However, DL includes an End-to-End (E2E) process for identifying features, which is computationally intensive and needs a considerable volume of training data. In this article, we have fine-tuned and proposed an efficient VGG-based (E-FireNet) model that has improved detection accuracy, has fewer parameters and can be deployed in actual scenarios. E-FireNet is not only capable of classifying fire and non-fire events but also looks for Object-of-Interest (OoI) located in the image. If the input image depicts a building fire or car fire, the model classifies it as a fire event on the specific object; if no fire is detected by the model, the resultant image is classified as non-fire. Furthermore, the baseline methods use computationally complex models that are not capable of being deployed on a drone, considering that a drone is a resource constraint device with limited processing power. Therefore, a lightweight CNN model that can be deployed in drones is highly desirable. The following are the significant contributions of this study:<list list-type="bullet"><list-item>
<p>Early attempts for fire detection using pre-trained models with numerous parameters show limited performance when the variation in the data was low. To address these problems, this study presents a CNN-based framework for early fire detection in the images captured in diverse surveillance environments. The proposed E-FireNet model shows convincing performance with respect to the accuracy and time complexity for the Central Processing Unit (CPU) and Graphical Processing Unit (GPU), i.e., 0.98, 22.17, and 30.58 Frames Per Second (FPS), compared to renowned State-Of-The-Art (SOTA) models.</p></list-item><list-item>
<p>As the existing fire datasets include limited scenarios and are monotonous, (i.e., considering only fire and non-fire scenarios), the generalization of the model is poor. Consequently, this study acquired a wide set of image samples containing real-world fire events such as building fire, car fire, and non-fire from various web sources, including social media platforms, NEWS sources, Google images, and YouTube videos.</p></list-item><list-item>
<p>To verify the proposed method, this study conducted a comprehensive set of experiments over numerous pre-trained models such as NASNetMobile, MobileNetV1, EfficientNetB0, VGG19, and VGG16 using the newly generated Fire Surveillance (SV-Fire) dataset. Furthermore, to investigate the effectiveness, this article compared the performance of the proposed E-FireNet with the SOTA models with respect to the accuracy, parameters, and FPS.</p></list-item></list></p>
<p>The remainder of this paper is organized as follows. Section 2 reviews the relevant literature, Section 3 describes the proposed methodology, and Section 4 presents the experimental results. Finally, Section 5 summarizes the findings and suggests future directions.</p>
</sec>
<sec id="s2">
<label>2</label><title>Literature Review</title>
<p>Fire is an abnormal event that leads to serious injury and death, and affects precious resources within a short duration. Several techniques were proposed to monitor and control fire events in cities for saving life and property. However, fire detection in real-time is a challenging task. For instance, a CNN-based technique was proposed in [<xref ref-type="bibr" rid="ref-18">18</xref>], which aims to improve the accuracy and reduce the false-alarm rate. For improving the performance, a pre-trained model with fine-tuning of the uppermost layers was used. Furthermore, experiments were conducted over benchmark datasets, and 94.43% accuracy was achieved. Another technique was proposed [<xref ref-type="bibr" rid="ref-19">19</xref>], to realize a false-alarm system for a fire event. Through the transfer learning technique, InceptionV3 achieved excellent performance on test data. Several researchers trained models using satellite-captured images for the classification of fire and normal scenes. The main aim was to extract the region of fire using a local binary pattern for the reduction of false detection rates, and 98% accuracy was achieved.</p>
<p>The approach presented in [<xref ref-type="bibr" rid="ref-20">20</xref>] resolved the fire detection issue through classification and segmentation mechanisms. An artificial neural network was built for the binary classification and 76% accuracy was realized. Nevertheless, the segmentation method was applied to determine the fire border, whereas, for the fire mask, U-Net was used for up and down-sampling to obtain 92% precision and 84% recall. The technique introduced in [<xref ref-type="bibr" rid="ref-21">21</xref>] uses You-Look-Only-Once (YOLO) model for flame detection and extracts the visual features from video data frames. To overcome the overfitting problem and achieve efficient performance, augmentation techniques, such as rotation, flipping, and brightness adjustment were applied. Another study [<xref ref-type="bibr" rid="ref-22">22</xref>] used a Faster Region-based CNN (Fast-RCNN) to detect fire and normal scenes in an image, and later extracted the spatial features via a CNN; for temporal features, Long-Short Term Memory (LSTM) was employed to classify the target scene.</p>
<p>In a recent study [<xref ref-type="bibr" rid="ref-23">23</xref>], a fast and accurate algorithm was developed for extracting spatial features from surveillance video data. Three different versions of SqueezeNet were analyzed to compare their classification performances. In-depth experiments were conducted and 95.02%, 98.46%, and 98.52% accuracy were obtained with SqueezeNet1, SqueezeNet2, and SqueezeNet3, respectively. The technique presented in [<xref ref-type="bibr" rid="ref-24">24</xref>] applied different CNNs such as AlexNet, GoogLeNet, and VGG16 to recognize different events (smoke, non-fire, flame). The experimental results exhibited that the VGG16 model achieved the best performance. Similarly, in another approach [<xref ref-type="bibr" rid="ref-25">25</xref>], a lightweight CNN model was developed for flame detection in real-time, which mainly monitored and controlled the fire scenario in the early stage. A recent study [<xref ref-type="bibr" rid="ref-26">26</xref>] proposed a technique based on a deep saliency network for video-based smoke detection and compared the obtained results with those of ML and DL methods. In addition, the saliency network aimed to highlight the Region-of-Interest (RoI), i.e., the smoke area in the images. To further improve the model performance, various augmentation techniques were applied to the samples.</p>
<p>Several researchers with diverse background studies have applied lightweight models to detect and classify fire in real-time. For instance, [<xref ref-type="bibr" rid="ref-27">27</xref>] proposed a lightweight CNN for fire detection and classification. Furthermore, they computed the execution time to verify the model&#x2019;s adaptability in real-time processing.</p>
<p>Similarly, [<xref ref-type="bibr" rid="ref-28">28</xref>] introduced a method that could detect fire in real-time in both indoor and outdoor surveillance videos. A multi-expert system was first employed to collect data based on color, shape, and motion analysis. The Bag-of-Words (BoW) approach was then applied for motion representation. In addition, real-time and web scraping videos of fire were utilized in experiments where the proposed model achieved better performance. Researchers in [<xref ref-type="bibr" rid="ref-29">29</xref>] introduced a real-time fire detection technique using a fusion algorithm and several sensors (smoke, flame, and temperature) in indoor and outdoor domains for fire incident detection. The included literature is summarized and then listed in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label>
<caption><title>Summary of the included literature, where BD and CD denote benchmark and custom dataset</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead valign="top">
<tr>
<th>Ref No.</th>
<th>Description</th>
<th>Dataset/Type</th>
<th>Architecture</th>
<th>Scenario</th>
</tr>
</thead>
<tbody>
<tr>
<td>[<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td>Authors proposed architecture of deep learning for surveillance videos, inspired by GoogLeNet.</td>
<td>Foggia/BD</td>
<td>CNN</td>
<td>Outdoor and indoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td>Authors proposed a novel method for forest fire classification based on satellite images.</td>
<td>NASA worldview/CD</td>
<td>InceptionV3</td>
<td>Outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td>Authors classified the presence and absence of fire in videos based on binary classification.</td>
<td>Fire flame/CD</td>
<td>ANN</td>
<td>Outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-21">21</xref>]</td>
<td>Authors applied the YOLO model for flame detection and compare the results of YOLO with the SOTA shallow learning method.</td>
<td>Fire flame/CD</td>
<td>YOLO</td>
<td>Indoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td>Authors used neural networks to detect fire and smoke in indoor and outdoor scenes in real-time using video data.</td>
<td>Fire and smoke/CD</td>
<td>RCNN, LSTM</td>
<td>Outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-23">23</xref>]</td>
<td>Authors concatenated manual features with DL features to create fast and accurate smoke detection in a forest.</td>
<td>Smoke/CD</td>
<td>DL</td>
<td>Outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-24">24</xref>]</td>
<td>Authors proposed a novel method to recognize video-based fire and smoke using the DL technique.</td>
<td>Fire and smoke/CD</td>
<td>CNN</td>
<td>Outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-25">25</xref>]</td>
<td>Authors proposed a novel algorithm of CNN for real-time flame detection by pre-processing the fire videos.</td>
<td>Bilkent Uni. fire/BD</td>
<td>CNN</td>
<td>Indoor and outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-26">26</xref>]</td>
<td>Authors proposed a method for real-time smoke detection in videos based on a deep saliency network.</td>
<td>Smoke/CD</td>
<td>Saliency Network</td>
<td>Indoor and outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>Authors presented a CNN architecture for fire detection in a surveillance scenario and compared the proposed method with SOTA techniques.</td>
<td>Foggia and BoWFire/BD</td>
<td>CNN</td>
<td>Indoor and outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-28">28</xref>]</td>
<td>Authors proposed a technique which is capable to detect fire in an early stage in real-time.</td>
<td>Fire flame/CD</td>
<td>Multi-Expert System</td>
<td>Indoor and outdoor</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td>Authors developed a fire detection robot based on sensors, such as smoke sensors, temperature semiconductor sensors, and ultraviolet sensors.</td>
<td>N/A</td>
<td>Multi-Sensor IoT System</td>
<td>Indoor</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As part of this study, a CNN architecture E-FireNet is employed, for detecting fire. Several scenarios were examined in the custom SV-Fire dataset, such as a fire in a car, a fire in a building, and non-fire. In this study, the outdoor fire conditions differ from those described in previous literature. In comparison, prior studies targeted only wildfire scenarios, including fire, smoke, non-fire scenarios, and non-smoke scenarios.</p>
</sec>
<sec id="s3">
<label>3</label><title>Proposed Methodology</title>
<p>The proposed framework involves three main steps. In the first step, the collected fire images are pre-processed to increase the number of samples. In the second step, images of diverse classes are input to the proposed efficient CNN model, which effectively detects and classifies the fire into the respective class. In the final step, the model takes a decision based on the predicted label for the given input image. If the predicted label is a fire in a building or fire in a vehicle, an alert is generated to the nearest emergency response department to take early action. A detailed pictorial representation of the proposed framework is depicted in subsection 3.2; the step-wise procedure is presented in Algorithm 1. In the initialization step of Algorithm 1, the drone acquires a Video Stream (VS) and loads a pre-trained Fire Detection Model (FD<sub>M</sub>). When the frame is read, the RoI<sub>S</sub> is extracted and checked for the presence of fire. If it is a non-fire image, the next frame is selected; else, if a fire is detected in the frame, an alert is generated and sent to the emergency department and disaster teams. Furthermore, in the following sections, this article briefly discusses each step of the proposed framework.</p>
<fig id="fig-7">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-7.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label><title>Pre-processing of the Collected Data</title>
<p>Pre-processing refers to all the alterations [<xref ref-type="bibr" rid="ref-30">30</xref>] performed on the raw data before being fed to the proposed E-FireNet model. Which is an E2E model that detects fire in a complex surveillance environment. To realize high-performance, DL models require immense training data; therefore, an augmentation technique is applied to generate new samples for training. For augmentation, several operations are performed, including different alignments, locales, and scales, as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. The applied augmentation techniques are the most efficient and convenient position augmentation in terms of upscaling the data samples. The experimental results before and after data augmentation are presented in subsection 4.3.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption><title>A variety of geometric transformations were undertaken in order to increase the number of samples in the dataset: (a) normal images, (b) horizontal flip, (c) scaling, and (d and e) rotation at various degrees</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-1.tif"/>
</fig>
<p>Furthermore, geometric transformations are applied to a normal image to obtain additional images from an input image. The input image was flipped horizontally and scaled. In addition, the input image was rotated clockwise and counter-clockwise by 10 and &#x2212;10 deg. CNN architectures have become more resilient because of the usage of a large variety of samples, which improved the model classification capability. Thus, the model must be familiar with objects of various sizes, their alignment, and all types of data compositions. Therefore, a DL model must employ augmentation approaches for generating new images in order to deal with all these different attributes. During the augmentation process, the DL model learns the same object from different angles and viewpoints for better generalization.</p>
</sec>
<sec id="s3_2">
<label>3.2</label><title>E-FireNet Framework</title>
<p>To monitor complex video surveillance, CNNs are often used for tasks such as activity and action recognition, anomaly detection, classification, and object detection [<xref ref-type="bibr" rid="ref-31">31</xref>&#x2013;<xref ref-type="bibr" rid="ref-35">35</xref>], as well as a wide range of other identification, medical image diagnosis, video summarization, and segmentation applications [<xref ref-type="bibr" rid="ref-36">36</xref>&#x2013;<xref ref-type="bibr" rid="ref-40">40</xref>]. The CNN architecture comprises three main components: the Convolution Layer (CL), pooling layer, and fully linked layer. A deep CNN includes a single input and a multitude of hidden, fully-linked, and Softmax layers. To build feature maps using deep CNNs, a number of parameters, local receptive fields, and various kernels are utilized that highlight the important characteristics of the objects in the picture. For dimensionality reduction, the feature maps are sub-sampled with average, minimum, or maximum pooling.</p>
<p>The selection of an appropriate CNN architecture for a certain situation is challenging in order to achieve adequate results while balancing the computational complexity. Each CNN has its own set of advantages and disadvantages based on the proposed architecture; for example, the design and development of AlexNet and VGG16 architectures are easier. In the ImageNet contest, AlexNet architecture was showcased and has become the benchmark architecture for DL. Increasing the number of CLs in a network is considered to enhance performance, as confirmed by the VGG model. As a robust feature extractor that can cope with large datasets and complex background identification tasks, the authors [<xref ref-type="bibr" rid="ref-24">24</xref>] suggested VGG16, a 16-layer architecture with the same filter size and considerable improvement in the classification.</p>
<p>Regardless of their numerous perks, VGG19 and VGG16 are not resource-friendly with respect to the overall size and training parameters. Architectures such as the NASNetMobile, MobileNetV1, and EfficientNetB0 CNN are resilient and considerably less costly, as MobileNetV1 and NASNetMobile have been specifically developed for fast inference time. Considering real-world implementation, resource computation cost, and repression of the constraints in present lightweight models, this study proposes an efficient fire detection and classification model, E-FireNet. The proposed framework is presented in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Initially, the performances of prominent ImageNet and pre-trained CNN architectures including VGG16, ResNet50, MobileNetV1, and NASNetMobile before developing the new E-FireNet framework are examined. This article particularly focuses on extracting fire zones using visually perceptible data successfully. As a result, this article included a smaller version of the captured image, unlike previous CNNs, to effectively recognize fire zones. This research entirely eliminated Block-5 of the VGG16 to reduce the number of parameters and training time. Additionally, the model was able to achieve higher accuracy when compared with other SOTA models despite a limited number of parameters and higher FPS. Moreover, this approach employs a smaller input size to retrieve minute information. As a result, the classifier can learn more characterized features.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption><title>Proposed E-FireNet framework for fire detection and classification. Initially, pre-processing of the collected data is performed, followed by fire detection using the proposed technique, and finally, in case of fire detection, alarm generation to the law enforcement, healthcare, and management departments</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-2.tif"/>
</fig>
<p>The input image size for the proposed E-FireNet is 128 <bold>&#x00D7;</bold> 128 with 3 channels, and 32 distinct filters for red, green, and blue. Deep feature extraction can be accomplished, but the scale of each filter increases with respect to each progressive block. The filter sizes for the first, second, third, and fourth blocks were set as 64, 128, 256, and 512 respectively. In each layer of the proposed E-FireNet model, a linear function called Rectified Linear Activation (ReLU) is applied, which produces a direct output if the input is positive, otherwise, it produces zero. Subsequently, the input from the fourth block is forwarded to the pooling layers where Global Average Pooling is applied and it is finally conveyed to the Softmax layer, which provides a spread over the three class categories namely building fire, car fire, and non-fire. <xref ref-type="table" rid="table-5">Table 5</xref> In subsection 4.3 the article lists the training parameters of the proposed model.</p>

</sec>
</sec>
<sec id="s4">
<label>4</label><title>Experimental Results</title>
<p>This section investigates the assessment measures and evaluation metrics in detail and describes the collected dataset and the graphical outcomes. The experimental setup and performance metrics are first described, followed by a discussion on the SV-Fire dataset, and the evaluation of the results. All the models, including the proposed E-FireNet, were trained using a total of 30 epochs with a low learning-rate to ensure that the model retained most of the previously learned knowledge. The pre-trained model progressively updates the learning parameters for optimum performance on the intended dataset. In subsection 4.3 the article compares the proposed model with the SOTA models and lists the main hyper-parameters utilized in these experiments. Based on the results, each model was retrained with its default input size, with a batch size of 16, and the Stochastic Gradient Descent (SGD) optimizer was equipped with a learning-rate and momentum of 1e-4 and 0.9, respectively. The experiments were conducted on an NVIDIA RTX 2060 Super GPU with 32-GB of onboard memory, a Keras DL framework, and TensorFlow for the back-end. As shown in the following equations, the performance of the proposed model was assessed by utilizing multiple evaluation metrics, including accuracy, precision, recall, and F1-score.</p>
<sec id="s4_1">
<label>4.1</label><title>Evaluation Metrics</title>
<p>In the classification problem, accuracy is defined as the number of correct predictions produced by the model over all the types of predictions made,</p>
<p><disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mi>A</mml:mi><mml:mi>c</mml:mi><mml:mi>c</mml:mi><mml:mi>u</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>T</mml:mi><mml:mi>N</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:math>
</disp-formula></p>
<p>where TP is True Positive, TN is True Negative, FP is False Positive, and FN is False Negative.</p>
<p>Precision is a metric that indicates the percentage of the dataset labeled as fire truly contains fire. The predicted positives (images predicted to be fire are TP and FP), and the photos with a fire scenario are TP.</p>
<p><disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;&#x00A0;</mml:mtext></mml:math>
</disp-formula></p>
<p>Recall is a metric that shows the percentage of observations in a dataset that were predicted as having a fire by the model. The real positives and fire images predicted by the model are TP.</p>
<p><disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mtext>&#x00A0;&#x00A0;</mml:mtext></mml:math>
</disp-formula></p>
<p>The F1-score measures the precision and recall harmonically.</p>
<p><disp-formula id="eqn-4"><label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:mtext>&#x00A0;</mml:mtext></mml:math>
</disp-formula></p>
</sec>
<sec id="s4_2">
<label>4.2</label><title>Dataset Collection</title>
<p>Finding appropriate data for evaluation is a difficult and time-consuming process. The authors could not find publicly accessible datasets for fire detection that satisfied the requirements for fire detection in buildings and cars. Owing to the unique nature of the findings of this study, a novel SV-Fire dataset is developed by collecting images from a variety of online sources. The major goal was to collect an image of a fire in a building and a car. Different settings and lighting situations are depicted in these high-resolution pictures. To make it more challenging, a new class of non-fire photos with an orange and red tint, as well as cars painted with fire decals, were added to the dataset. The overall statistics of SV-Fire are listed in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2"><label>Table 2</label>
<caption><title>Overall statistics of the newly created SV-Fire dataset with a total of 1500 images: 1050 for training, 150 for testing, and 300 for validation</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Training</th>
<th>Testing</th>
<th>Validation</th>
<th>Total</th>
</tr>
</thead>
<tbody>
<tr>
<td>Before augmentation</td>
<td>1050</td>
<td>150</td>
<td>300</td>
<td>1500</td>
</tr>
<tr>
<td>After augmentation</td>
<td>4200</td>
<td>600</td>
<td>1200</td>
<td>6000</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This article presents several sample images as well as the general statistics of the newly created dataset. The code and dataset are publicly available at the following link: (<ext-link ext-link-type="uri" xlink:href="https://github.com/NaqqashDilshad/E-FireNet">https://github.com/NaqqashDilshad/E-FireNet</ext-link>). The total number of images in the SV-Fire dataset is 1500, while after augmentation the total reaches 6000. There are three subgroups in the SV-Fire dataset: training, validation, and testing. The training set comprises 70% of the total dataset, while the validation set comprises 20%, and the testing set is only 10%. A few instances from the recently collected dataset are presented in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption><title>Sample images from our newly created SV-Fire dataset. The first, second, and third rows contain building fire, car fire, and non-fire images, respectively. Each row has five pictures. To make it more challenging, images with orange tint and fire look-alike are added</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-3.tif"/>
</fig>
</sec>
<sec id="s4_3">
<label>4.3</label><title>Performance Comparison with SOTA</title>
<p>This article compared the proposed model with different pre-trained CNN-based architectures for fire detection. The models were compared with respect to the number of parameters, precision, recall, F1-score, and accuracy as shown in <xref ref-type="table" rid="table-3">Tables 3</xref> and <xref ref-type="table" rid="table-4">4</xref>. NASNetMobile and MobileNetV1 have the least accuracy, whereas VGG16, VGG19, and the proposed E-FireNet model achieve high accuracies of 98%, 95%, and 98%, respectively. In addition, a comparison of the proposed model with MobileNetV1 shows that although both models are computationally efficient, the main difference is the accuracy where E-FireNet achieves approximately 21% higher accuracy than MobileNetV1.</p>
<table-wrap id="table-3"><label>Table 3</label>
<caption><title>Overview of the comparison of the input size and network training parameters of the proposed E-FireNet with the SOTA models</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Model</th>
<th>Input size</th>
<th>Batch size</th>
<th>Parameters (million)</th>
</tr>
</thead>
<tbody>
<tr>
<td>NASNetMobile</td>
<td>224 &#x00D7; 224</td>
<td>16</td>
<td>4.27</td>
</tr>
<tr>
<td>MobileNetV1</td>
<td>224 &#x00D7; 224</td>
<td>16</td>
<td>3.22</td>
</tr>
<tr>
<td>EfficientNetB0</td>
<td>224 &#x00D7; 224</td>
<td>16</td>
<td>4.04</td>
</tr>
<tr>
<td>VGG19</td>
<td>224 &#x00D7; 224</td>
<td>16</td>
<td>139.58</td>
</tr>
<tr>
<td>VGG16</td>
<td>224 &#x00D7; 224</td>
<td>16</td>
<td>134.27</td>
</tr>
<tr>
<td>E-FireNet</td>
<td>128 &#x00D7; 128</td>
<td>16</td>
<td>7.63</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-4"><label>Table 4</label>
<caption><title>Evaluation of the proposed model E-FireNet against the SOTA models utilizing the SV-Fire dataset</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead valign="top">
<tr>
<th rowspan="2">Model</th>
<th rowspan="2">Class</th>
<th colspan="4" align="center">Before augmentation</th>
<th colspan="4" align="center">After augmentation</th>
</tr>
<tr>
<th>Precision</th>
<th>Recall</th>
<th>F1-score</th>
<th>Accuracy</th>
<th>Precision</th>
<th>Recall</th>
<th>F1-score</th>
<th>Accuracy</th>
</tr>
</thead>
<tbody valign="top">
<tr>
<td rowspan="3">NASNetMobile</td>
<td>Car fire</td>
<td>0.55</td>
<td>0.54</td>
<td>0.55</td>
<td rowspan="3">0.59</td>
<td>0.64</td>
<td>0.44</td>
<td>0.52</td>
<td rowspan="3">0.59</td>
</tr>
<tr>
<td>Building fire</td>
<td>0.54</td>
<td>0.73</td>
<td>0.62</td>
<td>0.58</td>
<td>0.58</td>
<td>0.58</td>
</tr>
<tr>
<td>Non-fire</td>
<td>0.69</td>
<td>0.51</td>
<td>0.59</td>
<td>0.56</td>
<td>0.74</td>
<td>0.64</td>
</tr>
<tr>
<td rowspan="3">MobileNetV1</td>
<td>Car fire</td>
<td>0.59</td>
<td>0.54</td>
<td>0.57</td>
<td rowspan="3">0.62</td>
<td>0.75</td>
<td>0.77</td>
<td>0.76</td>
<td rowspan="3">0.77</td>
</tr>
<tr>
<td>Building fire</td>
<td>0.53</td>
<td>0.58</td>
<td>0.55</td>
<td>0.75</td>
<td>0.70</td>
<td>0.73</td>
</tr>
<tr>
<td>Non-fire</td>
<td>0.72</td>
<td>0.72</td>
<td>0.72</td>
<td>0.81</td>
<td>0.84</td>
<td>0.83</td>
</tr>
<tr>
<td rowspan="3">EfficientNetB0</td>
<td>Car fire</td>
<td>0.89</td>
<td>0.88</td>
<td>0.88</td>
<td rowspan="3">0.89</td>
<td>0.94</td>
<td>0.93</td>
<td>0.94</td>
<td rowspan="3">0.95</td>
</tr>
<tr>
<td>Building fire</td>
<td>0.84</td>
<td>0.91</td>
<td>0.87</td>
<td>0.92</td>
<td>0.94</td>
<td>0.93</td>
</tr>
<tr>
<td>Non-fire</td>
<td>0.93</td>
<td>0.88</td>
<td>0.90</td>
<td>0.98</td>
<td>0.97</td>
<td>0.97</td>
</tr>
<tr>
<td rowspan="3">VGG19</td>
<td>Car fire</td>
<td>0.92</td>
<td>0.92</td>
<td>0.92</td>
<td rowspan="3">0.92</td>
<td>0.99</td>
<td>0.88</td>
<td>0.93</td>
<td rowspan="3">0.95</td>
</tr>
<tr>
<td>Building fire</td>
<td>0.91</td>
<td>0.89</td>
<td>0.9</td>
<td>0.90</td>
<td>0.99</td>
<td>0.94</td>
</tr>
<tr>
<td>Non-fire</td>
<td>0.93</td>
<td>0.95</td>
<td>0.94</td>
<td>0.98</td>
<td>0.99</td>
<td>0.99</td>
</tr>
<tr>
<td rowspan="3">VGG16</td>
<td>Car fire</td>
<td>0.91</td>
<td>0.83</td>
<td>0.87</td>
<td rowspan="3">0.90</td>
<td>0.98</td>
<td>0.97</td>
<td>0.98</td>
<td rowspan="3">0.98</td>
</tr>
<tr>
<td>Building fire</td>
<td>0.84</td>
<td>0.91</td>
<td>0.87</td>
<td>0.96</td>
<td>0.98</td>
<td>0.97</td>
</tr>
<tr>
<td>Non-fire</td>
<td>0.95</td>
<td>0.95</td>
<td>0.95</td>
<td>0.99</td>
<td>0.99</td>
<td>0.99</td>
</tr>
<tr>
<td rowspan="3">E-FireNet</td>
<td>Car fire</td>
<td>0.82</td>
<td>0.77</td>
<td>0.80</td>
<td rowspan="3">0.81</td>
<td>0.98</td>
<td>0.96</td>
<td>0.97</td>
<td rowspan="3">0.98</td>
</tr>
<tr>
<td>Building fire</td>
<td>0.71</td>
<td>0.78</td>
<td>074</td>
<td>0.95</td>
<td>0.99</td>
<td>0.97</td>
</tr>
<tr>
<td>Non-fire</td>
<td>0.89</td>
<td>0.88</td>
<td>0.88</td>
<td>1</td>
<td>0.99</td>
<td>0.99</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-5"><label>Table 5</label>
<caption><title>The proposed E-FireNet summary with training parameters</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Layer (type)</th>
<th>Filter</th>
<th>Kernel size</th>
<th>Stride</th>
<th>Parameters (million)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conv_1</td>
<td>64</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.001792</td>
</tr>
<tr>
<td>Conv_2</td>
<td>64</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.036928</td>
</tr>
<tr>
<td>Max_Pool</td>
<td>&#x2013;</td>
<td>(3, 3)</td>
<td>2</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>Conv_3</td>
<td>128</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.073856</td>
</tr>
<tr>
<td>Conv_4</td>
<td>128</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.147584</td>
</tr>
<tr>
<td>Max_Pool</td>
<td>&#x2013;</td>
<td>(3, 3)</td>
<td>2</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>Conv_5</td>
<td>256</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.295168</td>
</tr>
<tr>
<td>Conv_6</td>
<td>256</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.59008</td>
</tr>
<tr>
<td>Conv_7</td>
<td>256</td>
<td>(3, 3)</td>
<td>1</td>
<td>0.59008</td>
</tr>
<tr>
<td>Max_Pool</td>
<td>&#x2013;</td>
<td>(3, 3)</td>
<td>2</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>Conv_8</td>
<td>512</td>
<td>(3, 3)</td>
<td>1</td>
<td>1.18016</td>
</tr>
<tr>
<td>Conv_9</td>
<td>512</td>
<td>(3, 3)</td>
<td>1</td>
<td>2.359808</td>
</tr>
<tr>
<td>Conv_10</td>
<td>512</td>
<td>(3, 3)</td>
<td>1</td>
<td>2.359808</td>
</tr>
<tr>
<td>Max_Pool</td>
<td>&#x2013;</td>
<td>(3, 3)</td>
<td>2</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>Global_Avg_Pool</td>
<td>512</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>Softmax (3)</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>0.001539</td>
</tr>
<tr>
<td>Total parameters</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>7.63</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>A comparison of the proposed model with VGG16 indicates that the results of VGG16 are proximate to those of the proposed model. However, the difference is the heavier weight, where VGG16 has 134.27 million parameters while E-FireNet has 7.63 million. The performance of the pre-trained models is listed in <xref ref-type="table" rid="table-4">Table 4</xref>. It can be observed that the pre-trained models achieve high performance with a low false-alarm rate. However, the false prediction rate remains high and needs to be boosted. Therefore, this research explored a fine-tuned and pre-trained convolution neural network architecture (E-FireNet) with respect to accuracy and incorrect prediction. After tuning, E-FireNet attains the best performance among the other models with fewer false predictions.</p>

<p>The confusion matrix for each SOTA model trained on the custom SV-Fire dataset is depicted in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. The red diagonal correlates with TP, whereas the saturation represents the accurate classification. The proposed E-FireNet exhibits overall better classification accuracy compared to the SOTA models, although some of the images in all three categories (building fire, car fire, and non-fire) are misclassified. The training accuracy and training loss graphs are visualized in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>; the vertical axis represents the accuracy and loss, whereas the horizontal axis shows the total number of epochs. It is evident from <xref ref-type="fig" rid="fig-5">Fig. 5</xref> that E-FireNet is effective for fire detection. As the number of iterations of the training and validation processes increases, the training and validation accuracy line graph of the model change, as depicted in <xref ref-type="fig" rid="fig-5">Fig. 5a</xref>. The proposed E-FireNet converges on 27 epochs, and the training and validation accuracies reach 100% and 98%, correspondingly. Likewise, the training and validation loss values change and drop to 0.0 and 0.09 respectively, as depicted in <xref ref-type="fig" rid="fig-5">Fig. 5b</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption><title>Confusion matrices of the various CNN models against E-FireNet</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-4.tif"/>
</fig><fig id="fig-5">
<label>Figure 5</label>
<caption><title>The proposed E-FireNet training accuracy and training loss. (a) Model accuracy (b) model loss</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-5.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label><title>Time Complexity Analysis</title>
<p>To assess a deep model&#x2019;s effectiveness, performance and deployment potential must be evaluated in real-time across various devices, including a CPU and GPU. The specifications of the CPU and GPU employed for analyzing the FPS of the proposed E-FireNet model are listed in Section 4. The criteria to assess the model performance for real-time application is that the model achieving 30 or more FPS is considered optimal for real-world scenarios. The FPS for the proposed E-FireNet model utilizing CPU and GPU is 22.17 and 30.58, respectively. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> compares the proposed E-FireNet model in terms of the FPS with several baseline models.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption><title>Comparison of the proposed E-FireNet with various deep models in terms of FPS</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CSSE_34475-fig-6.tif"/>
</fig>
<p>The experimental results show that employing the CPU and GPU, respectively, the FPS of the NASNetMobile model is 15.23, and 18.96, EfficientNetB0 model is 18.28 and 23.73, the VGG19 model is 6.81 and 25.32, VGG16 model is 7.68 and 26.42, and MobileNetV1 is 22.09 and 30.43. A comparison of the time complexity of the E-FireNet model with those of the other baseline models indicates that the performance of the proposed model is convincing. Thus, the proposed E-FireNet model is capable of real-time processing and operation.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label><title>Conclusion</title>
<p>To reduce social, environmental, and financial damage, CNN-based smart monitoring systems have been used to classify fire scenes in the early stages. Nevertheless, the research focuses on enhancing the accuracy, and attention to the model computation and generalization is limited. Therefore, this study presented an efficient framework (E-FireNet) that accurately classifies fire and non-fire images into their corresponding classes. The proposed E-FireNet achieves the best validation accuracy of 0.98 with limited parameters in comparison to the SOTA models. In addition, E-FireNet managed to achieve a precision of 1, a recall of 0.99, and an F1-score of 0.99. Furthermore, the SV-Fire dataset was collected since a dataset with diverse scenarios was not available for evaluating the proposed method. A set of experiments were performed using various CNN models and the proposed model, and their performances were compared in terms of accuracy, parameters, and FPS over two local systems (CPU and GPU) using the test data. Future research aims to expand the current dataset with new classes and apply vision transformers to fire detection.</p>
</sec>
</body>
<back>
<ack>
<p>This work was supported by the Institute for Information &#x0026; Communications Technology Promotion (IITP) grant funded by the Korean government (MSIT) (No.2020-0-00959, Fast Intelligence Analysis HW/SW Engine Exploiting IoT Platform for Boosting On-device AI in 5G Environment).</p>
</ack>
<sec><title>Funding Statement</title>
<p>This work was supported by the Institute for Information &#x0026; Communications Technology Promotion (IITP) grant funded by the <funding-source>Korean government (MSIT)</funding-source> (<award-id>No.2020-0-00959</award-id>).</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear"><title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Barmpoutis</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Papaioannou</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Dimitropoulos</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Grammalidis</surname></string-name></person-group>, &#x201C;<article-title>A review on early forest fire detection systems using optical remote sensing</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>20</volume>, no. <issue>22</issue>, pp. <fpage>6442</fpage>&#x2013;<lpage>6468</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Dilshad</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Kim</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Seo</surname></string-name></person-group>, &#x201C;<article-title>LocateUAV: Unmanned aerial vehicle location estimation via contextual analysis in an IoT environment</article-title>,&#x201D; in <source>IEEE Internet of Things Journal</source>, pp. <fpage>1</fpage>, <year>2022</year>. DOI <pub-id pub-id-type="doi">10.1109/JIOT.2022.3162300</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Dilshad</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Hwang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Song</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Sung</surname></string-name></person-group>, &#x201C;<article-title>Applications and challenges in video surveillance via drone: A brief survey</article-title>,&#x201D; in <conf-name>Proc. ICTC</conf-name>, <publisher-loc> Jeju Island, SK</publisher-loc>, pp. <fpage>728</fpage>&#x2013;<lpage>732</lpage>, <year>2020</year>. </mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Maula Khan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ahmed</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Recent advances in sensors for fire detection</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>22</volume>, no. <issue>9</issue>, pp. <fpage>3310</fpage>&#x2013;<lpage>3334</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Yar</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Hussain</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Ahmad Khan</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Koundal</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Young Lee</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Vision sensor-based real-time fire detection in resource-constrained IoT environments</article-title>,&#x201D; <source>Computational Intelligence and Neuroscience</source>, vol. <volume>2021</volume>, pp. <fpage>21</fpage>&#x2013;<lpage>36</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Ullah Khan</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Afzal</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Weon Lee</surname></string-name></person-group>, &#x201C;<article-title>Human activity recognition via hybrid deep learning based model</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>22</volume>, no. <issue>1</issue>, pp. <fpage>323</fpage>&#x2013;<lpage>339</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ullah Khan</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Ul Haq</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Muhammad</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Hijji</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Learning to rank: An intelligent system for person reidentification</article-title>,&#x201D; <source>International Journal of Intelligent Systems</source>, vol. <volume>37</volume>, no. <issue>9</issue>, pp. <fpage>5924</fpage>&#x2013;<lpage>5948</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ullah Khan</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Hussain</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ullah</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Wook Baik</surname></string-name></person-group>, &#x201C;<article-title>Deep-Reid: Deep features and autoencoder assisted image patching strategy for person re-identification in smart cities surveillance</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, pp. <fpage>1</fpage>&#x2013;<lpage>22</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Kumar Yadav</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Yadav</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Yadav</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Mittal</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Hussain Wazir</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>A novel reconfiguration technique for improvement of pv reliability</article-title>,&#x201D; <source>Renewable Energy</source>, vol. <volume>182</volume>, pp. <fpage>508</fpage>&#x2013;<lpage>520</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Bansal</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yadav</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Kumar Gupta</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Renewable energy adoption: Design, development, and assessment of solar tree for the mountainous region</article-title>,&#x201D; <source>International Journal of Energy Research</source>, vol. <volume>46</volume>, no. <issue>2</issue>, pp. <fpage>743</fpage>&#x2013;<lpage>759</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Al Mojamed</surname></string-name></person-group>, &#x201C;<article-title>Smart mina: LoraWAN technology for smart fire detection application for hajj pilgrimage</article-title>,&#x201D; <source>Computer Systems Science and Engineering</source>, vol. <volume>40</volume>, no. <issue>1</issue>, pp. <fpage>259</fpage>&#x2013;<lpage>272</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Muhammad</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Mumtaz</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Wook Baik</surname></string-name>, <string-name><given-names>V. H. C.</given-names> <surname>de Albuquerque</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Energy-efficient deep CNN for smoke detection in foggy IoT environment</article-title>,&#x201D; <source>Internet of Things Journal</source>, vol. <volume>6</volume>, no. <issue>6</issue>, pp. <fpage>9237</fpage>&#x2013;<lpage>9245</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. S.</given-names> <surname>Almasoud</surname></string-name></person-group>, &#x201C;<article-title>Intelligent deep learning enabled wild forest fire detection system</article-title>,&#x201D; <source>Computer Systems Science and Engineering</source>, vol. <volume>44</volume>, no. <issue>2</issue>, pp. <fpage>1485</fpage>&#x2013;<lpage>1498</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Yin</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Wan</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xia</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Shi</surname></string-name></person-group>, &#x201C;<article-title>A deep normalization and convolutional neural network for image smoke detection</article-title>,&#x201D; <source>Access</source>, vol. <volume>5</volume>, pp. <fpage>18429</fpage>&#x2013;<lpage>18438</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H. A.</given-names> <surname>Hosni Mahmoud</surname></string-name>, <string-name><given-names>A. H.</given-names> <surname>Alharbi</surname></string-name> and <string-name><given-names>N. S.</given-names> <surname>Alghamdi</surname></string-name></person-group>, &#x201C;<article-title>Time-efficient fire detection convolutional neural network coupled with transfer learning</article-title>,&#x201D; <source>Intelligent Automation &#x0026; Soft Computing</source>, vol. <volume>31</volume>, no. <issue>3</issue>, pp. <fpage>1393</fpage>&#x2013;<lpage>1403</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>An</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Shi</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Yang</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>A robust fire detection model via convolution neural networks for intelligent robot vision sensing</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>22</volume>, no. <issue>8</issue>, pp. <fpage>2929</fpage>&#x2013;<lpage>2949</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Sharma</surname></string-name>, <string-name><given-names>O.</given-names> <surname>Granmo</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Olsen</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Thomas Fidje</surname></string-name></person-group>, &#x201C;<article-title>Deep convolutional neural networks for fire detection in images</article-title>,&#x201D; in <conf-name>Proc. EANN</conf-name>, <publisher-loc>Athens, GR</publisher-loc>, pp. <fpage>183</fpage>&#x2013;<lpage>193</lpage>, <year>2017</year>. </mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Muhammad</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Mehmood</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Rho</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Wook Baik</surname></string-name></person-group>, &#x201C;<article-title>Convolutional neural networks-based fire detection in surveillance videos</article-title>,&#x201D; <source>Access</source>, vol. <volume>6</volume>, pp. <fpage>18174</fpage>&#x2013;<lpage>18183</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Shanmuga Priya</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Vani</surname></string-name></person-group>, &#x201C;<article-title>Deep learning-based forest fire classification and detection in satellite images</article-title>,&#x201D; in <conf-name>Proc. ICoAC</conf-name>, <publisher-loc>Chennai, IN</publisher-loc>, pp. <fpage>61</fpage>&#x2013;<lpage>65</lpage>, <year>2019</year>. </mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Shamsoshoara</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Afghah</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Razi</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Zheng</surname></string-name>, <string-name><given-names>P. Z.</given-names> <surname>Ful&#x00E9;</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Aerial imagery pile burn detection using deep learning: The flame dataset</article-title>,&#x201D; <source>Computer Networks</source>, vol. <volume>193</volume>, no. <issue>4</issue>, pp. <fpage>108001</fpage>&#x2013;<lpage>108011</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Shen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Nguyen</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Qi Yan</surname></string-name></person-group>, &#x201C;<article-title>Flame detection using deep learning</article-title>,&#x201D; in <conf-name>Proc. ICCAR</conf-name>, <publisher-loc>Auckland, NZ</publisher-loc>, pp. <fpage>416</fpage>&#x2013;<lpage>420</lpage>, <year>2018</year>. </mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Kim</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Lee</surname></string-name></person-group>, &#x201C;<article-title>A video-based fire detection using deep learning models</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>9</volume>, no. <issue>14</issue>, pp. <fpage>2862</fpage>&#x2013;<lpage>2881</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Peng</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>Real-time forest smoke detection using hand-designed features and deep learning</article-title>,&#x201D; <source>Computers and Electronics in Agriculture</source>, vol. <volume>167</volume>, pp. <fpage>105029</fpage>&#x2013;<lpage>105047</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Son</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Park</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Yoon</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Song</surname></string-name></person-group>, &#x201C;<chapter-title>Video based smoke and flame detection using convolutional neural network</chapter-title>,&#x201D; in <source>SITIS</source>. <publisher-loc>Las Palmas, ESP</publisher-loc>, <fpage>365</fpage>&#x2013;<lpage>368</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Zhong</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Shi</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Gao</surname></string-name></person-group>, &#x201C;<article-title>A convolutional neural network-based flame detection method in video sequence</article-title>,&#x201D; <source>Signal Image and Video Processing</source>, vol. <volume>12</volume>, no. <issue>8</issue>, pp. <fpage>1619</fpage>&#x2013;<lpage>1627</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Lin</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Wang</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Video smoke detection based on deep saliency network</article-title>,&#x201D; <source>Fire Safety Journal</source>, vol. <volume>105</volume>, pp. <fpage>277</fpage>&#x2013;<lpage>285</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Muhammad</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Elhoseny</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Hassan Ahmed</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Wook Baik</surname></string-name></person-group>, &#x201C;<article-title>Efficient fire detection for uncertain surveillance environment</article-title>,&#x201D; <source>Transactions on Industrial Informatics</source>, vol. <volume>15</volume>, no. <issue>5</issue>, pp. <fpage>3113</fpage>&#x2013;<lpage>3122</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Foggia</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Saggese</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Vento</surname></string-name></person-group>, &#x201C;<article-title>Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape and motion</article-title>,&#x201D; <source>Transactions on Circuits and Systems for Video Technology</source>, vol. <volume>25</volume>, no. <issue>9</issue>, pp. <fpage>1545</fpage>&#x2013;<lpage>1556</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R. C.</given-names> <surname>Lou</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Lan Su</surname></string-name></person-group>, &#x201C;<article-title>Autonomous fire-detection system using adaptive sensory fusion for intelligent security robot</article-title>,&#x201D; <source>Transactions on Mechatronics</source>, vol. <volume>12</volume>, no. <issue>3</issue>, pp. <fpage>274</fpage>&#x2013;<lpage>281</lpage>, <year>2007</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Krizhevsky</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Sutskever</surname></string-name> and <string-name><given-names>G.</given-names> <surname>Everest Hinton</surname></string-name></person-group>, &#x201C;<article-title>ImageNet classification with deep convolutional neural networks</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>60</volume>, no. <issue>6</issue>, pp. <fpage>84</fpage>&#x2013;<lpage>90</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>Muhammad</surname></string-name>, <string-name><surname>Mustaqeem</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Shariq Imran</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Sajjad</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Human action recognition using attention-based LSTM network with dilated CNN features</article-title>,&#x201D; <source>Future Generation Computer Systems</source>, vol. <volume>125</volume>, no. <issue>3</issue>, pp. <fpage>820</fpage>&#x2013;<lpage>830</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>I. U.</given-names> <surname>Haq</surname></string-name>, <string-name><given-names>V. G.</given-names> <surname>Menon</surname></string-name> and <string-name><given-names>S. W.</given-names> <surname>Baik</surname></string-name></person-group>, &#x201C;<article-title>SD-Net: Understanding overcrowded scenes in real-time via an efficient dilated convolutional neural network</article-title>,&#x201D; <source>Journal of Real-Time Image Processing</source>, vol. <volume>18</volume>, no. <issue>5</issue>, pp. <fpage>1729</fpage>&#x2013;<lpage>1743</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Dilshad</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Song</surname></string-name></person-group>, &#x201C;<article-title>Dual-stream siamese network for vehicle re-identification via dilated convolutional layers</article-title>,&#x201D; in <conf-name>Proc. Smart IoT</conf-name>, <publisher-loc>Jeju Island, SK</publisher-loc>, pp. <fpage>350</fpage>&#x2013;<lpage>352</lpage>, <year>2021</year>. </mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>U.</given-names> <surname>Ullah Khan</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Dilshad</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Husain Rehmani</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Umer</surname></string-name></person-group>, &#x201C;<article-title>Fairness in cognitive radio networks: models, measurement methods, applications and future research directions</article-title>,&#x201D; <source>Journal of Network and Computer Applications</source>, vol. <volume>73</volume>, pp. <fpage>12</fpage>&#x2013;<lpage>26</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Hussain</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Hussain</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Ullah</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Wook Baik</surname></string-name></person-group>, &#x201C;<article-title>Vision transformer and deep sequence learning for human activity recognition in surveillance videos</article-title>,&#x201D; <source>Computational Intelligence and Neuroscience</source>, vol. <volume>2022</volume>, pp. <fpage>22</fpage>&#x2013;<lpage>32</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Hussain</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Muhammad</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Shariq Imran</surname></string-name> <etal>et al.</etal></person-group><italic>,</italic> &#x201C;<article-title>Anomaly based camera prioritization in large scale surveillance networks</article-title>,&#x201D; <source>Computers, Materials &#x0026; Continua</source>, vol. <volume>70</volume>, no. <issue>2</issue>, pp. <fpage>2171</fpage>&#x2013;<lpage>2190</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>F.</given-names> <surname>Mehmood</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Ullah</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ahmad</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Kim</surname></string-name></person-group>, &#x201C;<article-title>Object detection mechanism based on deep learning algorithm using embedded IoT devices for smart home appliances control in cot</article-title>,&#x201D; <source>Journal of Ambient Intelligence and Humanized Computing</source>, pp. <fpage>1</fpage>&#x2013;<lpage>17</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Tiwari</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Jain</surname></string-name></person-group>, &#x201C;<article-title>Convolutional capsule network for covid-19 detection using radiography images</article-title>,&#x201D; <source>International Journal of Imaging Systems and Technology</source>, vol. <volume>31</volume>, no. <issue>2</issue>, pp. <fpage>525</fpage>&#x2013;<lpage>539</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Jain</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Tiwari</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Choudhury</surname></string-name> and <string-name><given-names>B. K.</given-names> <surname>Dewangan</surname></string-name></person-group>, &#x201C;<article-title>Gradient and statistical features-based prediction system for covid-19 using chest x-ray images</article-title>,&#x201D; <source>International Journal of Computer Applications in Technology</source>, vol. <volume>66</volume>, no. <issue>3&#x2013;4</issue>, pp. <fpage>362</fpage>&#x2013;<lpage>373</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Tiwari</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Jain</surname></string-name></person-group>, &#x201C;<article-title>A lightweight capsule network architecture for detection of covid-19 from lung CT scans</article-title>,&#x201D; <source>International Journal of Imaging Systems and Technology</source>, vol. <volume>32</volume>, no. <issue>2</issue>, pp. <fpage>419</fpage>&#x2013;<lpage>434</lpage>, <year>2022</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>










