<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">63468</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.063468</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Enhancing Fire Detection with YOLO Models: A Bayesian Hyperparameter Tuning Approach</article-title>
<alt-title alt-title-type="left-running-head">Enhancing Fire Detection with YOLO Models: A Bayesian Hyperparameter Tuning Approach</alt-title>
<alt-title alt-title-type="right-running-head">Enhancing Fire Detection with YOLO Models: A Bayesian Hyperparameter Tuning Approach</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Hoang</surname><given-names>Van-Ha</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Lee</surname><given-names>Jong Weon</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Park</surname><given-names>Chun-Su</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><xref rid="cor1" ref-type="corresp">&#x002A;</xref><email>cspk@skku.edu</email></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Software, Sejong University</institution>, <addr-line>Seoul, 05006, Republic of Korea</addr-line></aff>
<aff id="aff-2"><label>2</label><institution>Department of Computer Education, Sungkyunkwan University</institution>, <addr-line>Seoul, 03063, Republic of Korea</addr-line></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Chun-Su Park. Email: <email>cspk@skku.edu</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>19</day><month>05</month><year>2025</year>
</pub-date>
<volume>83</volume>
<issue>3</issue>
<fpage>4097</fpage>
<lpage>4116</lpage>
<history>
<date date-type="received">
<day>15</day>
<month>1</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>26</day>
<month>3</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_63468.pdf"></self-uri>
<abstract>
<p>Fire can cause significant damage to the environment, economy, and human lives. If fire can be detected early, the damage can be minimized. Advances in technology, particularly in computer vision powered by deep learning, have enabled automated fire detection in images and videos. Several deep learning models have been developed for object detection, including applications in fire and smoke detection. This study focuses on optimizing the training hyperparameters of YOLOv8 and YOLOv10 models using Bayesian Tuning (BT). Experimental results on the large-scale D-Fire dataset demonstrate that this approach enhances detection performance. Specifically, the proposed approach improves the mean average precision at an Intersection over Union (IoU) threshold of 0.5 (mAP50) of the YOLOv8s, YOLOv10s, YOLOv8l, and YOLOv10l models by 0.26, 0.21, 0.84, and 0.63, respectively, compared to models trained with the default hyperparameters. The performance gains are more pronounced in larger models, YOLOv8l and YOLOv10l, than in their smaller counterparts, YOLOv8s and YOLOv10s. Furthermore, YOLOv8 models consistently outperform YOLOv10, with mAP50 improvements of 0.26 for YOLOv8s over YOLOv10s and 0.65 for YOLOv8l over YOLOv10l when trained with BT. These results establish YOLOv8 as the preferred model for fire detection applications where detection performance is prioritized.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Fire detection</kwd>
<kwd>smoke detection</kwd>
<kwd>deep learning</kwd>
<kwd>YOLO</kwd>
<kwd>Bayesian hyperparameter tuning</kwd>
<kwd>hyperparameter optimization</kwd>
<kwd>Optuna</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Information Technology Research Center</funding-source>
<award-id>IITP-2024-RS-2022-00156354</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Ministry of SMEs and Startups</funding-source>
<award-id>RS-2023-00264489</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Fires and wildfires cause significant damage to property, the environment, and human lives. In the United States, approximately 1.5 million fires occurred in 2022, resulting in 3790 civilian fatalities, 13,250 injuries, and $18 billion in property damage, according to the National Fire Protection Association [<xref ref-type="bibr" rid="ref-1">1</xref>]. Surprisingly, a fire occurred every 21 s on average in that year. In 2024, the United States experienced 7734 wildfires, with an average area burned per wildfire of 8299 hectares [<xref ref-type="bibr" rid="ref-2">2</xref>].</p>
<p>Various methods and sensors for early fire detection are essential for minimizing damage and saving lives [<xref ref-type="bibr" rid="ref-3">3</xref>]. Vision sensors, such as cameras, offer several advantages over non-visual techniques like ionization smoke sensors, which detect changes in air ionization levels. Cameras can monitor larger areas, including extensive regions when mounted on drones. They are also easy to install, cost-effective, and adaptable to different environments. Furthermore, camera data can be utilized to analyze fire development, including the speed and direction of spread. Consequently, visual-based fire detection has gained significant attention in recent years.</p>
<p>Early visual-based fire detection methods relied on rule-based or traditional machine learning approaches [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>], which required significant effort in feature design. In contrast, by learning directly from data, deep learning (DL) [<xref ref-type="bibr" rid="ref-6">6</xref>] approaches have achieved state-of-the-art performance in vision tasks, including object detection [<xref ref-type="bibr" rid="ref-7">7</xref>], and have been applied to fire and smoke detection in images and videos. A comprehensive review [<xref ref-type="bibr" rid="ref-8">8</xref>] highlights the growing adoption of DL, particularly YOLO-based architectures [<xref ref-type="bibr" rid="ref-9">9</xref>], for fire and smoke detection. Unlike two-stage detectors like Faster R-CNN [<xref ref-type="bibr" rid="ref-10">10</xref>], YOLO follows a single-stage detection approach, enabling significantly faster inference while maintaining strong accuracy. This speed-accuracy tradeoff is crucial for real-time fire detection, where rapid response minimizes damage. For instance, YOLOv3 [<xref ref-type="bibr" rid="ref-11">11</xref>] not only achieves a slightly higher mAP than Faster R-CNN but is also nine times faster, demonstrating its superiority for time-sensitive applications. The YOLO series has evolved through multiple architectural advancements, with some widely used versions including YOLOv5 [<xref ref-type="bibr" rid="ref-12">12</xref>] (2020), YOLOv8 [<xref ref-type="bibr" rid="ref-13">13</xref>] (2023), and YOLOv10 [<xref ref-type="bibr" rid="ref-14">14</xref>] (2024). Basic information about these models and their performance metrics is illustrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Variants of You Only Look Once (YOLO) versions (v5, v8, v10) compared by number of model parameters, Giga Floating Point Operations (GFLOPs), and mean average precision (mAP) at Intersection over Union (IoU) Thresholds of 0.50 to 0.95 (mAP50-95) on the Common Objects in Context (COCO) dataset [<xref ref-type="bibr" rid="ref-15">15</xref>]. The size of each dot on the chart indicates the model&#x2019;s GFLOPs as an indicator of its computational complexity. One aspect of this study investigates whether the performance of YOLOv8 and YOLOv10 on the D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>] is consistent with their performance on the COCO dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-1.tif"/>
</fig>
<p>While previous studies have applied YOLO models for fire and smoke detection, most have focused on architectural modifications [<xref ref-type="bibr" rid="ref-17">17</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>] while relying on default or manually selected hyperparameters, or have only employed a basic grid search for hyperparameter tuning [<xref ref-type="bibr" rid="ref-19">19</xref>]. This study proposes an alternative approach to enhance YOLO models for fire and smoke detection by optimizing training hyperparameters using Bayesian Tuning (BT). YOLOv8 and YOLOv10 are utilized in this study due to their recent advancements, widespread adoption in academia and industry, and their implementation within the Ultralytics framework [<xref ref-type="bibr" rid="ref-13">13</xref>], enabling direct comparative analysis. Moreover, as illustrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, YOLOv10 surpasses YOLOv8 in performance on the COCO dataset [<xref ref-type="bibr" rid="ref-15">15</xref>]. However, in the context of fire and smoke detection, this trend remains ambiguous in literature. To investigate this, we train YOLOv8 and YOLOv10 models, evaluate their performance, and compare the results on the D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>], a large-scale dataset for fire and smoke detection.</p>
<p>Our main contributions are summarized as follows:
<list list-type="bullet">
<list-item>
<p>We provide an overview of the Optuna framework [<xref ref-type="bibr" rid="ref-20">20</xref>], outline its general applications, and present a procedure for applying Bayesian tuning to optimize the training parameters of YOLOv8 and YOLOv10 models. This approach enables the identification of optimal training configurations for each model.</p></list-item>
<list-item>
<p>We perform the proposed Bayesian tuning process for the YOLOv8 and YOLOv10 models using the D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>] to identify the optimal training hyperparameters for each model. We then utilize these optimized hyperparameters to train the models and evaluate their performance on the same dataset.</p></list-item>
<list-item>
<p>We present detailed analyses of the hyperparameter tuning process and the object detection performance of the YOLOv8 and YOLOv10 models on the D-Fire dataset. Additionally, we offer insights into these models, including speed-accuracy trade-offs, generalization capacity evaluated through testing on a different dataset, and detection failure analysis supported by visualizations.</p></list-item>
</list></p>
<p>The rest of this paper is organized as follows: <xref ref-type="sec" rid="s2">Section 2</xref> provides an overview of the D-Fire dataset, the Optuna framework, and related YOLO-based fire and smoke detection studies. <xref ref-type="sec" rid="s3">Section 3</xref> describes our methodology, including evaluation metrics, the YOLO models utilized, and the proposed hyperparameter tuning protocol. <xref ref-type="sec" rid="s4">Section 4</xref> presents the experimental results, while <xref ref-type="sec" rid="s5">Section 5</xref> presents the key conclusions, discusses the limitations of this study, and outlines potential directions for future research.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<sec id="s2_1">
<label>2.1</label>
<title>D-Fire Benchmark Dataset</title>
<p>Our method requires extensive image data for hyperparameter tuning, training, and testing the experimental YOLO models in real-world scenarios. To meet these requirements, the D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>] was selected due to its large size, and well-defined training, validation, and testing subsets. The D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>] contains a total of 21,527 images. Each image is manually annotated with bounding boxes categorized into two classes: <italic>fire</italic> and <italic>smoke</italic>. Consequently, four scenarios can occur in an image: (1) an image with fire, (2) an image with smoke, (3) an image with both fire and smoke, and (4) an image without any fire or smoke (referred to as &#x201C;None&#x201D;). Statistical data on image categories and bounding box classes of this dataset are presented in <xref ref-type="fig" rid="fig-2">Fig. 2b</xref>. The D-Fire image dataset was divided by its authors into a training set comprising 17,221 images (80% of the total dataset) and a testing set comprising 4306 images (20%). The training set was further split into five folds for cross-validation, facilitating our proposed Bayesian hyperparameter optimization process. After training the models with the optimized parameters, their final performance is evaluated using the testing set.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>D-Fire dataset for fire and smoke detection. (a) Sample images from the D-Fire dataset, demonstrating variation in resolution, lighting conditions, and sharpness. Bounding box sizes also exhibit significant variability; (b) Statistics of the D-Fire database by image category and bounding box classes</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-2.tif"/>
</fig>
<p><bold>Dataset characteristics:</bold> The D-Fire dataset contains a diverse collection of fire/smoke images, exhibiting variations in resolution, lighting conditions, sharpness, surrounding environments, and bounding box sizes, as illustrated by sample images in <xref ref-type="fig" rid="fig-2">Fig. 2a</xref>. However, this dataset lacks statistical summary regarding fire/smoke conditions, such as whether incidents occurred in indoor, outdoor, urban, or rural environments. To further characterize the dataset, we utilized FiftyOne [<xref ref-type="bibr" rid="ref-21">21</xref>] to evaluate thematic patterns in fire and smoke conditions based on image uniqueness. A quick analysis was conducted by examining samples in the dataset at different levels of uniqueness computed by [<xref ref-type="bibr" rid="ref-21">21</xref>]. The analysis revealed that most images depict outdoor scenes, primarily in mountainous, forested, or grassland regions, with urban settings being less prevalent. Additionally, fire and smoke are often observed at significant distances from the camera. A limitation of this dataset is the scarcity of indoor images. Regarding temporal distribution, daytime images dominate, followed by nighttime images, with a minor fraction captured during pre-evening hours.</p>
<p><bold>Dataset selection:</bold> Previous datasets employed in YOLO-based fire and smoke detection studies have been characterized by being either large but not publicly accessible [<xref ref-type="bibr" rid="ref-22">22</xref>], small in scale [<xref ref-type="bibr" rid="ref-17">17</xref>], or lacking diversity in fire and smoke conditions [<xref ref-type="bibr" rid="ref-23">23</xref>]. In this study, we selected the D-Fire dataset due to its extensive image collection, predefined train-validation-test partitioning, and strong resemblance to real-world surveillance footage. Its high proportion of outdoor images, particularly those captured from a distance, makes it well-suited for developing fire/smoke detection models for public safety applications.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Optuna Framework for Hyperparameter Optimization</title>
<p>The performance of DL models is significantly influenced by hyperparameters such as the learning rate and the configuration values of optimizers [<xref ref-type="bibr" rid="ref-24">24</xref>]. A review paper [<xref ref-type="bibr" rid="ref-25">25</xref>] presents several methods for optimizing hyperparameters, including grid search, random search, and Bayesian optimization, providing foundational context for our approach. Although grid search and random search are straightforward to implement, they can be computationally expensive. In contrast, Bayesian optimization [<xref ref-type="bibr" rid="ref-25">25</xref>] is regarded as more efficient, requiring fewer evaluations to identify a suitable set of hyperparameters.</p>
<p>The Optuna framework [<xref ref-type="bibr" rid="ref-20">20</xref>] is a popular tool for hyperparameter optimization that provides an easy-to-use programming interface in Python. It allows for simple parallelization and supports various optimization algorithms. Optuna has demonstrated good performance in hyperparameter optimization tasks, as highlighted in [<xref ref-type="bibr" rid="ref-26">26</xref>]. The general workflow of Optuna is as follows:
<list list-type="simple">
<list-item><label>1.</label><p><bold>Define the Search Space</bold><bold>:</bold> Specify the hyperparameters to optimize and their respective search ranges.</p></list-item>
<list-item><label>2.</label><p><bold>Define the Objective Function</bold><bold>:</bold> Formulate the function to be optimized (e.g., maximize mAP50 of a YOLO model).</p></list-item>
<list-item><label>3.</label><p><bold>Run the Optimization Trials</bold><bold>:</bold> Each trial involves executing the objective function with a specific set of hyperparameters. The next set of hyperparameters is determined based on the results of previous trials using a <bold>sampler</bold>, which implements an optimization algorithm (e.g., random search).</p></list-item>
<list-item><label>4.</label><p><bold>Select the Best Hyperparameters</bold><bold>:</bold> After all trials are completed, the optimal set of hyperparameters are those that yield the best value for the objective function.</p></list-item>
</list></p>
<p>Optuna provides various samplers that correspond to different optimization algorithms. Common samplers include GridSampler, RandomSampler, and TPESampler, which implement grid search, random search, and Tree-structured Parzen Estimator (TPE) [<xref ref-type="bibr" rid="ref-27">27</xref>], respectively.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>YOLO-Based Fire and Smoke Detection</title>
<p>Numerous studies have explored YOLO architectures for fire and smoke detection, primarily focusing on architectural modifications while often neglecting hyperparameter optimization. For instance, the work [<xref ref-type="bibr" rid="ref-17">17</xref>] proposed modifications to YOLOv8 by incorporating an advanced attention mechanism to enhance detection performance but relied on manually selected hyperparameters, which may not be optimal. Additionally, their used dataset is not publicly available and contains only <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mo>&#x223C;</mml:mo></mml:math></inline-formula>4400 images, significantly smaller than the D-Fire dataset. Similarly, the authors in [<xref ref-type="bibr" rid="ref-18">18</xref>] introduced an enhanced YOLOv8n-based architecture with modifications to the backbone, head, and bounding box loss function to improve performance on the D-Fire dataset. However, they did not optimize hyperparameters and only tested YOLOv8n and YOLOv10n, limiting insights into larger model variants. Other studies utilizing YOLOv10 for fire detection also lacked hyperparameter tuning and employed private or small-scale/less diverse datasets, such as the 6000-image dataset (primarily consisting of indoor images) in [<xref ref-type="bibr" rid="ref-23">23</xref>] or dataset focused on specific scenarios instead of general fire/smoke detection cases, such as the fire ship detection dataset in [<xref ref-type="bibr" rid="ref-28">28</xref>].</p>
<p>A few studies have conducted hyperparameter optimization for YOLO models. The authors in [<xref ref-type="bibr" rid="ref-19">19</xref>] conducted hyperparameter tuning for YOLOv5 models on the D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>], however they only used a simple grid search strategy for selecting the best hyperparameters. In contrast, the study by [<xref ref-type="bibr" rid="ref-29">29</xref>] optimized YOLOv8 hyperparameters using a more sophisticated hyperparameter tuning called one-factor-at-a-time (OFAT) approach, but this study tested with a publicly available dataset containing only 10,000 images, significantly fewer than the D-Fire dataset utilized in this research.</p>
<p>Overall, research on YOLO-based fire and smoke detection has primarily emphasized architectural modifications, with limited attention to hyperparameter optimization and a lack of performance evaluations across multiple YOLOv8 and YOLOv10 variants on large-scale, publicly available datasets. Our study aims to address these gaps by proposing an advanced hyperparameter optimization technique, Bayesian tuning, to improve the performance of both small and large variants of YOLOv8 and YOLOv10. Additionally, we provide a comprehensive performance comparison of these models using the D-Fire dataset, offering insights into their effectiveness for fire and smoke detection.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Proposed Method for Hyperparameter Tuning</title>
<p>This study focuses on optimizing training hyperparameters to enhance the performance of YOLO models (YOLOv8 and YOLOv10) for the fire/smoke detection task. The Optuna framework [<xref ref-type="bibr" rid="ref-20">20</xref>] is employed to perform Bayesian hyperparameter optimization. The study workflow, illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, employs 5-fold cross-validation on the D-Fire dataset training set [<xref ref-type="bibr" rid="ref-16">16</xref>] to identify optimal hyperparameters that maximize the average mAP across folds (see <xref ref-type="sec" rid="s3_2">Section 3.2</xref>). These parameters are then applied to train each model on the full training set, while model performance is assessed using the test set.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Workflow of this paper</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-3.tif"/>
</fig>
<sec id="s3_1">
<label>3.1</label>
<title>YOLO Models Used in This Study</title>
<p>As mentioned in <xref ref-type="sec" rid="s1">Section 1</xref>, YOLO-based object detectors have evolved through multiple versions. In this study, we focus on two more recent versions of YOLO: YOLOv8 [<xref ref-type="bibr" rid="ref-13">13</xref>] and YOLOv10 [<xref ref-type="bibr" rid="ref-14">14</xref>]. These YOLO versions are developed based on the YOLO Ultralytics framework [<xref ref-type="bibr" rid="ref-13">13</xref>], which offers ease of use through a simple user interface, demonstrating good speed and accuracy. The framework facilitates the training, exporting, and deployment of models across various environments and provides multiple model variants with the trade-off between speed and accuracy. This section presents an overview of the key architectural features of YOLOv8 and YOLOv10, along with their model variants to contextualize our study. Detailed information on YOLOv8 and YOLOv10 can be found in the original publications [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>].</p>
<sec id="s3_1_1">
<label>3.1.1</label>
<title>YOLOv8</title>
<p>The network architecture of YOLOv8 [<xref ref-type="bibr" rid="ref-13">13</xref>] (<xref ref-type="fig" rid="fig-4">Fig. 4a</xref>) incorporates several significant changes, including the replacement of the C3 module from YOLOv5 [<xref ref-type="bibr" rid="ref-12">12</xref>] with the new C2f module. Additionally, modifications to the kernel sizes in the backbone and bottleneck layers have been implemented. A key innovation is the introduction of a decoupled head with an anchor-free mechanism, which enhances both detection speed and accuracy.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>YOLO architectures used in this study. (a) YOLOv8 architecture (reproduced from [<xref ref-type="bibr" rid="ref-30">30</xref>] under the CC BY-NC-ND 4.0 license); (b) YOLOv10 architecture with Dual Label Assignments (adapted from [<xref ref-type="bibr" rid="ref-14">14</xref>] under the CC BY 4.0 license)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-4.tif"/>
</fig>
<p>YOLOv8 provides five model variants, including YOLOv8n (nano), YOLOv8s (small), YOLOv8m (medium), YOLOv8l (large), and YOLOv8x (extra large), offering a range of options based on computational efficiency and performance requirements. In this paper, we utilize the YOLOv8s and YOLOv8l models for experimentation, as they represent two distinct categories: a smaller and faster model with reduced accuracy (YOLOv8s), and a larger and more accurate model with slower processing speed (YOLOv8l).</p>
</sec>
<sec id="s3_1_2">
<label>3.1.2</label>
<title>YOLOv10</title>
<p>In May 2024, YOLOv10 [<xref ref-type="bibr" rid="ref-14">14</xref>] was introduced as an updated version of YOLO, developed by a research team at Tsinghua University and based on the Ultralytics framework [<xref ref-type="bibr" rid="ref-13">13</xref>]. The authors aimed to enhance speed and accuracy through two key implementations: 1) NMS (Non-Maximum Suppression)-Free training utilizes Dual Label Assignments with an additional one-to-one head alongside the traditional one-to-many head (as shown in <xref ref-type="fig" rid="fig-4">Fig. 4b</xref>), eliminating the NMS post-processing step and increasing processing speed and 2) an efficiency-accuracy driven model design that features lightweight classification heads, decoupled downsampling strategies, and a rank-guided block design to reduce layer redundancy [<xref ref-type="bibr" rid="ref-31">31</xref>]. Additionally, YOLOv10 employs larger convolution filters for improved context capture and a partial self-attention mechanism to reduce complexity.</p>
<p>Similar to YOLOv8, YOLOv10 presents multiple model variants: YOLOv10n (nano), YOLOv10s (small), YOLOv10m (medium), YOLOv10b (balanced), YOLOv10l (large), and YOLOv10x (extra large). For this study, YOLOv10s was selected to represent the small and fast model, which is less accurate, while YOLOv10l was chosen to represent the large and accurate model, although it operates at a slower speed.</p>
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, YOLOv10 outperforms YOLOv8 on the COCO dataset [<xref ref-type="bibr" rid="ref-15">15</xref>] in terms of mAP50-95, with fewer parameters and lower GFLOPs. However, its performance advantage over YOLOv8 for fire and smoke detection remains unclear. One objective of this study is to address this gap by comparing the performance of YOLOv8 and YOLOv10 on the D-Fire dataset [<xref ref-type="bibr" rid="ref-16">16</xref>].</p>
</sec>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Evaluation Metrics</title>
<p>In this study, we employ common evaluation metrics for object detection, including Precision (<italic>P</italic>), Recall (<italic>R</italic>), F1-score, and Mean Average Precision (mAP) for performance assessment. <italic>P</italic> measures the proportion of correctly predicted positive instances among all predicted positives, while <italic>R</italic> evaluates the proportion of true positives relative to all actual positives in the ground truth. The F1-score provides a balance between <italic>P</italic> and <italic>R</italic> by computing their harmonic mean. The formulas for <italic>P</italic>, <italic>R</italic>, and <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>F</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> are given as follows:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mspace width="1em" /><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mspace width="1em" /><mml:mrow><mml:mtext>F1-score</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>where <italic>TP</italic>, <italic>FP</italic> and <italic>FN</italic> represent true positive, false positive and false negative, respectively.</p>
<p>The primary evaluation metric used in this study is mAP [<xref ref-type="bibr" rid="ref-32">32</xref>]. Initially, Average Precision (AP) is computed for each class as the area under the precision-recall curve, as described in [<xref ref-type="bibr" rid="ref-32">32</xref>]. In the computation of AP, a specific threshold for Intersection over Union (IoU) is established (e.g., 0.25, 0.5, 0.75) to categorize predictions into true positives, false positives, and false negatives. In this context, IoU acts as a metric for assessing the accuracy of predicted bounding boxes in relation to the ground truth bounding box, as defined in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>I</mml:mi><mml:mi>o</mml:mi><mml:mi>U</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>A</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2229;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>A</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>a</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup><mml:mo>&#x222A;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> and <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:msup><mml:mrow><mml:mi>&#x0212C;</mml:mi></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> represent the predicted and ground truth bounding boxes, respectively.</p>
<p>Subsequently, the mAP (at a specified IoU threshold) is calculated as the mean of AP across all classes <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow></mml:math></inline-formula> (in our case, <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mtext>fire</mml:mtext><mml:mo>,</mml:mo><mml:mtext>smoke</mml:mtext><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>). In this study, a conventional IoU threshold of 0.50 is used to compute mAP50, as follows:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mi>m</mml:mi><mml:mi>A</mml:mi><mml:mi>P</mml:mi><mml:mn>50</mml:mn><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow></mml:mfrac><mml:munderover><mml:mo movablelimits="false">&#x2211;</mml:mo><mml:mrow><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:mi>&#x1D49E;</mml:mi></mml:mrow></mml:mrow></mml:munderover><mml:mi>A</mml:mi><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>c</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>For the assessment of computational costs, we use Floating Point Operations (FLOPs) to quantify the number of floating-point operations required for model execution. Specifically, we express the computational cost in terms of Giga Floating Point Operations (GFLOPs), representing billions of FLOPs. An increase in GFLOPs indicates a higher computational cost, while a decrease suggests lower computational demand.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Practical Bayesian Hyperparameter Tuning Using Optuna</title>
<p>Our proposed tuning process follows the general workflow of the Optuna framework, as described in <xref ref-type="sec" rid="s2_2">Section 2.2</xref>. At the core of our approach, the Tree-structured Parzen Estimator (TPE) algorithm [<xref ref-type="bibr" rid="ref-27">27</xref>] is employed as the sampler for hyperparameter optimization. The TPE method is a widely recognized Bayesian optimization method designed to enhance the efficiency of hyperparameter searches [<xref ref-type="bibr" rid="ref-33">33</xref>]. During the optimization process utilizing the TPE sampler, for each parameter in each trial, a set of candidate values for each parameter is derived based on the results of previous trials. These values are then modeled with two Gaussian Mixture Models (GMMs) <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>l</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, representing two distinct groups:
<list list-type="bullet">
<list-item>
<p>GMM <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>l</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>: Fitted to the values that yield the best results for the objective function.</p></list-item>
<list-item>
<p>GMM <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>: Fitted to the remaining values.</p></list-item>
</list></p>
<p>The next hyperparameter value is selected to maximize the ratio <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mfrac><mml:mrow><mml:mi>l</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>g</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:math></inline-formula>, thereby guiding the search towards the most promising hyperparameter values.</p>
<fig id="fig-9">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-9.tif"/>
</fig>
<p>Algorithm 1 outlines the proposed hyperparameter optimization protocol, while <xref ref-type="fig" rid="fig-5">Fig. 5</xref> provides the flowchart of this procedure as implemented using the Optuna framework. Due to constraints related to computational resources and time, particularly when conducting 5-fold cross-validation on DL models, we held a specific set of parameters constant during the optimization process. The parameters fixed included:</p>
<p><list list-type="bullet">
<list-item>
<p>Image size (<inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mi>g</mml:mi><mml:mi>s</mml:mi><mml:mi>z</mml:mi></mml:math></inline-formula>): Fixed at <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mn>640</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>640</mml:mn></mml:math></inline-formula> as this size is a standard choice for YOLO models, providing a balance between detection performance and inference speed. Increasing the image size may enhance accuracy but imposes substantial computational overhead, resulting in slower training and higher resource demands (e.g., GPU memory).</p></list-item>
<list-item>
<p>Optimizer (<inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>o</mml:mi><mml:mi>p</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi></mml:math></inline-formula>): Stochastic Gradient Descent (SGD) [<xref ref-type="bibr" rid="ref-34">34</xref>] was chosen due to its effectiveness in training deep learning models and superior generalization performance compared to alternatives like Adam Optimizer, as suggested by the study [<xref ref-type="bibr" rid="ref-35">35</xref>].</p></list-item>
<list-item>
<p>Training epochs (<inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi>e</mml:mi><mml:mi>p</mml:mi><mml:mi>o</mml:mi><mml:mi>c</mml:mi><mml:mi>h</mml:mi></mml:math></inline-formula>): Based on preliminary experiments, we determined that 200 epochs were sufficient for training these YOLO models, with early stopping enabled (a patience of 40 epochs, i.e., training halts if the loss does not improve for 40 consecutive epochs).</p></list-item>
</list></p>

<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Flowchart illustrating the implementation of Algorithm 1 using the Optuna framework for hyperparameter optimization, with the objective of maximizing the average mAP50 over 5-fold cross-validation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-5.tif"/>
</fig>
<p>The search for optimized hyperparameters was conducted within well-defined search spaces, which included:
<list list-type="bullet">
<list-item>
<p>The learning rate (<inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>l</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula>): The search space for (<inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>l</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula>) was defined as <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>3</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> with logarithmic sampling, following recommendations from the work [<xref ref-type="bibr" rid="ref-36">36</xref>].</p></list-item>
<list-item>
<p>Momentum (<inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:math></inline-formula>): Sampled from the range <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mo stretchy="false">[</mml:mo><mml:mn>0.9</mml:mn><mml:mo>,</mml:mo><mml:mn>0.99</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> using logarithmic sampling, as suggested by the study [<xref ref-type="bibr" rid="ref-37">37</xref>].</p></list-item>
<list-item>
<p>Weight decay (<inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:math></inline-formula>): The search space for <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:math></inline-formula> was defined as <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mo stretchy="false">[</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msup><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> with logarithmic sampling, following guidelines from the literature [<xref ref-type="bibr" rid="ref-37">37</xref>].</p></list-item>
<list-item>
<p>Batch size (<inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>b</mml:mi><mml:mi>s</mml:mi><mml:mi>z</mml:mi></mml:math></inline-formula>): Values of <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mn>16</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mn>32</mml:mn></mml:math></inline-formula>, and <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mn>64</mml:mn></mml:math></inline-formula> were considered as they are commonly used for training deep learning models and fit within our hardware constraints.</p></list-item>
</list></p>
<p>The objective of this tuning step was to identify a set of hyperparameters that maximizes the average mAP50 across a 5-fold cross-validation on the D-Fire image dataset for each experimental YOLO model, specifically YOLOv8s, YOLOv8l, YOLOv10s, and YOLOv10l by systematically exploring the defined hyperparameter space using the TPE algorithm.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experimental Results</title>
<p>The experiments were conducted using a Windows 10 Pro 22H2 operating system with 64 GB of RAM and two NVIDIA GeForce RTX 3090 GPUs, each with 24 GB of VRAM. The NVIDIA CUDA Toolkit version 11.8 was used in conjunction with PyTorch 2.1.0 for training YOLO models. Pretrained weights for the YOLOv8 and YOLOv10 models from the Ultralytics framework [<xref ref-type="bibr" rid="ref-13">13</xref>] served as initial weights during the hyperparameter tuning and final training processes. The hyperparameter optimization process was performed using Optuna 3.6.1 [<xref ref-type="bibr" rid="ref-20">20</xref>] on the D-Fire training set, which is divided into five folds as described in <xref ref-type="sec" rid="s2_1">Section 2.1</xref>. This process follows the procedure outlined in Algorithm 1 in <xref ref-type="sec" rid="s3_3">Section 3.3</xref>, with a total of 20 trials (<inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mrow><mml:mi>&#x1D4AF;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>20</mml:mn></mml:math></inline-formula>) conducted for each YOLO model. The choice of 20 trials is justified by two key factors. First, preliminary tests with YOLOv8s showed no significant mAP improvement beyond 20 trials across five-fold cross-validation, suggesting signs of convergence. Second, computational constraints&#x2013;e.g., a single YOLOv10l trial requiring around 27 h with 5-fold cross-validation across multiple models (YOLOv8s, YOLOv8l, YOLOv10s, YOLOv10l)&#x2013;made exceeding 20 trials impractical given our resource limitations. Setting <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mrow><mml:mi>&#x1D4AF;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mn>20</mml:mn></mml:math></inline-formula> ensures a fair comparison across models while balancing optimization and resource availability, though additional trials could potentially improve results at the cost of significantly increased tuning time. The relationships between hyperparameters across all trials for each YOLO model are visualized in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. Tuning results for YOLOv8s, YOLOv8l, YOLOv10s, and YOLOv10l are summarized in <xref ref-type="table" rid="table-1">Table 1</xref>. <xref ref-type="fig" rid="fig-6">Fig. 6</xref> highlights distinct hyperparameter selection patterns, learning rate (<inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>l</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula>) and the weight decay (<inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:math></inline-formula>) favoring smaller values, while momentum (<inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:math></inline-formula>) and batch size (<inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>b</mml:mi><mml:mi>s</mml:mi><mml:mi>z</mml:mi></mml:math></inline-formula>) exhibit broader distributions across their search ranges. More specifically, all models (YOLOv8s, YOLOv8l, YOLOv10s, and YOLOv10l) predominantly favor a small <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mi>l</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula>, primarily within the range of 0.001 to 0.01. Similarly, <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>w</mml:mi><mml:mi>d</mml:mi></mml:math></inline-formula> values are generally low, ranging from 0.00001 to 0.01. In contrast, the <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:math></inline-formula> parameter shows variability across its entire search range. Additionally, YOLOv8s, YOLOv8l, and YOLOv10s tend to prefer smaller batch sizes (16 or 32), while YOLOv10l exhibits a preference for a larger batch size of 64.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>The parallel coordinate plot illustrates the interrelationships among the tuning hyperparameters across all Optuna trials for each YOLO model</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-6.tif"/>
</fig><table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Optimal hyperparameters and average mAP50 values across five folds for each YOLO model after Optuna tuning process</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Network</th>
<th>Learning rate</th>
<th>Batch size</th>
<th>Momentum</th>
<th>Weight decay</th>
<th>Average mAP50 (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv8s [<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td>0.006948</td>
<td>16</td>
<td>0.928685</td>
<td>0.000379</td>
<td>78.59</td>
</tr>
<tr>
<td>YOLOv8l [<xref ref-type="bibr" rid="ref-13">13</xref>]</td>
<td>0.001032</td>
<td>16</td>
<td>0.960639</td>
<td>0.008724</td>
<td>79.91</td>
</tr>
<tr>
<td>YOLOv10s [<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td>0.001117</td>
<td>16</td>
<td>0.940489</td>
<td>0.001015</td>
<td>78.24</td>
</tr>
<tr>
<td>YOLOv10l [<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td>0.002797</td>
<td>64</td>
<td>0.923013</td>
<td>0.000102</td>
<td>79.07</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As shown in <xref ref-type="table" rid="table-1">Table 1</xref>, despite YOLOv10 being the later version, YOLOv8 models tend to excel in fire and smoke detection tasks within the D-Fire dataset. This observation may seem counterintuitive, given that YOLOv10 achieves higher mAP scores on the COCO dataset, as illustrated in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. However, model performance is inherently dataset-dependent; a model may excel in one dataset while underperforming in another. The superior performance of YOLOv8 over YOLOv10 in fire and smoke detection was exhibited in other studies [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-38">38</xref>]. The study [<xref ref-type="bibr" rid="ref-18">18</xref>] found that YOLOv8n outperforms YOLOv10n in terms of mAP50 in the D-Fire dataset. Similarly, in the work [<xref ref-type="bibr" rid="ref-38">38</xref>], YOLOv8s surpasses YOLOv10s in mAP50 and mAP50-95.</p>

<p>Preliminary tuning results suggest that YOLOv8 models may outperform their YOLOv10 counterparts; however, definitive performance evaluation requires testing on the test set. Therefore, using the optimal hyperparameter sets for each model, all YOLO variants (YOLOv8s, YOLOv8l, YOLOv10s, and YOLOv10l) were trained five times on the complete D-Fire image dataset and subsequently evaluated on the test set. The average performance metrics from these five training sessions are presented in <xref ref-type="table" rid="table-2">Table 2</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Performance metrics of selected YOLO models with varying hyperparameter selection strategies on the D-Fire image dataset test set, compared with results from other studies. Each metric represents the mean performance of models derived from five training iterations. Abbreviations for hyperparameter selection methods: Grid Search (GS), Manual Selection (MS), Ultralytics Default (UD), and Bayesian Tuning (BT) &#x003D; our proposed method</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Network</th>
<th>mAP50 (%)</th>
<th><inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">AP</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">fire</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> (%)</th>
<th><inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">AP</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">smoke</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> (%)</th>
<th>Precision (%)</th>
<th>Recall (%)</th>
<th>F1-score</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv5s [<xref ref-type="bibr" rid="ref-19">19</xref>] (GS)</td>
<td>78.15</td>
<td>72.45</td>
<td>83.85</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>0.76</td>
</tr>
<tr>
<td>YOLOv5l [<xref ref-type="bibr" rid="ref-19">19</xref>] (GS)</td>
<td>79.1</td>
<td>72.32</td>
<td>85.88</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>0.78</td>
</tr>
<tr>
<td>GTP-YOLO [<xref ref-type="bibr" rid="ref-39">39</xref>]</td>
<td>78.2</td>
<td>72.6</td>
<td>83.7</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>ESFD-YOLOv8n <sup>a</sup> [<xref ref-type="bibr" rid="ref-18">18</xref>]</td>
<td>79.4</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>80.1</td>
<td>72.7</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>YOLOv8s (MS) [<xref ref-type="bibr" rid="ref-40">40</xref>]</td>
<td>78.7</td>
<td>72.5</td>
<td>84.9</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
<tr>
<td>YOLOv8s [<xref ref-type="bibr" rid="ref-13">13</xref>] (UD)</td>
<td>79.27</td>
<td>72.78</td>
<td>85.75</td>
<td>79.03</td>
<td>74.01</td>
<td>0.764</td>
</tr>
<tr>
<td>YOLOv8s [<xref ref-type="bibr" rid="ref-13">13</xref>] (BT)</td>
<td>79.53</td>
<td>73.18</td>
<td>85.89</td>
<td>78.29</td>
<td>74.33</td>
<td>0.762</td>
</tr>
<tr>
<td>YOLOv8l [<xref ref-type="bibr" rid="ref-13">13</xref>] (UD)</td>
<td>79.64</td>
<td>73.33</td>
<td>85.96</td>
<td>79.99</td>
<td>74.71</td>
<td>0.772</td>
</tr>
<tr>
<td>YOLOv8l [<xref ref-type="bibr" rid="ref-13">13</xref>] (BT)</td>
<td>80.48</td>
<td>74.66</td>
<td>86.3</td>
<td>80.25</td>
<td>74.32</td>
<td>0.772</td>
</tr>
<tr>
<td>YOLOv10s [<xref ref-type="bibr" rid="ref-14">14</xref>] (UD)</td>
<td>79.06</td>
<td>72.79</td>
<td>85.32</td>
<td>80.11</td>
<td>72.97</td>
<td>0.763</td>
</tr>
<tr>
<td>YOLOv10s [<xref ref-type="bibr" rid="ref-14">14</xref>] (BT)</td>
<td>79.27</td>
<td>72.94</td>
<td>85.61</td>
<td>79.94</td>
<td>73.39</td>
<td>0.765</td>
</tr>
<tr>
<td>YOLOv10l [<xref ref-type="bibr" rid="ref-14">14</xref>] (UD)</td>
<td>79.2</td>
<td>72.79</td>
<td>85.61</td>
<td>80.47</td>
<td>73.28</td>
<td>0.767</td>
</tr>
<tr>
<td>YOLOv10l [<xref ref-type="bibr" rid="ref-14">14</xref>] (BT)</td>
<td>79.83</td>
<td>73.2</td>
<td>86.47</td>
<td>79.61</td>
<td>75.03</td>
<td>0.772</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-2fn1" fn-type="other">
<p>Note: <sup>a</sup>This work used 10% of the D-Fire dataset through an arbitrary split, while other methods utilized the official test set, which constitutes 20% of the dataset.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>For comparative purposes, we also included performance metrics from other studies on the D-Fire dataset [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>,<xref ref-type="bibr" rid="ref-40">40</xref>], as it is essential to evaluate the models&#x2019; performances on the same large-scale dataset. Note that <xref ref-type="table" rid="table-2">Table 2</xref> presents the performance of YOLO models under different hyperparameter selection methods. These methods include:
<list list-type="bullet">
<list-item>
<p>Grid search (GS): Exhaustive search over a predefined set of hyperparameters.</p>
</list-item>
<list-item>
<p>Manual selection (MS): Manual selection of hyperparameters based on prior knowledge or experience.</p></list-item>
<list-item>
<p>Ultralytics default (UD): Default hyperparameters provided by the Ultralytics framework [<xref ref-type="bibr" rid="ref-13">13</xref>].</p></list-item>
<list-item>
<p>Bayesian tuning (BT): Our proposed method.</p></list-item>
</list></p>
<p><bold>Comparison Rules and Significance Test</bold><bold>:</bold> To compare our work with existing studies, we utilize mean values for each metric due to the unavailability of the data required for conducting significance tests. To assess the performance of YOLOv8 and YOLOv10 models under different hyperparameter selection strategies, this study employs the Almost Stochastic Order (ASO) test, a statistical significance test originally proposed by [<xref ref-type="bibr" rid="ref-41">41</xref>] and reimplemented by [<xref ref-type="bibr" rid="ref-42">42</xref>] for enhanced usability. ASO is suitable for comparing the performance metrics of two models across multiple runs on the same test set. The test takes as input two lists of metric <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mi>x</mml:mi></mml:math></inline-formula> scores: one for Model A and one for Model B. At a 95% confidence level, it computes the violation ratio <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>&#x03F5;</mml:mi></mml:math></inline-formula>, which quantifies the extent to which Model A stochastically dominates Model B in terms of metric <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>x</mml:mi></mml:math></inline-formula>. If <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mi>&#x03F5;</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mn>0.5</mml:mn></mml:math></inline-formula>, Model A is considered superior to Model B. The smaller <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>&#x03F5;</mml:mi></mml:math></inline-formula> is, the more reliable the conclusion. Otherwise, no significant dominance is established.</p>
<p><bold>Comparison across YOLO Versions</bold><bold>:</bold> The results reveal consistent trends: YOLOv8 and YOLOv10 models generally outperform their YOLOv5 counterparts and derivatives (e.g., GTP-YOLO backbone) in terms of mAP50. For instance, YOLOv8s (UD) achieved an mAP50 of 79.27, exceeding YOLOv5s (GS) at 78.15 and GTP-YOLO at 78.20. Interestingly, YOLOv5l (GS) achieved the highest F1-score (0.78), with a marginal difference compared to other models (0.76&#x2013;0.77).</p>
<p>Furthermore, <xref ref-type="table" rid="table-3">Table 3</xref>, validated by the ASO test, confirms the findings from <xref ref-type="table" rid="table-1">Table 1</xref>, indicating that YOLOv8 models consistently outperform their YOLOv10 counterparts in terms of mAP50 under identical hyperparameter selection strategies. For instance, YOLOv8s (UD) achieved an mAP50 of 79.27, exceeding YOLOv10s (UD) at 79.06. Similarly, YOLOv8l (BT) attained an mAP50 of 80.48, surpassing YOLOv10l (BT) by 0.65.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Comparison of mAP50 and F1 scores for YOLOv8 and YOLOv10. The notation <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> indicates the ASO test results for assessing whether model A is superior to model B based on the selected metrics. mAP50/F1 Diff. denotes the difference in mAP50/F1 score between model A and model B (<italic>A</italic>&#x2212;<italic>B</italic>)</title>
</caption>
<table>
<colgroup>
<col width="20mm"/>
<col width="25mm"/>
<col width="25mm"/>
<col width="15mm"/>
<col width="35mm"/>
<col width="15mm"/>
</colgroup>
<thead>
<tr>
<th>Model (A)</th>
<th>Baseline (B)</th>
<th>mAP50 ASO Test</th>
<th>mAP50 Diff.</th>
<th>F1 ASO Test</th>
<th>F1 Diff.</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv8s (UD)</td>
<td>YOLOv10s (UD)</td>
<td><inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.2833</mml:mn></mml:math></inline-formula></td>
<td>0.21</td>
<td><inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.8287</mml:mn><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula></td>
<td>&#x2013;</td>
</tr>
<tr>
<td>YOLOv8s (BT)</td>
<td>YOLOv10s (BT)</td>
<td><inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.4141</mml:mn></mml:math></inline-formula></td>
<td>0.26</td>
<td><inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.3987</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mo stretchy="false">(</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>0.003</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></td>
</tr>
<tr>
<td>YOLOv8l (UD)</td>
<td>YOLOv10l (UD)</td>
<td><inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.0703</mml:mn></mml:math></inline-formula></td>
<td>0.44</td>
<td><inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.0045</mml:mn></mml:math></inline-formula></td>
<td>0.005</td>
</tr>
<tr>
<td>YOLOv8l (BT)</td>
<td>YOLOv10l (BT)</td>
<td><inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula></td>
<td>0.65</td>
<td><inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.7664</mml:mn></mml:math></inline-formula></td>
<td>&#x2013;</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>Performance Comparison between Model Variants</bold><bold>:</bold> Larger models within the same YOLO version (L variants) generally outperform their smaller counterparts (S variants) regarding mAP50 and F1-score, as expected due to their higher number of parameters enabling better handling of complex scenarios. For instance, YOLOv8l (UD) achieved the mAP50 of 79.64 compared to YOLOv8s (UD) at 79.27, representing an improvement of 0.37 (ASO test &#x003D; 0.0085). Similarly, when considering models using Bayesian Tuning, YOLOv10l (BT) got 79.83 compared to YOLOv10s (BT) at 79.27, representing an improvement of 0.14 (ASO test &#x003D; 0.07165). Furthermore, both YOLOv8l and YOLOv10l demonstrated higher F1-scores than their smaller counterparts (YOLOv8s and YOLOv10s) when using the same hyperparameter selection strategies, with a slight improvement of less than 0.01 (all ASO tests &#x0003C; 0.011).</p>
<p><bold>Effectiveness of Bayesian Hyperparameter Tuning</bold><bold>:</bold> Bayesian tuning (BT) demonstrated superior performance in terms of mAP50 compared to the MS and UD hyperparameter selection strategies, particularly for larger models. This finding is statistically confirmed by the results presented in <xref ref-type="table" rid="table-4">Table 4</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Comparison of F1 and mAP50 scores for YOLO models utilizing the BT hyperparameter selection strategy vs. those employing the UD hyperparameter selection strategy, as evaluated through ASO testing. The notation <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> indicates the ASO test results for assessing whether model A is superior to model B based on the selected metrics. An increase in mAP50 (<inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mo stretchy="false">&#x2191;</mml:mo></mml:math></inline-formula> mAP50/F1) denotes the improvement in mAP50/F1 of model A compared to model B</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Model (A)</th>
<th>Baseline (B)</th>
<th>mAP50 ASO Test</th>
<th><inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mo stretchy="false">&#x2191;</mml:mo></mml:math></inline-formula> mAP50</th>
<th>F1 ASO Test</th>
<th><inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mo stretchy="false">&#x2191;</mml:mo></mml:math></inline-formula> F1</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv8s (BT)</td>
<td>YOLOv8s (UD)</td>
<td><inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.0933</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mo stretchy="false">&#x2191;</mml:mo><mml:mn>0.26</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo></mml:math></inline-formula> <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.5516</mml:mn></mml:math></inline-formula></td>
<td>&#x2013;</td>
</tr>
<tr>
<td>YOLOv8l (BT)</td>
<td>YOLOv8l (UD)</td>
<td><inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:mo stretchy="false">&#x2191;</mml:mo><mml:mn>0.84</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>B</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>A</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.5515</mml:mn></mml:math></inline-formula></td>
<td>&#x2013;</td>
</tr>
<tr>
<td>YOLOv10s (BT)</td>
<td>YOLOv10s (UD)</td>
<td><inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.4384</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mo stretchy="false">&#x2191;</mml:mo><mml:mn>0.21</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.4384</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mo stretchy="false">&#x2191;</mml:mo><mml:mn>0.002</mml:mn></mml:math></inline-formula></td>
</tr>
<tr>
<td>YOLOv10l (BT)</td>
<td>YOLOv10l (UD)</td>
<td><inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.0009</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mo stretchy="false">&#x2191;</mml:mo><mml:mn>0.63</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:msub><mml:mi>&#x03F5;</mml:mi><mml:mrow><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:mi>B</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.0009</mml:mn></mml:math></inline-formula></td>
<td><inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:mo stretchy="false">&#x2191;</mml:mo><mml:mn>0.005</mml:mn></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>For instance, YOLOv8s (BT) achieved an mAP50 of 79.53, outperforming YOLOv8s (MS) at 78.70 and YOLOv8s (UD) at 79.27. Similarly, YOLOv8l (BT) reached a mAP50 of 80.48, representing an improvement of 0.84 over YOLOv8l (UD) at 79.64. Larger models showed more significant gains, with YOLOv8l (BT) and YOLOv10l (BT) achieving improvements of 0.84 and 0.63, respectively, compared to their UD counterparts. Smaller models, such as YOLOv8s (BT) and YOLOv10s (BT), also demonstrated improvements, with mAP50 increases of 0.26 and 0.21, respectively, although these gains were less substantial. These results demonstrate the effectiveness of Bayesian tuning, particularly for optimizing larger models. Besides, the BT approach led to a trade-off between precision and recall. For example, in the case of YOLOv8l (BT) compared to YOLOv8l (UD), precision improved while recall slightly declined. Similarly, when comparing YOLOv10l (BT) with YOLOv10l (UD), precision decreased but recall increased. Despite these variations, the F1-score exhibited only marginal changes, remaining nearly constant across both the Bayesian tuning (BT) and uniform distribution (UD) hyperparameter selection strategies.</p>
<p><bold>Trade Off between Detection Speed and Accuracy</bold><bold>:</bold> As mentioned before, larger models (L variants) give better performance but come with a higher computational cost compared to smaller models (S variants). For instance, YOLOv8l (UD) achieved an mAP50 of 79.64, exceeding YOLOv8s (UD) at 79.27 by 0.37, while requiring 126.30 GFLOPs&#x2013;nearly six times the computational cost of YOLOv8s (UD) at 28.40 GFLOPs. This trade-off between performance and computational cost is a key consideration when selecting a model for real-world application deployment. To gain a deeper understanding of this trade-off, we assessed the performance metrics, inference speeds, model sizes, and GPU memory utilization of YOLOv8 and YOLOv10 models, as presented in <xref ref-type="table" rid="table-5">Table 5</xref>. The trained models were converted to the Open Neural Network Exchange (ONNX) format&#x2013;a standard for model deployment, then evaluated on an NVIDIA RTX 3090 GPU. As shown in <xref ref-type="table" rid="table-5">Table 5</xref>, larger models (YOLOv8s, YOLOv10s) achieve higher performance but experience a reduction of over 40% in inference speed compared to smaller models (YOLOv8l, YOLOv10l). Additionally, larger models require significantly more disk storage, which may pose challenges for deployment on edge devices with limited storage capacity. Notably, YOLOv10 models achieve marginally higher inference speeds than their YOLOv8 counterparts while also reducing GPU memory consumption. The choice of model depends on the specific application requirements, including detection accuracy and available computational resources.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Comparative analysis of performance, speed and resource usage of ONNX versions of YOLOv8 and YOLOv10 models. Frame per Second (FPS) represents the average number of frames processed per second. GPU memory usage and ONNX model size are measured in megabytes (mB)</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Model</th>
<th>mAP50 <sup>a</sup> (%)</th>
<th>FPS <sup>b</sup></th>
<th>GPU memory usage (mB)</th>
<th>ONNX model size (mB)</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv8s</td>
<td>79.53</td>
<td>68.38</td>
<td>865</td>
<td>43.67</td>
</tr>
<tr>
<td>YOLOv8l</td>
<td>80.48</td>
<td>46.23</td>
<td>1379</td>
<td>170.58</td>
</tr>
<tr>
<td>YOLOv10s</td>
<td>79.27</td>
<td>69.11</td>
<td>613</td>
<td>28.43</td>
</tr>
<tr>
<td>YOLOv10l</td>
<td>79.83</td>
<td>48.48</td>
<td>871</td>
<td>95.25</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-5fn1" fn-type="other">
<p>Note: <sup>a</sup>: The mAP50 values presented are derived from <xref ref-type="table" rid="table-2">Table 2</xref>, obtained from models optimized using Bayesian Tuning. <sup>b</sup>: FPS was quantified by executing ONNX models on the D-Fire dataset test set and computing the mean. FPS varies with hardware and software configurations. Measurements were conducted using an RTX 3090 GPU, Intel Core i9-12900K CPU, 64 GB RAM, on Windows 10 Pro 64-bit (21H2), with Python 3.9.12, PyTorch 2.1.0&#x002B;cu118, Ultralytics 8.2.74, ONNX 1.16.1, and ONNXRuntime-GPU 1.18.0.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p><bold>Generalization Capability</bold><bold>:</bold> To assess the generalization capability of our trained models, we evaluate them on a dataset with different characteristics. Specifically, we utilize YOLOv8s (BT) trained on the D-Fire dataset and test it on the Unmanned Aerial Vehicles (UAVs)-based forest fire database (UAVs-FFDB) [<xref ref-type="bibr" rid="ref-43">43</xref>]. The D-Fire dataset primarily consists of outdoor images captured from a fixed camera viewpoint, whereas the UAVs-FFDB dataset contains images acquired from Unmanned Aerial Vehicles (UAVs) in forested areas, exhibiting a distinct data distribution. The UAVs-FFDB comprises 15,560 images categorized into four classes: Fire Incident in Evening, Fire Incident in Pre-Evening, Forest Evening without Fire, and Forest Pre-Evening without Fire. Following [<xref ref-type="bibr" rid="ref-44">44</xref>], we allocate 10% of the dataset for testing, resulting in 389 images per class. A new test set is constructed by merging images from the fire-related classes into a unified &#x201D;Fire&#x201D; class and grouping the remaining classes into a &#x0201C;Non-Fire&#x201D; category, resulting in 778 images per class.</p>
<p>Inference is performed using YOLOv8s (BT) on the new test set. An image is classified as &#x02018;fire&#x2019; if at least one fire-class bounding box is detected; otherwise, it is categorized as &#x02018;non-fire.&#x2019; The accuracy, Fire Detection Rate (FDR), and F1-score are computed as in [<xref ref-type="bibr" rid="ref-44">44</xref>]. The results are presented in <xref ref-type="table" rid="table-6">Table 6</xref>. Our YOLOv8s (BT) achieves accuracy and FDR of approximately 90%, with an F1-score around 0.9. Although these metrics are lower than those reported in [<xref ref-type="bibr" rid="ref-44">44</xref>], they are nonetheless promising, given that our models were trained exclusively on the D-Fire dataset without exposure to the UAVs-FFDB. In contrast, the study in [<xref ref-type="bibr" rid="ref-44">44</xref>] utilized the UAVs-FFDB training set, comprising 80% of the total images. This highlights the model&#x2019;s ability to generalize for fire detection in previously unseen environments.</p>
<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Comparison of fire detection performance on the UAVs-FFDB dataset [<xref ref-type="bibr" rid="ref-43">43</xref>] between our YOLOv8s (BT) model, which was trained exclusively on the D-Fire dataset without exposure to UAVs-FFDB, and the AHMHCNN-mCBAM method [<xref ref-type="bibr" rid="ref-44">44</xref>], which was trained on the UAVs-FFDB training set</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Method</th>
<th>Accuracy (%)</th>
<th>Fire detection rate (%)</th>
<th>F1-score</th>
</tr>
</thead>
<tbody>
<tr>
<td>YOLOv8s (BT)&#x2013;ours</td>
<td>88</td>
<td>87.9</td>
<td>0.88</td>
</tr>
<tr>
<td>AHMHCNN-mCBAM [<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td>100</td>
<td>100</td>
<td>1</td>
</tr>
</tbody>
</table>
</table-wrap>
<p><bold>Detection Visualizations</bold><bold>:</bold> As shown in <xref ref-type="table" rid="table-2">Table 2</xref>, YOLOv8 models outperform their YOLOv10 counterparts. Consequently, we focus our visual analysis exclusively on YOLOv8s and YOLOv8l to evaluate their effectiveness in fire and smoke detection. Specifically, we present the detection results of these models on some challenging scenes selected from the D-Fire image test set, as shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. The first three rows display scenes that could be confused with fire or smoke; however, both YOLOv8s and YOLOv8l effectively ignore these, not predicting them as fire or smoke. The fourth row presents a low-resolution image of smoke situated far from the camera, which both models successfully detect, with YOLOv8l achieving a higher confidence score. The last two rows depict scenes containing smoke and multiple fire sources. The second last row indicates that YOLOv8l provides better predictions for fires. In the final row, YOLOv8s incorrectly identify the fire sources near a car as smoke, whereas YOLOv8l accurately detected these fire sources. However, both models mistakenly classified a distant yellow light, likely multiple incandescent streetlights from a city, as fire. These observations suggest that further improvements are needed to enhance the models&#x2019; performance across various scenarios.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Predictions of YOLOv8l on the D-Fire image test set, demonstrating its ability to ignore challenging scenes that could be confused with fire or smoke, such as sunsets and incandescent streetlights. Left column: the ground truth, middle column: YOLOv8s predictions; right column: YOLOv8l predictions</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-7.tif"/>
</fig>
<p><bold>Detection Failure Analysis</bold><bold>:</bold> As illustrated in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, our trained YOLO models occasionally produce incorrect predictions. To investigate the underlying causes of these errors, we analyze two trained models, YOLOv8l and YOLOv10l, using the D-Fire test set. We identify failure cases and visualize the results using FiftyOne [<xref ref-type="bibr" rid="ref-21">21</xref>]. Our analysis focuses on two primary failure types: false positives (FP) and false negatives (FN). False positives occur when the model incorrectly detects fire or smoke, while false negatives arise when the model fails to detect actual fire or smoke. Representative failure cases are depicted in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Representative failure cases of YOLOv8 and YOLOv10 models on the D-Fire dataset test set</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_63468-fig-8.tif"/>
</fig>
<p><italic>False Positive Analysis</italic>: FP cases may result from localization errors or environmental complexities. In FP Case 1, both YOLOv8l and YOLOv10l detect smoke, but the predicted bounding box is significantly smaller than the ground truth. This discrepancy arises because smoke can gradually disperse over a large area with fading intensity, making accurate boundary prediction challenging. Environmental factors also contribute to false positives. In FP Case 2, intense car headlights near an actual fire region mislead the model, causing incorrect predictions. Similarly, in FP Case 3, distant areas with a certain level of gray coloration are mistakenly identified as smoke.</p>
<p><italic>False Negative Analysis</italic>: FN cases often occur due to low image resolution or when the size of the fire/smoke is too small relative to the entire image. Again, complex environmental conditions can lead to false predictions. For example, in FN Case 2, two small smoke plumes are partially occluded by trees and blend into the background, making detection difficult. In FN Case 3, a large smoke area blends with the white sky, making it challenging for the model to detect.</p>
<p>To mitigate detection failures, potential strategies include enhancing the model&#x2019;s architecture to better process complex scenes and developing larger, more diverse, and challenging training datasets. Further details on these strategies are discussed in <xref ref-type="sec" rid="s5">Section 5</xref> as future research directions.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This study proposes a Bayesian hyperparameter optimization method for selecting optimal training parameters for YOLOv8 and YOLOv10 models, employing the Optuna library for effective hyperparameter search. The models, trained with their optimal hyperparameters on the large-scale D-Fire dataset, demonstrated improved performance compared to configurations using manually selected or default hyperparameters. Our analysis demonstrates that YOLOv8 models outperform their YOLOv10 counterparts in fire and smoke detection.</p>
<p>However, the proposed approach has certain limitations. First, while the BT method effectively enhances YOLO model performance, it demands substantial computational resources and extensive time to optimize hyperparameters, particularly when employing 5-fold cross-validation. Additionally, this study did not include an ablation analysis to evaluate the impact of individual hyperparameters on model performance due to constraints in computational resources and time. Such an analysis could offer valuable insights into hyperparameter significance and inform future optimization strategies. Second, since the underlying architecture of the YOLO models remains unmodified, performance is inherently constrained by architectural limitations, such as difficulty in detecting small objects and complex scenes. Nevertheless, a key strength of the proposed hyperparameter tuning technique is its universality, allowing it to be applied across various YOLO model architectures.</p>
<p>As presented in <xref ref-type="sec" rid="s4">Section 4</xref>, the best-performing YOLOv8 model (YOLOv8l) achieved an mAP50 of approximately 80%, indicating potential for further enhancement. Future improvements may be attained through architectural modifications and data augmentation, possibly in conjunction with hyperparameter tuning. Architectural enhancements could involve the integration of attention mechanisms and the design of specific layers to detect small objects, as suggested by [<xref ref-type="bibr" rid="ref-45">45</xref>]. Data augmentation is particularly promising, given that fire and smoke events are rare. Future research could explore the use of generative adversarial networks (GANs) [<xref ref-type="bibr" rid="ref-46">46</xref>] to produce synthetic fire and smoke images, as shown in [<xref ref-type="bibr" rid="ref-47">47</xref>], or incorporate adverse weather conditions&#x2013;such as haze, fog, and nighttime settings&#x2013;into the dataset, as demonstrated in [<xref ref-type="bibr" rid="ref-48">48</xref>], to increase its complexity and enhance model robustness.</p>
</sec>
</body>
<back>
<ack>
<p>We acknowledge the authors of the <ext-link ext-link-type="uri" xlink:href="https://github.com/gaiasd/DFireDataset">D-Fire Dataset</ext-link> for providing access to their dataset and for their assistance in clarifying dataset-related inquiries.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This research was supported by the MSIT (Ministry of Science and ICT), Republic of Korea, under the ITRC (Information Technology Research Center) Support Program (<italic>IITP-2024-RS-2022-00156354</italic>) supervised by the IITP (Institute for Information &#x0026; Communications Technology Planning &#x0026; Evaluation). This work was also supported by the Technology Development Program (<italic>RS-2023-00264489</italic>) funded by the Ministry of SMEs and Startups (MSS, Republic of Korea).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Study conception and design: Van-Ha Hoang; Data curation: Van-Ha Hoang; Analysis and interpretation of results: Van-Ha Hoang, Jong Weon Lee, Chun-Su Park; Funding acquisition: Chun-Su Park, Jong Weon Lee; Writing&#x2014;original draft preparation: Van-Ha Hoang; Writing&#x2014;review and editing: Jong Weon Lee, Chun-Su Park. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The data that support the findings of this study are openly available in <italic>gaiasd/DFireDataset: D-Fire: an image data set for fire and smoke detection</italic> at <ext-link ext-link-type="uri" xlink:href="https://github.com/gaiasd/DFireDataset">https://github.com/gaiasd/DFireDataset</ext-link> (accessed on 04 March 2025).</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="other"><article-title>Fire loss in the United States&#x2014;NFPA Research. [Internet]. [cited 2024 Oct 7]</article-title>. Available from: <ext-link ext-link-type="uri" xlink:href="https://www.nfpa.org/education-and-research/research/nfpa-research/fire-statistical-reports/fire-loss-in-the-united-states">https://www.nfpa.org/education-and-research/research/nfpa-research/fire-statistical-reports/fire-loss-in-the-united-states</ext-link>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="other"><article-title>Annual area burnt per wildfire vs. number of fires, 2024. [Internet]. [cited 2024 Oct 7]</article-title>. Available from: <ext-link ext-link-type="uri" xlink:href="https://ourworldindata.org/grapher/annual-area-burnt-per-wildfire-vs-number-of-fires?tab=table">https://ourworldindata.org/grapher/annual-area-burnt-per-wildfire-vs-number-of-fires?tab=table</ext-link>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gaur</surname> <given-names>A</given-names></string-name>, <string-name><surname>Singh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kulkarni</surname> <given-names>KS</given-names></string-name>, <string-name><surname>Lala</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kapoor</surname> <given-names>K</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Fire sensing technologies: a review</article-title>. <source>IEEE Sensors J</source>. <year>2019</year>;<volume>19</volume>(<issue>9</issue>):<fpage>3191</fpage>&#x2013;<lpage>202</lpage>. doi:<pub-id pub-id-type="doi">10.1109/JSEN.2019.2894665</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Foggia</surname> <given-names>P</given-names></string-name>, <string-name><surname>Saggese</surname> <given-names>A</given-names></string-name>, <string-name><surname>Vento</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Real-time fire detection for video-surveillance applications using a combination of experts based on color, shape, and motion</article-title>. <source>IEEE TRANS Circuits Syst Video Technol</source>. <year>2015</year>;<volume>25</volume>(<issue>9</issue>):<fpage>1545</fpage>&#x2013;<lpage>56</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TCSVT.2015.2392531</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Toulouse</surname> <given-names>T</given-names></string-name>, <string-name><surname>Rossi</surname> <given-names>L</given-names></string-name>, <string-name><surname>Celik</surname> <given-names>T</given-names></string-name>, <string-name><surname>Akhloufi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Automatic fire pixel detection using image processing: a comparative analysis of rule-based and machine learning-based methods</article-title>. <source>Signal Image Video Process</source>. <year>2016</year>;<volume>10</volume>:<fpage>647</fpage>&#x2013;<lpage>54</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11760-015-0789-x</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>LeCun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Deep learning</article-title>. <source>Nature</source>. <year>2015</year>;<volume>521</volume>(<issue>7553</issue>):<fpage>436</fpage>&#x2013;<lpage>44</lpage>. doi:<pub-id pub-id-type="doi">10.1038/nature14539</pub-id>; <pub-id pub-id-type="pmid">26017442</pub-id></mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>ZQ</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>P</given-names></string-name>, <string-name><surname>St</surname> <given-names>Xu</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Object detection with deep learning: a review</article-title>. <source>IEEE Trans Neural Netw Learn Syst</source>. <year>2019</year>;<volume>30</volume>(<issue>11</issue>):<fpage>3212</fpage>&#x2013;<lpage>32</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TNNLS.2018.2876865</pub-id>; <pub-id pub-id-type="pmid">30703038</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cheng</surname> <given-names>G</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xian</surname> <given-names>B</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Visual fire detection using deep learning: a survey</article-title>. <source>Neurocomputing</source>. <year>2024</year>;<volume>596</volume>:<fpage>127975</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2024.127975</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Redmon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Divvala</surname> <given-names>S</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name>, <string-name><surname>Farhadi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>You only look once: unified, real-time object detection</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <year>2016</year>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>. p. <fpage>779</fpage>&#x2013;<lpage>88</lpage>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Faster R-CNN: towards real-time object detection with region proposal networks</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2015</year>;<volume>28</volume>:<fpage>91</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Image fire detection algorithms based on convolutional neural networks</article-title>. <source>Case Stud Therm Eng</source>. <year>2020</year>;<volume>19</volume>:<fpage>100625</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.csite.2020.100625</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Jocher</surname> <given-names>G</given-names></string-name></person-group>. <article-title>YOLOv5 by Ultralytics</article-title>; <year>2020. [Internet]. [cited 2024 Oct 7]</year>. Available from: <ext-link ext-link-type="uri" xlink:href="https://github.com/ultralytics/yolov5">https://github.com/ultralytics/yolov5</ext-link>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Jocher</surname> <given-names>G</given-names></string-name>, <string-name><surname>Chaurasia</surname> <given-names>A</given-names></string-name>, <string-name><surname>Qiu</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Ultralytics YOLOv8</article-title>; <year>2023. [Internet]. [cited 2024 Oct 7]</year>. Available from: <ext-link ext-link-type="uri" xlink:href="https://github.com/ultralytics/ultralytics">https://github.com/ultralytics/ultralytics</ext-link>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>A</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>K</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Han</surname> <given-names>J</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>YOLOv10: real-time end-to-end object detection</article-title>. <comment>arXiv:2405.14458. 2024</comment>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>TY</given-names></string-name>, <string-name><surname>Maire</surname> <given-names>M</given-names></string-name>, <string-name><surname>Belongie</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hays</surname> <given-names>J</given-names></string-name>, <string-name><surname>Perona</surname> <given-names>P</given-names></string-name>, <string-name><surname>Ramanan</surname> <given-names>D</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Microsoft coco: common objects in context</article-title>. In: <conf-name>Computer Vision-ECCV 2014: 13th European Conference</conf-name>; <year>2014 Sep 6&#x2013;12</year>; <publisher-loc>Zurich, Switzerland</publisher-loc>. p. <fpage>740</fpage>&#x2013;<lpage>55</lpage>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>de Ven&#x00E2;ncio</surname> <given-names>PVA</given-names></string-name>, <string-name><surname>Lisboa</surname> <given-names>AC</given-names></string-name>, <string-name><surname>Barbosa</surname> <given-names>AV</given-names></string-name></person-group>. <article-title>An automatic fire detection system based on deep convolutional neural networks for low-power, resource-constrained devices</article-title>. <source>Neural Comput Appl</source>. <year>2022</year>;<volume>34</volume>(<issue>18</issue>):<fpage>15349</fpage>&#x2013;<lpage>68</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-022-07467-z</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ning</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Research on fire smoke detection algorithm based on improved YOLOv8</article-title>. <source>IEEE Access</source>. <year>2024</year>;<volume>12</volume>:<fpage>117354</fpage>&#x2013;<lpage>62</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2024.3448608</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mamadaliev</surname> <given-names>D</given-names></string-name>, <string-name><surname>Touko</surname> <given-names>PLM</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>SC</given-names></string-name></person-group>. <article-title>ESFD-YOLOv8n: early smoke and fire detection method based on an improved YOLOv8n model</article-title>. <source>Fire</source>. <year>2024</year>;<volume>7</volume>(<issue>9</issue>):<fpage>303</fpage>. doi:<pub-id pub-id-type="doi">10.3390/fire7090303</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>de Ven&#x00E2;ncio</surname> <given-names>PVA</given-names></string-name>, <string-name><surname>Campos</surname> <given-names>RJ</given-names></string-name>, <string-name><surname>Rezende</surname> <given-names>TM</given-names></string-name>, <string-name><surname>Lisboa</surname> <given-names>AC</given-names></string-name>, <string-name><surname>Barbosa</surname> <given-names>AV</given-names></string-name></person-group>. <article-title>A hybrid method for fire detection based on spatial and temporal patterns</article-title>. <source>Neural Comput Appl</source>. <year>2023</year>;<volume>35</volume>(<issue>13</issue>):<fpage>9349</fpage>&#x2013;<lpage>61</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-023-08260-2</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Akiba</surname> <given-names>T</given-names></string-name>, <string-name><surname>Sano</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yanase</surname> <given-names>T</given-names></string-name>, <string-name><surname>Ohta</surname> <given-names>T</given-names></string-name>, <string-name><surname>Koyama</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Optuna: a Next-generation Hyperparameter Optimization Framework</article-title>. In: <conf-name>Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining</conf-name>; <publisher-loc>Anchorage, AK, USA</publisher-loc>. <year>2019</year>. p. <fpage>2623</fpage>&#x2013;<lpage>26</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Moore</surname> <given-names>BE</given-names></string-name>, <string-name><surname>Corso</surname> <given-names>JJ</given-names></string-name></person-group>. <article-title>FiftyOne. GitHub Note</article-title>; <year>2020. [Internet]. [cited 2024 Oct 7]</year>. Available from: <ext-link ext-link-type="uri" xlink:href="https://githubcom/voxel51/fiftyone">https://githubcom/voxel51/fiftyone</ext-link>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Talaat</surname> <given-names>FM</given-names></string-name>, <string-name><surname>ZainEldin</surname> <given-names>H</given-names></string-name></person-group>. <article-title>An improved fire detection approach based on YOLO-v8 for smart cities</article-title>. <source>Neural Comput Appl</source>. <year>2023</year>;<volume>35</volume>(<issue>28</issue>):<fpage>20939</fpage>&#x2013;<lpage>54</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-023-08809-1</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Huynh</surname> <given-names>TT</given-names></string-name>, <string-name><surname>Nguyen</surname> <given-names>HT</given-names></string-name>, <string-name><surname>Phu</surname> <given-names>DT</given-names></string-name></person-group>. <article-title>Enhancing fire detection performance based on fine-tuned YOLOv10</article-title>. <source>Comput Mater Contin</source>. <year>2024</year>;<volume>81</volume>(<issue>2</issue>):<fpage>2281</fpage>&#x2013;<lpage>98</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2024.057954</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>F</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>J</given-names></string-name></person-group>. <article-title>A survey of convolutional neural networks: analysis, applications, and prospects</article-title>. <source>IEEE Trans Neural Netw Learn Syst</source>. <year>2021</year>;<volume>33</volume>(<issue>12</issue>):<fpage>6999</fpage>&#x2013;<lpage>7019</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TNNLS.2021.3084827</pub-id>; <pub-id pub-id-type="pmid">34111009</pub-id></mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Yu</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Hyper-parameter optimization: a review of algorithms and applications</article-title>. <comment>arXiv:2003.05689. 2020</comment>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shekhar</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bansode</surname> <given-names>A</given-names></string-name>, <string-name><surname>Salim</surname> <given-names>A</given-names></string-name></person-group>. <article-title>A comparative study of hyper-parameter optimization tools</article-title>. In: <conf-name>2021 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE)</conf-name>; <publisher-loc>Brisbane, QLD, Australia</publisher-loc>. <year>2021</year>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bergstra</surname> <given-names>J</given-names></string-name>, <string-name><surname>Bardenet</surname> <given-names>R</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>K&#x00E9;gl</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Algorithms for hyper-parameter optimization</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2011</year>;<volume>24</volume>:<fpage>2546</fpage>&#x2013;<lpage>54</lpage>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Akhmedov</surname> <given-names>F</given-names></string-name>, <string-name><surname>Nasimov</surname> <given-names>R</given-names></string-name>, <string-name><surname>Abdusalomov</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Dehazing algorithm integration with YOLO-v10 for ship fire detection</article-title>. <source>Fire</source>. <year>2024</year>;<volume>7</volume>(<issue>9</issue>):<fpage>332</fpage>. doi:<pub-id pub-id-type="doi">10.3390/fire7090332</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ramos</surname> <given-names>L</given-names></string-name>, <string-name><surname>Casas</surname> <given-names>E</given-names></string-name>, <string-name><surname>Bendek</surname> <given-names>E</given-names></string-name>, <string-name><surname>Romero</surname> <given-names>C</given-names></string-name>, <string-name><surname>Rivas-Echeverr&#x00ED;a</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Hyperparameter optimization of YOLOv8 for smoke and wildfire detection: implications for agricultural and environmental safety</article-title>. <source>Artif Intell Agric</source>. <year>2024</year>;<volume>12</volume>:<fpage>109</fpage>&#x2013;<lpage>26</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aiia.2024.05.003</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Casas</surname> <given-names>E</given-names></string-name>, <string-name><surname>Ramos</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bendek</surname> <given-names>E</given-names></string-name>, <string-name><surname>Rivas-Echeverr&#x00ED;a</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Assessing the effectiveness of YOLO architectures for smoke and wildfire detection</article-title>. <source>IEEE Access</source>. <year>2023</year>;<volume>11</volume>:<fpage>96554</fpage>&#x2013;<lpage>83</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2023.3312217</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Feng</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>K</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>D</given-names></string-name>, <string-name><surname>Jordan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zha</surname> <given-names>ZJ</given-names></string-name></person-group>. <article-title>Rank diminishing in deep neural networks</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2022</year>;<volume>35</volume>:<fpage>33054</fpage>&#x2013;<lpage>65</lpage>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Padilla</surname> <given-names>R</given-names></string-name>, <string-name><surname>Netto</surname> <given-names>SL</given-names></string-name>, <string-name><surname>Da Silva</surname> <given-names>EA</given-names></string-name></person-group>. <article-title>A survey on performance metrics for object-detection algorithms</article-title>. In: <conf-name>2020 International Conference on Systems, Signals and Image Processing (IWSSIP)</conf-name>. <publisher-loc>Rio de Janeiro, Brazil</publisher-loc>: <publisher-name>IEEE</publisher-name>; <year>2020</year>. p. <fpage>237</fpage>&#x2013;<lpage>42</lpage>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Watanabe</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Tree-structured parzen estimator: understanding its algorithm components and their roles for better empirical performance</article-title>. <comment>arXiv:2304.11127. 2023</comment>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sutskever</surname> <given-names>I</given-names></string-name>, <string-name><surname>Martens</surname> <given-names>J</given-names></string-name>, <string-name><surname>Dahl</surname> <given-names>G</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name></person-group>. <article-title>On the importance of initialization and momentum in deep learning</article-title>. In: <conf-name>International Conference on Machine Learning</conf-name>. <publisher-loc>Vancouver, BC, Canada</publisher-loc>: <publisher-name>PMLR</publisher-name>; <year>2013</year>. p. <fpage>1139</fpage>&#x2013;<lpage>47</lpage>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>P</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>C</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>C</given-names></string-name>, <string-name><surname>Hoi</surname> <given-names>SCH</given-names></string-name>, <string-name><surname>E.</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Towards theoretically understanding why sgd generalizes better than adam in deep learning</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>21285</fpage>&#x2013;<lpage>96</lpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Goodfellow</surname> <given-names>I</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Courville</surname> <given-names>A</given-names></string-name></person-group>. <source>Deep learning</source>. <publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name>; <year>2016</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Smith</surname> <given-names>LN</given-names></string-name></person-group>. <article-title>A disciplined approach to neural network hyper-parameters: part 1&#x2013;learning rate, batch size, momentum, and weight decay</article-title>. <comment>arXiv:1803.09820. 2018</comment>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Fire-RPG: an urban fire detection network providing warnings in advance</article-title>. <source>Fire</source>. <year>2024</year>;<volume>7</volume>(<issue>7</issue>):<fpage>214</fpage>. doi:<pub-id pub-id-type="doi">10.3390/fire7070214</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kim</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jang</surname> <given-names>Is</given-names></string-name>, <string-name><surname>Ko</surname> <given-names>BC</given-names></string-name></person-group>. <article-title>Domain-free fire detection using the spatial-temporal attention transform of the YOLO backbone</article-title>. <source>Pattern Anal Appl</source>. <year>2024</year>;<volume>27</volume>(<issue>2</issue>):<fpage>45</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s10044-024-01267-y</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gon&#x00E7;alves</surname> <given-names>LAO</given-names></string-name>, <string-name><surname>Ghali</surname> <given-names>R</given-names></string-name>, <string-name><surname>Akhloufi</surname> <given-names>MA</given-names></string-name></person-group>. <article-title>YOLO-based models for smoke and wildfire detection in ground and aerial images</article-title>. <source>Fire</source>. <year>2024</year>;<volume>7</volume>(<issue>4</issue>):<fpage>140</fpage>. doi:<pub-id pub-id-type="doi">10.3390/fire7040140</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Dror</surname> <given-names>R</given-names></string-name>, <string-name><surname>Shlomov</surname> <given-names>S</given-names></string-name>, <string-name><surname>Reichart</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Deep dominance-how to properly compare deep neural models</article-title>. In: <conf-name>Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics</conf-name>; <year>2019</year>; <publisher-loc>Florence, Italy</publisher-loc>. p. <fpage>2773</fpage>&#x2013;<lpage>85</lpage>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Ulmer</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hardmeier</surname> <given-names>C</given-names></string-name>, <string-name><surname>Frellsen</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Deep-significance-easy and meaningful statistical significance testing in the age of neural networks</article-title>. <comment>arXiv:2204.06815. 2022</comment>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mowla</surname> <given-names>MN</given-names></string-name>, <string-name><surname>Asadi</surname> <given-names>D</given-names></string-name>, <string-name><surname>Tekeoglu</surname> <given-names>KN</given-names></string-name>, <string-name><surname>Masum</surname> <given-names>S</given-names></string-name>, <string-name><surname>Rabie</surname> <given-names>K</given-names></string-name></person-group>. <article-title>UAVs-FFDB: a high-resolution dataset for advancing forest fire detection and monitoring using unmanned aerial vehicles (UAVs)</article-title>. <source>Data Brief</source>. <year>2024</year>;<volume>55</volume>:<fpage>110706</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.dib.2024.110706</pub-id>; <pub-id pub-id-type="pmid">39076831</pub-id></mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mowla</surname> <given-names>MN</given-names></string-name>, <string-name><surname>Asadi</surname> <given-names>D</given-names></string-name>, <string-name><surname>Masum</surname> <given-names>S</given-names></string-name>, <string-name><surname>Rabie</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Adaptive hierarchical multi-headed convolutional neural network with modified convolutional block attention for aerial forest fire detection</article-title>. <source>IEEE Access</source>. <year>2025</year>;<volume>13</volume>:<fpage>3412</fpage>&#x2013;<lpage>33</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2024.3524320</pub-id>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>W</given-names></string-name>, <string-name><surname>Cai</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>C</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>H</given-names></string-name></person-group>. <article-title>AD-YOLO: a real-time yolo network with swin transformer and attention mechanism for airport scene detection</article-title>. <source>IEEE Trans Instrum Meas</source>. <year>2024</year>;<volume>73</volume>:<fpage>5036112</fpage>. doi:<pub-id pub-id-type="doi">10.1109/TIM.2024.3472805</pub-id>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Goodfellow</surname> <given-names>I</given-names></string-name>, <string-name><surname>Pouget-Abadie</surname> <given-names>J</given-names></string-name>, <string-name><surname>Mirza</surname> <given-names>M</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Warde-Farley</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ozair</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Generative adversarial nets</article-title>. In: <source>Advances in neural information processing systems 27</source>. <publisher-loc>Cambridge, MA, USA</publisher-loc>: <publisher-name>MIT Press</publisher-name>; <year>2014</year>. p. <fpage>2672</fpage>&#x2013;<lpage>80</lpage>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nguyen</surname> <given-names>QD</given-names></string-name>, <string-name><surname>Mai</surname> <given-names>ND</given-names></string-name>, <string-name><surname>Nguyen</surname> <given-names>VH</given-names></string-name>, <string-name><surname>Kakani</surname> <given-names>V</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>H</given-names></string-name></person-group>. <article-title>SynFAGnet: a fully automated generative network for realistic fire image generation</article-title>. <source>Fire Technol</source>. <year>2024</year>;<volume>60</volume>(<issue>3</issue>):<fpage>1643</fpage>&#x2013;<lpage>65</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10694-023-01540-2</pub-id>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yar</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ullah</surname> <given-names>W</given-names></string-name>, <string-name><surname>Khan</surname> <given-names>ZA</given-names></string-name>, <string-name><surname>Baik</surname> <given-names>SW</given-names></string-name></person-group>. <article-title>An effective attention-based CNN model for fire detection in adverse weather conditions</article-title>. <source>ISPRS J Photogramm Remote Sens</source>. <year>2023</year>;<volume>206</volume>:<fpage>335</fpage>&#x2013;<lpage>46</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.isprsjprs.2023.10.019</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>