<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMES</journal-id>
<journal-id journal-id-type="nlm-ta">CMES</journal-id>
<journal-id journal-id-type="publisher-id">CMES</journal-id>
<journal-title-group>
<journal-title>Computer Modeling in Engineering &#x0026; Sciences</journal-title>
</journal-title-group>
<issn pub-type="epub">1526-1506</issn>
<issn pub-type="ppub">1526-1492</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">73530</article-id>
<article-id pub-id-type="doi">10.32604/cmes.2025.073530</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Side-Scan Sonar Image Synthesis Based on CycleGAN with 3D Models and Shadow Integration</article-title>
<alt-title alt-title-type="left-running-head">Side-Scan Sonar Image Synthesis Based on CycleGAN with 3D Models and Shadow Integration</alt-title>
<alt-title alt-title-type="right-running-head">Side-Scan Sonar Image Synthesis Based on CycleGAN with 3D Models and Shadow Integration</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Kim</surname><given-names>Byeongjun</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="author-notes" rid="afn1">#</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Lee</surname><given-names>Seung-Hun</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><xref ref-type="author-notes" rid="afn1">#</xref></contrib>
<contrib id="author-3" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Chang</surname><given-names>Won-Du</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>chang@pknu.ac.kr</email></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Artificial Intelligence Convergence, Pukyong National University</institution>, <addr-line>Busan, 48513</addr-line>, <country>Republic of Korea</country></aff>
<aff id="aff-2"><label>2</label><institution>Marine Domain Research Division, Korea Institute of Ocean Science and Technology (KIOST)</institution>, <addr-line>Busan, 49111</addr-line>, <country>Republic of Korea</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Won-Du Chang. Email: <email>chang@pknu.ac.kr</email></corresp>
<fn id="afn1">
<p><sup>#</sup>These authors contributed equally to this work</p>
</fn>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>26</day><month>11</month><year>2025</year>
</pub-date>
<volume>145</volume>
<issue>2</issue>
<fpage>1237</fpage>
<lpage>1252</lpage>
<history>
<date date-type="received">
<day>19</day>
<month>09</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>10</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMES_73530.pdf"></self-uri>
<abstract>
<p>Side-scan sonar (SSS) is essential for acquiring high-resolution seafloor images over large areas, facilitating the identification of subsea objects. However, military security restrictions and the scarcity of subsea targets limit the availability of SSS data, posing challenges for Automatic Target Recognition (ATR) research. This paper presents an approach that uses Cycle-Consistent Generative Adversarial Networks (CycleGAN) to augment SSS images of key subsea objects, such as shipwrecks, aircraft, and drowning victims. The process begins by constructing 3D models to generate rendered images with realistic shadows from multiple angles. To enhance image quality, a shadow extractor and shadow region loss function are introduced to ensure consistent shadow representation. Additionally, a multi-resolution learning structure enables effective training, even with limited data availability. The experimental results show that the generated data improved object detection accuracy when they were used for training and demonstrated the ability to generate clear shadow and background regions with stability.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Side-scan sonar (SSS)</kwd>
<kwd>cycle-consistent generative adversarial networks (CycleGAN)</kwd>
<kwd>automatic target recognition (ATR)</kwd>
<kwd>sonar imaging</kwd>
<kwd>sample augmentation</kwd>
<kwd>image simulation</kwd>
<kwd>image translation</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Research Foundation of Korea</funding-source>
<award-id>RS-2024-00334159</award-id>
</award-group>
<award-group id="awg2">
<funding-source>Korea Institute of Ocean Science and Technology</funding-source>
<award-id>PEA0332</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Water, covering 71% of the Earth&#x2019;s surface, presents unique challenges for exploration since electromagnetic waves, including lasers, are heavily absorbed, limiting their effective range [<xref ref-type="bibr" rid="ref-1">1</xref>]. While airborne bathymetric Light Detection and Ranging (LiDAR) systems offer some capability for marine surveys, they struggle with limited resolution and shallow depth penetration [<xref ref-type="bibr" rid="ref-2">2</xref>]. As a result, sound waves are preferred for underwater applications because they can travel farther and are ideal for tasks such as detailed seafloor mapping, imaging, and communication [<xref ref-type="bibr" rid="ref-3">3</xref>]. Side-scan sonar (SSS), which utilizes sound waves, is a powerful tool for quickly acquiring high-resolution images of the seafloor across large areas, enabling the detection of subsea objects. Although more advanced technologies such as synthetic aperture sonar (SAS) can provide higher resolution and broader coverage [<xref ref-type="bibr" rid="ref-4">4</xref>], their deployment is constrained by high cost and limited accessibility. Consequently, side-scan sonar (SSS) remains the predominant modality for efficient wide-area seafloor imaging and constitutes the focus of this study. However, despite its practicality, SSS data remain difficult to acquire at scale due to military security, cultural heritage restrictions, and the scarcity of underwater objects [<xref ref-type="bibr" rid="ref-5">5</xref>]. Additionally, the accuracy of SSS image interpretation can vary depending on the expertise of surveyors and environmental conditions. In some cases, divers or remotely operated vehicles (ROVs) are necessary to capture optical images for precise identification [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>]. Automating the analysis of SSS images through artificial intelligence (AI) offers the potential for rapid and expert-level object detection or recognition. This, however, relies on training AI models with large datasets, which are often difficult to acquire.</p>
<p>Generative techniques, such as generative adversarial networks (GANs), offer a practical way to address data scarcity by creating realistic and diverse datasets that support effective model training [<xref ref-type="bibr" rid="ref-8">8</xref>]. These techniques ensure recognition models are well-trained to identify underwater objects, even with limited real-world SSS data. Karjalainen et al. (2019) validated an automatic target recognition (ATR) system using GANs [<xref ref-type="bibr" rid="ref-9">9</xref>], while Reed et al. (2019) synthesized SAS images by combining GANs with an optical renderer [<xref ref-type="bibr" rid="ref-10">10</xref>]. Jiang et al. (2020) developed a GAN-based method for synthesizing multi-frequency SSS images [<xref ref-type="bibr" rid="ref-11">11</xref>], and Tang et al. (2023) leveraged CycleGAN for augmenting SSS image samples [<xref ref-type="bibr" rid="ref-12">12</xref>].</p>
<p>This study presents an SSS image synthesis method based on CycleGAN, with a focus on accurately incorporating shadow characteristics to enhance the quality of generated images. The shadow characteristics in SSS images provide crucial 3D cues about the height and shape of objects [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>]. These cues are essential for accurate analysis of SSS images, helping to interpret spatial information that might otherwise be lost. Despite the importance of shadows, previous research on synthetic SSS images has not thoroughly addressed how to generate them realistically. To bridge this gap, we employed 3D models to generate SSS images with shadows from multiple angles. A shadow extractor and shadow region loss function were integrated into the CycleGAN framework to ensure correct shadow generation. Additionally, a multi-resolution learning structure was incorporated to facilitate effective training with limited data. Compared to existing approaches that neglect shadow information, the proposed method incorporates accurate shadow characteristics into the CycleGAN framework to enhance the quality of generated SSS images. By leveraging 3D models and integrating a shadow extractor and shadow region loss function, the framework ensures realistic shadow generation, capturing essential spatial details. This improves the model&#x2019;s ability to generate high-quality images and enhances the effectiveness of recognition models, leading to more reliable underwater object detection even with limited training data.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Research Methodology</title>
<sec id="s2_1">
<label>2.1</label>
<title>Data</title>
<p>In this study, style transformation based on CycleGAN was performed to generate SSS images, utilizing datasets from two domains: real-world SSS images and synthetic images for style transformation. The SeabedObjects-KLSG dataset, published by Huo et al., is a publicly available collection of 1190 underwater SSS images gathered over 10 years [<xref ref-type="bibr" rid="ref-15">15</xref>]. These images are categorized into five groups&#x2014;shipwrecks, aircraft, drowning victims, mines, and seafloor&#x2014;to support automatic underwater object detection research. For this study, 402 images from three categories&#x2014;334 shipwrecks, 34 aircraft, and 34 drowning victims&#x2014;were selected, while other categories such as mines and seafloor were excluded. Mines were excluded because their specifications vary widely across different types, and the shadows they cast are generally small, making them less relevant to the objectives of this study. The seafloor category was also omitted, since the focus of this study is on object-level representation rather than background modeling. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> presents sample images from the SeabedObjects-KLSG dataset, showcasing real-world examples of SSS imagery.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Samples of seabedObjects-KLSG dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-1.tif"/>
</fig>
<p>Beyond the SeabedObjects-KLSG dataset, the availability of public SSS datasets containing object instances is extremely limited. Due to security restrictions and data sensitivity, most existing studies have relied on private datasets, which makes reproducibility difficult. Moreover, many works have primarily focused on mine or mine-like objects [<xref ref-type="bibr" rid="ref-16">16</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>], while comparatively little attention has been given to objects with more complex structural features. The SeabedObjects-KLSG dataset remains one of the few publicly accessible resources and has therefore been widely adopted in related research [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-20">20</xref>]. Other available datasets mainly contain categories unrelated to this study, including glaciers and walls [<xref ref-type="bibr" rid="ref-21">21</xref>], pipes, mounds, and platforms [<xref ref-type="bibr" rid="ref-22">22</xref>], walls [<xref ref-type="bibr" rid="ref-23">23</xref>], pipelines [<xref ref-type="bibr" rid="ref-24">24</xref>], and seagrass, rocks, and sand [<xref ref-type="bibr" rid="ref-25">25</xref>]. Consequently, the SeabedObjects-KLSG dataset was selected as the most suitable basis for the experiments in this work.</p>
<p>To complement real-world data, a rendering image dataset was created using Blender, a 3D computer graphics software. This dataset includes 3D models of ships, aircraft, and human figures, transformed through rotation, deformation, and other adjustments. Camera angles were varied to capture different perspectives, while light sources were altered to generate diverse shadow effects. The rendering dataset comprises 117 shipwreck images, 104 aircraft images, and 100 drowning victim images. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> illustrates examples from this dataset, demonstrating the variety achieved through controlled 3D rendering.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Samples of rendering image dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-2.tif"/>
</fig>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Network Structure</title>
<p>This study implements a style transformation model based on CycleGAN to generate SSS images from rendered images of various 3D models. A limitation of the traditional CycleGAN structure is that, with a small amount of data, the generated images may not sufficiently reflect the characteristics of real SSS images. To address this issue, we introduced a multi-resolution training structure and a shadow area loss function. The multi-resolution training method enables the model to simultaneously learn from images of varying resolutions, allowing the network to learn diverse features effectively. The shadow area loss function compares the shadow regions extracted from the original and generated images, training the network to preserve the original object&#x2019;s shadow shape while accurately reflecting the shadow characteristics of SSS images. The network structure used for generating SSS images based on CycleGAN in this study is presented in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Structure for the SSS image domain for the proposed network</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-3.tif"/>
</fig>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>CycleGAN</title>
<p>CycleGAN is a GAN-based model designed to learn image translation between two different domains. It is composed of two generators and two discriminators. These two neural networks, the generators, and discriminators, interact with each other, progressively learning to generate and discriminate realistic images. A key advantage of CycleGAN is its ability to learn image translation between two domains through unsupervised learning without requiring a one-to-one matched dataset. Additionally, cycle consistency helps minimize distortions and information loss during the translation process, enabling more consistent and reliable image transformations [<xref ref-type="bibr" rid="ref-26">26</xref>]. Moreover, the lack of paired ground-truth supervision does not guarantee the preservation of high-frequency structures (e.g., edges and shadow boundaries). Adversarial loss primarily enforces distribution-level realism, while cycle-consistency loss emphasizes reversibility of content rather than exact edge fidelity; as a result, generated images may not consistently retain fine details.</p>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates the training structure for the domain <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow></mml:math></inline-formula> (SSS images) in this study. The generator <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> transforms images from domain <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>r</mml:mi></mml:math></inline-formula> (rendered images) into images in domain <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow></mml:math></inline-formula>, while the generator <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> converts images from domain <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow></mml:math></inline-formula> back into images in domain <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>r</mml:mi></mml:math></inline-formula>. The discriminator <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> determines whether the generated images belong to the real domain <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:mtext>s</mml:mtext></mml:mrow></mml:math></inline-formula>, and the discriminator <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> determines whether the generated images belong to the real domain <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>r</mml:mi></mml:math></inline-formula>. Unlike the conventional CycleGAN, the proposed model incorporates additional components tailored to SSS imagery: a k-means&#x2013;based Shadow Extractor that highlights shadow regions, and a Shadow Region Loss that enforces the preservation of these regions during translation. Furthermore, multi-resolution training enables the model to capture both global structure and fine details more effectively. These modifications allow the framework not only to perform domain translation but also to retain the critical high-frequency cues, such as edges and shadow boundaries, which are essential for reliable analysis of SSS data.</p>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>Multi-Resolution Training Structure</title>
<p>Due to the limited number of available SSS images, the conventional CycleGAN structure faced limitations in generating SSS images. To address this limitation, this study proposes a multi-resolution training structure. The multi-resolution learning structure refers to a structure in which the generator and discriminator are trained at multiple resolutions for a single image, allowing for more diverse and efficient learning [<xref ref-type="bibr" rid="ref-27">27</xref>]. This study conducts training at four resolutions: 256 &#x00D7; 256, 128 &#x00D7; 128, 64 &#x00D7; 64, and 32 &#x00D7; 32. <xref ref-type="fig" rid="fig-4">Fig. 4</xref> provides a detailed overview of the generator and discriminator structure used for multi-resolution training.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Detailed structure of constructors and discriminators</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-4.tif"/>
</fig>
<p>The generator is based on the U-net structure employed in the traditional CycleGAN generator, with an added feature that generates images at the four specified resolutions from each decoder layer. The four generated images at different resolutions are passed to the discriminator, which extracts features from each resolution and then combines them to produce a decision value. The real images are also converted into four resolutions to match the discriminator&#x2019;s input format. The output of the discriminator adopts a PatchGAN structure, where the image is divided into small patches, and each patch is evaluated, resulting in an output of (1, 30, 30) shape. This enables the model to learn more detailed and localized information across the image [<xref ref-type="bibr" rid="ref-28">28</xref>].</p>
</sec>
<sec id="s2_5">
<label>2.5</label>
<title>Loss Functions Used for Training</title>
<p>In this study, four loss functions&#x2014;adversarial loss, cycle consistency loss, identity loss, and shadow area loss&#x2014;are used to train the model.</p>
<sec id="s2_5_1">
<label>2.5.1</label>
<title>Adversarial Loss</title>
<p>Adversarial loss is the fundamental loss function used in GANs and plays a role in training the generator to produce images indistinguishable from the target domain images. In traditional GANs, the discriminator learns to output 1 for real data and 0 for fake data. However, in this study, we employ the Least Squares Generative Adversarial Network (LSGAN) loss function, to ensure more stable training and higher-quality image generation [<xref ref-type="bibr" rid="ref-29">29</xref>]. The discriminator is trained to classify real images as genuine and generated images as fake. The formulation is as follows:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mi>S</mml:mi><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The generator is trained to ensure the discriminator classifies the generated images as real. Here, <italic>s</italic> represents the SSS image domain, and <italic>r</italic> denotes the rendered image domain. The formulation is as follows:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mi>S</mml:mi><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>G</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>r</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>G</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The Adversarial Loss is defined by <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>. Since CycleGAN consists of two generators and two discriminators, the loss is defined for each domain as follows:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mi>S</mml:mi><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>G</mml:mi><mml:mo>,</mml:mo><mml:mi>D</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>D</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mi>S</mml:mi><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>D</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>G</mml:mi></mml:mrow></mml:munder><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mi>L</mml:mi><mml:mi>S</mml:mi><mml:mi>G</mml:mi><mml:mi>A</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>G</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
<sec id="s2_5_2">
<label>2.5.2</label>
<title>Cycle Consistency Loss</title>
<p>Cycle Consistency Loss ensures that a transformed image remains similar to the original image when it is converted back to its original domain. This criterion, when applied to each domain, is represented as follows:
<disp-formula id="ueqn-4"><mml:math id="mml-ueqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>r</mml:mi><mml:mo stretchy="false">&#x21D2;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x21D2;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2248;</mml:mo><mml:mi>r</mml:mi><mml:mo>,</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>s</mml:mi><mml:mo stretchy="false">&#x21D2;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo stretchy="false">&#x21D2;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2248;</mml:mo><mml:mi>s</mml:mi></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>This minimizes distortions or information loss that may arise during the transformation process, resulting in more consistent and reliable image translations. Cycle Consistency Loss is defined as follows:
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2225;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>r</mml:mi><mml:msub><mml:mo>&#x2225;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2225;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>s</mml:mi><mml:msub><mml:mo>&#x2225;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
<sec id="s2_5_3">
<label>2.5.3</label>
<title>Identity Loss</title>
<p>Identity Loss ensures the generator does not introduce unnecessary distortions or alterations to the input image. Identity Loss is defined as follows:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2225;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>r</mml:mi><mml:msub><mml:mo>&#x2225;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2225;</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>s</mml:mi><mml:msub><mml:mo>&#x2225;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
</sec>
<sec id="s2_5_4">
<label>2.5.4</label>
<title>Shadow Area Loss</title>
<p>Shadow Area Loss ensures that the generator preserves the shadow shapes of objects in the input image and accurately reflects the overall shadow characteristics of SSS images. A shadow extractor based on the K-Means algorithm is utilized to extract the shadow areas from both the input and generated images for comparison. Based on empirical results, the value of K is set to 8 in this study. The Shadow Area Loss is defined as follows, where SE represents the shadow extractor:
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>h</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>r</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2225;</mml:mo><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>r</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mo>&#x2225;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">E</mml:mi></mml:mrow><mml:mrow><mml:mi>s</mml:mi><mml:mo>&#x223C;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>[</mml:mo><mml:mo>&#x2225;</mml:mo><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:msub><mml:mo>&#x2225;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The overall loss function of the proposed model is formulated as follows, building upon the losses in <xref ref-type="disp-formula" rid="eqn-3">Eqs. (3)</xref> and <xref ref-type="disp-formula" rid="eqn-5">(5)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-7">(7)</xref>:</p>
<p><disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mi>t</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo></mml:mtd><mml:mtd><mml:mspace width="thinmathspace" /><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>s</mml:mi><mml:mi>g</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mi>s</mml:mi><mml:mi>g</mml:mi><mml:mi>a</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>c</mml:mi><mml:mi>y</mml:mi><mml:mi>c</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>i</mml:mi><mml:mi>t</mml:mi><mml:mi>y</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>s</mml:mi><mml:mi>h</mml:mi><mml:mi>a</mml:mi><mml:mi>d</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>S</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>S</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> are the loss weights for cycle-consistency, identity, and shadow, respectively.</p>
</sec>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Experimental Results</title>
<sec id="s3_1">
<label>3.1</label>
<title>SSS Image Generation</title>
<p>This study conducted experiments using both the SSS and rendered image datasets. The SSS image dataset consisted of 403 images, including 334 shipwrecks, 34 aircraft, and 35 drowning victims. The Blender-generated image dataset comprised 321 images, 117 shipwrecks, 104 aircraft, and 100 drowning victims.</p>
<p>The model was trained for 1000 epochs, and data augmentation techniques such as CenterCrop, HorizontalFlip, and VerticalFlip were applied. CenterCrop is a technique that crops the central part of the image to adjust its size, while HorizontalFlip and VerticalFlip flip the image horizontally and vertically, respectively. The input data was normalized with a mean of 0.5 and a standard deviation of 0.5. The learning rate was set to 0.0001, and optimization was performed using Adam with &#x03B2; &#x003D; (0.5, 0.999), consistent with the settings reported in the original CycleGAN study.</p>
<p>Weighted loss functions were applied with the following values: <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>10</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.4</mml:mn></mml:math></inline-formula>, and <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0.6</mml:mn></mml:math></inline-formula>. The relatively large weight of &#x03BB;<sub>1</sub> enforced the preservation of structural information during domain translation, thereby mitigating mode collapse as well as geometric and topological distortions. The shadow weight &#x03BB;<sub>3</sub> emphasized the retention of shadow regions, a critical feature of sonar images, whereas &#x03BB;<sub>2</sub> prevented unnecessary modifications to samples that already belong to the target domain. This configuration preserved the structural integrity of the original images while maintaining shadow fidelity and suppressing undesired transformations. The weights were empirically determined through repeated experimentation and refined based on expert feedback from sonar specialists to achieve results that closely resemble real sonar images.</p>
<p>To analyze the impact of each loss weight (<inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow></mml:math></inline-formula>), we conducted experiments in which each <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow></mml:math></inline-formula> value was individually minimized to 0.1. In addition, experiments without multi-resolution training were performed, with the results presented in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. When <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> (cycle-consistency) was minimized, the generated images exhibited severe structural distortions of the objects, demonstrating its importance in preserving overall geometry. When <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> (identity) was minimized, some regions showed blurred boundaries, highlighting its role in constraining unnecessary changes to samples from the target domain. When <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mrow><mml:mi mathvariant="normal">&#x03BB;</mml:mi></mml:mrow><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> (shadow) was minimized, the generated outputs exhibited inaccurate shadow orientations and partial degradation of shadow regions, underscoring the necessity of shadow loss for realistic sonar imagery. Finally, disabling multi-resolution training produced images lacking the distinctive characteristics of sonar data, making them clearly distinguishable from real sonar images.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Qualitative comparison of generated SSS images under different configurations. From left to right: Proposed method, cycle loss min (&#x03BB;<sub>1</sub> &#x003D; 0.1), identity loss min (&#x03BB;<sub>2</sub> &#x003D; 0.1), shadow loss min (&#x03BB;<sub>3</sub> &#x003D; 0.1), without multi-resolution training, and rendered input</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-5.tif"/>
</fig>
<p>The generated images exhibit the features of SSS imagery, as presented in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>. The highlights are caused by the direct reflection of sound waves off objects, and the shadows formed in areas where objects obstruct sound waves are prominently displayed. The overall noise in the images, resulting from various sources of underwater interference, is also effectively represented. These features demonstrate that the model effectively captures the distinctive properties of SSS imagery.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>SSS images generated by the proposed method</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-6.tif"/>
</fig>
<p>To evaluate the generated SSS images against other generative models, we trained existing CycleGAN models&#x2014;Unet128, Unet256, Resnet-06, and Resnet-09&#x2014;and compared their generated outputs with those from the proposed model. When analyzing the outputs of these existing models, we observed that traditional models struggled to preserve object structure, with significant distortions occurring in the background and shadows, resulting in unsuccessful image generation. In some cases, the images lacked typical SSS characteristics, presenting only irregular patterns in place of the original features. In contrast, the model proposed in this study consistently preserved the shape of the objects, background, and shadow while also effectively capturing key characteristics of SSS images, such as the highlights of object surfaces, shadow details, and noise (<xref ref-type="fig" rid="fig-7">Fig. 7</xref>).</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>SSS images generated by different methods</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-7.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Computational Efficiency</title>
<p>To evaluate the computational requirements of the proposed model, we compared the number of parameters, multiply-accumulate operations (MACs) and floating-point operations (FLOPs) with those of the baseline CycleGAN. <xref ref-type="table" rid="table-1">Table 1</xref> summarizes the results for an input size of 1 &#x00D7; 256 &#x00D7; 256. The proposed model contains 15.45 M parameters, which is comparable to the 14.12 M parameters of CycleGAN. The total MACs decreased from 59.16 to 17.06 G, and the total FLOPs decreased from 118.32 to 34.11 G, representing reductions of approximately 71.2% in both metrics.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>FLOPs and parameters at input size 1 &#x00D7; 256 &#x00D7; 256. Params are in millions (M), MACs and FLOPs are in giga (G)</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th align="center" valign="top">Model</th>
<th align="center" valign="top">Params (M)</th>
<th align="center" valign="top">MACs (G)</th>
<th align="center" valign="top">FLOPs (G)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="middle">Ours-generator</td>
<td align="center" valign="middle">13.38</td>
<td align="center" valign="middle">14.70</td>
<td align="center" valign="middle">29.39</td>
</tr>
<tr>
<td align="center" valign="middle">Ours-discriminator</td>
<td align="center" valign="middle">2.07</td>
<td align="center" valign="middle">2.36</td>
<td align="center" valign="middle">4.72</td>
</tr>
<tr>
<td align="center" valign="middle"><bold>Ours-Total</bold></td>
<td align="center" valign="middle">15.45</td>
<td align="center" valign="middle">17.06</td>
<td align="center" valign="middle">34.11</td>
</tr>
<tr>
<td align="center" valign="middle">CycleGAN-Gen (9 res)</td>
<td align="center" valign="middle">11.36</td>
<td align="center" valign="middle">56.04</td>
<td align="center" valign="middle">112.08</td>
</tr>
<tr>
<td align="center" valign="middle">CycleGAN-Disc (70 &#x00D7; 70)</td>
<td align="center" valign="middle">2.76</td>
<td align="center" valign="middle">3.12</td>
<td align="center" valign="middle">6.23</td>
</tr>
<tr>
<td align="center" valign="middle"><bold>CycleGAN-Total</bold></td>
<td align="center" valign="middle">14.12</td>
<td align="center" valign="middle">59.16</td>
<td align="center" valign="middle">118.32</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>This result is attributed to structural design differences. Conventional CycleGAN applies nine residual blocks on relatively high-resolution feature maps, which considerably increases computational cost despite having a similar number of parameters. Because MACs/FLOPs scale with the spatial dimensions of feature maps and the number of convolutional operations, repeated residual blocks substantially amplify the overall computational load in CycleGAN. In contrast, the proposed model distributes computation across multiple resolutions, executes a substantial portion at lower resolutions, and uses an optimized discriminator that avoids redundant operations. As a result, the proposed framework maintains a similar parameter scale while reducing MACs and FLOPs by more than two-thirds.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Performance Comparison Of YOLOv5 Using Generated Images</title>
<p>In this study, we used the YOLOv5 model [<xref ref-type="bibr" rid="ref-30">30</xref>] to evaluate the performance of ATR when trained with SSS images generated by different models, including Unet128, Unet256, ResNet-06, ResNet-09, and the proposed CycleGAN-based method. The experiment was conducted over 200 epochs, using real SSS data split into training and testing sets in a 50:50 ratio. The YOLOv5n model, known for its lightweight structure and fast inference speed, was trained from scratch without pre-trained weights. Following initial training, we augmented the training dataset by adding 117 shipwreck images, 104 aircraft images, and 100 drowning victim images generated by the model described above. A comparative analysis was conducted to evaluate the effectiveness of these generated datasets in improving ATR performance.</p>
<p><xref ref-type="table" rid="table-2">Table 2</xref> summarizes the quantitative evaluation results, including object detection performance evaluated with YOLOv5 and image quality assessed using Fr&#x00E9;chet inception distance (FID). The proposed method yielded the highest mAP for shipwreck (0.898) and drowning victim (0.942) detection, demonstrating superior performance in these categories compared with other models. For aircraft detection, ResNet-06 achieved the highest mAP (0.679), while the proposed method obtained a lower mAP of 0.508. To further evaluate image quality, we employed FID as an objective image similarity metric. Since the conventional InceptionV3 model used for FID is trained on natural images and thus unsuitable for sonar imagery evaluation, we replaced it with a ResNet18 model fine-tuned on sonar images. The FID results obtained using this approach are summarized in <xref ref-type="table" rid="table-2">Table 2</xref>. The experimental results show that our method achieved the lowest FID scores for the Ship and Victim classes, while ResNet-06 yielded the lowest FID for the Plane class. Upon further analysis, we observed that the real sonar Plane images used in training contained numerous complex and cluttered signals. These signals likely resulted from structural damage during crashes or from prolonged underwater exposure, which caused corrosion and deformation of the fuselage and wings, producing intricate internal structural patterns. In contrast, the images generated by our method were derived from 3D models without surface damage or corrosion, leading to cleaner representations where the acoustic responses of the fuselage and wings were expressed with reduced noise. This seems to imply that the ResNet-based model, which was trained to emphasize irregular and noisy features, assigned relatively lower FID values to real Plane images compared with our generated counterparts.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Comparison of object detection performance on real SSS images and FID scores, with training data from different real&#x2013;synthetic groups. Bold and underline are used to highlight the best-performing results for clarity</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th align="center" valign="top" rowspan="2">Group</th>
<th align="center" valign="top" rowspan="2">Target</th>
<th align="center" valign="top" colspan="2">Train data</th>
<th align="center" valign="top" rowspan="2">Test data</th>
<th align="center" valign="top" rowspan="2">mAP@0.5</th>
<th align="center" valign="top" rowspan="2">FID (ResNet18)</th>
</tr>
<tr>
<th align="center" valign="top">Real SSS</th>
<th align="center" valign="top">Generated SSS</th>
</tr>
</thead>
<tbody>
<tr>
<td align="center" valign="middle" rowspan="3">Real SSS</td>
<td align="center" valign="middle">Ship</td>
<td align="center" valign="middle">169</td>
<td align="center" valign="middle">&#x2013;</td>
<td align="center" valign="middle">165</td>
<td align="center" valign="middle">0.795</td>
<td align="center" valign="middle">&#x2013;</td>
</tr>
<tr>
<td align="center" valign="middle">Plane</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">&#x2013;</td>
<td align="center" valign="middle">19</td>
<td align="center" valign="middle">0.129</td>
<td align="center" valign="middle">&#x2013;</td>
</tr>
<tr>
<td align="center" valign="middle">Victim</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">&#x2013;</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">0.592</td>
<td align="center" valign="middle">&#x2013;</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="3">Unet 128</td>
<td align="center" valign="middle">Ship</td>
<td align="center" valign="middle">169</td>
<td align="center" valign="middle">117</td>
<td align="center" valign="middle">165</td>
<td align="center" valign="middle">0.579</td>
<td align="center" valign="middle">53.3</td>
</tr>
<tr>
<td align="center" valign="middle">Plane</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">104</td>
<td align="center" valign="middle">19</td>
<td align="center" valign="middle">0.110</td>
<td align="center" valign="middle">116.0</td>
</tr>
<tr>
<td align="center" valign="middle">Victim</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">100</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">0.732</td>
<td align="center" valign="middle">186.1</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="3">Unet 256</td>
<td align="center" valign="middle">Ship</td>
<td align="center" valign="middle">169</td>
<td align="center" valign="middle">117</td>
<td align="center" valign="middle">165</td>
<td align="center" valign="middle">0.793</td>
<td align="center" valign="middle">56.3</td>
</tr>
<tr>
<td align="center" valign="middle">Plane</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">104</td>
<td align="center" valign="middle">19</td>
<td align="center" valign="middle">0.522</td>
<td align="center" valign="middle">96.9</td>
</tr>
<tr>
<td align="center" valign="middle">Victim</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">100</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">0.910</td>
<td align="center" valign="middle">234.5</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="3">Resnet-06</td>
<td align="center" valign="middle">Ship</td>
<td align="center" valign="middle">169</td>
<td align="center" valign="middle">117</td>
<td align="center" valign="middle">165</td>
<td align="center" valign="middle">0.870</td>
<td align="center" valign="middle">108.3</td>
</tr>
<tr>
<td align="center" valign="middle">Plane</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">104</td>
<td align="center" valign="middle">19</td>
<td align="center" valign="middle">0.679</td>
<td align="center" valign="middle">77.4</td>
</tr>
<tr>
<td align="center" valign="middle">Victim</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">100</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">0.989</td>
<td align="center" valign="middle">215.9</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="3">Resnet-09</td>
<td align="center" valign="middle">Ship</td>
<td align="center" valign="middle">169</td>
<td align="center" valign="middle">117</td>
<td align="center" valign="middle">165</td>
<td align="center" valign="middle">0.883</td>
<td align="center" valign="middle">108.2</td>
</tr>
<tr>
<td align="center" valign="middle">Plane</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">104</td>
<td align="center" valign="middle">19</td>
<td align="center" valign="middle">0.665</td>
<td align="center" valign="middle">77.9</td>
</tr>
<tr>
<td align="center" valign="middle">Victim</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">100</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">0.897</td>
<td align="center" valign="middle">212.1</td>
</tr>
<tr>
<td align="center" valign="middle" rowspan="3">Ours</td>
<td align="center" valign="middle">Ship</td>
<td align="center" valign="middle">169</td>
<td align="center" valign="middle">117</td>
<td align="center" valign="middle">165</td>
<td align="center" valign="middle">0.898</td>
<td align="center" valign="middle">42.1</td>
</tr>
<tr>
<td align="center" valign="middle">Plane</td>
<td align="center" valign="middle">15</td>
<td align="center" valign="middle">104</td>
<td align="center" valign="middle">19</td>
<td align="center" valign="middle">0.508</td>
<td align="center" valign="middle">95.9</td>
</tr>
<tr>
<td align="center" valign="middle">Victim</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">100</td>
<td align="center" valign="middle">17</td>
<td align="center" valign="middle">0.942</td>
<td align="center" valign="middle">167.1</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Role of the Shadow Area Loss Function</title>
<p>To evaluate the impact of the shadow area loss function on training, the proposed model was trained with the weight of the shadow area loss function set to 0, and the generated results were compared.</p>
<p>The experimental results, shown in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, reveal that abnormal shadow patterns appeared in the background in the model without the shadow area loss function. Additionally, the shapes of the object shadows were unstable. These findings indicate that the shadow area loss function plays a crucial role in preserving the shadow shapes of objects in the input images and ensuring that the overall shadow characteristics of SSS images are represented consistently and accurately.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>(<bold>a</bold>) The result of training without the shadow loss function, (<bold>b</bold>) The result of training with the shadow loss function</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_73530-fig-8.tif"/>
</fig>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Conclusions</title>
<p>This study proposed a CycleGAN-based model for augmenting SSS images, introducing a multi-resolution learning structure, a shadow extractor, and a shadow region loss function. These innovations address limitations in conventional CycleGAN models by ensuring consistent shadow preservation and stable image generation. In particular, the shadow region loss function compares highlight and shadow patterns between the original and generated images, preserving the critical structural information unique to SSS images while realistically representing underwater noise.</p>
<p>Performance evaluation using the YOLOv5 model demonstrated the effectiveness of the proposed model, achieving an average precision improvement of 10.3% in shipwreck detection, 37.9% in aircraft detection, and 35% in drowning victim detection. These results highlight the model&#x2019;s capability to overcome the limitations of limited SSS datasets, significantly enhancing AI-driven marine exploration and seabed monitoring through the generation of diverse SSS images under various conditions.</p>
<p>However, the study faced limitations in generalization due to the constrained size and variety of the dataset, with aircraft images being particularly underrepresented (34 samples compared to 334 shipwreck images). Future research should prioritize securing larger, more diverse datasets and developing objective, quantitative evaluation metrics. In particular, incorporating data from varied seabed textures, sonar angles, and occlusion scenarios would be valuable for improving the model&#x2019;s robustness and real-world applicability. Furthermore, integrating the model with various marine exploration equipment will be essential to validate its robustness and applicability, thereby advancing its potential for underwater exploration. Building on this, it is also important to consider how the proposed approach could be integrated into real-world operational pipelines.</p>
<p>The synthetic images themselves are not directly integrated into real sonar systems. Instead, they are used to train automatic target recognition (ATR) systems that operate on board sonar platforms. The trained ATR models can subsequently be deployed in autonomous sonar systems, where they detect critical targets for further investigation. They can also provide essential information for applications such as mission planning with Remotely Operated Vehicles (ROVs), autonomous object recognition, and military operations. Nevertheless, potential challenges such as domain drift in operational environments should be carefully addressed to ensure reliable performance when transferring from training datasets to real-world scenarios.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to express their sincere gratitude to the members of the Korea Institute of Ocean Science and Technology (KIOST) and the Pattern Recognition &#x0026; Machine Learning Lab at Pukyong National University for their valuable technical advice on sonar data acquisition and preprocessing. The authors also acknowledge the open-access contributors of the SeabedObjects-KLSG dataset for making their side-scan sonar imagery publicly available, which served as an essential foundation for this study. Finally, the authors express their appreciation to all individuals who assisted in the preparation and verification of 3D models and provided valuable insights into sonar image interpretation and dataset validation.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2024-00334159), and the Korea Institute of Ocean Science and Technology (KIOST) project entitled &#x201C;Development of Maritime Domain Awareness Technology for Sea Power Enhancement&#x201D; (PEA0332).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Byeongjun Kim, Won-Du Chang; data collection: Byeongjun Kim; analysis and interpretation of results: Byeongjun Kim, Seung-Hun Lee, Won-Du Chang; draft manuscript preparation: Byeongjun Kim, Seung-Hun Lee; supervision: Won-Du Chang. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The SeabedObjects-KLSG dataset used in this study is publicly available and can be accessed from the original open-access publication by Huo et al. (2020). This dataset contains real-world side-scan sonar images collected over a 10-year period for underwater object detection research. The synthetic SSS images generated in this study were produced using 3D modeling and a CycleGAN-based style transformation framework. These images were created solely for experimental purposes and are available from the corresponding author upon reasonable request.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lunkenheimer</surname> <given-names>P</given-names></string-name>, <string-name><surname>Emmert</surname> <given-names>S</given-names></string-name>, <string-name><surname>Gulich</surname> <given-names>R</given-names></string-name>, <string-name><surname>K&#x00F6;hler</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wolf</surname> <given-names>M</given-names></string-name>, <string-name><surname>Schwab</surname> <given-names>M</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Electromagnetic-radiation absorption by water</article-title>. <source>Phys Rev E</source>. <year>2017</year>;<volume>96</volume>(<issue>6</issue>):<fpage>062607</fpage>. doi:<pub-id pub-id-type="doi">10.1103/physreve.96.062607</pub-id>; <pub-id pub-id-type="pmid">29347319</pub-id></mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yeu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yee</surname> <given-names>JJ</given-names></string-name>, <string-name><surname>Yun</surname> <given-names>HS</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>KB</given-names></string-name></person-group>. <article-title>Evaluation of the accuracy of bathymetry on the nearshore coastlines of western Korea from satellite altimetry, multi-beam, and airborne bathymetric LiDAR</article-title>. <source>Sensors</source>. <year>2018</year>;<volume>18</volume>(<issue>9</issue>):<fpage>2926</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s18092926</pub-id>; <pub-id pub-id-type="pmid">30177653</pub-id></mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Murad</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sheikh</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Manzoor</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Felemban</surname> <given-names>E</given-names></string-name>, <string-name><surname>Qaisar</surname> <given-names>S</given-names></string-name></person-group>. <article-title>A survey on current underwater acoustic sensor network applications</article-title>. <source>Int J Comput Theory Eng</source>. <year>2015</year>;<volume>7</volume>(<issue>1</issue>):<fpage>51</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.7763/ijcte.2015.v7.929</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Synthetic aperture image enhancement with near-coinciding nonuniform sampling case</article-title>. <source>Comput Electr Eng</source>. <year>2024</year>;<volume>120</volume>:<fpage>109818</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compeleceng.2024.109818</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Bai</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bai</surname> <given-names>Q</given-names></string-name></person-group>. <source>Subsea engineering handbook</source>. <publisher-loc>Oxford, UK</publisher-loc>: <publisher-name>Gulf Professional Publishing</publisher-name>; <year>2018</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Olejnik</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Visual identification of underwater objects using a ROV-type vehicle: graf Zeppelin wreck investigation</article-title>. <source>Pol Marit Res</source>. <year>2008</year>;<volume>15</volume>(<issue>1</issue>):<fpage>72</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.2478/v10012-007-0055-4</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Neves</surname> <given-names>G</given-names></string-name>, <string-name><surname>Ruiz</surname> <given-names>M</given-names></string-name>, <string-name><surname>Fontinele</surname> <given-names>J</given-names></string-name>, <string-name><surname>Oliveira</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Rotated object detection with forward-looking sonar in underwater applications</article-title>. <source>Expert Syst Appl</source>. <year>2020</year>;<volume>140</volume>(<issue>1</issue>):<fpage>112870</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2019.112870</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dai</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Duan</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Small-sample sonar image classification based on deep learning</article-title>. <source>J Mar Sci Eng</source>. <year>2022</year>;<volume>10</volume>(<issue>12</issue>):<fpage>1820</fpage>. doi:<pub-id pub-id-type="doi">10.3390/jmse10121820</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Karjalainen</surname> <given-names>AI</given-names></string-name>, <string-name><surname>Mitchell</surname> <given-names>R</given-names></string-name>, <string-name><surname>Vazquez</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Training and validation of automatic target recognition systems using generative adversarial networks</article-title>. In: <conf-name>Proceedings of the 2019 Sensor Signal Processing for Defence Conference (SSPD); 2019 May 9&#x2013;10</conf-name>; <publisher-loc>New York, NY, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/sspd.2019.8751666</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Reed</surname> <given-names>A</given-names></string-name>, <string-name><surname>Gerg</surname> <given-names>ID</given-names></string-name>, <string-name><surname>McKay</surname> <given-names>JD</given-names></string-name>, <string-name><surname>Brown</surname> <given-names>DC</given-names></string-name>, <string-name><surname>Williamsk</surname> <given-names>DP</given-names></string-name>, <string-name><surname>Jayasuriya</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Coupling rendering and generative adversarial networks for artificial SAS image generation</article-title>. In: <conf-name>Proceedings of the OCEANS, 2019 MTS/IEEE Seattle; 2019 Oct 27&#x2013;31</conf-name>; <publisher-loc>Seattle, WA, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>10</lpage>. doi:<pub-id pub-id-type="doi">10.23919/oceans40490.2019.8962733</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ku</surname> <given-names>B</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ko</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Side-scan sonar image synthesis based on generative adversarial network for images in multiple frequencies</article-title>. <source>IEEE Geosci Remote Sens Lett</source>. <year>2020</year>;<volume>18</volume>(<issue>9</issue>):<fpage>1505</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/LGRS.2020.3005679</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bian</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jin</surname> <given-names>S</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>SSS underwater target image samples augmentation based on the cross-domain mapping relationship of images of the same physical object</article-title>. <source>IEEE J Sel Top Appl Earth Obs Remote Sens</source>. <year>2023</year>;<volume>16</volume>:<fpage>6393</fpage>&#x2013;<lpage>410</lpage>. doi:<pub-id pub-id-type="doi">10.1109/JSTARS.2023.3292327</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Grz&#x0105;dziel</surname> <given-names>A</given-names></string-name></person-group>. <article-title>The impact of side-scan sonar resolution and acoustic shadow phenomenon on the quality of sonar imagery and data interpretation capabilities</article-title>. <source>Remote Sens</source>. <year>2023</year>;<volume>15</volume>(<issue>23</issue>):<fpage>5599</fpage>. doi:<pub-id pub-id-type="doi">10.3390/rs15235599</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zou</surname> <given-names>L</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>B</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Sonar image target detection for underwater communication system based on deep neural network</article-title>. <source>Comput Model Eng Sci</source>. <year>2023</year>;<volume>137</volume>(<issue>3</issue>):<fpage>2641</fpage>&#x2013;<lpage>59</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmes.2023.028037</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Huo</surname> <given-names>G</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Underwater object classification in sidescan sonar images using deep transfer learning and semisynthetic training data</article-title>. <source>IEEE Access</source>. <year>2020</year>;<volume>8</volume>:<fpage>47407</fpage>&#x2013;<lpage>18</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2020.2978880</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pessanha Santos</surname> <given-names>N</given-names></string-name>, <string-name><surname>Moura</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sampaio Torgal</surname> <given-names>G</given-names></string-name>, <string-name><surname>Lobo</surname> <given-names>V</given-names></string-name>, <string-name><surname>de Castro Neto</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Side-scan sonar imaging data of underwater vehicles for mine detection</article-title>. <source>Data Brief</source>. <year>2024</year>;<volume>53</volume>:<fpage>110132</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.dib.2024.110132</pub-id>; <pub-id pub-id-type="pmid">38384311</pub-id></mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Side-scan sonar mine-like target detection considering acoustic illumination and shadow characteristics</article-title>. <source>Ocean Eng</source>. <year>2025</year>;<volume>336</volume>:<fpage>121711</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.oceaneng.2025.121711</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bouwman</surname> <given-names>F</given-names></string-name>, <string-name><surname>Ecclestone</surname> <given-names>DW</given-names></string-name>, <string-name><surname>Gabri&#x00EB;lse</surname> <given-names>AL</given-names></string-name>, <string-name><surname>van Oers</surname> <given-names>AM</given-names></string-name></person-group>. <article-title>Synthetic side-scan sonar data for detecting mine-like contacts</article-title>. <source>Artif Intell Secur Def Appl II</source>. <year>2024</year>;<volume>13206</volume>:<fpage>437</fpage>&#x2013;<lpage>41</lpage>. doi:<pub-id pub-id-type="doi">10.1117/12.3031150</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ge</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ruan</surname> <given-names>F</given-names></string-name>, <string-name><surname>Qiao</surname> <given-names>B</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zuo</surname> <given-names>X</given-names></string-name>, <string-name><surname>Dang</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Side-scan sonar image classification based on style transfer and pre-trained convolutional neural networks</article-title>. <source>Electronics</source>. <year>2021</year>;<volume>10</volume>(<issue>15</issue>):<fpage>1823</fpage>. doi:<pub-id pub-id-type="doi">10.3390/electronics10151823</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Peng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhai</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Underwater sonar image classification with image disentanglement reconstruction and zero-shot learning</article-title>. <source>Remote Sens</source>. <year>2025</year>;<volume>17</volume>(<issue>1</issue>):<fpage>134</fpage>. doi:<pub-id pub-id-type="doi">10.3390/rs17010134</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sugiyama</surname> <given-names>S</given-names></string-name>, <string-name><surname>Minowa</surname> <given-names>M</given-names></string-name>, <string-name><surname>Schaefer</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Underwater ice terrace observed at the front of glaciar grey, a freshwater calving glacier in Patagonia</article-title>. <source>Geophys Res Lett</source>. <year>2019</year>;<volume>46</volume>(<issue>5</issue>):<fpage>2602</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1029/2018GL081441</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Du</surname> <given-names>X</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Song</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>L</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Recognition of underwater engineering structures using CNN models and data expansion on side-scan sonar images</article-title>. <source>J Mar Sci Eng</source>. <year>2025</year>;<volume>13</volume>(<issue>3</issue>):<fpage>424</fpage>. doi:<pub-id pub-id-type="doi">10.3390/jmse13030424</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Aubard</surname> <given-names>M</given-names></string-name>, <string-name><surname>Antal</surname> <given-names>L</given-names></string-name>, <string-name><surname>Madureira</surname> <given-names>A</given-names></string-name>, <string-name><surname>&#x00C1;brah&#x00E1;m</surname> <given-names>E</given-names></string-name></person-group>. <article-title>Knowledge distillation in YOLOX-ViT for side-scan sonar object detection</article-title>. <comment>arXiv:2403.09313. 2024</comment>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>&#x00C1;lvarez-Tu&#x00F1;&#x00F3;n</surname> <given-names>O</given-names></string-name>, <string-name><surname>Marnet</surname> <given-names>LR</given-names></string-name>, <string-name><surname>Aubard</surname> <given-names>M</given-names></string-name>, <string-name><surname>Antal</surname> <given-names>L</given-names></string-name>, <string-name><surname>Costa</surname> <given-names>M</given-names></string-name>, <string-name><surname>Brodskiy</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>SubPipe: a submarine pipeline inspection dataset for segmentation and visual-inertial localization</article-title>. In: <conf-name>Proceedings of the OCEANS 2024; 2024 Apr 15&#x2013;18</conf-name>; <publisher-loc>Singapore</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/OCEANS51537.2024.10682150</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Burguera</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bonin-Font</surname> <given-names>F</given-names></string-name></person-group>. <article-title>On-line multi-class segmentation of side-scan sonar imagery using an autonomous underwater vehicle</article-title>. <source>J Mar Sci Eng</source>. <year>2020</year>;<volume>8</volume>(<issue>8</issue>):<fpage>557</fpage>. doi:<pub-id pub-id-type="doi">10.3390/jmse8080557</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>JY</given-names></string-name>, <string-name><surname>Park</surname> <given-names>T</given-names></string-name>, <string-name><surname>Isola</surname> <given-names>P</given-names></string-name>, <string-name><surname>Efros</surname> <given-names>AA</given-names></string-name></person-group>. <article-title>Unpaired image-to-image translation using cycle-consistent adversarial networks</article-title>. In: <conf-name>Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV); 2017 Oct 22&#x2013;29</conf-name>; <publisher-loc>Venice, Italy</publisher-loc>. p. <fpage>2242</fpage>&#x2013;<lpage>51</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCV.2017.244</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sauer</surname> <given-names>A</given-names></string-name>, <string-name><surname>Chitta</surname> <given-names>K</given-names></string-name>, <string-name><surname>M&#x00FC;ller</surname> <given-names>J</given-names></string-name>, <string-name><surname>Geiger</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Projected GANs converge faster</article-title>. <source>Adv Neural Inf Process</source>. <year>2021</year>;<volume>34</volume>:<fpage>17480</fpage>&#x2013;<lpage>92</lpage>. doi:<pub-id pub-id-type="doi">10.5555/3540261.3541598</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Isola</surname> <given-names>P</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>JY</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>T</given-names></string-name>, <string-name><surname>Efros</surname> <given-names>AA</given-names></string-name></person-group>. <article-title>Image-to-image translation with conditional adversarial networks</article-title>. In: <conf-name>Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2017 Jul 21&#x2013;26</conf-name>; <publisher-loc>Piscataway, NJ, USA</publisher-loc>. p. <fpage>5967</fpage>&#x2013;<lpage>76</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CVPR.2017.632</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Mao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>H</given-names></string-name>, <string-name><surname>Lau</surname> <given-names>RYK</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Smolley</surname> <given-names>SP</given-names></string-name></person-group>. <article-title>Least squares generative adversarial networks</article-title>. In: <conf-name>Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV)2017 Oct 22&#x2013;29</conf-name>; <publisher-loc>Piscataway, NJ, USA</publisher-loc>. p. <fpage>2813</fpage>&#x2013;<lpage>21</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCV.2017.304</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Redmon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Divvala</surname> <given-names>S</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name>, <string-name><surname>Farhadi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>You only look once: unified, real-time object detection</article-title>. In: <conf-name>Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27&#x2013;30</conf-name>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>. p. <fpage>779</fpage>&#x2013;<lpage>88</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CVPR.2016.91</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>