<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="review-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">67915</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.067915</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Review</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Deep Multi-Scale and Attention-Based Architectures for Semantic Segmentation in Biomedical Imaging</article-title>
<alt-title alt-title-type="left-running-head">Deep Multi-Scale and Attention-Based Architectures for Semantic Segmentation in Biomedical Imaging</alt-title>
<alt-title alt-title-type="right-running-head">Deep Multi-Scale and Attention-Based Architectures for Semantic Segmentation in Biomedical Imaging</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Harouni</surname><given-names>Majid</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>majid.harouni@nih.gov</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Goyal</surname><given-names>Vishakha</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Feldman</surname><given-names>Gabrielle</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Michael</surname><given-names>Sam</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Voss</surname><given-names>Ty C.</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Division of Preclinical Innovation, National Center for Advancing Translational Sciences (NCATS), National Institutes of Health, 9800 Medical Center Drive, Building B</institution>, <addr-line>Rockville, MD 20850</addr-line>, <country>USA</country></aff>
<aff id="aff-2"><label>2</label><institution>Data, Automation, and Predictive Sciences (DAPS), Research Technologies, GSK</institution>, <addr-line>2929 Walnut Street, Ste. 1700, Philadelphia, PA 19104</addr-line>, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Majid Harouni. Email: <email>majid.harouni@nih.gov</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>29</day><month>08</month><year>2025</year>
</pub-date>
<volume>85</volume>
<issue>1</issue>
<fpage>331</fpage>
<lpage>366</lpage>
<history>
<date date-type="received">
<day>16</day>
<month>5</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>16</day>
<month>7</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_67915.pdf"></self-uri>
<abstract>
<p>Semantic segmentation plays a foundational role in biomedical image analysis, providing precise information about cellular, tissue, and organ structures in both biological and medical imaging modalities. Traditional approaches often fail in the face of challenges such as low contrast, morphological variability, and densely packed structures. Recent advancements in deep learning have transformed segmentation capabilities through the integration of fine-scale detail preservation, coarse-scale contextual modeling, and multi-scale feature fusion. This work provides a comprehensive analysis of state-of-the-art deep learning models, including U-Net variants, attention-based frameworks, and Transformer-integrated networks, highlighting innovations that improve accuracy, generalizability, and computational efficiency. Key architectural components such as convolution operations, shallow and deep blocks, skip connections, and hybrid encoders are examined for their roles in enhancing spatial representation and semantic consistency. We further discuss the importance of hierarchical and instance-aware segmentation and annotation in interpreting complex biological scenes and multiplexed medical images. By bridging methodological developments with diverse application domains, this paper outlines current trends and future directions for semantic segmentation, emphasizing its critical role in facilitating annotation, diagnosis, and discovery in biomedical research.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Biomedical</kwd>
<kwd>semantic segmentation</kwd>
<kwd>multi-scale feature fusion</kwd>
<kwd>fine- and coarse-scale features</kwd>
<kwd>convolution operations</kwd>
<kwd>shallow and deep blocks</kwd>
<kwd>skip connections</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Institutes of Health (NIH)</funding-source>
</award-group>
<award-group id="awg2">
<funding-source>NCATS Intramural Fund</funding-source>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Automated cell segmentation is critical for biomedical research, yet challenges such as low contrast, morphological variability, and dense cell clusters hinder traditional approaches. Low cell-to-cell contrast is frequently encountered when analyzing images of densely packed cells with ambiguous boundaries, since adjacent cells often have similar intensities, textures, and features. Deep learning-based segmentation models have advanced segmentation accuracy beyond that achievable with traditional methods by improving feature representation at fine- and coarse-scale level and/or leveraging the information in each to gain multi-scale features.</p>
<p>1) Fine-scale feature representation: To preserve fine-scale information and enhance structural detail in a U-Net architecture, reference [<xref ref-type="bibr" rid="ref-1">1</xref>] uses skip connections to relay feature details from the contracting path to the expansive path, thereby improving the precision of target localization in the modified U-Net architecture. Center Surround Difference (CSD) algorithm is incorporated into the skipped connections. This approach generates a CSD feature map through the application of the CSD algorithm to the encoder layer feature maps. Complex details of cellular structures are captured in [<xref ref-type="bibr" rid="ref-2">2</xref>] by introducing a trainable deep-learning layer, i.e., MaxSigLayer. This layer&#x2019;s dual-window mechanism incorporates spatial and learnable weight components, thus improving contrast and boundary delineation. The Fine-scale Corrective (FCL)-Net model [<xref ref-type="bibr" rid="ref-3">3</xref>] has a Top-down Attentional Guiding (TAG) module, which, when combined with a Pixel-level Weighting module, guides fine-scale feature learning by applying coarse-scale semantic cues.</p>
<p>2) Coarse-scale feature representation: Coarse feature maps capture contextual details, providing a high-level understanding of the scene by emphasizing the category and position of key objects. Typically, an initial coarse-level model, like U-Net, identifies the region of interest (ROI) by capturing contextual information. The extracted ROI is then cropped and processed by a second model for segmentation refinement. These feature maps guide finer feature representations, improving spatial awareness and semantic consistency in deep learning models [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. In [<xref ref-type="bibr" rid="ref-6">6</xref>], a coarse-level model is proposed to segment dendrites from axons and somas, improving the detection of dendritic shafts, spine necks, and spine heads through contextual differentiation. While a traditional segmentation process combines a generating-shrinking neural network with a spatiotemporal parametric modeling method based on functional basis decomposition [<xref ref-type="bibr" rid="ref-7">7</xref>], this multiscale approach utilizes a coarse-scale model from its previous fine-scale step to guide and constrain boundary detection at each stage, ensuring improved segmentation accuracy and structural consistency.</p>
<p>3) Multi-scale feature representation: Fusing and combining coarse-to-fine feature maps can effectively overcome the challenges resulting from low-resolution image data and large variations in the sizes, shapes, and locations of cancer lesions. This hierarchical fusion-based approach enhances feature representation, enabling better detection and segmentation of complex lesion structures [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-10">10</xref>]. Different imaging modalities can be used to detect and diagnose cancer lesions, with key selection criteria including cost, sensitivity, radiation exposure, and accessibility. In breast cancer screenings, ultrasound imaging stands out as one of the most cost-effective and easily accessible tools for early cancer detection, offering high sensitivity without exposing patients to radiation [<xref ref-type="bibr" rid="ref-8">8</xref>]. However, poor image quality in ultrasound imaging can lead to blurred boundaries, making it difficult to determine the exact location and size of lesions, and consequently, to assess lesion malignancy [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>]. Regardless of imaging modalities and organ structures, a fusion-based U-Net architecture can serve as the backbone for mapping coarse-to-fine features for multi-scale feature representation. The architecture broadly addresses three key concerns [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>]: (1) the convolutional operation, which captures and refines spatial features, (2) the shallow block, which may be used as an early processing stage, (3) the deep block, which enhances feature extraction and semantic understanding, and (4) the skip connection, which preserves fine-grained details by transferring information from the encoder to the decoder, ensuring better segmentation performance.</p>
<p>The fundamental component of convolutional neural networks (CNNs) is the convolutional operation, a mathematical procedure that extracts features from image input using a matrix filter. During this process, filter values are multiplied element-by-element with corresponding input values at each pixel position, followed by a local summation operation that generates feature maps from the image data. These feature maps simplify the input data by highlighting specific features, such as edges, patterns, and textures, which are essential for downstream tasks in semantic segmentation. A limitation of the U-net architecture is its inability to effectively capture long-range and global semantic information, especially in low-contrast scenarios between the organ and the surrounding environment, due to the inherently local nature of its operations [<xref ref-type="bibr" rid="ref-15">15</xref>]. Rayhan et al. [<xref ref-type="bibr" rid="ref-16">16</xref>] employed attention-guided residual convolutional operations, allowing the model to generate relevant feature maps while maintaining performance even with a considerable increase in network depth. Roy and Ameer [<xref ref-type="bibr" rid="ref-17">17</xref>] introduced the use of Atrous convolutions, also known as dilated convolutions, to enhance image resolution by applying standard convolution with an expanded receptive field. Fan et al. [<xref ref-type="bibr" rid="ref-18">18</xref>] implemented the Self-Attention Paralleling Network (CSAP-UNet), which utilizes an encoder-decoder architecture integrated with two modules, i.e., boundary enhancement and attention fusion. Pavani et al. [<xref ref-type="bibr" rid="ref-19">19</xref>] replaced the conventional U-Net encoder with multiscale feature extraction and deep aggregation pyramid pooling modules to capture multiscale features by applying convolutional operations with kernels of varying sizes for fluid detection in Optical Coherence Tomography (OCT) images.</p>
<p>In the shallow block, low-level semantic information is extracted while sufficient object details are retained for accurate localization [<xref ref-type="bibr" rid="ref-20">20</xref>]. However, the details captured by the shallow block may be overly fine at each spatial location and can summarize the entire features [<xref ref-type="bibr" rid="ref-21">21</xref>]. In [<xref ref-type="bibr" rid="ref-22">22</xref>], the residuals produced by the shallow blocks guide the deeper blocks, allowing them to operate with fewer parameters for the removal of small objects in detection tasks. Several modified U-Net models have been developed utilizing shallow feature map, including the FSOU-Net model [<xref ref-type="bibr" rid="ref-23">23</xref>], which introduces a shallow feature supplement structure, a dual-rotation network in [<xref ref-type="bibr" rid="ref-24">24</xref>] that incorporates a shallow strategy, the Spiral Squeeze-and-Excitation and Attention NET [<xref ref-type="bibr" rid="ref-25">25</xref>], which leverages shallow features, and PAMSNet [<xref ref-type="bibr" rid="ref-14">14</xref>], which integrates shallow semantic information for dual-attention fusion.</p>
<p>The purpose of deep blocks is to extract finer details from images and effectively filter out tiny noise as the convolutional structure gets progressively deeper [<xref ref-type="bibr" rid="ref-26">26</xref>]. A deep block is built from different layers including convolutional, activation function, batch normalization layers, etc. Typically, the structure of the backbone of each proposed model is built by stacking several deep blocks, from which features can be derived. The original U-Net uses a basic deep block architecture, which has some difficulties for training as the depth increases. Several modified U-Nets are developed by incorporating residual blocks to overcome the limitations of basic deep block architecture. A hybrid encoder, integrating a ConvNeXt-based Transformer with cross-dimensional long-range spatial-aware attention, is proposed in [<xref ref-type="bibr" rid="ref-27">27</xref>]. Other approaches include the use of deep-based residual blocks [<xref ref-type="bibr" rid="ref-28">28</xref>], Inception-Res-based dense connection blocks [<xref ref-type="bibr" rid="ref-29">29</xref>], and combinations with attention architectures such as the Attention&#x2013;Inception&#x2013;Residual-based U-Net (AIR-UNet) [<xref ref-type="bibr" rid="ref-30">30</xref>] and the Multi-View Attention and Multi-Scale Feature Interaction U-Net (MVSI-Net) [<xref ref-type="bibr" rid="ref-31">31</xref>] for brain tumor detection. Additionally, transformer-based hybrid models such as the Dual-Attention Transformer-Based Hybrid Network [<xref ref-type="bibr" rid="ref-32">32</xref>], Internal and External Dual Attention Network (IEA-Net) [<xref ref-type="bibr" rid="ref-33">33</xref>], and Dual Multi-Scale Attention U-Net (DMSA-UNet) [<xref ref-type="bibr" rid="ref-34">34</xref>] have also been introduced.</p>
<p>The skip connection block is designed to prevent feature map explosion and minimize information loss in the decoder path [<xref ref-type="bibr" rid="ref-35">35</xref>], while also enhancing feature reusability and accelerating gradient propagation in deep networks [<xref ref-type="bibr" rid="ref-36">36</xref>]. Also, this block preserves spatial and boundary information that may be lost during the encoding process [<xref ref-type="bibr" rid="ref-37">37</xref>]. The primary function of a skip connection is to transfer low-level (shallow) features from the encoder sub-network to high-level (deep) features in the decoder sub-network at the same scale. This facilitates the concatenation of contextual semantic information between the two sub-networks, enabling the deep network to effectively fuse coarse-grained and fine-grained feature maps for improved semantic segmentation. Several skip connection blocks have been proposed to facilitate the transfer of coarse-to-fine features, including the dense-insertion-based block in DESCINet [<xref ref-type="bibr" rid="ref-38">38</xref>], multi-scale skip connections in the Star-shaped Window Transformer Reinforced U-Net (SWTRU) [<xref ref-type="bibr" rid="ref-39">39</xref>], information bottleneck-based theory fusion and selective fusion in a dual encoder model [<xref ref-type="bibr" rid="ref-40">40</xref>], a multichannel fusion Transformer skip connection in USCT-UNet [<xref ref-type="bibr" rid="ref-41">41</xref>], the combination of UNet&#x002B;&#x002B; architecture and Mamba-based model in SK-VM&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-42">42</xref>], and symmetric encoder-decoder-based skip connections [<xref ref-type="bibr" rid="ref-43">43</xref>]. Skip Non-local Attention is utilized in UTSN-Net [<xref ref-type="bibr" rid="ref-44">44</xref>], and skip connections are also employed in the cell structure of Quantum-Inspired Neural Architecture Search (SegQNAS) [<xref ref-type="bibr" rid="ref-45">45</xref>].</p>
<p>Recent advances in deep learning have substantially enhanced the ability to segment cells and organs within complex tissue environments, where accurate annotation serves as a critical foundation for reliable segmentation [<xref ref-type="bibr" rid="ref-46">46</xref>&#x2013;<xref ref-type="bibr" rid="ref-48">48</xref>]. These methods enable more precise interpretation of multiplexed tissue images, which are vital for understanding cellular composition and spatial organization. While semantic segmentation offers pixel-level classification, it often lacks the capacity to distinguish individual cell or organ instances, a limitation in many biological and clinical applications [<xref ref-type="bibr" rid="ref-49">49</xref>&#x2013;<xref ref-type="bibr" rid="ref-52">52</xref>]. This review provides a unique, structured analysis of deep learning-based semantic segmentation approaches with a specific focus on multi-scale feature representation strategies, i.e., fine, coarse, and fused coarse-to-fine. Unlike prior reviews that broadly summarize segmentation models, this work dissects architectural components, e.g., convolutional blocks, shallow/deep modules, skip connections, and maps them to their respective contributions in enhancing semantic segmentation performance under challenging biomedical conditions. Key contributions of this work include: (1) a comprehensive classification of models based on their scale-aware design principles; (2) an in-depth discussion of advanced modules such as attention mechanisms, Transformer hybrids, and multi-path encoders; and (3) insights into the role of these architectures in improving annotation efficiency, interpretability, and scalability for biomedical imaging. To address this, deep learning models can be employed not only for segmentation but also to assist in the annotation process itself, streamlining image labeling and reducing the time and complexity associated with manual annotations. <xref ref-type="sec" rid="s2">Section 2</xref> presents a review of related work on fine-to-coarse semantic segmentation approaches and analyzes various deep learning model architectures. In <xref ref-type="sec" rid="s3">Section 3</xref>, we examine relevant datasets, followed by a concluding discussion in <xref ref-type="sec" rid="s4">Section 4</xref>.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Main Discussion/Analysis</title>
<p>The effective capture and representation of multi-scale features are fundamental to the success of contemporary U-Net deep learning-based semantic segmentation architectures, especially in fields such as medical and biological imaging [<xref ref-type="bibr" rid="ref-31">31</xref>&#x2013;<xref ref-type="bibr" rid="ref-33">33</xref>,<xref ref-type="bibr" rid="ref-52">52</xref>,<xref ref-type="bibr" rid="ref-53">53</xref>]. This section explores the distinct roles of fine-scale, coarse-scale, and multi-scale feature representations, emphasizing their importance in enhancing model robustness and segmentation precision. As depicted in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the workflow begins with input data that undergoes preprocessing and augmentation to improve the model&#x2019;s generalization across diverse data variations. The primary focus of this discussion is the integration of fine- and coarse-scale attention mechanisms and/or combining of them, which are key to improving feature discrimination and contextual understanding. This is supported using advanced convolutional operations, including residual connections, dilated convolutions, and multi-scale feature extractors, along with attention mechanisms that aid efficient feature fusion. The analysis also considers the effectiveness of different architectural modules, such as shallow and deep blocks enhanced with transformers, inception structures, and residual-based mechanisms. Crucially, the role of skip connections is highlighted, particularly those augmented with dense features or transformer-based enhancements, as they are instrumental in preserving spatial and semantic information across layers. By examining these varied strategies for feature extraction and fusion, this work aims to advance deep learning methodologies for complex semantic segmentation tasks, reinforcing the critical role of multi-scale approaches in achieving state-of-the-art performance.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Overview of medical and biological image semantic segmentation: a workflow from input data to evaluation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-1.tif"/>
</fig>
<sec id="s2_1">
<label>2.1</label>
<title>Fine-Scale Feature Representation Analysis</title>
<p>As described in [<xref ref-type="bibr" rid="ref-1">1</xref>], the PESA R-CNN is a two-stage instance segmentation model, similar to Mask R-CNN, designed to enhance segmentation performance using three key components: CSD U-Net with pseudo perihematomal edema (PHE) targets, Scale Adaptive RoI Align (SARA), and a densely connected Multi-Scale Segmentation Network (MSSN). In models such as Mask R-CNN [<xref ref-type="bibr" rid="ref-54">54</xref>], DETR [<xref ref-type="bibr" rid="ref-55">55</xref>], RT-DETR [<xref ref-type="bibr" rid="ref-56">56</xref>], and Mask2Former [<xref ref-type="bibr" rid="ref-57">57</xref>&#x2013;<xref ref-type="bibr" rid="ref-59">59</xref>], it has been observed that their object detection capabilities can be used to detect when newly untracked classes appear or when previously tracked entities leave a scene. In the first stage, a weakly supervised trained CSD U-Net detects hemorrhage and PHE regions, which are used to generate region proposals (RoIs) via the Region Proposal Network. The second input branch extracts feature maps from a ResNet-101 backbone and processes them through the SARA module, which classifies RoIs into three scale-based groups for adaptive alignment. The feature maps, i.e., color, intensity and orientations, are originally generated using center-surround differences (CSD), which compute intensity contrasts between fine-scale center regions and coarser-scale surround regions. This process mimics neuronal responses in mammals that detect dark centers on bright backgrounds and <italic>vice versa</italic>, producing six rectified feature maps [<xref ref-type="bibr" rid="ref-60">60</xref>,<xref ref-type="bibr" rid="ref-61">61</xref>]. In the second stage, the aligned RoIs are processed by MSSN, where densely connected layers help preserve fine details and minimize information loss. The final segmentation is obtained by integrating outputs from all segmentation networks using pixel-wise addition. Additionally, classification and box regression branches refine object classes and bounding box coordinates. Through the integration of SARA and MSSN, the model achieves enhanced detection and localization of hemorrhage patterns of varying sizes in CT scans. A multi-task loss function optimizes classification, box regression, and segmentation jointly, using cross-entropy loss for classification.</p>
<p>The MaxSigLayer proposed in [<xref ref-type="bibr" rid="ref-2">2</xref>] introduces a non-linearity mechanism to enhance feature representation for cell segmentation in microscopy images. When used as a ramp function, ReLU mitigates the vanishing gradient problem and facilitates faster convergence by sustaining larger and more stable gradient values [<xref ref-type="bibr" rid="ref-62">62</xref>]. So, by combining maximum values with sigmoid functions, it effectively captures fine-grained structural details, improving semantic segmentation accuracy. Designed as a single-layer model trainable in a supervised framework, it operates within a weight-learning block consisting of two MaxSigLayer layers, followed by batch normalization and ReLU activation. Batch normalization is crucial since the layer&#x2019;s weights, initially randomized within [0, 1], are compressed by the Sigmoid function, leading to potential information loss; without it, the weights stop changing after a few iterations, limiting the network&#x2019;s learning capacity. In addition, a combination of a rectified linear unit (ReLU) activation and a batch normalization (BN) layer is commonly represented as a unified function [<xref ref-type="bibr" rid="ref-63">63</xref>]. Experimental evaluations showed that integrating MaxSigLayer within the encoding or preprocessing stages of a U-Net model significantly improved performance. Many works indicate that different preprocessing approaches can improve image quality depend on the complexity of image data, e.g., image normalization procedures [<xref ref-type="bibr" rid="ref-64">64</xref>], intensity inhomogeneity correction and normalization [<xref ref-type="bibr" rid="ref-65">65</xref>], image resizing [<xref ref-type="bibr" rid="ref-66">66</xref>], contrast enhancement [<xref ref-type="bibr" rid="ref-67">67</xref>], color unmixing and morphological operators [<xref ref-type="bibr" rid="ref-68">68</xref>]. Furthermore, the extended MaxSigNet architecture, which incorporates dilated convolutional layers and edge information maps, demonstrated superior generalization, outperforming state-of-the-art cell segmentation methods. Ablation studies confirmed MaxSigNet&#x2019;s robustness, revealing that even individual network blocks contributed significantly to segmentation accuracy, highlighting its effectiveness in refining segmentation boundaries and its adaptability for broader medical imaging applications.</p>
<p>Deep learning-based approaches for edge detection have significantly improved edge detection by integrating hierarchical feature representations to better detect edges of varying sizes and shapes as well as edge density estimation [<xref ref-type="bibr" rid="ref-69">69</xref>]. This task has received significant attention due to its importance in a variety of high-level vision tasks, including semantic segmentation. These approaches are broadly categorized into two main groups [<xref ref-type="bibr" rid="ref-3">3</xref>]: Holistically-nested Edge Detection (HED)-based approaches, which utilize deep supervision to enhance multi-scale feature extraction [<xref ref-type="bibr" rid="ref-70">70</xref>,<xref ref-type="bibr" rid="ref-71">71</xref>], and Feature Pyramid Networks (FPN)-based approaches, which employ feature pyramid networks to aggregate multi-level features. Both strategies are intended to improve edge detection accuracy by incorporating multi-scale context, however, they differ in how they handle feature fusion and refinement [<xref ref-type="bibr" rid="ref-72">72</xref>,<xref ref-type="bibr" rid="ref-73">73</xref>]. HED-based and FPN-based methods primarily focus on multi-scale feature extraction and aggregation for edge detection, but often overlook the limitations of fine-scale branches, leading to increased false positives and suboptimal fusion performance. HED-based approaches, such as those by [<xref ref-type="bibr" rid="ref-74">74</xref>] in 2017, [<xref ref-type="bibr" rid="ref-75">75</xref>] in 2022, [<xref ref-type="bibr" rid="ref-76">76</xref>] in 2024, and [<xref ref-type="bibr" rid="ref-77">77</xref>,<xref ref-type="bibr" rid="ref-78">78</xref>] in 2025, employ deep supervision mechanisms and dilated convolutions to enhance multi-scale representation. FPN-based methods, like those by [<xref ref-type="bibr" rid="ref-79">79</xref>] in 2022, [<xref ref-type="bibr" rid="ref-80">80</xref>] in 2023, [<xref ref-type="bibr" rid="ref-81">81</xref>] in 2024, and [<xref ref-type="bibr" rid="ref-82">82</xref>,<xref ref-type="bibr" rid="ref-83">83</xref>] in 2025, use feature pyramid networks to aggregate hierarchical features but may lose fine-level details due to up-sampling artifacts. In contrast [<xref ref-type="bibr" rid="ref-3">3</xref>], FCL-Net addresses this limitation by enhancing fine-scale feature learning with high-level semantic cues. It introduces a top-down attentional guiding (TAG) module and a pixel-level weighting (PW) module, ensuring fine-scale branches accurately refine predictions. Unlike [<xref ref-type="bibr" rid="ref-84">84</xref>,<xref ref-type="bibr" rid="ref-85">85</xref>], which combines features in two ways: using additive fusion to refine details from different layers or applying a dilated pyramid pooling layer with a multi-scale fusion module to blend fine details with deeper, more abstract features, FCL-Net employs an LSTM-based connection to directly encode semantic information into fine-scale learning, overcoming long propagation path issues. This approach not only refines fine-scale predictions but also effectively integrates multi-scale information, leveraging both deep supervision and pyramid aggregation strategies.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Coarse-Scale Feature Representation Analysis</title>
<p>A point cloud model based on LightConvPoint [<xref ref-type="bibr" rid="ref-86">86</xref>] was trained in [<xref ref-type="bibr" rid="ref-6">6</xref>] until the training loss converged, utilizing various hyperparameters such as random point sampling, mini-batches, and the Adam optimizer. Training samples were augmented with random noise, rotations, flipping, elastic transformations, and anisotropic scaling, with point cloud processing handled using the MorphX package. For dendrite semantic segmentation, a coarse-level model was employed to distinguish between dendrites, axons, and somas, by training and testing on high-resolution surface segmentation. A grid search using fixed parameters from the coarse-level model was performed to evaluate the impact of point number and context radius on dendritic inference. The coarse-level morphology model was trained with a batch size of 4, using Dice Loss with class weights (dendrite: 2, combined axon and soma: 1), the Adam optimizer, and an initial learning rate of 2 &#x00D7; 10<sup>&#x2212;3</sup> with a scheduler step size of 100 and a decay rate of 0.996. Input points were normalized to a unit sphere to ensure consistency in training. In [<xref ref-type="bibr" rid="ref-7">7</xref>], the proposed traditional model-based cardiac shape detection method is proposed to emphasize computational efficiency, which enhanced interactive performance, especially with 4-D data. It achieved robustness and noise insensitivity without sacrificing accuracy by gradually reducing model smoothness. The process started with a coarse initial model to capture the approximate surface shape and to detect shape boundaries, which is then refined for increased extraction accuracy.</p>
<p>However, challenges in accurately distinguishing organ boundaries can significantly degrade segmentation accuracy, posing a major limitation in clinical applications. In , a fusion-based U-Net model is proposed to segment lesions in breast ultrasound images, where a fusion block is utilized to represent the generated features, including different lesion sizes, aggregated coarse-to-fine information, and high-resolution edge data within the U-Net architecture. This block is implemented using four key units: (1) a feature-capturing unit that detects various lesion sizes using Atrous Spatial Pyramid Pooling (ASPP) to extract multiscale features, (2) a cascade feature fusion unit that aggregates coarse-to-fine information and high-resolution edge data, (3) a contour-deblurring unit that enhances sharp edge features to reduce boundary blurring, and (4) a refining convolution unit that further processes the outputs of the previous two units to capture the most relevant features for breast density segmentation. Following these units, a clustering-based superpixel algorithm is applied to address noise reduction challenges while preserving boundary context, ensuring more accurate lesion segmentation.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Multi-Scale Feature Representation Analysis</title>
<p>Multi-scale feature representation plays a crucial role in enhancing the clinical diagnostic accuracy of tumor boundary segmentation by enabling models to capture both fine-grained anatomical details and global contextual cues [<xref ref-type="bibr" rid="ref-87">87</xref>,<xref ref-type="bibr" rid="ref-88">88</xref>]. For instance, in glioma or brain tumor segmentation, precise delineation of tumor subregions, such as the enhancing core, edema, and necrotic core, is critical for surgical planning and radiotherapy targeting. Models like MVSI-Net and DMSA-UNet, which integrate multi-scale attention and feature interaction modules, have demonstrated improved performance in capturing complex tumor morphologies [<xref ref-type="bibr" rid="ref-31">31</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>]. Clinical studies such as the BraTS Challenge have shown that deep learning models incorporating multi-scale architectures significantly reduce inter-observer variability and improve Dice similarity coefficients in comparison to manual annotation, directly impacting treatment planning and response monitoring. Similarly, in breast ultrasound imaging, multi-scale fusion approaches have improved boundary localization of malignant lesions, aiding in more accurate BI-RADS scoring and biopsy decision-making [<xref ref-type="bibr" rid="ref-89">89</xref>,<xref ref-type="bibr" rid="ref-90">90</xref>]. Incorporating such real-world validations or referencing standardized datasets with proven clinical utility strengthens the translational relevance of the proposed architectural strategies.</p>
<p>Also, multi-scale feature representation enhances segmentation accuracy by integrating coarse-to-fine feature maps, simultaneously addressing challenges presented by low-resolution imaging, lesion variability and textural complexity. A fusion-based U-Net architecture approach to these challenges is based on (1) convolutional operations for spatial feature extraction, (2) shallow blocks for early processing, (3) deep blocks for semantic enhancement, and (4) skip connections to preserve fine-grained details, ensuring improved semantic segmentation performance as explored in the following sub-sections. Sample images illustrating fine-scale and coarse-scale features are shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Illustration of fine-scale and coarse-scale feature generation within U-Net architectures for image semantic segmentation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-2.tif"/>
</fig>
<sec id="s2_3_1">
<label>2.3.1</label>
<title>Convolution Operations</title>
<p>The convolutional operations explored in the following studies contain several key advancements in feature extraction and representation learning. Traditional convolution operations focus on local feature extraction, while dilated (Atrous) convolutions and Atrous Spatial Pyramid Pooling (ASPP) expand the receptive field without increasing computational complexity. Multiscale feature extraction and attention mechanisms, such as Squeeze and Excitation (SE) and Self-Attention (SA), enhance the model&#x2019;s ability to capture both local and global contextual information. Hybrid and parallel convolution techniques combine different architectures like CNN and Transformer to utilize the capabilities of both. Residual connections and feature concatenation further improve performance by preserving important features and handling issues like gradient vanishing. Specialized methods for medical image segmentation, such as FAM-U-Net and CSAP-UNet, apply these techniques to enhance segmentation accuracy. Lastly, future directions include optimizing convolutional architectures for efficiency, integrating self-supervised learning, and developing adaptive convolution methods for better resolution preservation and receptive field expansion.</p>
<p><bold><italic>&#x2013;&#x2002;Enhanced Convolution with Residual Connections</italic></bold></p>
<p>An attention-guided residual convolution method (AG-residual) has been introduced to enhance the conventional convolution operation by addressing the gradient vanishing problem and preserving high-resolution spatial details [<xref ref-type="bibr" rid="ref-16">16</xref>]. This approach improves the performance of U-Net models by generating more effective feature maps, even as network depth increases. The AG-residual module consists of two 3 &#x00D7; 3 convolution layers, each followed by batch normalization and ReLU activation. Batch normalization handles internal covariate shifts and regularizes the U-Net model, while ReLU introduces nonlinearity. A shortcut residual connection using a 1 &#x00D7; 1 convolution is applied as an identity mapping, ensuring the preservation of essential features. To further refine these feature maps, a hybrid Triple Attention Module (TAM) is employed, combining spatial, channel-based, and squeeze-and-excitation attention mechanisms to emphasize relevant contextual information. Additionally, a squeeze-and-excitation-based Atrous spatial pyramid pooling (SE-ASPP) module extends the receptive field of convolution filters, capturing semantic information across multiple scales. Together, these modules enhance the model&#x2019;s ability to capture fine-grained details and maintain contextual relevance, making the AG-residual method highly effective for feature extraction in deep neural networks.</p>
<p><bold><italic>&#x2013;&#x2002;Dilated/Atrous Convolutions</italic></bold></p>
<p>Atrous convolution (AC), also known as convolution with up-sampled filters or dilated convolution, is a technique that controls the convolution&#x2019;s field of view through a parameter called the rate. The rate determines the spacing between filter coefficients, where a rate of 1 makes Atrous convolution equivalent to a standard convolution. By inserting r&#x2212;1 zeros between filter coefficients (where r is the rate), the filter expands, allowing the convolution to cover a larger receptive field without increasing the number of parameters. This technique is widely used in convolutional neural networks (CNNs) to extract dense features and improve image resolution. Atrous Spatial Pyramid Pooling (ASPP) utilizes Atrous convolution to capture multi-scale contextual information by applying convolutions with different rates, generating feature maps at various scales. For instance, DeepLabv3&#x002B; with a ResNet-50 in [<xref ref-type="bibr" rid="ref-17">17</xref>] backbone employs three parallel Atrous convolutions with rates of 6, 12, and 18, effectively capturing multi-scale features and enhancing the model&#x2019;s capability to extract fine-grained contextual information.</p>
<p><bold><italic>&#x2013;&#x2002;Multiscale Feature Extraction and Attention Mechanisms</italic></bold></p>
<p>In convolutional neural networks (CNNs), the convolution operation captures local information by operating within a defined window of the input image. Conversely, the self-attention (SA) mechanism extracts global information by calculating correlations between tokens (non-overlapping patches in Vision Transformers (ViTs)) across all positions in the image. These complementary approaches, i.e., local feature extraction through CNNs and global context modeling via SA, can enhance feature extraction when combined. However, effectively integrating these modules remains a challenge. To address this, a parallel combination of CNN and SA, known as CSAP-UNet, is introduced in [<xref ref-type="bibr" rid="ref-18">18</xref>], where U-Net serves as the backbone. The encoder of CSAP-UNet consists of two parallel branches: one utilizing CNNs to capture local features and the other employing SA to model global dependencies. This parallel architecture enables the model to incorporate both local and global information, which is particularly important for medical image segmentation. Since medical images often originate from specific frequency bands and exhibit non-uniform color channels, adapting U-Net to account for these characteristics is essential. The Attention Fusion Module (AFM) integrates CNN and SA outputs by applying channel and spatial attention in series, effectively merging local and global information. Additionally, a Boundary Enhancement Module (BEM) is incorporated at the shallow layers of the U-Net to improve boundary segmentation, particularly for medical images where precise localization of lesion regions is critical. This module focuses on enhancing attention to pixel-level edge details, thereby improving the accuracy of semantic segmentation in medical imaging tasks. Another notable advancement is EFFResNet-ViT [<xref ref-type="bibr" rid="ref-91">91</xref>], a hybrid deep learning model that combines EfficientNet-B0 and ResNet-50 CNN backbones with ViTs module to address the limitations of conventional CNNs in modeling global dependencies. This architecture employs a feature fusion strategy to integrate local and global representations, enhancing classification accuracy across diverse medical imaging tasks. Additionally, EFFResNet-ViT emphasizes interpretability, incorporating Grad-CAM for visual explanation and t-SNE for feature space analysis. Evaluations on brain tumor CE-MRI and retinal image datasets demonstrate the model&#x2019;s potential for accurate and interpretable clinical decision support.</p>
<p>FAM-U-Net is an advanced variation of the traditional U-Net architecture [<xref ref-type="bibr" rid="ref-19">19</xref>], designed to enhance the accuracy of medical semantic segmentation and improve retinal fluid detection. This architecture replaces the conventional U-Net encoder with Multiscale Feature Extraction (MFE) modules to capture multi-scale information more robustly. Each MFE block generates feature maps using kernels of different sizes, including dilated convolutions with varying rates (1, 2, 4, and 8), which expand the receptive field while maintaining resolution. This multi-path dilation strategy enables the U-Net network to extract fine-grained and contextually relevant features across multiple scales. To further refine feature representations, Squeeze and Excitation (SE) blocks are incorporated to enhance channel-wise attention, focusing on more discriminative features and improving the model&#x2019;s ability to differentiate key structures in medical images. In the decoder path, FAM-U-Net enhances feature map quality by employing Dilated Atrous Pyramid Pooling Modules (DAPPM), which refine feature maps and integrate outputs from the Convolutional Block Attention Module (CBAM) to improve attention-based fusion. This integration enhances segmentation accuracy, particularly for boundary localization and lesion detection. The U-Net backbone in FAM-U-Net maintains the use of repeated convolution layers, pooling, and attention mechanisms, ensuring the preservation of low-level and high-level features the network. Despite having only 1.4 million trainable parameters, FAM-U-Net demonstrates superior performance over traditional U-Net models by efficiently extracting multiscale features and improving segmentation performance, particularly in scenarios involving irregular and complex structures such as fluid boundaries. Recent advancements such as DCSSGA-UNet [<xref ref-type="bibr" rid="ref-92">92</xref>] address persistent challenges in biomedical image segmentation by enhancing both spatial and semantic feature integration. This architecture combines a DenseNet201 encoder with channel spatial attention and semantic guidance attention modules to selectively focus on discriminative features and reduce redundancy.</p>
<p><xref ref-type="table" rid="table-1">Table 1</xref> outlines a comparative overview of widely adopted convolutional blocks and attention mechanisms designed to enhance feature representation in semantic segmentation models. These modules, including MFE, SE, and ASPP, are designed to capture contextual information across varying spatial scales. Attention-focused blocks such as SE, CBAM, and AFM emphasize salient spatial and channel-wise features, promoting refined and discriminative learning. Meanwhile, components like AC and ASPP improve receptive field expansion without compromising resolution, and BEM contributes to more precise boundary detection. Collectively, these blocks address critical challenges in segmentation tasks, such as multiscale context integration, attention-guided refinement, and boundary preservation, thereby improving model robustness and accuracy across diverse medical imaging datasets.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Module-wise breakdown of convolution operations for enhanced U-Net architectures</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Convolution block</th>
<th align="center">Description</th>
<th align="center">Purpose</th>
<th align="center">Key features</th>
<th align="center">Advantages</th>
</tr>
</thead>
<tbody>
<tr>
<td>MFE</td>
<td>Extract multiscale features using varying kernel sizes and dilation.</td>
<td>Improve accuracy by capturing features at multiple scales.</td>
<td>Multiple kernel sizes, Dilated convolutions, FP channels</td>
<td>Detailed feature extraction and efficient receptive field increase.</td>
</tr>
<tr>
<td>SE</td>
<td>Applies channel attention using Global Average Pooling.</td>
<td>Focus on important channels for improved discrimination.</td>
<td>GAP, Channel-wise attention, Feature refinement</td>
<td>Enhances feature focus and segmentation accuracy.</td>
</tr>
<tr>
<td>CBAM</td>
<td>Applies spatial and channel attention to enhance feature maps.</td>
<td>Focus on relevant spatial/channel features.</td>
<td>Spatial &#x0026; Channel attention, Refinement</td>
<td>Boosts segmentation accuracy with attention.</td>
</tr>
<tr>
<td>AC</td>
<td>Dilated convolutions to expand receptive field without added params.</td>
<td>Capture wider context efficiently.</td>
<td>Rate parameter, dilated convolution</td>
<td>Wider receptive field without resolution loss.</td>
</tr>
<tr>
<td>ASPP</td>
<td>Multiple atrous convolutions at different dilation rates.</td>
<td>Extract multiscale features efficiently.</td>
<td>Dilation rates (6, 12, 18), multiscale extraction</td>
<td>Improves feature capture on complex data.</td>
</tr>
<tr>
<td>AFM</td>
<td>Sequential spatial and channel attention for feature fusion.</td>
<td>Refine features from CNN &#x0026; SA.</td>
<td>Spatial &#x0026; channel attention, fusion</td>
<td>Improves attention-guided segmentation.</td>
</tr>
<tr>
<td>BEM</td>
<td>Enhances edge localization for boundary detail capture.</td>
<td>Improve detection of irregular boundaries.</td>
<td>Pixel-level focus, edge enhancement</td>
<td>Improves precision in complex segmentation.</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_3_2">
<label>2.3.2</label>
<title>Shallow Blocks</title>
<p>Shallow blocks in image semantic segmentation play a crucial role in preserving spatial details and capturing fine-grained structures. Enhancements in shallow block design focus on deepening layers to extract richer semantic information while maintaining boundary integrity and small-region targets. Integration with deep feature representations is facilitated through techniques such as skip connections and multi-scale fusion, enabling a seamless combination of fine-grained details with high-level semantics. Optimization strategies, including smaller anchors in region proposal networks and attention mechanisms, further refine shallow block performance, particularly in small object detection and medical or biological image segmentation. However, challenges such as limited receptive fields and potential overfitting in small target detection necessitate further advancements. Future designs should focus on extending receptive fields, refining fusion strategies, and enhancing computational efficiency, as reviewed in the following.</p>
<p>The work presented in [<xref ref-type="bibr" rid="ref-20">20</xref>] introduces a shallow feature map representation strategy to enhance pest detection using Convolutional Neural Networks (CNNs). In the proposed CNN architecture, convolution layers, batch normalization, and ReLU activation are systematically integrated. A specific strategy is employed for shallow layers, where increasing the depth of these layers enables the extraction of richer semantic information, while reducing deep blocks helps preserve spatial details. This design ensures that sufficient semantic features are captured before positional data is lost in deeper layers. To facilitate small object detection, such as pests, a region proposal network adapted from Faster R-CNN [<xref ref-type="bibr" rid="ref-93">93</xref>] is utilized with smaller anchor sizes. Most of the region proposals are generated from shallow layers, guided by two key considerations: (1) deeper shallow layers can extract meaningful semantic features that are critical for classification, and (2) retaining spatial information in the lower layers prevents the loss of important features that may occur in deeper layers. The proposed method uses a ResNet-50 backbone and visualizes feature maps generated by both the proposed approach and a Feature Pyramid Network (FPN) [<xref ref-type="bibr" rid="ref-94">94</xref>] with a global attention module to validate its effectiveness. These visualizations, spanning from shallow to deep layers, reveal that shallow layers in the proposed approach are less affected by background noise compared to FPN, where background interference is more prominent. As the network progresses deeper, it learns to accurately focus on small object locations, such as pests. The proposed globally activated feature pyramid network effectively highlights object regions through lighter activation points while minimizing attention to non-object areas, demonstrating superior performance in small object detection.</p>
<p>In [<xref ref-type="bibr" rid="ref-23">23</xref>], a Shallow Feature Supplement Module is introduced to enhance the extraction of fine-grained semantic features by up-sampling shallow semantic information. In U-Net architecture, features extracted at different stages carry distinct types of semantic information, in which shallow layers capture more concrete spatial details, while deeper layers encode more abstract semantic representations. To optimize the integration of these features, Feature Supplement and Optimization U-Net (FSOU-Net) is proposed for medical image semantic segmentation, where shallow and deep features are processed separately and optimized for improved performance. In conventional U-Net models, the encoder down-samples input images using max-pooling layers, which reduces the scale of semantic features but often leads to the loss of fine-grained information. This loss is particularly detrimental for tasks requiring precise segmentation of target object boundaries. To address this challenge, FSOU-Net employs a multi-scale shallow feature supplementation technique that enhances the extraction of fine-grained semantic details from shallow layers. This approach improves the model&#x2019;s overall feature representation by preserving spatial information, including target locations and contours. By supplementing fine-grained shallow semantic information and optimizing deep feature representations, FSOU-Net demonstrates improved segmentation performance, especially in boundary detection, compared to the original U-Net. The model&#x2019;s ability to retain fine spatial details while effectively processing deeper semantic information contributes to its enhanced accuracy in medical image segmentation tasks.</p>
<p>In [<xref ref-type="bibr" rid="ref-25">25</xref>], a shallow feature map is introduced in U-Net architecture to capture essential information, particularly focusing on the boundaries of target objects and the global characteristics of small targets. Shallow feature maps, extracted from the early layers of the U-Net encoder, not only preserve fine-grained spatial details that are often critical for accurately delineating object boundaries and identifying small structures, but also maintain detailed spatial information that would otherwise be lost during down-sampling in deeper layers.</p>
<p>In [<xref ref-type="bibr" rid="ref-14">14</xref>], a model called PAMSNet is introduced for medical image lesion segmentation, designed to enhance feature extraction at shallow stages and improve overall segmentation performance. To achieve this, two key modules are incorporated: 1) Efficient Pyramid Split Attention (EPSA) Module: Integrated into the encoding stage, this module leverages multi-scale feature maps to facilitate pyramidal information fusion. By extracting fine-grained spatial information and enriching contextual details, EPSA enhances the model&#x2019;s capacity to capture critical features for improved lesion segmentation. 2) Spatial Pyramid-Coordinate Attention (SPCA) Module: Placed in the bottleneck layer, SPCA performs weighted feature fusion from different spatial locations. This mechanism improves PAMSNet&#x2019;s ability to focus on key features, capturing fine details of the lesion, texture characteristics, and semantic information in medical images. Additionally, SPCA emphasizes edge and detail information, further enhancing segmentation accuracy. By integrating these modules, PAMSNet refines feature representation and segmentation precision, particularly for capturing lesion boundaries and intricate details in medical images.</p>
</sec>
<sec id="s2_3_3">
<label>2.3.3</label>
<title>Deep Blocks</title>
<p>Deep blocks in semantic segmentation models can be categorized based on their functionality and architectural design. Basic convolutional blocks primarily aid low-level feature extraction, while inception-based blocks employ multiple kernel sizes to capture diverse spatial features. Attention-enhanced blocks, such as Dual Attention and Multi-Scale Attention mechanisms, refine feature selection by leveraging both spatial and channel-wise information. Dense connection blocks, including DenseNet and Hybrid Dense-Inception architectures, improve gradient flow and feature reuse, promoting more efficient learning. Residual blocks, through skip connections, enable the training of deeper networks by reducing vanishing gradient issues. Transformer-based blocks model long-range dependencies via self-attention, while hybrid CNN-Transformer architectures synergistically integrate convolutional inductive biases with global attention mechanisms. Additionally, efficient convolutional blocks, such as depthwise separable convolutions, enhance computational efficiency without compromising performance. These categorizations underscore the continuous evolution of deep learning strategies aimed at improving semantic segmentation accuracy and efficiency in medical and biological imaging as follows.</p>
<p><bold><italic>&#x2013;&#x2002;Transformer-Based Blocks</italic></bold></p>
<p>CI-UNet architecture is proposed for medical image segmentation in [<xref ref-type="bibr" rid="ref-27">27</xref>], which can address the limitations of existing ConvNet and Transformer-based models. The architecture leverages ConvNeXt as its encoder while combining the computational efficiency of CNNs with the superior feature extraction capabilities of Transformers. A key component of CI-UNet is the integration of a four-branch interactive attention module, which captures complex cross-dimensional interactions while incorporating global spatial context. This advanced attention mechanism enhances deep feature representation by simultaneously considering spatial and channel dependencies, which can effectively overcome the attention gaps present in traditional approaches. As a result, CI-UNet demonstrates improved segmentation performance by refining feature extraction and maintaining rich contextual information.</p>
<p><bold><italic>&#x2013;&#x2002;Inception-Based Blocks</italic></bold></p>
<p>In [<xref ref-type="bibr" rid="ref-29">29</xref>], DIU-Net (Dense-Inception U-Net) is proposed to improve segmentation performance across different medical imaging modalities, including retinal blood vessels, lung CT images, and brain tumor MRI scans. DIU-Net is built on the U-Net framework and integrates elements from GoogleNet&#x2019;s Inception-Res module and DenseNet, enhancing both the encoder and decoder paths by incorporating Inception modules, dense connections. The architecture introduces two key components: (1) Inception-Res Block: A modified residual Inception module aggregates feature maps from kernels of different sizes, allowing the network to capture multi-scale features. Inclusion of residual connections enhances learning efficiency and handles the gradient vanishing problem. (2) Dense-Inception Block: This block combines Inception modules with dense connections, making the network deeper and wider while preventing gradient vanishing and redundant computations. Batch normalization is applied after each convolution to enhance learning. The middle section of DIU-Net integrates additional Inception layers within the Dense-Inception block, increasing feature complexity while optimizing computational efficiency. These architectural modifications enable DIU-Net to process complex medical image segmentation tasks while maintaining computational feasibility.</p>
<p><bold><italic>&#x2013;&#x2002;Attention-Enhanced Blocks</italic></bold></p>
<p>In [<xref ref-type="bibr" rid="ref-31">31</xref>], MVSI-Net is proposed to enhance feature extraction and segmentation performance by combining a Multi-View Attention (MVA) framework and a Multi-Scale Feature Interaction (MSI) module. Shallow networks primarily capture low-level features, which limit segmentation and detection accuracy, while deeper networks provide better semantic understanding. To address these challenges, MVSI-Net integrates MVA in the final two layers of both the encoder and decoder of the U-Net architecture. The MVA framework refines feature representations by focusing on lesion-related regions, reducing redundancy, and improving target localization. Additionally, the MSI module, incorporated at the bottleneck layer, captures scale-specific features, enabling accurate segmentation of tumor boundaries across varying receptive fields. By combining the MVA framework and MSI module, MVSI-Net effectively integrates attention mechanisms and cross-dimensional feature interactions, enabling precise lesion localization and improving semantic segmentation accuracy for MRI brain tumors.</p>
<p>In [<xref ref-type="bibr" rid="ref-32">32</xref>], DATTNet, a segmentation model designed with deep blocks such as the Dual Attention Module (DAM) and Context Fusion Bridge, is proposed to enhance medical image segmentation. The encoder of DATTNet consists of six stages, each employing a VGG16 sub-block (Conv1&#x2013;Conv6) to progressively extract multi-scale features. The feature maps generated at each stage, containing local information from VGG16, are processed by the DAM, which integrates both Efficient Channel Attention and Spatial Attention to capture global and local feature dependencies. This dual attention mechanism allows the network to focus on relevant features while minimizing redundancy. The Context Fusion Bridge, positioned between the fourth and fifth stages of the encoder and decoder, models correlations between multi-scale features, enabling the fusion of global and local contextual information. To ensure effective integration, the context fusion bridge uses residual addition. Additionally, the decoder incorporates an up-sampling module that doubles the spatial resolution of feature maps while reducing the number of channels, preserving spatial details during reconstruction. By combining these deep network blocks, DATTNet effectively enhances feature representation and achieves high segmentation accuracy in medical image analysis.</p>
<p>IEA-Net is designed to extract both internal and external correlation features from medical images, significantly improving semantic segmentation performance while minimizing computational complexity [<xref ref-type="bibr" rid="ref-33">33</xref>]. The architecture integrates several advanced deep network modules to optimize feature representation. Initially, the input tensor undergoes layer normalization and is processed by the Local-Global Gaussian Weighted Self-Attention (LGGW-SA) module, which prioritizes local regions over distant ones to enhance model performance and reduce computational overhead. The output of LGGW-SA is combined with the input tensor through a skip connection, forming an intermediate feature map. This intermediate map is further refined by the external attention (EA) module, which strengthens inter-sample correlations, producing a second intermediate feature map. A subsequent skip connection merges these intermediate maps to generate the final output of the IEAM module. To prevent feature loss during initial feature extraction, the ICSwR (&#x201C;interleaved convolutional system with residual&#x201D;) module is employed, which offers improved performance compared to conventional convolution operations, playing a critical role in maintaining segmentation accuracy. The EA module, placed after LGGW-SA, enhances the model&#x2019;s capability to capture inter-sample correlations, and its absence results in significant performance degradation, underscoring its importance. By combining these specialized modules, i.e., the IEAM, LGGW-SA, ICSwR, and EA, IEA-Net effectively focuses on essential features, improving segmentation accuracy across multiple datasets. This comprehensive approach balances extraction, attention mechanisms, and computational efficiency, setting a new benchmark for medical image semantic segmentation models.</p>
<p>DMSA-UNet is a U-shaped architecture that integrates CNNs and Transformers to enhance segmentation performance, proposed in [<xref ref-type="bibr" rid="ref-34">34</xref>]. The model introduces a Dual Multi-Scale Attention (DMSA) mechanism that improves global attention while maintaining computational efficiency. DMSA leverages multi-scale keys and values to capture richer feature representations, followed by multi-scale spatial attention and multi-scale channel attention to facilitate comprehensive spatial and channel interactions. These mechanisms operate with linear complexity while preserving critical spatial information, ensuring an optimal balance between feature diversity and computational efficiency.</p>
<p>Additionally, DMSA-UNet replaces the context-gated linear unit with a feed-forward network, enabling non-linear representations with localized attention, which further refines feature extraction. Unlike Swin-UNet [<xref ref-type="bibr" rid="ref-10">10</xref>], DMSA-UNet eliminates the deepest convolutional block in the U-Net architecture, reducing noise and enhancing segmentation accuracy. By combining CNNs with Transformer-based multi-scale attention and integrating DMSA into the U-Net framework, DMSA-UNet effectively improves semantic segmentation performance, particularly in capturing fine-grained details while maintaining spatial consistency.</p>
<p><bold><italic>&#x2013;&#x2002;Residual-Enhanced Blocks</italic></bold></p>
<p>In [<xref ref-type="bibr" rid="ref-30">30</xref>], Attention-Inception-Residual U-Net (AIR-UNet) is proposed to address the challenges posed by the variability in tumor characteristics across different imaging modalities, particularly for MRI brain tumor segmentation. AIR-UNet enhances feature propagation and accelerates network convergence by incorporating Inception and Residual blocks into the U-Net architecture. These blocks facilitate the extraction of complex tumor features while maintaining a deep and efficient network. To further refine the segmentation, an attention mechanism is introduced, enabling the model to focus on critical tumor regions, thereby improving segmentation accuracy. AIR-UNet demonstrates superior feature propagation capabilities, effectively handling the vanishing gradient problem and enhancing segmentation performance across key tumor regions, including the whole tumor, tumor core, and enhancing tumor.</p>
<p><xref ref-type="table" rid="table-2">Table 2</xref> provides a comparative overview of deep block strategies employed in recent U-Net-based models designed for medical image semantic segmentation. Each model introduces distinct architectural components, ranging from inception-residual and dense-inception blocks to hybrid CNN-transformer frameworks, that enhance representation learning. The integration of advanced attention mechanisms, such as cross-dimensional, self-attention, and internal-external correlation learning, facilitates more precise spatial and semantic feature extraction. These innovations collectively improve segmentation accuracy, particularly in delineating complex structures like lesions and tumors, while also addressing challenges related to computational efficiency, feature fusion, and multi-scale context preservation.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Comparative overview of deep block strategies for medical image semantic segmentation</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Model</th>
<th align="center">Key innovations</th>
<th align="center">Architecture</th>
<th align="center">Attention mechanisms</th>
<th align="center">Feature enhancements</th>
</tr>
</thead>
<tbody>
<tr>
<td>PAMSNet [<xref ref-type="bibr" rid="ref-14">14</xref>]</td>
<td>Combination of EPSA &#x0026; SPCA modules</td>
<td>U-Net variant</td>
<td>Weighted feature fusion</td>
<td>Enhances fine-grained spatial &#x0026; semantic feature extraction focusing on lesion features</td>
</tr>
<tr>
<td>CI-UNet [<xref ref-type="bibr" rid="ref-27">27</xref>]</td>
<td>Four-branch interactive attention module</td>
<td>ConvNeXt-based U-Net</td>
<td>Cross-dimensional attention</td>
<td>Captures global spatial context while maintaining computational efficiency and overcomes attention gaps</td>
</tr>
<tr>
<td>DIU-Net [<xref ref-type="bibr" rid="ref-29">29</xref>]</td>
<td>Inception-Res and Dense-Inception blocks</td>
<td>U-Net with Inception &#x0026; DenseNet</td>
<td>Multi-scale feature aggregation</td>
<td>Prevents gradient vanishing, enhances computational efficiency across multiple modalities</td>
</tr>
<tr>
<td>AIR-UNet [<xref ref-type="bibr" rid="ref-30">30</xref>]</td>
<td>Inception and Residual blocks with attention</td>
<td>U-Net variant</td>
<td>Attention-based refinement</td>
<td>Enhances feature propagation &#x0026; network convergence for MRI brain tumors</td>
</tr>
<tr>
<td>MVSI-Net [<xref ref-type="bibr" rid="ref-31">31</xref>]</td>
<td>Combination of MVA &#x0026; MSI modules</td>
<td>U-Net variant</td>
<td>Lesion-focused attention</td>
<td>Improves boundary delineation and segmentation accuracy with cross-dimensional interactions</td>
</tr>
<tr>
<td>DATTNet [<xref ref-type="bibr" rid="ref-32">32</xref>]</td>
<td>DAM &#x0026; Context Fusion Bridge Combination</td>
<td>VGG16-based encoder</td>
<td>Combination of ECA &#x0026; SA</td>
<td>Captures local and global multi-scale feature dependencies</td>
</tr>
<tr>
<td>IEA-Net [<xref ref-type="bibr" rid="ref-33">33</xref>]</td>
<td>Combination of LGGW-SA &#x0026; EA modules</td>
<td>Custom network</td>
<td>Internal-external correlation learning</td>
<td>Reduces computational complexity while improving accuracy with inter-sample feature learning</td>
</tr>
<tr>
<td>DMSA-UNet [<xref ref-type="bibr" rid="ref-34">34</xref>]</td>
<td>DMSA</td>
<td>CNN-Transformer hybrid</td>
<td>Combination of MSSA &#x0026; MSCA</td>
<td>Preserves spatial information while improving global feature extraction with multi-scale attention</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s2_3_4">
<label>2.3.4</label>
<title>Skip Connections</title>
<p>Skip connection methods in deep learning can be broadly categorized into traditional, enhanced, attention-enhanced, transformer-integrated, advanced, and efficient designs. Traditional skip connections maintain spatial information by linking encoder and decoder layers (e.g., U-Net, U-Net&#x002B;&#x002B;), while enhanced versions like Redesigned Full-Scale Skip Connections (RFSC) capture both fine-grained and coarse-grained features. Attention-based approaches, such as non-local and Mamba-based skip connections, focus on important features and suppress noise. Transformer integration, including self-attention mechanisms and feature integration blocks, improves long-range dependencies and feature fusion. Advanced designs incorporate up-sampling, down-sampling, and multi-level skip connections for refined information flow, while efficient strategies reduce computational complexity through dimensionality reduction and parallel operations, optimizing performance without excessive computational cost.</p>
<p><bold><italic>&#x2013;&#x2002;Dense-Enhanced Skip Connections</italic></bold></p>
<p>The SenseNet architecture integrates dense blocks with skip connections to enhance neural network efficiency by reducing computational overhead and memory consumption [<xref ref-type="bibr" rid="ref-35">35</xref>]. By establishing direct connections between early and later layers, SenseNet moderates exponential parameter growth, helps more effective training, and accelerates inference. Within the decoder pathway, deeper feature extraction minimizes reliance on early-layer representations, optimizing hierarchical feature learning. To further regulate feature map complexity and prevent information loss, SenseNet includes a DenseNet-BC structure, leveraging bottleneck skip connections inspired by DenseBlock [<xref ref-type="bibr" rid="ref-95">95</xref>]. This design strategy ensures efficient memory utilization while maintaining robust feature propagation throughout the network.</p>
<p>A dense skip connection mechanism [<xref ref-type="bibr" rid="ref-37">37</xref>] is introduced in the U-Net architecture to enhance the preservation of spatial details typically lost during encoding. Unlike conventional U-Net skip connections, which link each decoder layer to a single corresponding encoder layer, the proposed approach fuses information from both the symmetrical encoder layer and all preceding higher-level encoder layers. This fusion ensures that each decoder layer retains both fine-grained details and high-level semantic features through pixel-wise addition. To align feature map dimensions and channels, max pooling and convolution operations are applied before concatenation with upsampled decoder features. The resulting fused feature maps undergo additional convolutional refinement, improving spatial context retention and facilitating multi-scale feature reuse. Mathematically, these dense skip connections integrate feature representations through a series of convolution, pooling, up-sampling, and concatenation operations, forming the foundation of the proposed multi-scale context-aware network architecture.</p>
<p><bold><italic>&#x2013;&#x2002;Skip Connections with Enhanced Transformer Integration</italic></bold></p>
<p>SWTRU in [<xref ref-type="bibr" rid="ref-39">39</xref>], a symmetric U-shaped network integrating U-Net and Transformer architectures, is proposed to enhance multi-scale feature fusion for medical image segmentation. It employs a Redesigned Full-Scale Skip Connection (RFSC) to effectively capture both fine-grained and coarse-grained spatial features. The encoder, based on a CNN framework, progressively downsamples input images through repeated convolutions, ReLU activations, and max-pooling. At the bottleneck, feature maps are partitioned into non-overlapping patches and processed using a Star-Shaped Window Transformer Block, enabling global self-attention while minimizing computational complexity. To address the increased parameter burden from RFSC and Transformer components, a Filtering Feature Integration Mechanism (FFIM) is introduced, optimizing efficiency by selectively integrating shallow and deep semantic features. The decoder utilizes a Linear Integration Layer to merge feature representations, restore spatial resolution, and generate precise segmentation outputs. By expanding attention regions and improving feature interactions while reducing parameter complexity, SWTRU presents an efficient and scalable solution for high-accuracy medical image segmentation.</p>
<p>For the proposed SIB-UNet model in [<xref ref-type="bibr" rid="ref-40">40</xref>], a skip connection structure is introduced within the U-Net model to aid the decoder in recovering spatial information lost during pooling. While traditional skip connections help bridge this gap, alternative approaches such as ResPath [<xref ref-type="bibr" rid="ref-96">96</xref>,<xref ref-type="bibr" rid="ref-97">97</xref>] and multi-scale fusion [<xref ref-type="bibr" rid="ref-98">98</xref>,<xref ref-type="bibr" rid="ref-99">99</xref>] have been developed to enhance semantic information transfer. However, these methods often rely on additional convolutional layers, increasing the risk of overfitting, particularly in small medical image datasets. To address this challenge, the information bottleneck fusion module is proposed as a skip connection strategy that selectively compresses features, retaining only the most relevant semantic information and reducing overfitting. Established in Information Bottleneck theory, this approach filters out irrelevant features during training, ensuring that only essential information is preserved. Furthermore, the incorporation of the variational information bottleneck modulerefines feature learning through variational inference, effectively managing high-dimensional data. This method enhances the transfer of meaningful semantic features across network layers, improving performance in medical image analysis while reducing overfitting and semantic inconsistencies.</p>
<p>The USCT-UNet architecture [<xref ref-type="bibr" rid="ref-41">41</xref>] extends the traditional U-Net to address the semantic gap between the encoder and decoder in segmentation tasks. Instead of conventional direct skip connections, it introduces a U-shaped skip connection (USC) that leverages multichannel feature transformation (MCFT) to refine feature representations and address semantic inconsistencies. The process begins with an input image passing through the encoder, generating feature maps that are embedded and processed within the USC for semantic disambiguation. Concurrently, the highest-level encoder features undergo pooling and convolution before being fed into the decoder to generate additional feature maps. The decoder then integrates outputs from both the USC and its own layers through a spatial-channel cross-attention module, effectively fusing multiscale features to enhance fine-detail recovery. The severity of the semantic gap is managed by adjusting the number of MCFT blocks within the USC, with a higher number employed for greater semantic disparities. The final segmentation output is obtained by applying a convolution operation to the decoder&#x2019;s output. This approach strengthens feature fusion, improves semantic consistency, and enhances segmentation accuracy.</p>
<p><bold><italic>&#x2013;&#x2002;Mamba-Based and U-Shaped-Based for Attention-Enhanced Skip Connections</italic></bold></p>
<p>A Mamba-based skip-connection approach is introduced in [<xref ref-type="bibr" rid="ref-42">42</xref>], leveraging Mamba&#x2019;s capability for long-sequence feature learning within the UNet&#x002B;&#x002B; framework to enhance both high- and low-level feature extraction. Unlike traditional parallel Mamba operations, this method integrates skip connections into UNet&#x002B;&#x002B; using the parallel vision Mamba (PVM) layer. This modification significantly reduces the computational burden, achieving an 86.90% reduction in floating-point operations (FLOPs [<xref ref-type="bibr" rid="ref-100">100</xref>]) and a 79.01% decrease in parameters compared to the original UNet&#x002B;&#x002B; architecture. The PVM layer, central to SK-VM&#x002B;&#x002B;, partitions input features into smaller channels, optimizing computational efficiency as channel numbers increase. The architecture follows a multi-stage design, where each stage comprises multiple PVM layers, and lower-stage features are fused with upsampled features from higher stages. This hierarchical integration not only alleviates computational overhead but also improves segmentation accuracy. Further refinement is achieved through multi-scale supervised learning with a LossNet model [<xref ref-type="bibr" rid="ref-101">101</xref>], which enhances performance by adapting to varying lesion sizes in medical images. Ultimately, SK-VM&#x002B;&#x002B; presents a lightweight yet effective solution for medical image segmentation, balancing computational efficiency with improved segmentation precision.</p>
<p>A hybrid model integrating U-Net and Mask R-CNN is proposed in [<xref ref-type="bibr" rid="ref-43">43</xref>] for brain MRI semantic and instance segmentation. The model incorporates skip connections within a symmetric encoder-decoder structure, similar to the U-Net architecture, to capture both global semantic information and fine-grained feature details essential for accurately identifying small, irregularly shaped tumors. The U-Net component is optimized for semantic segmentation by fusing low-level encoder features with high-level decoder features, enabling precise delineation of the tumor core even in complex cases. Additionally, the Mask R-CNN framework [<xref ref-type="bibr" rid="ref-54">54</xref>], utilizing a region proposal network block with a pre-trained ResNet-50 backbone [<xref ref-type="bibr" rid="ref-102">102</xref>], is employed for instance segmentation. This component generates pixel-wise tumor and edema segmentations, assigning class labels and confidence scores while effectively distinguishing tumors from overlapping background tissues. This dual-architecture approach enhances segmentation accuracy by combining the strengths of semantic and instance segmentation techniques.</p>
<p>UTSN-Net, a U-Net-based model introduced in [<xref ref-type="bibr" rid="ref-44">44</xref>], enhances feature extraction and semantic segmentation by integrating convolutional operations with a deep-layer encoder and a skip non-local attention (SN) module. During encoding, convolutional layers extract low-level features with high-resolution contextual information, which are processed by the SN module to suppress noise while preserving spatial accuracy. A deep Transformer mechanism further enhances feature representation by capturing global context, which is then integrated into deeper feature maps. These globally enriched deep features are combined with shallow, high-resolution features through up-sampling and concatenation operations. The SN module, built on a non-local attention mechanism, refines skip connections by applying attention weights to emphasize critical features and suppress irrelevant information. Within this module, shallow feature maps undergo 1 &#x00D7; 1 convolution to generate query, key, and value matrices, which are used to compute attention scores. These scores capture pixel-wise correlations across the feature map, and their weighted values are used to produce an attention-enhanced feature representation. The resulting feature map, containing both fine-grained spatial details and high-level semantic information, is fused with deeper network features, improving segmentation accuracy by enhancing focus on the region of interest.</p>
<p><xref ref-type="table" rid="table-3">Table 3</xref> presents a comparative analysis of recent deep learning models that integrate advanced skip connection designs and feature fusion strategies for medical image semantic segmentation. These models build upon the foundational U-Net architecture by incorporating components such as dense skip connections, variational information bottlenecks (VIB), and multi-scale attention mechanisms to enhance spatial detail preservation, semantic consistency, and overall segmentation accuracy. From hybrid architectures like U-Net &#x002B;Mask R-CNN to transformer-integrated designs such as UTSN-Net and SWTRU, the focus lies in improving feature transfer across encoder-decoder paths while minimizing computational cost. These architectural innovations not only improve model efficiency but also ensure robust performance in segmenting complex anatomical structures.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Analysis of skip connection and feature fusion strategies</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Model</th>
<th align="center">Key components</th>
<th align="center">Purpose</th>
<th align="center">Advantages</th>
<th align="center">Limitations</th>
</tr>
</thead>
<tbody>
<tr>
<td>SenseNet [<xref ref-type="bibr" rid="ref-35">35</xref>]</td>
<td>Dense skip connections</td>
<td>Reduces computational overhead and memory usage</td>
<td>Prevents exponential parameter growth, enhances training, and speeds up inference</td>
<td>May underperform on complex structures due to reduced early-layer dependency</td>
</tr>
<tr>
<td>U-Net [<xref ref-type="bibr" rid="ref-37">37</xref>]</td>
<td>Dense skip connections</td>
<td>Preserves detailed spatial information during encoding</td>
<td>Enhances spatial context preservation, enables multi-level feature reuse in U-Net-based model</td>
<td>Increase memory usage due to feature fusion from multiple layers</td>
</tr>
<tr>
<td>SIB-UNet [<xref ref-type="bibr" rid="ref-40">40</xref>]</td>
<td>IB skip connections with VIB</td>
<td>Enhances semantic information transfer and prevents overfitting</td>
<td>Reduces redundant features, optimizes feature learning, controls overfitting</td>
<td>Increase risk of information loss due to aggressive feature compression</td>
</tr>
<tr>
<td>USCT-UNet [<xref ref-type="bibr" rid="ref-41">41</xref>]</td>
<td>USC &#x0026; MCFT</td>
<td>Reduces semantic gap between encoder and decoder</td>
<td>Improves feature fusion and segmentation accuracy with spatial-channel cross-attention</td>
<td>Complex integration mechanism increases model size and training complexity</td>
</tr>
<tr>
<td>SK-VM&#x002B;&#x002B; [<xref ref-type="bibr" rid="ref-42">42</xref>]</td>
<td>PVM &#x0026; multi-stage feature fusion</td>
<td>Reduces computational complexity while maintaining segmentation accuracy</td>
<td>Mamba-based U-Net&#x002B;&#x002B; to reduce parameter growth</td>
<td>Complexity of Mamba layer integration and loss interpretability in lesion-specific tuning</td>
</tr>
<tr>
<td>U-Net &#x002B; Mask R-CNN [<xref ref-type="bibr" rid="ref-43">43</xref>]</td>
<td>ResNet-50-based skip connections</td>
<td>Brain MRI semantic and instance segmentation</td>
<td>Captures fine-grained details in hybrid U-Net &#x002B; Mask R-CNN while enabling pixel-wise classification with confidence scores</td>
<td>Requires dual training pipelines and more GPU memory</td>
</tr>
<tr>
<td>UTSN-Net [<xref ref-type="bibr" rid="ref-44">44</xref>]</td>
<td>Skip non-local attention &#x0026; deep Transformer operations</td>
<td>Improves feature extraction and segmentation performance</td>
<td>Emphasizes key features, suppresses noise, and enhances global context awareness</td>
<td>High computational cost and potential overfitting in low-data scenarios</td>
</tr>
<tr>
<td>SWTRU [<xref ref-type="bibr" rid="ref-39">39</xref>]</td>
<td>RFSC, Star-shaped Window Transformer, FFIM &#x0026; Linear Integration Layer</td>
<td>Improves feature fusion across multiple scales while reducing computational complexity</td>
<td>Enhances multi-scale feature integration, expands attention areas, optimizes computational efficiency</td>
<td>Star-shaped attention may limit full global context modeling in highly irregular shapes</td>
</tr>
</tbody>
</table>
</table-wrap>
  
<p><bold><italic>&#x2013;&#x2002;Discussion and Insights</italic></bold></p>
<p>In this section, a comprehensive review of multi-scale feature representation strategies is provided across convolutional, shallow, deep, and skip connection modules; however, a balanced analysis highlights key trade-offs to be addressed. Dilated convolutions and ASPP modules effectively expand the receptive field without increasing computational load but may suffer from gridding artifacts and lose fine details. Hybrid CNN-Transformer models like CSAP-UNet in [<xref ref-type="bibr" rid="ref-18">18</xref>] capture both local and global dependencies, offering superior context modeling; however, they typically require more memory and complex training strategies. Attention-based mechanisms, e.g., SE, CBAM, AFM, improve feature discrimination but can introduce redundancy or overfitting, especially in small datasets. Shallow block supplements like FSOU-Net preserve boundary precision but may lack high-level semantics if not fused effectively. Deep blocks with dense or inception connections enhance gradient flow and feature reuse, yet they increase model depth and may hinder real-time performance. Advanced skip connections, e.g., RFSC, VIB, SN-attention, ensure effective feature transfer across scales but may complicate network optimization due to increased parameterization. Collectively, these methods present valuable strategies to overcome scale variability in medical images, but model selection should consider computational cost, dataset size, and clinical application requirements.</p>
<p>Enhanced skip connections, such as dense skip connections and attention-based fusion mechanisms, significantly contribute to semantic consistency and feature reuse in medical image segmentation. Dense skip connections link not only corresponding encoder and decoder layers but also multiple preceding layers, enabling the network to reuse features across scales and improve gradient flow. This facilitates better integration of low-level spatial details with high-level semantic information, preserving fine-grained structures like lesion edges or small anatomical features. Attention-based fusions further refine this process by selectively weighting the importance of transferred features, ensuring that only the most relevant spatial and channel-wise information is emphasized. These enhancements help the network maintain semantic coherence throughout the decoding process, reduce information loss during downsampling, and improve segmentation accuracy, particularly in complex or low-contrast biomedical images. As a result, they enable more reliable delineation of boundaries and better generalization across diverse imaging conditions.</p>
<p>Although the classification of shallow, deep, and skip connection blocks highlights the architectural diversity of U-Net-based models, recent trends indicate a shift toward hybrid architectures that integrate Transformer-based modules with traditional convolutional backbones. Convolutional Neural Networks (CNNs) such as U-Net and DenseNet are well-suited for local feature extraction and are computationally efficient, making them ideal for real-time applications and high-resolution medical imaging in resource-limited settings. However, CNNs inherently struggle to model long-range dependencies, which are essential for capturing global anatomical context, particularly in whole-organ or complex tissue analysis.</p>
<p>In contrast, Transformer-based models like CI-UNet [<xref ref-type="bibr" rid="ref-27">27</xref>], IEA-Net [<xref ref-type="bibr" rid="ref-27">27</xref>], and DMSA-UNet [<xref ref-type="bibr" rid="ref-34">34</xref>] offer superior global context modeling and scale-aware attention mechanisms, improving segmentation accuracy in tasks such as brain tumor localization and whole-slide image analysis. Despite these advantages, Transformers introduce significant computational and memory overhead, limiting their feasibility in low-resource environments or edge devices. Similarly, attention mechanisms, e.g., SE, CBAM, AFM, enhance feature relevance and boundary precision, but may lead to overfitting and redundancy, particularly in small biomedical datasets. Their added complexity must be carefully balanced against the marginal gains in accuracy.</p>
<p>Ultimately, the integration of Transformers and attention mechanisms into U-Net-like architectures enhances both precision and robustness, especially in challenging conditions such as low contrast, variable lesion morphology, or overlapping structures. The synergy between U-Net&#x2019;s skip connections, which preserve fine-grained spatial information, and Transformer modules, which model broader semantic dependencies, results in more accurate and clinically relevant segmentation outcomes. These hybrid models hold great promise for tasks like diagnosis, treatment planning, and disease monitoring, though their deployment must account for task-specific requirements and computational constraints.</p>
</sec>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>Deep Annotation/Segmentation Models in Histological and Tissue Imaging</title>
<p>Accurate annotation and segmentation in tissue imaging are critical for elucidating cellular organization, functional architecture, and anatomical structures, particularly in multiplexed microscopy and medical imaging. Conventional deep learning models, including U-Net and DeepCell, primarily employ semantic segmentation, which assigns class labels to individual pixels but fails to distinguish between separate object instances. To overcome this limitation, instance segmentation techniques have been developed, enabling cell- or organ-level delineation while preserving spatial relationships. Advances such as Mesmer have demonstrated notable progress by offering a robust segmentation framework alongside large-scale annotated datasets like TissueNet. Despite these advances, the reliance on extensive manual annotations remains a significant bottleneck. This has motivated the exploration of alternative strategies that leverage weak, sparse, or incomplete labels to enhance model generalizability and reduce the burden of exhaustive annotation.</p>
<p><bold><italic>&#x2013;&#x2002;Multiplexed Tissue Images</italic></bold></p>
<p>A human-in-the-loop approach was employed in [<xref ref-type="bibr" rid="ref-50">50</xref>] to annotate a large-scale dataset, wherein the outputs from a deep learning model were iteratively corrected by human experts and fed back into the model for further refinement. Multiplexed imaging plays a critical role in spatial profiling of biological components at the cellular level [<xref ref-type="bibr" rid="ref-103">103</xref>]. However, extracting meaningful information from such images requires precise instance segmentation of individual cells to enable accurate feature extraction. In this context, a deep learning model trained on a diverse dataset such as TissueNet proves highly effective. For multiplexed tissue images, instance segmentation is essential for delineating boundaries of individual cell instances. Using the TissueNet dataset, a deep learning-based model called Mesmer was developed to perform whole-cell and nuclear segmentation. Mesmer is built upon a ResNet50 [<xref ref-type="bibr" rid="ref-104">104</xref>] backbone integrated with a Feature Pyramid Network (FPN) [<xref ref-type="bibr" rid="ref-94">94</xref>], enabling it to predict both nuclear and whole-cell masks. Input images consist of two channels: one representing the nuclear signal and another corresponding to the cytoplasmic or membrane signal. These channels are normalized and processed by the model to generate spatial maps indicating centroids and boundaries of cells and nuclei. These spatial outputs serve as inputs to a watershed algorithm [<xref ref-type="bibr" rid="ref-105">105</xref>], which subsequently generates instance segmentation masks for each cell and nucleus in the image. Notably, the deep learning model does not directly output the final instance masks but rather provides spatial cues that guide the segmentation process. This approach is particularly valuable for downstream analyses, where extracted cell-level features from multiplexed images can be projected into low-dimensional spaces for phenotypic profiling and quantitative assessment of biological samples [<xref ref-type="bibr" rid="ref-103">103</xref>].</p>
<p><bold><italic>&#x2013;&#x2002;Segmentation Using Weakly Annotated Datasets</italic></bold></p>
<p>Numerous deep learning algorithms have demonstrated effectiveness in cell segmentation, but typically require substantial quantities of high-quality annotated data to achieve optimal performance. This requirement becomes particularly challenging, and costly, when annotations must delineate individual cell instances. To address the annotation burden, several studies have explored unsupervised [<xref ref-type="bibr" rid="ref-106">106</xref>] and weakly supervised learning strategies [<xref ref-type="bibr" rid="ref-107">107</xref>,<xref ref-type="bibr" rid="ref-108">108</xref>]. Unsupervised methods such as [<xref ref-type="bibr" rid="ref-106">106</xref>] have shown performance comparable to state-of-the-art approaches like CellPose [<xref ref-type="bibr" rid="ref-109">109</xref>] and Mesmer [<xref ref-type="bibr" rid="ref-50">50</xref>] in nuclei segmentation. However, their effectiveness varies across datasets, particularly when extended to broader cell segmentation tasks, varies when measured by F1 score comparisons with CellPose and Mesmer. Weakly supervised methods [<xref ref-type="bibr" rid="ref-107">107</xref>,<xref ref-type="bibr" rid="ref-108">108</xref>], though less annotation-intensive than fully supervised counterparts, still require spatial cues such as centroids or bounding boxes, which are time-consuming to generate at scale. To mitigate these challenges, the authors in [<xref ref-type="bibr" rid="ref-110">110</xref>] proposed an approach leveraging image-level segmentations alongside location-of-interest annotations for individual cells, striking a balance between annotation efficiency and segmentation accuracy.</p>
<p>In [<xref ref-type="bibr" rid="ref-110">110</xref>], Location Assisted Cell Segmentation System (LACSS) is introduced, a network architecture designed to balance annotation efficiency with segmentation accuracy. LACSS builds upon a Fully Convolutional Network (FCN) framework [<xref ref-type="bibr" rid="ref-111">111</xref>], employing an encoder&#x2013;decoder backbone to extract hierarchical features, which are then passed to a Location Proposal Network (LPN). The LPN is tasked with predicting locations of interest (LOIs) for individual cells, though it does not estimate object sizes due to the lack of size annotations. A subsequent segmentation FCN module focuses on generating single-cell segmentations. To improve computational efficiency, segmentation is restricted to localized regions surrounding each LOI, under the assumption that distant pixels are unlikely to belong to the target cell. While LACSS is optimized for datasets with sparse or incomplete annotations, it can also be configured for fully supervised learning. In the supervised setting, the total loss comprises the LPN loss, quantifying the discrepancy between predicted and ground truth LOIs, and the segmentation loss. For weakly supervised training, the model combines LPN loss with a weak supervision objective that enforces consistency between the image-level and cell-level segmentations, enabling robust performance under limited annotation regimes.</p>
<p>To evaluate the segmentation performance of different models across various anatomical structures in [<xref ref-type="bibr" rid="ref-112">112</xref>], Dice similarity coefficient distributions are analyzed based on organ and model types. The boxplot in <xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates the organ-wise variability in segmentation accuracy for the models, i.e., SAM and MedSAM, applied to abdominal CT images. Overall, the aorta and liver exhibited higher median Dice scores, indicating relatively consistent and accurate segmentation across slices, whereas the kidneys and spleen showed greater interquartile spread and lower median performance. This variability may reflect challenges associated with organ boundary delineation, anatomical variability, or contrast heterogeneity. Notably, both models demonstrated competitive performances across most organs, suggesting robust generalization in multi-organ segmentation tasks. These findings underscore the importance of organ-specific evaluation when benchmarking segmentation models and highlight potential areas for improvement in anatomical precision, particularly for smaller or morphologically complex structures.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Organ-wise distribution of Dice similarity coefficients for different semantic segmentation models, i.e., SAM and MedSAM. Boxplots illustrate the variability in segmentation accuracy across spleen, kidneys (right and left), liver, and aorta</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-3.tif"/>
</fig>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Databases</title>
<p>Collaborations among clinical, academic, and industry stakeholders play a pivotal role in advancing innovation within the medical imaging field. High-profile computer vision challenges, such as KUMAR [<xref ref-type="bibr" rid="ref-113">113</xref>], CHAOS [<xref ref-type="bibr" rid="ref-114">114</xref>], CVC-ClinicDC [<xref ref-type="bibr" rid="ref-115">115</xref>], and MonuSeg [<xref ref-type="bibr" rid="ref-113">113</xref>], that provide monetary incentives for competitive analysis on standardized datasets are accelerating large-scale benchmarking and spurring algorithmic innovation. In parallel, universities and hospitals are increasingly releasing annotated datasets across various organ systems to support research efficiency and foster progress in the field. The growing availability of multi-organ datasets derived from clinical imaging modalities like CT, MRI, and ultrasound significantly reduces the barriers to entry for clinically relevant tasks such as tumor segmentation, e.g., BRATS [<xref ref-type="bibr" rid="ref-116">116</xref>], GlaS [<xref ref-type="bibr" rid="ref-117">117</xref>], BUSI [<xref ref-type="bibr" rid="ref-118">118</xref>], and disease classification, e.g., ADNI [<xref ref-type="bibr" rid="ref-119">119</xref>], Sample result images from CT datasets, i.e., MICCAI 2021 FLARE Challenge Dataset [<xref ref-type="bibr" rid="ref-120">120</xref>], etc., based on VISTA-3D [<xref ref-type="bibr" rid="ref-121">121</xref>] are presented in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, illustrating the diversity of abdominal organ structures and variations across different CT scans used in the challenge. By minimizing the need for individual research groups to independently curate data, these shared resources enable more rapid and reproducible experimentation. Further dataset details are provided in <xref ref-type="table" rid="table-4">Table 4</xref>. As the importance of cross-dataset generalization grows for real-world clinical deployment, the demand for diverse, high-quality datasets is expected to increase accordingly.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Visualization of segmentation outputs from CT Datasets using VISTA-3D</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-4a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-4b.tif"/>
</fig>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Overview of datasets by organ type and imaging modality</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Dataset</th>
<th align="center">Modality</th>
<th align="center">Organ type</th>
<th align="center">Year</th>
<th align="center">Resolution/Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>ADNI [<xref ref-type="bibr" rid="ref-119">119</xref>]</td>
<td>MRI, PET</td>
<td>Alzheimer&#x2019;s brain scan</td>
<td>2023</td>
<td>160 &#x00D7; 160 &#x00D7; 96</td>
</tr>
<tr>
<td>MICCAI 2021 FLARE Challenge Dataset</td>
<td>CT</td>
<td>Abdominal organs</td>
<td>2021</td>
<td>512 &#x00D7; 512</td>
</tr>
<tr>
<td>MiMM_SBILab [<xref ref-type="bibr" rid="ref-53">53</xref>]</td>
<td>Microscopy</td>
<td>Bone marrow</td>
<td>2019</td>
<td>2560 &#x00D7; 1920</td>
</tr>
<tr>
<td>Physionet [<xref ref-type="bibr" rid="ref-122">122</xref>]</td>
<td>CT</td>
<td>Brain</td>
<td>2020</td>
<td>512 &#x00D7; 512</td>
</tr>
<tr>
<td>SynConn2 [<xref ref-type="bibr" rid="ref-6">6</xref>]</td>
<td>Electron microscopy</td>
<td>Brain neuron</td>
<td>2022</td>
<td>482 &#x00D7; 481 &#x00D7; 236</td>
</tr>
<tr>
<td>BRATS2014 [<xref ref-type="bibr" rid="ref-116">116</xref>]</td>
<td>Multi contrast MR</td>
<td>Brain tumor</td>
<td>2014</td>
<td>128 &#x00D7; 128 &#x00D7; 128</td>
</tr>
<tr>
<td>UDIAT [<xref ref-type="bibr" rid="ref-123">123</xref>]</td>
<td>Ultrasound</td>
<td>Breast</td>
<td>2017</td>
<td>256 &#x00D7; 256</td>
</tr>
<tr>
<td>BUSI [<xref ref-type="bibr" rid="ref-118">118</xref>]</td>
<td>Ultrasound</td>
<td>Breast</td>
<td>2019</td>
<td>500 &#x00D7; 500 &#x00D7; 780</td>
</tr>
<tr>
<td>TissueNet [<xref ref-type="bibr" rid="ref-50">50</xref>]</td>
<td>Microscopy</td>
<td>Breast cancer, colorectal carcinoma, skin, lymph node, lymphoma, colon, spleen, DCIS, esophagus, lung, pancreas</td>
<td>2022</td>
<td>512 &#x00D7; 512</td>
</tr>
<tr>
<td>KUMAR [<xref ref-type="bibr" rid="ref-113">113</xref>]</td>
<td>Microscopy</td>
<td>Breast, liver, kidney, prostate, bladder, colon, and stomach</td>
<td>2017</td>
<td>1000 &#x00D7; 1000</td>
</tr>
<tr>
<td>MOD [<xref ref-type="bibr" rid="ref-124">124</xref>]</td>
<td>Light microscope</td>
<td>Breast, liver, kidney, prostate, bladder, colon, stomach</td>
<td>2017</td>
<td>1000 &#x00D7; 1000</td>
</tr>
<tr>
<td>CVC-ClinicDC [<xref ref-type="bibr" rid="ref-115">115</xref>]</td>
<td>Colonoscopy</td>
<td>Colon polyp</td>
<td>2012</td>
<td>384 &#x00D7; 288</td>
</tr>
<tr>
<td>CRAG [<xref ref-type="bibr" rid="ref-125">125</xref>]</td>
<td>Microscopy</td>
<td>Colorectal adenocarcinoma gland</td>
<td>2019</td>
<td>1512 &#x00D7; 1516</td>
</tr>
<tr>
<td>GlaS [<xref ref-type="bibr" rid="ref-117">117</xref>]</td>
<td>Microscopy</td>
<td>Colorectal cancer glands</td>
<td>2017</td>
<td>775 &#x00D7; 522</td>
</tr>
<tr>
<td>PanNuke [<xref ref-type="bibr" rid="ref-126">126</xref>]</td>
<td>Microscopy</td>
<td>Epithelial, connective/soft tissue cells, lympho-reticular cells, nervous system cells and dead.</td>
<td>2019</td>
<td>256 &#x00D7; 256</td>
</tr>
<tr>
<td>LERA [<xref ref-type="bibr" rid="ref-119">119</xref>]</td>
<td>Radiograph</td>
<td>Foot, knee, ankle</td>
<td>2020</td>
<td>Varying sizes</td>
</tr>
<tr>
<td>MonuSeg [<xref ref-type="bibr" rid="ref-113">113</xref>]</td>
<td>Microscopy</td>
<td>H&#x0026;E stains from various human organs</td>
<td>2018</td>
<td>1000 &#x00D7; 1000</td>
</tr>
<tr>
<td>CHAOS [<xref ref-type="bibr" rid="ref-114">114</xref>]</td>
<td>CT-MR</td>
<td>Liver, kidney, spleen</td>
<td>2021</td>
<td>512 &#x00D7; 512</td>
</tr>
<tr>
<td>CoNSeP [<xref ref-type="bibr" rid="ref-127">127</xref>]</td>
<td>Microscopy</td>
<td>Nuclei labeled from H&#x0026;E colorectal slides</td>
<td>2018</td>
<td>1000 &#x00D7; 1000</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>A variety of publicly available microscopy datasets have been developed to advance image analysis research involving cellular images. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows sample original and mask images across different imaging channels [<xref ref-type="bibr" rid="ref-128">128</xref>]. Among the earliest resources, The Cell Image Library served as a pioneering data repository, offering a broad spectrum of cellular images and enabling early efforts in data sharing among researchers. As the field matured, more specialized datasets emerged, LIVECell [<xref ref-type="bibr" rid="ref-129">129</xref>], for instance, introduced a large-scale collection of manually annotated images from eight distinct cell lines, emphasizing diverse cellular morphologies.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Visualization of multi-channel microscopy images and ground truth masks</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-5a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_67915-fig-5b.tif"/>
</fig>
<p>Training datasets that involve a wide range of tissue types and cell structures are essential for improving model generalizability (see <xref ref-type="fig" rid="fig-5">Fig. 5</xref>). However, earlier image analysis methods often struggled to handle such heterogeneity due to limited computational capabilities. <xref ref-type="table" rid="table-5">Table 5</xref> provides additional details on dataset diversity across cell types and modalities. Recent advancements in deep and machine learning and computer vision have enabled the development of generalist models capable of segmenting a broader array of cell types and structures. Cellpose stands out as a top example; it was trained on diverse image modalities and staining protocols and exemplifies the effectiveness of cross-dataset training to support robust segmentation across various biological contexts. Concurrently, the Broad Bioimage Benchmark Collection has introduced datasets tailored for image-based profiling by covering multiple cell lines and phenotypic categories, thus facilitating analysis that reflects the diversity inherent in biological assays.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Summary of public microscopy datasets by cell type and imaging modality</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Dataset</th>
<th align="center">Modality</th>
<th align="center">Cell line</th>
<th align="center">Year</th>
<th align="center">Resolution/Size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Cell image library [<xref ref-type="bibr" rid="ref-130">130</xref>]</td>
<td>Microscopy</td>
<td>Various images from different organisms, cell types, and cellular processes</td>
<td>2010</td>
<td>Various sizes depending on type</td>
</tr>
<tr>
<td>LIVECell [<xref ref-type="bibr" rid="ref-129">129</xref>]</td>
<td>Microscopy</td>
<td>Single cells from following lines: SA172, BT474, BV2, Huh7, MCF7, SHSY5Y, SkBr3, SKOV3</td>
<td>2021</td>
<td>704 &#x00D7; 520</td>
</tr>
<tr>
<td>Cellpose [<xref ref-type="bibr" rid="ref-131">131</xref>]</td>
<td>Various modalities</td>
<td>Cells from various fluorescent markers</td>
<td>2024</td>
<td>Various sizes</td>
</tr>
<tr>
<td>Cell nuclei segmentation [<xref ref-type="bibr" rid="ref-132">132</xref>]</td>
<td>Brightfield microscopy</td>
<td>Cell</td>
<td>2018</td>
<td>256 &#x00D7; 256</td>
</tr>
<tr>
<td>Broad Bioimage Benchmark [<xref ref-type="bibr" rid="ref-133">133</xref>]</td>
<td>Microscopy</td>
<td>Various cell lines</td>
<td>2012</td>
<td>Various sizes depending on cell line</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4">
<label>4</label>
<title>Future Directions and Open Challenges</title>
<p>Future research in AI-driven diagnostic tools is expected to focus on the integration of interactive semantic segmentation to foster collaboration between clinicians and AI across various imaging modalities. Semantic segmentation serves as a vital bridge for computer-aided clinical decision support systems, offering precise anatomical and pathological delineation. One key challenge in improving segmentation performance lies in reducing the computational complexity of feature extraction blocks, such as the feature aggregation and feature selection modules [<xref ref-type="bibr" rid="ref-134">134</xref>]. This includes not only simplifying operations but also minimizing memory usage and computational cost. Additionally, enhancing the ability of shallow and deep blocks to extract both internal and external correlation features remains an open research problem [<xref ref-type="bibr" rid="ref-33">33</xref>]. Such assessments are essential for developing scalable models that can extract rich semantic features across different modalities, including dual-model systems as discussed in [<xref ref-type="bibr" rid="ref-135">135</xref>]. Moreover, the scarcity of diverse imaging datasets, especially those representing pathological variations across different tissue types, poses a significant limitation [<xref ref-type="bibr" rid="ref-58">58</xref>]. For instance, developing dedicated datasets with explicit attributes, such as those capturing melanoma through color, texture, and contour features, could aid in the spatial analysis of skin lesions [<xref ref-type="bibr" rid="ref-64">64</xref>]. In imaging modalities like ultrasound, where speckle noise is prevalent, building multi-modality semantic segmentation models using virtual imaging trials is a promising direction. This could support harmonization across imaging types for the same patient characteristics [<xref ref-type="bibr" rid="ref-29">29</xref>,<xref ref-type="bibr" rid="ref-136">136</xref>,<xref ref-type="bibr" rid="ref-137">137</xref>]. Another challenge is the growing complexity of network architecture, which can slow training and hinder scalability, as seen with Dense-Inception blocks.</p>
<p>In medical image semantic segmentation architectures such as U-Net and its variants, fine-scale and coarse-scale feature representations serve complementary roles that together improve segmentation accuracy and robustness. Fine-scale features, extracted in the early encoder layers and transferred via skip connections, capture high-resolution spatial details, such as precise boundaries, textures, and small structures, e.g., capillaries, thin tissue layers. These features are essential for accurate localization and boundary delineation, particularly in tasks like tumor margin identification or small organ segmentation.</p>
<p>In contrast, coarse-scale features, typically learned in deeper encoder layers, encode global semantic context, such as organ shape, location, and inter-structure relationships. This is crucial for disambiguating visually similar regions, suppressing false positives, and maintaining anatomical coherence, especially in low-contrast or noisy images.</p>
<p>By fusing these two feature types through mechanisms like skip connections, attention gates, or feature pyramid networks, U-Net variants can simultaneously retain fine structural accuracy and robust semantic understanding. This fusion enables the network to balance local precision and global context, which is particularly valuable in clinical applications like tumor segmentation, where subtle boundary cues and larger anatomical context must both be interpreted accurately.</p>
<p>Models like DMSA-UNet improve global attention by capturing semantic features from both spatial and channel dimensions while maintaining linear computational complexity. However, they often lack support for pre-trained weights. To address this, integrating local and global multi-scale information across multiple stages offers a potential solution [<xref ref-type="bibr" rid="ref-34">34</xref>]. Atrous convolutions can expand the receptive field without increasing the number of trainable parameters, making them an efficient enhancement [<xref ref-type="bibr" rid="ref-138">138</xref>].</p>
<p>Multi-scale attention networks offer high segmentation accuracy by capturing both local details and global context, but they face several challenges in maintaining computational efficiency across diverse biomedical imaging modalities. First, the incorporation of multiple attention modules, such as spatial, channel, and multi-scale attention, significantly increases the computational and memory demands, which can limit their applicability in real-time clinical settings or on resource-constrained hardware. Second, biomedical imaging modalities vary widely in resolution, contrast, and noise characteristics, e.g., CT vs. ultrasound vs. MRI, making it difficult to design a single attention mechanism or receptive field size that generalizes well across all modalities. Third, processing high-resolution images or volumetric data with multi-scale attention networks often requires down-sampling, which can lead to a loss of fine structural details critical for clinical interpretation. Additionally, overly complex attention architectures may introduce training instability or overfitting, especially when applied to small or imbalanced datasets common in biomedical research. Balancing model complexity, scalability, and generalizability remains a central challenge in deploying multi-scale attention networks effectively in real-world biomedical applications.</p>
<p>Given that most semantic segmentation approaches discussed rely on multi-scale and multi-level feature representations, employing model parallel training could help manage time complexity and accelerate training [<xref ref-type="bibr" rid="ref-139">139</xref>]. Furthermore, exploring diverse combinations of attention modules, selected based on their unique capabilities and complementary features, could enhance model effectiveness [<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-140">140</xref>]. Finally, quantum deep learning approaches [<xref ref-type="bibr" rid="ref-141">141</xref>] represent an emerging frontier, offering promising potential for real-time clinical applications with AI.</p>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This paper provides a comprehensive review of the key advancements in deep learning models, which have revolutionized semantic segmentation, enabling efficiency and precision in analyzing medical and biological images as well as high-resolution understanding of complex anatomical and cellular structures. This review has outlined the evolution of semantic segmentation architectures, emphasizing the critical role of fine-to-coarse scale feature representation. Unlike traditional segmentation methods, where it can be difficult to segment densely packed or low contrast images, deep learning models utilizing multi-scale scale feature representation can be useful in enhancing structural details in such data. The integration of multi-scale features, e.g., the combination of DeepLabv3&#x002B; and ResNet-50 [<xref ref-type="bibr" rid="ref-17">17</xref>], DMSA-UNet [<xref ref-type="bibr" rid="ref-34">34</xref>], LossNet model [<xref ref-type="bibr" rid="ref-101">101</xref>], enhances the model&#x2019;s ability to simultaneously capture local details, e.g., tissue boundaries and cellular morphology, and global context necessary for structural coherence and anatomical understanding. Advances in attention mechanisms, residual and transformer-based blocks, and fusion-based U-Net variants, e.g., SWTRU in [<xref ref-type="bibr" rid="ref-39">39</xref>], SIB-UNet [<xref ref-type="bibr" rid="ref-40">40</xref>], multi-scale fusion [<xref ref-type="bibr" rid="ref-98">98</xref>,<xref ref-type="bibr" rid="ref-99">99</xref>], have further improved the precision and adaptability of semantic segmentation models. However, challenges remain, particularly in managing low-contrast boundaries, modality-specific artifacts, e.g., speckle noise in ultrasound images, and the computational demands of increasingly complex networks. Furthermore, the lack of diverse, annotated datasets continues to limit generalizability across patient populations and imaging modalities. Future research should prioritize the development of lightweight architectures capable of effectively fusing fine-scale and coarse-scale information, while maintaining computational efficiency. The incorporation of interactive human-AI systems, harmonization across multi-modal inputs, and use of model-parallel training strategies may bridge current performance gaps. Emerging directions, such as virtual imaging trials, hybrid quantum deep learning models, and unsupervised feature refinement, hold promise for real-time, clinically integrated solutions. As segmentation moves toward becoming a foundational element in precision diagnostics and personalized medicine, the capability to reliably analyze and integrate features across multiple scales will be crucial for driving the next wave of innovation.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>Open Access funding provided by the National Institutes of Health (NIH). The funding for this project was provided by NCATS Intramural Fund.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Study Conception and Design: Majid Harouni and Vishakha Goyal; Analysis and Interpretation of Results: Majid Harouni and Vishakha Goyal; Draft Manuscript Preparation: Majid Harouni, Vishakha Goyal and Gabrielle Feldman; Writing, Review, and Editing: Majid Harouni, Vishakha Goyal and Gabrielle Feldman; Supervision: Sam Michael and Ty C. Voss. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Not applicable.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Choi</surname> <given-names>I</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>M</given-names></string-name></person-group>. <article-title>PESA R-CNN: perihematomal edema guided scale adaptive R-CNN for hemorrhage segmentation</article-title>. <source>IEEE J Biomed Health Inform</source>. <year>2022</year>;<volume>27</volume>(<issue>1</issue>):<fpage>397</fpage>&#x2013;<lpage>408</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jbhi.2022.3220820</pub-id>; <pub-id pub-id-type="pmid">36350855</pub-id></mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yazdi</surname> <given-names>R</given-names></string-name>, <string-name><surname>Khotanlou</surname> <given-names>H</given-names></string-name></person-group>. <article-title>MaxSigNet: light learnable layer for semantic cell segmentation</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>95</volume>(<issue>6</issue>):<fpage>106464</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2024.106464</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xuan</surname> <given-names>W</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Du</surname> <given-names>B</given-names></string-name></person-group>. <article-title>FCL-Net: towards accurate edge detection via fine-scale corrective learning</article-title>. <source>Neural Networks</source>. <year>2022</year>;<volume>145</volume>(<issue>5</issue>):<fpage>248</fpage>&#x2013;<lpage>59</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neunet.2021.10.022</pub-id>; <pub-id pub-id-type="pmid">34773900</pub-id></mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Oktay</surname> <given-names>O</given-names></string-name>, <string-name><surname>Schlemper</surname> <given-names>J</given-names></string-name>, <string-name><surname>Folgoc</surname> <given-names>LL</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>M</given-names></string-name>, <string-name><surname>Heinrich</surname> <given-names>M</given-names></string-name>, <string-name><surname>Misawa</surname> <given-names>K</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Attention u-net: learning where to look for the pancreas</article-title>. <comment>arXiv:1804.03999. 2018</comment>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Schlemper</surname> <given-names>J</given-names></string-name>, <string-name><surname>Oktay</surname> <given-names>O</given-names></string-name>, <string-name><surname>Schaap</surname> <given-names>M</given-names></string-name>, <string-name><surname>Heinrich</surname> <given-names>M</given-names></string-name>, <string-name><surname>Kainz</surname> <given-names>B</given-names></string-name>, <string-name><surname>Glocker</surname> <given-names>B</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Attention gated networks: learning to leverage salient regions in medical images</article-title>. <source>Med Image Anal</source>. <year>2019</year>;<volume>53</volume>(<issue>7639</issue>):<fpage>197</fpage>&#x2013;<lpage>207</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2019.01.012</pub-id>; <pub-id pub-id-type="pmid">30802813</pub-id></mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Schubert</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Dorkenwald</surname> <given-names>S</given-names></string-name>, <string-name><surname>Januszewski</surname> <given-names>M</given-names></string-name>, <string-name><surname>Klimesch</surname> <given-names>J</given-names></string-name>, <string-name><surname>Svara</surname> <given-names>F</given-names></string-name>, <string-name><surname>Mancu</surname> <given-names>A</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>SyConn2: dense synaptic connectivity inference for volume electron microscopy</article-title>. <source>Nat Meth</source>. <year>2022</year>;<volume>19</volume>(<issue>11</issue>):<fpage>1367</fpage>&#x2013;<lpage>70</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s41592-022-01624-x</pub-id>; <pub-id pub-id-type="pmid">36280715</pub-id></mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Stalidis</surname> <given-names>G</given-names></string-name>, <string-name><surname>Maglaveras</surname> <given-names>N</given-names></string-name>, <string-name><surname>Efstratiadis</surname> <given-names>SN</given-names></string-name>, <string-name><surname>Dimitriadis</surname> <given-names>AS</given-names></string-name>, <string-name><surname>Pappas</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Model-based processing scheme for quantitative 4-D cardiac MRI analysis</article-title>. <source>IEEE Transact Inform Technol Biomed</source>. <year>2002</year>;<volume>6</volume>(<issue>1</issue>):<fpage>59</fpage>&#x2013;<lpage>72</lpage>. doi:<pub-id pub-id-type="doi">10.1109/4233.992164</pub-id>; <pub-id pub-id-type="pmid">11936598</pub-id></mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zhong</surname> <given-names>S</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ning</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Breast ultrasound image segmentation: a coarse-to-fine fusion convolutional neural network</article-title>. <source>Med Phys</source>. <year>2021</year>;<volume>48</volume>(<issue>8</issue>):<fpage>4262</fpage>&#x2013;<lpage>78</lpage>. doi:<pub-id pub-id-type="doi">10.1002/mp.15006</pub-id>; <pub-id pub-id-type="pmid">34053092</pub-id></mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Manh</surname> <given-names>V</given-names></string-name>, <string-name><surname>Jia</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>W</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Mei</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>Y</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>An efficient framework for lesion segmentation in ultrasound images using global adversarial learning and region-invariant loss</article-title>. <source>Comput Biol Med</source>. <year>2024</year>;<volume>171</volume>(<issue>1</issue>):<fpage>108137</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2024.108137</pub-id>; <pub-id pub-id-type="pmid">38447499</pub-id></mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Han</surname> <given-names>X</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>CSwinDoubleU-Net: a double U-shaped network combined with convolution and Swin Transformer for colorectal polyp segmentation</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>89</volume>(<issue>1</issue>):<fpage>105749</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2023.105749</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Han</surname> <given-names>H</given-names></string-name>, <string-name><surname>Cai</surname> <given-names>Y</given-names></string-name>, <string-name><surname>He</surname> <given-names>H</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ding</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Cross-modality segmentation of ultrasound image with generative adversarial network and dual normalization network</article-title>. <source>Pattern Recognit</source>. <year>2025</year>;<volume>157</volume>:<fpage>110953</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.patcog.2024.110953</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Orlando</surname> <given-names>N</given-names></string-name>, <string-name><surname>Gyacskov</surname> <given-names>I</given-names></string-name>, <string-name><surname>Gillies</surname> <given-names>DJ</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>F</given-names></string-name>, <string-name><surname>Romagnoli</surname> <given-names>C</given-names></string-name>, <string-name><surname>D&#x2019;Souza</surname> <given-names>D</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Effect of dataset size, image quality, and image type on deep learning-based automatic prostate segmentation in 3D ultrasound</article-title>. <source>Phy Med Biol</source>. <year>2022</year>;<volume>67</volume>(<issue>7</issue>):<fpage>074002</fpage>. doi:<pub-id pub-id-type="doi">10.1088/1361-6560/ac5a93</pub-id>; <pub-id pub-id-type="pmid">35240585</pub-id></mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yin</surname> <given-names>H</given-names></string-name>, <string-name><surname>Shao</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>CFU-Net: a coarse-fine U-Net with multilevel attention for medical image segmentation</article-title>. <source>IEEE Transact Instrument Measur</source>. <year>2023</year>;<volume>72</volume>:<fpage>1</fpage>&#x2013;<lpage>12</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tim.2023.3293887</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Feng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>PAMSNet: a medical image segmentation network based on spatial pyramid and attention mechanism</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>94</volume>(<issue>6</issue>):<fpage>106285</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2024.106285</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yin</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>CoT-UNet&#x002B;&#x002B;: a medical image segmentation method based on contextual transformer and dense connection</article-title>. <source>Math Biosci and Eng</source>. <year>2023</year>;<volume>20</volume>(<issue>5</issue>):<fpage>8320</fpage>&#x2013;<lpage>36</lpage>. doi:<pub-id pub-id-type="doi">10.3934/mbe.2023364</pub-id>; <pub-id pub-id-type="pmid">37161200</pub-id></mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ahmed</surname> <given-names>MR</given-names></string-name>, <string-name><surname>Ashrafi</surname> <given-names>AF</given-names></string-name>, <string-name><surname>Ahmed</surname> <given-names>RU</given-names></string-name>, <string-name><surname>Shatabda</surname> <given-names>S</given-names></string-name>, <string-name><surname>Islam</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Islam</surname> <given-names>S</given-names></string-name></person-group>. <article-title>DoubleU-NetPlus: a novel attention and context-guided dual U-Net with multi-scale residual feature fusion network for semantic segmentation of medical images</article-title>. <source>Neural Comput Appl</source>. <year>2023</year>;<volume>35</volume>(<issue>19</issue>):<fpage>14379</fpage>&#x2013;<lpage>401</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-023-08493-1</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Roy</surname> <given-names>RM</given-names></string-name>, <string-name><surname>Ameer</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Segmentation of leukocyte by semantic segmentation model: a deep learning approach</article-title>. <source>Biomed Signal Process Control</source>. <year>2021</year>;<volume>65</volume>(<issue>3</issue>):<fpage>102385</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2020.102385</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Fan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xin</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hou</surname> <given-names>L</given-names></string-name></person-group>. <article-title>CSAP-UNet: convolution and self-attention paralleling network for medical image segmentation with edge enhancement</article-title>. <source>Comput Biol Med</source>. <year>2024</year>;<volume>172</volume>(<issue>11</issue>):<fpage>108265</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2024.108265</pub-id>; <pub-id pub-id-type="pmid">38461698</pub-id></mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pavani</surname> <given-names>PG</given-names></string-name>, <string-name><surname>Biswal</surname> <given-names>B</given-names></string-name>, <string-name><surname>Gandhi</surname> <given-names>TK</given-names></string-name>, <string-name><surname>Kota</surname> <given-names>AR</given-names></string-name></person-group>. <article-title>Robust semantic segmentation of retinal fluids from SD-OCT images using FAM-U-Net</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>87</volume>(<issue>1&#x2013;2</issue>):<fpage>105481</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2023.105481</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>L</given-names></string-name></person-group>. <article-title>A global activated feature pyramid network for tiny pest detection in the wild</article-title>. <source>Mach Vision Appl</source>. <year>2022</year>;<volume>33</volume>(<issue>5</issue>):<fpage>76</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s00138-022-01310-0</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Su</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>P</given-names></string-name>, <string-name><surname>Qiao</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Hybrid token transformer for deep face recognition</article-title>. <source>Pattern Recognit</source>. <year>2023</year>;<volume>139</volume>:<fpage>109443</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.patcog.2023.109443</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Hu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>L</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Squeeze-and-excitation networks</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Salt Lake City, UT, USA</publisher-loc>; <year>2018</year>. p. <fpage>7132</fpage>&#x2013;<lpage>41</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>S</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>FSOU-Net: feature supplement and optimization U-Net for 2D medical image segmentation</article-title>. <source>Technol Health Care</source>. <year>2023</year>;<volume>31</volume>(<issue>1</issue>):<fpage>181</fpage>&#x2013;<lpage>95</lpage>. doi:<pub-id pub-id-type="doi">10.3233/thc-220174</pub-id>; <pub-id pub-id-type="pmid">35754242</pub-id></mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>You</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>S</given-names></string-name>, <string-name><surname>Cai</surname> <given-names>W</given-names></string-name></person-group>. <article-title>DR-Net: dual-rotation network with feature map enhancement for medical image segmentation</article-title>. <source>Comp Intell Syst</source>. <year>2021</year>;<volume>8</volume>(<issue>1</issue>):<fpage>611</fpage>&#x2013;<lpage>23</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s40747-021-00525-4</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xiong</surname> <given-names>L</given-names></string-name>, <string-name><surname>Yi</surname> <given-names>C</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>S</given-names></string-name></person-group>. <article-title>SEA-NET: medical image segmentation network based on spiral squeeze-and-excitation and attention modules</article-title>. <source>BMC Med Imag</source>. <year>2024</year>;<volume>24</volume>(<issue>1</issue>):<fpage>17</fpage>. doi:<pub-id pub-id-type="doi">10.21203/rs.3.rs-2988347/v1</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Han</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name></person-group>. <article-title>Notice of retraction: dien network: detailed information extracting network for detecting continuous circular capsulorhexis boundaries of cataracts</article-title>. <source>IEEE Access</source>. <year>2020</year>;<volume>8</volume>:<fpage>161571</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2020.3021490</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>CI-UNet: melding convnext and cross-dimensional attention for robust medical image segmentation</article-title>. <source>Biomed Eng Lett</source>. <year>2024</year>;<volume>14</volume>(<issue>2</issue>):<fpage>341</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s13534-023-00341-4</pub-id>; <pub-id pub-id-type="pmid">38374903</pub-id></mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Murmu</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Automated breast nuclei feature extraction for segmentation in histopathology images using Deep-CNN-based gaussian mixture model and color optimization technique</article-title>. <source>Multimed Tools Appl</source>. <year>2025</year>;<volume>2</volume>(<issue>7</issue>):<fpage>1</fpage>&#x2013;<lpage>27</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11042-025-20676-7</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Coleman</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kerr</surname> <given-names>D</given-names></string-name></person-group>. <article-title>DENSE-INception U-net for medical image segmentation</article-title>. <source>Comput Meth Prog Biomed</source>. <year>2020</year>;<volume>192</volume>:<fpage>105395</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cmpb.2020.105395</pub-id>; <pub-id pub-id-type="pmid">32163817</pub-id></mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sharma</surname> <given-names>V</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Yadav</surname> <given-names>AK</given-names></string-name></person-group>. <article-title>3D AIR-UNet: attention-inception&#x2013;residual-based U-Net for brain tumor segmentation from multimodal MRI</article-title>. <source>Neural Comput Appl</source>. <year>2025</year>:<fpage>1</fpage>&#x2013;<lpage>22</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-025-11105-9</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sun</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lahza</surname> <given-names>H</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>MVSI-Net: multi-view attention and multi-scale feature interaction for brain tumor segmentation</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>95</volume>(<issue>4</issue>):<fpage>106484</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2024.106484</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Han</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>H</given-names></string-name>, <string-name><surname>Qiao</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Dual-attention transformer-based hybrid network for multi-modal medical image segmentation</article-title>. <source>Sci Rep</source>. <year>2024</year>;<volume>14</volume>(<issue>1</issue>):<fpage>25704</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-024-76234-y</pub-id>; <pub-id pub-id-type="pmid">39465274</pub-id></mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Peng</surname> <given-names>B</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>C</given-names></string-name></person-group>. <article-title>IEA-Net: internal and external dual-attention medical segmentation network with high-performance convolutional blocks</article-title>. <source>J Imag Inform Med</source>. <year>2025</year>;<volume>38</volume>(<issue>1</issue>):<fpage>602</fpage>&#x2013;<lpage>14</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10278-024-01217-4</pub-id>; <pub-id pub-id-type="pmid">39105850</pub-id></mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Sham</surname> <given-names>C-W</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>J</given-names></string-name></person-group>. <article-title>DMSA-UNet: dual multi-scale attention makes UNet more strong for medical image segmentation</article-title>. <source>Knowl Based Syst</source>. <year>2024</year>;<volume>299</volume>(<issue>6</issue>):<fpage>112050</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.knosys.2024.112050</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lodhi</surname> <given-names>BA</given-names></string-name>, <string-name><surname>Ullah</surname> <given-names>R</given-names></string-name>, <string-name><surname>Imran</surname> <given-names>S</given-names></string-name>, <string-name><surname>Imran</surname> <given-names>M</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>B-S</given-names></string-name></person-group>. <article-title>SenseNet: densely connected, fully convolutional network with bottleneck skip connection for image segmentation</article-title>. <source>IEIE Transact Smart Process Comput</source>. <year>2024</year>;<volume>13</volume>(<issue>4</issue>):<fpage>328</fpage>&#x2013;<lpage>36</lpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>L</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <string-name><surname>Bi</surname> <given-names>X</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Skip-AttSeqNet: Leveraging skip connection and attention-driven Seq2seq model to enhance eye movement event detection in Parkinson&#x2019;s disease</article-title>. <source>Biomed Signal Process Control</source>. <year>2025</year>;<volume>99</volume>(<issue>6</issue>):<fpage>106862</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2024.106862</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Jiao</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Multimodal medical image segmentation using multi-scale context-aware network</article-title>. <source>Neurocomputing</source>. <year>2022</year>;<volume>486</volume>:<fpage>135</fpage>&#x2013;<lpage>46</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2021.11.017</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Silva</surname> <given-names>AQB</given-names></string-name>, <string-name><surname>Gon&#x00E7;alves</surname> <given-names>WN</given-names></string-name>, <string-name><surname>Matsubara</surname> <given-names>ET</given-names></string-name></person-group>. <article-title>DESCINet: a hierarchical deep convolutional neural network with skip connection for long time series forecasting</article-title>. <source>Expert Syst Appl</source>. <year>2023</year>;<volume>228</volume>:<fpage>120246</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2023.120246</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>X</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>SWTRU: star-shaped window transformer reinforced U-net for medical image segmentation</article-title>. <source>Comput Biol Med</source>. <year>2022</year>;<volume>150</volume>(<issue>2</issue>):<fpage>105954</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2022.105954</pub-id>; <pub-id pub-id-type="pmid">36122443</pub-id></mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>G</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>SIB-UNet: a dual encoder medical image segmentation model with selective fusion and information bottleneck fusion</article-title>. <source>Expert Syst Appl</source>. <year>2024</year>;<volume>252</volume>(<issue>10</issue>):<fpage>124284</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2024.124284</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xie</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>M</given-names></string-name></person-group>. <article-title>USCT-UNet: rethinking the semantic gap in U-net network from U-shaped skip connections with multichannel fusion transformer</article-title>. <source>IEEE Transact Neural Syst Rehabilitat Eng</source>. <year>2024</year>;<volume>32</volume>:<fpage>3782</fpage>&#x2013;<lpage>93</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnsre.2024.3468339</pub-id>; <pub-id pub-id-type="pmid">39325601</pub-id></mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Pan</surname> <given-names>L</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>P</given-names></string-name>, <string-name><surname>Chang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Fang</surname> <given-names>W</given-names></string-name></person-group>. <article-title>SK-VM&#x002B;&#x002B;: mamba assists skip-connections for medical image segmentation</article-title>. <source>Biomed Signal Process Control</source>. <year>2025</year>;<volume>105</volume>(<issue>581</issue>):<fpage>107646</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2025.107646</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Amin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gul</surname> <given-names>N</given-names></string-name>, <string-name><surname>Sharif</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Dual-method for semantic and instance brain tumor segmentation based on UNet and mask R-CNN using MRI</article-title>. <source>Neural Comput Appl</source>. <year>2025</year>:<fpage>1</fpage>&#x2013;<lpage>19</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00521-025-11013-y</pub-id>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>C</given-names></string-name></person-group>. <article-title>UTSN-net: medical image semantic segmentation model based on skip non-local attention module</article-title>. In: <conf-name>Eighth International Conference on Electronic Technology and Information Science (ICETIS 2023); Dalian, China</conf-name>. <publisher-loc>2023</publisher-loc>. </mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Carlos</surname> <given-names>G</given-names></string-name>, <string-name><surname>Figueiredo</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hussain</surname> <given-names>A</given-names></string-name>, <string-name><surname>Vellasco</surname> <given-names>M</given-names></string-name></person-group>. <article-title>SegQNAS: quantum-inspired neural architecture search applied to medical image semantic segmentation</article-title>. In: <conf-name>2023 International Joint Conference on Neural Networks (IJCNN); Gold Coast, Australia</conf-name>. <year>2023</year>. p. <fpage>1</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lew</surname> <given-names>CO</given-names></string-name>, <string-name><surname>Harouni</surname> <given-names>M</given-names></string-name>, <string-name><surname>Kirksey</surname> <given-names>ER</given-names></string-name>, <string-name><surname>Kang</surname> <given-names>EJ</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>H</given-names></string-name>, <string-name><surname>Gu</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A publicly available deep learning model and dataset for segmentation of breast, fibroglandular tissue, and vessels in breast MRI</article-title>. <source>Sci Rep</source>. <year>2024</year>;<volume>14</volume>(<issue>1</issue>):<fpage>5383</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-024-54048-2</pub-id>; <pub-id pub-id-type="pmid">38443410</pub-id></mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rehman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Harouni</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zogh</surname> <given-names>F</given-names></string-name>, <string-name><surname>Saba</surname> <given-names>T</given-names></string-name>, <string-name><surname>Karimi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Alamri</surname> <given-names>FS</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Detection of lungs tumors in CT scan images using convolutional neural networks</article-title>. <source>IEEE/ACM Transact Computat Biol Bioinform</source>. <year>2023</year>;<volume>21</volume>(<issue>4</issue>):<fpage>769</fpage>&#x2013;<lpage>77</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tcbb.2023.3315303</pub-id>; <pub-id pub-id-type="pmid">37708019</pub-id></mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Morse</surname> <given-names>DB</given-names></string-name>, <string-name><surname>Michalowski</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Ceribelli</surname> <given-names>M</given-names></string-name>, <string-name><surname>De Jonghe</surname> <given-names>J</given-names></string-name>, <string-name><surname>Vias</surname> <given-names>M</given-names></string-name>, <string-name><surname>Riley</surname> <given-names>D</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Positional influence on cellular transcriptional identity revealed through spatially segmented single-cell transcriptomics</article-title>. <source>Cell Systems</source>. <year>2023</year>;<volume>14</volume>(<issue>6</issue>):<fpage>464</fpage>&#x2013;<lpage>81.e7</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cels.2023.05.003</pub-id>; <pub-id pub-id-type="pmid">37348462</pub-id></mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Snezhko</surname> <given-names>E</given-names></string-name>, <string-name><surname>Kovalev</surname> <given-names>V</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>M</given-names></string-name></person-group>. <article-title>MLDA-Net: multi-level deep aggregation network for 3D nuclei instance segmentation</article-title>. <source>IEEE J Biomed Health Inform</source>. <year>2025</year>;<volume>29</volume>(<issue>5</issue>):<fpage>3516</fpage>&#x2013;<lpage>25</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jbhi.2025.3529464</pub-id>; <pub-id pub-id-type="pmid">40031026</pub-id></mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Greenwald</surname> <given-names>NF</given-names></string-name>, <string-name><surname>Miller</surname> <given-names>G</given-names></string-name>, <string-name><surname>Moen</surname> <given-names>E</given-names></string-name>, <string-name><surname>Kong</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kagel</surname> <given-names>A</given-names></string-name>, <string-name><surname>Dougherty</surname> <given-names>T</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning</article-title>. <source>Nat Biotechnol</source>. <year>2022</year>;<volume>40</volume>(<issue>4</issue>):<fpage>555</fpage>&#x2013;<lpage>65</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s41587-021-01094-0</pub-id>; <pub-id pub-id-type="pmid">34795433</pub-id></mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Goyal</surname> <given-names>V</given-names></string-name>, <string-name><surname>Schaub</surname> <given-names>NJ</given-names></string-name>, <string-name><surname>Voss</surname> <given-names>TC</given-names></string-name>, <string-name><surname>Hotaling</surname> <given-names>NA</given-names></string-name></person-group>. <article-title>Unbiased image segmentation assessment toolkit for quantitative differentiation of state-of-the-art algorithms and pipelines</article-title>. <source>BMC Bioinform</source>. <year>2023</year>;<volume>24</volume>(<issue>1</issue>):<fpage>388</fpage>. doi:<pub-id pub-id-type="doi">10.21203/rs.3.rs-2302693/v1</pub-id>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Boutin</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Voss</surname> <given-names>TC</given-names></string-name>, <string-name><surname>Titus</surname> <given-names>SA</given-names></string-name>, <string-name><surname>Cruz-Gutierrez</surname> <given-names>K</given-names></string-name>, <string-name><surname>Michael</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ferrer</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A high-throughput imaging and nuclear segmentation analysis protocol for cleared 3D culture models</article-title>. <source>Sci Rep</source>. <year>2018</year>;<volume>8</volume>(<issue>1</issue>):<fpage>11135</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-018-29169-0</pub-id>; <pub-id pub-id-type="pmid">30042482</pub-id></mixed-citation></ref>
<ref id="ref-53"><label>[53]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Karimi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Harouni</surname> <given-names>M</given-names></string-name>, <string-name><surname>Nasr</surname> <given-names>A</given-names></string-name>, <string-name><surname>Tavakoli</surname> <given-names>N</given-names></string-name></person-group>. <chapter-title> Automatic lung infection segmentation of COVID-19 in CT scan images</chapter-title>. In: <source>Intelligent computing applications for COVID-19</source>. <publisher-name>CRC Press</publisher-name>; <year>2021</year>. p. <fpage>235</fpage>&#x2013;<lpage>53</lpage>.</mixed-citation></ref>
<ref id="ref-54"><label>[54]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gkioxari</surname> <given-names>G</given-names></string-name>, <string-name><surname>Doll&#x00E1;r</surname> <given-names>P</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Mask R-CNN</article-title>. In: <conf-name>Proceedings of the IEEE International Conference on Computer Vision</conf-name>; <publisher-loc>Venice, Italy</publisher-loc>; <year>2017</year>. p. <fpage>2961</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-55"><label>[55]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hao</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Du</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>L</given-names></string-name></person-group>. <article-title>End-to-end deep learning-based cells detection in microscopic leucorrhea images</article-title>. <source>Micros Microanal</source>. <year>2022</year>;<volume>28</volume>(<issue>3</issue>):<fpage>732</fpage>&#x2013;<lpage>43</lpage>. doi:<pub-id pub-id-type="doi">10.1017/s1431927622000265</pub-id>; <pub-id pub-id-type="pmid">35232520</pub-id></mixed-citation></ref>
<ref id="ref-56"><label>[56]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Guemas</surname> <given-names>E</given-names></string-name>, <string-name><surname>Routier</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ghelfenstein-Ferreira</surname> <given-names>T</given-names></string-name>, <string-name><surname>Cordier</surname> <given-names>C</given-names></string-name>, <string-name><surname>Hartuis</surname> <given-names>S</given-names></string-name>, <string-name><surname>Marion</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Automatic patient-level recognition of four Plasmodium species on thin blood smear by a real-time detection transformer (RT-DETR) object detection algorithm: a proof-of-concept and evaluation</article-title>. <source>Microbiol Spectr</source>. <year>2024</year>;<volume>12</volume>(<issue>2</issue>):<fpage>e0144023</fpage>. doi:<pub-id pub-id-type="doi">10.1128/spectrum.01440-23</pub-id>; <pub-id pub-id-type="pmid">38171008</pub-id></mixed-citation></ref>
<ref id="ref-57"><label>[57]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Cheng</surname> <given-names>B</given-names></string-name>, <string-name><surname>Misra</surname> <given-names>I</given-names></string-name>, <string-name><surname>Schwing</surname> <given-names>AG</given-names></string-name>, <string-name><surname>Kirillov</surname> <given-names>A</given-names></string-name>, <string-name><surname>Girdhar</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Masked-attention mask transformer for universal image segmentation</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>New Orleans, LA, USA</publisher-loc>; <year>2022</year>.</mixed-citation></ref>
<ref id="ref-58"><label>[58]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sheng</surname> <given-names>J-C</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>Y-S</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>C-R</given-names></string-name></person-group>. <article-title>Apply masked-attention mask transformer to instance segmentation in pathology images</article-title>. In: <conf-name>2023 Sixth International Symposium on Computer, Consumer and Control (IS3C); Taichung, Taiwan</conf-name>. <year>2023</year>. p. <fpage>342</fpage>&#x2013;<lpage>45</lpage>.</mixed-citation></ref>
<ref id="ref-59"><label>[59]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Phoommanee</surname> <given-names>N</given-names></string-name>, <string-name><surname>Andrews</surname> <given-names>PJ</given-names></string-name>, <string-name><surname>Leung</surname> <given-names>TS</given-names></string-name></person-group>. <chapter-title>Segmentation of endoscopy images of the anterior nasal cavity using deep learning</chapter-title>. In: <source>Medical imaging 2024: computer-aided diagnosis</source>. <publisher-name>SPIE</publisher-name>; <year>2024</year>.</mixed-citation></ref>
<ref id="ref-60"><label>[60]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Itti</surname> <given-names>L</given-names></string-name>, <string-name><surname>Koch</surname> <given-names>C</given-names></string-name>, <string-name><surname>Niebur</surname> <given-names>E</given-names></string-name></person-group>. <article-title>A model of saliency-based visual attention for rapid scene analysis</article-title>. <source>IEEE Transact Pattern Anal Mach Intell</source>. <year>2002</year>;<volume>20</volume>(<issue>11</issue>):<fpage>1254</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/34.730558</pub-id>.</mixed-citation></ref>
<ref id="ref-61"><label>[61]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Leventhal</surname> <given-names>AG</given-names></string-name></person-group>. <article-title>The neural basis of visual function</article-title>; <year>1991</year> <comment>[cited 2025 Jun 16]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://books.google.com/books?id=FaRTxgEACAAJ">https://books.google.com/books?id=FaRTxgEACAAJ</ext-link>.</mixed-citation></ref>
<ref id="ref-62"><label>[62]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Karthik</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hamatta</surname> <given-names>HS</given-names></string-name>, <string-name><surname>Patthi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Krubakaran</surname> <given-names>C</given-names></string-name>, <string-name><surname>Pradhan</surname> <given-names>AK</given-names></string-name>, <string-name><surname>Rachapudi</surname> <given-names>V</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Ensemble-based multimodal medical imaging fusion for tumor segmentation</article-title>. <source>Biomed Signal Process Control</source>. <year>2024</year>;<volume>96</volume>(<issue>1</issue>):<fpage>106550</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2024.106550</pub-id>.</mixed-citation></ref>
<ref id="ref-63"><label>[63]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mustafa</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jaffar</surname> <given-names>A</given-names></string-name>, <string-name><surname>Rashid</surname> <given-names>M</given-names></string-name>, <string-name><surname>Akram</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bhatti</surname> <given-names>SM</given-names></string-name></person-group>. <article-title>Deep learning-based skin lesion analysis using hybrid ResUNet&#x002B;&#x002B; and modified AlexNet-Random Forest for enhanced segmentation and classification</article-title>. <source>PLoS One</source>. <year>2025</year>;<volume>20</volume>(<issue>1</issue>):<fpage>e0315120</fpage>. doi:<pub-id pub-id-type="doi">10.1371/journal.pone.0315120</pub-id>; <pub-id pub-id-type="pmid">39820868</pub-id></mixed-citation></ref>
<ref id="ref-64"><label>[64]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ergin</surname> <given-names>F</given-names></string-name>, <string-name><surname>Parlak</surname> <given-names>IB</given-names></string-name>, <string-name><surname>Adel</surname> <given-names>M</given-names></string-name>, <string-name><surname>G&#x00FC;l</surname> <given-names>&#x00D6;M</given-names></string-name>, <string-name><surname>Karpouzis</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Noise resilience in dermoscopic image segmentation: comparing deep learning architectures for enhanced accuracy</article-title>. <source>Electronics</source>. <year>2024</year>;<volume>13</volume>(<issue>17</issue>):<fpage>3414</fpage>. doi:<pub-id pub-id-type="doi">10.3390/electronics13173414</pub-id>.</mixed-citation></ref>
<ref id="ref-65"><label>[65]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Elazab</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Gardezi</surname> <given-names>SJS</given-names></string-name>, <string-name><surname>Bai</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>GP-GAN: brain tumor growth prediction using stacked 3D generative adversarial networks from longitudinal MR Images</article-title>. <source>Neural Netw</source>. <year>2020</year>;<volume>132</volume>:<fpage>321</fpage>&#x2013;<lpage>32</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neunet.2020.09.004</pub-id>; <pub-id pub-id-type="pmid">32977277</pub-id></mixed-citation></ref>
<ref id="ref-66"><label>[66]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alebiosu</surname> <given-names>DO</given-names></string-name>, <string-name><surname>Dharmaratne</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lim</surname> <given-names>CH</given-names></string-name></person-group>. <article-title>Improving tuberculosis severity assessment in computed tomography images using novel DAvoU-Net segmentation and deep learning framework</article-title>. <source>Expert Syst Appl</source>. <year>2023</year>;<volume>213</volume>(<issue>5</issue>):<fpage>119287</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.eswa.2022.119287</pub-id>.</mixed-citation></ref>
<ref id="ref-67"><label>[67]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Long</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Microscopy cell nuclei segmentation with enhanced U-Net</article-title>. <source>BMC Bioinform</source>. <year>2020</year>;<volume>21</volume>(<issue>1</issue>):<fpage>8</fpage>. doi:<pub-id pub-id-type="doi">10.1186/s12859-019-3332-1</pub-id>; <pub-id pub-id-type="pmid">31914944</pub-id></mixed-citation></ref>
<ref id="ref-68"><label>[68]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pan</surname> <given-names>X</given-names></string-name>, <string-name><surname>Li</surname> <given-names>L</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>D</given-names></string-name>, <string-name><surname>He</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>H</given-names></string-name></person-group>. <article-title>An accurate nuclei segmentation algorithm in pathological image based on deep semantic network</article-title>. <source>IEEE Access</source>. <year>2019</year>;<volume>7</volume>:<fpage>110674</fpage>&#x2013;<lpage>86</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2019.2934486</pub-id>.</mixed-citation></ref>
<ref id="ref-69"><label>[69]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sinitca</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Kayumov</surname> <given-names>AR</given-names></string-name>, <string-name><surname>Zelenikhin</surname> <given-names>PV</given-names></string-name>, <string-name><surname>Porfiriev</surname> <given-names>AG</given-names></string-name>, <string-name><surname>Kaplun</surname> <given-names>DI</given-names></string-name>, <string-name><surname>Bogachev</surname> <given-names>MI</given-names></string-name></person-group>. <article-title>Segmentation of patchy areas in biomedical images based on local edge density estimation</article-title>. <source>Biomed Signal Process Control</source>. <year>2023</year>;<volume>79</volume>(<issue>2</issue>):<fpage>104189</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2022.104189</pub-id>.</mixed-citation></ref>
<ref id="ref-70"><label>[70]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lv</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Mms-net: multi-level multi-scale feature extraction network for medical image segmentation</article-title>. <source>Biomed Signal Process Control</source>. <year>2023</year>;<volume>86</volume>(<issue>2</issue>):<fpage>105330</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2023.105330</pub-id>.</mixed-citation></ref>
<ref id="ref-71"><label>[71]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ghosh</surname> <given-names>S</given-names></string-name>, <string-name><surname>Das</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Multi-scale morphology-aided deep medical image segmentation</article-title>. <source>Eng Appl Artif Intell</source>. <year>2024</year>;<volume>137</volume>(<issue>8</issue>):<fpage>109047</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.engappai.2024.109047</pub-id>.</mixed-citation></ref>
<ref id="ref-72"><label>[72]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Mei</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Multi-phase and multi-level selective feature fusion for automated pancreas segmentation from CT images</article-title>. In: <conf-name>International Conference on Medical Image Computing and Computer-Assisted Intervention</conf-name>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2020 Oct 4&#x2013;8</year>. p. <fpage>460</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-73"><label>[73]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kushnure</surname> <given-names>DT</given-names></string-name>, <string-name><surname>Tyagi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Talbar</surname> <given-names>SN</given-names></string-name></person-group>. <article-title>LiM-Net: lightweight multi-level multiscale network with deep residual learning for automatic liver segmentation in CT images</article-title>. <source>Biomed Signal Process Control</source>. <year>2023</year>;<volume>80</volume>:<fpage>104305</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.bspc.2022.104305</pub-id>.</mixed-citation></ref>
<ref id="ref-74"><label>[74]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cheng</surname> <given-names>R</given-names></string-name>, <string-name><surname>Roth</surname> <given-names>HR</given-names></string-name>, <string-name><surname>Lay</surname> <given-names>N</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>L</given-names></string-name>, <string-name><surname>Turkbey</surname> <given-names>B</given-names></string-name>, <string-name><surname>Gandler</surname> <given-names>W</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Automatic magnetic resonance prostate segmentation by deep learning with holistically nested networks</article-title>. <source>J Med Imaging</source>. <year>2017</year>;<volume>4</volume>(<issue>4</issue>):<fpage>041302</fpage>. doi:<pub-id pub-id-type="doi">10.1117/1.jmi.4.4.041302</pub-id>; <pub-id pub-id-type="pmid">28840173</pub-id></mixed-citation></ref>
<ref id="ref-75"><label>[75]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Automatic segmentation of ovarian follicles using deep neural network combined with edge information</article-title>. <source>Front Reproduct Heal</source>. <year>2022</year>;<volume>4</volume>:<fpage>877216</fpage>. doi:<pub-id pub-id-type="doi">10.3389/frph.2022.877216</pub-id>; <pub-id pub-id-type="pmid">36303627</pub-id></mixed-citation></ref>
<ref id="ref-76"><label>[76]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wan</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Che</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Cao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Han</surname> <given-names>X</given-names></string-name>, <string-name><surname>Si</surname> <given-names>X</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>BIF-Net: boundary information fusion network for abdominal aortic aneurysm segmentation</article-title>. <source>Comput Biol Med</source>. <year>2024</year>;<volume>183</volume>(<issue>4</issue>):<fpage>109191</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2024.109191</pub-id>; <pub-id pub-id-type="pmid">39393127</pub-id></mixed-citation></ref>
<ref id="ref-77"><label>[77]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Youssef</surname> <given-names>D</given-names></string-name>, <string-name><surname>Atef</surname> <given-names>H</given-names></string-name>, <string-name><surname>Gamal</surname> <given-names>S</given-names></string-name>, <string-name><surname>El-Azab</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ismail</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Early breast cancer prediction using thermal images and hybrid feature extraction based system</article-title>. <source>IEEE Access</source>. <year>2025</year>;<volume>13</volume>(<issue>4</issue>):<fpage>29327</fpage>&#x2013;<lpage>39</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2025.3541051</pub-id>.</mixed-citation></ref>
<ref id="ref-78"><label>[78]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ryu</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Shin</surname> <given-names>K</given-names></string-name>, <string-name><surname>Shin</surname> <given-names>SW</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>SH</given-names></string-name>, <string-name><surname>Seo</surname> <given-names>SM</given-names></string-name>, <string-name><surname>Koh</surname> <given-names>SH</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Enhanced diagnosis of pes planus and pes cavus using deep learning-based segmentation of weight-bearing lateral foot radiographs: a comparative observer study</article-title>. <source>Biomed Eng Lett</source>. <year>2025</year>;<volume>15</volume>(<issue>1</issue>):<fpage>203</fpage>&#x2013;<lpage>15</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s13534-024-00439-3</pub-id>; <pub-id pub-id-type="pmid">39781051</pub-id></mixed-citation></ref>
<ref id="ref-79"><label>[79]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hsiao</surname> <given-names>C-H</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>P-C</given-names></string-name>, <string-name><surname>Chung</surname> <given-names>L-A</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>FY-S</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>F-J</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>S-Y</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A deep learning-based precision and automatic kidney segmentation system using efficient feature pyramid networks in computed tomography images</article-title>. <source>Comput Methods Programs Biomed</source>. <year>2022</year>;<volume>221</volume>(<issue>10225</issue>):<fpage>106854</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.cmpb.2022.106854</pub-id>; <pub-id pub-id-type="pmid">35567864</pub-id></mixed-citation></ref>
<ref id="ref-80"><label>[80]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Mao</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>RFPNet: reorganizing feature pyramid networks for medical image segmentation</article-title>. <source>Comput Biol Med</source>. <year>2023</year>;<volume>163</volume>(<issue>1</issue>):<fpage>107108</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2023.107108</pub-id>; <pub-id pub-id-type="pmid">37321104</pub-id></mixed-citation></ref>
<ref id="ref-81"><label>[81]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Xing</surname> <given-names>L</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Expressive feature representation pyramid network for pulmonary nodule detection</article-title>. <source>Multim Syst</source>. <year>2024</year>;<volume>30</volume>(<issue>6</issue>):<fpage>1</fpage>&#x2013;<lpage>18</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00530-024-01532-4</pub-id>.</mixed-citation></ref>
<ref id="ref-82"><label>[82]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>R</given-names></string-name>, <string-name><surname>Dong</surname> <given-names>W</given-names></string-name>, <string-name><surname>He</surname> <given-names>H</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>S</given-names></string-name></person-group>. <article-title>HistoNeXt: dual-mechanism feature pyramid network for cell nuclear segmentation and classification</article-title>. <source>BMC Med Imaging</source>. <year>2025</year>;<volume>25</volume>(<issue>1</issue>):<fpage>9</fpage>. doi:<pub-id pub-id-type="doi">10.1186/s12880-025-01550-2</pub-id>; <pub-id pub-id-type="pmid">39773093</pub-id></mixed-citation></ref>
<ref id="ref-83"><label>[83]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wei</surname> <given-names>B</given-names></string-name>, <string-name><surname>Hao</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Micro-KTNet: microstructure knowledge transfer learning for fiber masterbatch agglomeration recognition</article-title>. <source>J Ind Text</source>. <year>2025</year>;<volume>55</volume>(<issue>1</issue>):<fpage>15280837241307864</fpage>. doi:<pub-id pub-id-type="doi">10.1177/15280837241307864</pub-id>.</mixed-citation></ref>
<ref id="ref-84"><label>[84]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Gao</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>HT</given-names></string-name>, <string-name><surname>Song</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Bottom-up and top-down: bidirectional additive net for edge detection</article-title>. In: <conf-name>Proceedings of the Twenty-Ninth International Conference on International Joint Conferences on Artificial Intelligence</conf-name>; <year>2021</year>. p. <fpage>594</fpage>&#x2013;<lpage>600</lpage>.</mixed-citation></ref>
<ref id="ref-85"><label>[85]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Teng</surname> <given-names>L</given-names></string-name>, <string-name><surname>Qiao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Shafiq</surname> <given-names>M</given-names></string-name>, <string-name><surname>Srivastava</surname> <given-names>G</given-names></string-name>, <string-name><surname>Javed</surname> <given-names>AR</given-names></string-name>, <string-name><surname>Gadekallu</surname> <given-names>TR</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>FLPK-BiSeNet: federated learning based on priori knowledge and bilateral segmentation network for image edge extraction</article-title>. <source>IEEE Transact Netw Serv Manag</source>. <year>2023</year>;<volume>20</volume>(<issue>2</issue>):<fpage>1529</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnsm.2023.3273991</pub-id>.</mixed-citation></ref>
<ref id="ref-86"><label>[86]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Boulch</surname> <given-names>A</given-names></string-name>, <string-name><surname>Puy</surname> <given-names>G</given-names></string-name>, <string-name><surname>Marlet</surname> <given-names>R</given-names></string-name></person-group>. <article-title>FKAConv: feature-kernel alignment for point cloud convolution</article-title>. In: <conf-name>Proceedings of the Asian Conference on Computer Vision</conf-name>; <publisher-loc>Kyoto, Japan</publisher-loc>; <year>2020</year>. </mixed-citation></ref>
<ref id="ref-87"><label>[87]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>B</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>C</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Diagnosis of coronary heart disease through deep learning-based segmentation and localization in computed tomography angiography</article-title>. <source>IEEE Access</source>. <year>2025</year>;<volume>13</volume>(<issue>6</issue>):<fpage>10177</fpage>&#x2013;<lpage>93</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2025.3528638</pub-id>.</mixed-citation></ref>
<ref id="ref-88"><label>[88]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>K</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>W</given-names></string-name>, <string-name><surname>Cui</surname> <given-names>X</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>MPF-Net: a multi-scale feature learning network enhanced by prior knowledge integration for medical image segmentation</article-title>. <source>Alexandria Eng J</source>. <year>2025</year>;<volume>128</volume>:<fpage>200</fpage>&#x2013;<lpage>12</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aej.2025.05.058</pub-id>.</mixed-citation></ref>
<ref id="ref-89"><label>[89]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kisting</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Hinshaw</surname> <given-names>JL</given-names></string-name>, <string-name><surname>Toia</surname> <given-names>GV</given-names></string-name>, <string-name><surname>Ziemlewicz</surname> <given-names>TJ</given-names></string-name>, <string-name><surname>Kisting</surname> <given-names>AL</given-names></string-name>, <string-name><surname>Lee Jr</surname> <given-names>FT</given-names>
</string-name>, <etal>et al</etal></person-group>. <article-title>Artificial intelligence&#x2013;aided selection of needle pathways: proof-of-concept in percutaneous lung biopsies</article-title>. <source>J Vascul Intervent Radiology</source>. <year>2024</year>;<volume>35</volume>(<issue>5</issue>):<fpage>770</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.jvir.2023.11.016</pub-id> 
<comment>e1</comment>; <pub-id pub-id-type="pmid">38008378</pub-id></mixed-citation></ref>
<ref id="ref-90"><label>[90]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Gong</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Qiu</surname> <given-names>J</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Exploring the value of multiparametric quantitative magnetic resonance imaging in avoiding unnecessary biopsy in patients with PI-RADS 3&#x2013;4</article-title>. <source>Abdom Radiol</source>. <year>2025</year>:<fpage>1</fpage>&#x2013;<lpage>11</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00261-025-04901-3</pub-id>; <pub-id pub-id-type="pmid">40137950</pub-id></mixed-citation></ref>
<ref id="ref-91"><label>[91]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hussain</surname> <given-names>T</given-names></string-name>, <string-name><surname>Shouno</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hussain</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hussain</surname> <given-names>D</given-names></string-name>, <string-name><surname>Ismail</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mir</surname> <given-names>TH</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>EFFResNet-ViT: a fusion-based convolutional and vision transformer model for explainable medical image classification</article-title>. <source>IEEE Access</source>. <year>2025</year>;<volume>13</volume>(<issue>86</issue>):<fpage>54040</fpage>&#x2013;<lpage>68</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2025.3554184</pub-id>.</mixed-citation></ref>
<ref id="ref-92"><label>[92]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hussain</surname> <given-names>T</given-names></string-name>, <string-name><surname>Shouno</surname> <given-names>H</given-names></string-name>, <string-name><surname>Mohammed</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Marhoon</surname> <given-names>HA</given-names></string-name>, <string-name><surname>Alam</surname> <given-names>T</given-names></string-name></person-group>. <article-title>DCSSGA-UNet: biomedical image segmentation with DenseNet channel spatial and Semantic Guidance Attention</article-title>. <source>Knowl Based Syst</source>. <year>2025</year>;<volume>314</volume>(<issue>11</issue>):<fpage>113233</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.knosys.2025.113233</pub-id>.</mixed-citation></ref>
<ref id="ref-93"><label>[93]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Faster R-CNN: towards real-time object detection with region proposal networks</article-title>. <source>IEEE Transact Pattern Anal Mach Intell</source>. <year>2016</year>;<volume>39</volume>(<issue>6</issue>):<fpage>1137</fpage>&#x2013;<lpage>49</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tpami.2016.2577031</pub-id>; <pub-id pub-id-type="pmid">27295650</pub-id></mixed-citation></ref>
<ref id="ref-94"><label>[94]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>T-Y</given-names></string-name>, <string-name><surname>Doll&#x00E1;r</surname> <given-names>P</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name>, <string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hariharan</surname> <given-names>B</given-names></string-name>, <string-name><surname>Belongie</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Feature pyramid networks for object detection</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Honolulu, HI, USA</publisher-loc>; <year>2017</year>. p. <fpage>936</fpage>&#x2013;<lpage>44</lpage>.</mixed-citation></ref>
<ref id="ref-95"><label>[95]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Huang</surname> <given-names>G</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Van Der Maaten</surname> <given-names>L</given-names></string-name>, <string-name><surname>Weinberger</surname> <given-names>KQ</given-names></string-name></person-group>. <article-title>Densely connected convolutional networks</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Honolulu, HI, USA</publisher-loc>; <year>2017</year>. p. <fpage>2261</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-96"><label>[96]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Le</surname> <given-names>N-M</given-names></string-name>, <string-name><surname>Le</surname> <given-names>D-H</given-names></string-name>, <string-name><surname>Pham</surname> <given-names>V-T</given-names></string-name>, <string-name><surname>Tran</surname> <given-names>T-T</given-names></string-name></person-group>. <article-title>DR-Unet: rethinking the ResUnet&#x002B;&#x002B; architecture with dual ResPath skip connection for Nuclei segmentation</article-title>. In: <conf-name>2021 8th NAFOSTED Conference on Information and Computer Science (NICS); Hanoi, Vietnam</conf-name>; <publisher-loc>2021</publisher-loc>. p. <fpage>194</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-97"><label>[97]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>W</given-names></string-name>, <string-name><surname>Feng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xing</surname> <given-names>G</given-names></string-name>, <string-name><surname>von Deneen</surname> <given-names>KM</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A dense connection encoding&#x2013;decoding convolutional neural network structure for semantic segmentation of thymoma</article-title>. <source>Neurocomputing</source>. <year>2021</year>;<volume>451</volume>:<fpage>1</fpage>&#x2013;<lpage>11</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2021.04.023</pub-id>.</mixed-citation></ref>
<ref id="ref-98"><label>[98]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Li</surname> <given-names>J</given-names></string-name></person-group>. <article-title>MSF-TransUNet: a multi-scale fusion approach for precise cardiac image segmentation</article-title>. In: <conf-name>Proceedings of the 2024 4th International Conference on Artificial Intelligence, Big Data and Algorithms</conf-name>; <publisher-loc>Zhengzhou, China</publisher-loc>; <year>2024</year>. p. <fpage>1139</fpage>&#x2013;<lpage>46</lpage>.</mixed-citation></ref>
<ref id="ref-99"><label>[99]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>J</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>P</given-names></string-name>, <string-name><surname>Chai</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>D</given-names></string-name></person-group>. <article-title>RAU-Net: U-Net network based on residual multi-scale fusion and attention skip layer for overall spine segmentation</article-title>. <source>Mach Vision Appl</source>. <year>2023</year>;<volume>34</volume>(<issue>1</issue>):<fpage>10</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s00138-022-01360-4</pub-id>.</mixed-citation></ref>
<ref id="ref-100"><label>[100]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>T</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>He</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Uncovering prototypical knowledge for weakly open-vocabulary semantic segmentation</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2023</year>;<volume>36</volume>:<fpage>73652</fpage>&#x2013;<lpage>65</lpage>.</mixed-citation></ref>
<ref id="ref-101"><label>[101]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Automatic polyp segmentation via multi-scale subtraction network</article-title>. In: <conf-name>Medical Image Computing and Computer Assisted Intervention&#x2013;MICCAI 2021: 24th International Conference; 2021 Sep 27&#x2013;Oct 1</conf-name>;<publisher-loc>Strasbourg,
France</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2021</year>. p. <fpage>120</fpage>&#x2013;<lpage>30</lpage>.</mixed-citation></ref>
<ref id="ref-102"><label>[102]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Koonce</surname> <given-names>B</given-names></string-name></person-group>. <source>ResNet 50. Convolutional neural networks with swift for tensorflow: image recognition and dataset categorization</source>. <publisher-loc>New York, NY, USA: Springer</publisher-loc>; <year>2021</year>. p. <fpage>63</fpage>&#x2013;<lpage>72</lpage>.</mixed-citation></ref>
<ref id="ref-103"><label>[103]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Windhager</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zanotelli</surname> <given-names>VRT</given-names></string-name>, <string-name><surname>Schulz</surname> <given-names>D</given-names></string-name>, <string-name><surname>Meyer</surname> <given-names>L</given-names></string-name>, <string-name><surname>Daniel</surname> <given-names>M</given-names></string-name>, <string-name><surname>Bodenmiller</surname> <given-names>B</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>An end-to-end workflow for multiplexed image processing and analysis</article-title>. <source>Nature Protocols</source>. <year>2023</year>;<volume>18</volume>(<issue>11</issue>):<fpage>3565</fpage>&#x2013;<lpage>613</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s41596-023-00881-0</pub-id>; <pub-id pub-id-type="pmid">37816904</pub-id></mixed-citation></ref>
<ref id="ref-104"><label>[104]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Deep residual learning for image recognition</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>; <year>2016</year>. p. <fpage>770</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-105"><label>[105]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Meyer</surname> <given-names>F</given-names></string-name>, <string-name><surname>Beucher</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Morphological segmentation</article-title>. <source>J Visual Communicat Image Represent</source>. <year>1990</year>;<volume>1</volume>(<issue>1</issue>):<fpage>21</fpage>&#x2013;<lpage>46</lpage>. doi:<pub-id pub-id-type="doi">10.1016/1047-3203(90)90014-m</pub-id>.</mixed-citation></ref>
<ref id="ref-106"><label>[106]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kochetov</surname> <given-names>B</given-names></string-name>, <string-name><surname>Bell</surname> <given-names>PD</given-names></string-name>, <string-name><surname>Garcia</surname> <given-names>PS</given-names></string-name>, <string-name><surname>Shalaby</surname> <given-names>AS</given-names></string-name>, <string-name><surname>Raphael</surname> <given-names>R</given-names></string-name>, <string-name><surname>Raymond</surname> <given-names>B</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>UNSEG: unsupervised segmentation of cells and their nuclei in complex tissue samples</article-title>. <source>Communicat Biol</source>. <year>2024</year>;<volume>7</volume>(<issue>1</issue>):<fpage>1062</fpage>. doi:<pub-id pub-id-type="doi">10.1101/2023.11.13.566842</pub-id>; <pub-id pub-id-type="pmid">38014263</pub-id></mixed-citation></ref>
<ref id="ref-107"><label>[107]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nishimura</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>C</given-names></string-name>, <string-name><surname>Watanabe</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ker</surname> <given-names>DFE</given-names></string-name>, <string-name><surname>Bise</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Weakly supervised cell instance segmentation under various conditions</article-title>. <source>Med Image Anal</source>. <year>2021</year>;<volume>73</volume>(<issue>7</issue>):<fpage>102182</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2021.102182</pub-id>; <pub-id pub-id-type="pmid">34340103</pub-id></mixed-citation></ref>
<ref id="ref-108"><label>[108]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Khoreva</surname> <given-names>A</given-names></string-name>, <string-name><surname>Benenson</surname> <given-names>R</given-names></string-name>, <string-name><surname>Hosang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Hein</surname> <given-names>M</given-names></string-name>, <string-name><surname>Schiele</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Simple does it: weakly supervised instance and semantic segmentation</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Honolulu, HI, USA</publisher-loc>; <year>2017</year>. p. <fpage>1665</fpage>&#x2013;<lpage>74</lpage>.</mixed-citation></ref>
<ref id="ref-109"><label>[109]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Stringer</surname> <given-names>C</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Michaelos</surname> <given-names>M</given-names></string-name>, <string-name><surname>Pachitariu</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Cellpose: a generalist algorithm for cellular segmentation</article-title>. <source>Nature Methods</source>. <year>2021</year>;<volume>18</volume>(<issue>1</issue>):<fpage>100</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1101/2020.02.02.931238</pub-id>.</mixed-citation></ref>
<ref id="ref-110"><label>[110]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shrestha</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kuang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Efficient end-to-end learning for cell segmentation with machine generated weak annotations</article-title>. <source>Communicat Biol</source>. <year>2023</year>;<volume>6</volume>(<issue>1</issue>):<fpage>232</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s42003-023-04608-5</pub-id>; <pub-id pub-id-type="pmid">36864076</pub-id></mixed-citation></ref>
<ref id="ref-111"><label>[111]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Long</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shelhamer</surname> <given-names>E</given-names></string-name>, <string-name><surname>Darrell</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Fully convolutional networks for semantic segmentation</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Boston, MA, USA</publisher-loc>; <year>2015</year>. p. <fpage>3431</fpage>&#x2013;<lpage>40</lpage>.</mixed-citation></ref>
<ref id="ref-112"><label>[112]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Kulkarni</surname> <given-names>P</given-names></string-name>, <string-name><surname>Kanhere</surname> <given-names>A</given-names></string-name>, <string-name><surname>Savani</surname> <given-names>D</given-names></string-name>, <string-name><surname>Chan</surname> <given-names>A</given-names></string-name>, <string-name><surname>Chatterjee</surname> <given-names>D</given-names></string-name>, <string-name><surname>Yi</surname> <given-names>PH</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Anytime, anywhere, anyone: investigating the feasibility of segment anything model for crowd-sourcing medical image annotations</article-title>. <comment>arXiv:2403.15218. 2024</comment>.</mixed-citation></ref>
<ref id="ref-113"><label>[113]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kumar</surname> <given-names>N</given-names></string-name>, <string-name><surname>Verma</surname> <given-names>R</given-names></string-name>, <string-name><surname>Anand</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Onder</surname> <given-names>OF</given-names></string-name>, <string-name><surname>Tsougenis</surname> <given-names>E</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>A multi-organ nucleus segmentation challenge</article-title>. <source>IEEE Transact Med Imag</source>. <year>2019</year>;<volume>39</volume>(<issue>5</issue>):<fpage>1380</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TMI.2019.2947628</pub-id>; <pub-id pub-id-type="pmid">31647422</pub-id></mixed-citation></ref>
<ref id="ref-114"><label>[114]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kavur</surname> <given-names>AE</given-names></string-name>, <string-name><surname>Gezer</surname> <given-names>NS</given-names></string-name>, <string-name><surname>Bar&#x0131;&#x015F;</surname> <given-names>M</given-names></string-name>, <string-name><surname>Aslan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Conze</surname> <given-names>P-H</given-names></string-name>, <string-name><surname>Groza</surname> <given-names>V</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>CHAOS challenge-combined (CT-MR) healthy abdominal organ segmentation</article-title>. <source>Med Image Anal</source>. <year>2021</year>;<volume>69</volume>(<issue>4</issue>):<fpage>101950</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2020.101950</pub-id>; <pub-id pub-id-type="pmid">33421920</pub-id></mixed-citation></ref>
<ref id="ref-115"><label>[115]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bernal</surname> <given-names>J</given-names></string-name>, <string-name><surname>S&#x00E1;nchez</surname> <given-names>FJ</given-names></string-name>, <string-name><surname>Fern&#x00E1;ndez-Esparrach</surname> <given-names>G</given-names></string-name>, <string-name><surname>Gil</surname> <given-names>D</given-names></string-name>, <string-name><surname>Rodr&#x00ED;guez</surname> <given-names>C</given-names></string-name>, <string-name><surname>Vilari&#x00F1;o</surname> <given-names>F</given-names></string-name></person-group>. <article-title>WM-DOVA maps for accurate polyp highlighting in colonoscopy: validation vs. saliency maps from physicians</article-title>. <source>Comput Med Imag Grap</source>. <year>2015</year>;<volume>43</volume>(<issue>1258</issue>):<fpage>99</fpage>&#x2013;<lpage>111</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compmedimag.2015.02.007</pub-id>; <pub-id pub-id-type="pmid">25863519</pub-id></mixed-citation></ref>
<ref id="ref-116"><label>[116]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Menze</surname> <given-names>BH</given-names></string-name>, <string-name><surname>Jakab</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bauer</surname> <given-names>S</given-names></string-name>, <string-name><surname>Kalpathy-Cramer</surname> <given-names>J</given-names></string-name>, <string-name><surname>Farahani</surname> <given-names>K</given-names></string-name>, <string-name><surname>Kirby</surname> <given-names>J</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>The multimodal brain tumor image segmentation benchmark (BRATS)</article-title>. <source>IEEE Transact Med Imag</source>. <year>2014</year>;<volume>34</volume>(<issue>10</issue>):<fpage>1993</fpage>&#x2013;<lpage>2024</lpage>.</mixed-citation></ref>
<ref id="ref-117"><label>[117]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sirinukunwattana</surname> <given-names>K</given-names></string-name>, <string-name><surname>Pluim</surname> <given-names>JP</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>X</given-names></string-name>, <string-name><surname>Heng</surname> <given-names>P-A</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>YB</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Gland segmentation in colon histology images: the glas challenge contest</article-title>. <source>Medical Image Analysis</source>. <year>2017</year>;<volume>35</volume>(<issue>3</issue>):<fpage>489</fpage>&#x2013;<lpage>502</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2016.08.008</pub-id>; <pub-id pub-id-type="pmid">27614792</pub-id></mixed-citation></ref>
<ref id="ref-118"><label>[118]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Al-Dhabyani</surname> <given-names>W</given-names></string-name>, <string-name><surname>Gomaa</surname> <given-names>M</given-names></string-name>, <string-name><surname>Khaled</surname> <given-names>H</given-names></string-name>, <string-name><surname>Fahmy</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Dataset of breast ultrasound images</article-title>. <source>Data Brief</source>. <year>2020</year>;<volume>28</volume>(<issue>5</issue>):<fpage>104863</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.dib.2019.104863</pub-id>; <pub-id pub-id-type="pmid">31867417</pub-id></mixed-citation></ref>
<ref id="ref-119"><label>[119]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Varma</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Gardner</surname> <given-names>R</given-names></string-name>, <string-name><surname>Dunnmon</surname> <given-names>J</given-names></string-name>, <string-name><surname>Khandwala</surname> <given-names>N</given-names></string-name>, <string-name><surname>Rajpurkar</surname> <given-names>P</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Automated abnormality detection in lower extremity radiographs using deep learning</article-title>. <source>Nat Mach Intell</source>. <year>2019</year>;<volume>1</volume>(<issue>12</issue>):<fpage>578</fpage>&#x2013;<lpage>83</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s42256-019-0126-0</pub-id>.</mixed-citation></ref>
<ref id="ref-120"><label>[120]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ma</surname> <given-names>J</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Gu</surname> <given-names>S</given-names></string-name>, <string-name><surname>An</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Ge</surname> <given-names>C</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Fast and low-GPU-memory abdomen CT organ segmentation: the FLARE challenge</article-title>. <source>Med Image Anal</source>. <year>2022</year>;<volume>82</volume>(<issue>1</issue>):<fpage>102616</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2022.102616</pub-id>; <pub-id pub-id-type="pmid">36179380</pub-id></mixed-citation></ref>
<ref id="ref-121"><label>[121]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>P</given-names></string-name>, <string-name><surname>Tang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Myronenko</surname> <given-names>A</given-names></string-name>, <string-name><surname>Nath</surname> <given-names>V</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Z</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>VISTA3D: a unified segmentation foundation model for 3D medical imaging</article-title>. In: <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision and Pattern Recognition</conf-name>; <publisher-loc>Seattle, WA, USA</publisher-loc>; <year>2024</year>. p. <fpage>20863</fpage>&#x2013;<lpage>73</lpage>.</mixed-citation></ref>
<ref id="ref-122"><label>[122]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hssayeni</surname> <given-names>M</given-names></string-name>, <string-name><surname>Croock</surname> <given-names>M</given-names></string-name>, <string-name><surname>Salman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Al-khafaji</surname> <given-names>H</given-names></string-name>, <string-name><surname>Yahya</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Ghoraani</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Computed tomography images for intracranial hemorrhage detection and segmentation</article-title>. <source>Intracr Hemorr Segmentat Using Deep Convolut Model Data</source>. <year>2020</year>;<volume>5</volume>(<issue>1</issue>):<fpage>14</fpage>.</mixed-citation></ref>
<ref id="ref-123"><label>[123]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yap</surname> <given-names>MH</given-names></string-name>, <string-name><surname>Pons</surname> <given-names>G</given-names></string-name>, <string-name><surname>Marti</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ganau</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sentis</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zwiggelaar</surname> <given-names>R</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Automated breast ultrasound lesions detection using convolutional neural networks</article-title>. <source>IEEE J Biomed Health Inform</source>. <year>2017</year>;<volume>22</volume>(<issue>4</issue>):<fpage>1218</fpage>&#x2013;<lpage>26</lpage>. doi:<pub-id pub-id-type="doi">10.1109/jbhi.2017.2731873</pub-id>; <pub-id pub-id-type="pmid">28796627</pub-id></mixed-citation></ref>
<ref id="ref-124"><label>[124]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kumar</surname> <given-names>N</given-names></string-name>, <string-name><surname>Verma</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sharma</surname> <given-names>S</given-names></string-name>, <string-name><surname>Bhargava</surname> <given-names>S</given-names></string-name>, <string-name><surname>Vahadane</surname> <given-names>A</given-names></string-name>, <string-name><surname>Sethi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>A dataset and a technique for generalized nuclear segmentation for computational pathology</article-title>. <source>IEEE Transact Med Imag</source>. <year>2017</year>;<volume>36</volume>(<issue>7</issue>):<fpage>1550</fpage>&#x2013;<lpage>60</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmi.2017.2677499</pub-id>; <pub-id pub-id-type="pmid">28287963</pub-id></mixed-citation></ref>
<ref id="ref-125"><label>[125]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Graham</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Gamper</surname> <given-names>J</given-names></string-name>, <string-name><surname>Dou</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Heng</surname> <given-names>P-A</given-names></string-name>, <string-name><surname>Snead</surname> <given-names>D</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>MILD-Net: minimal information loss dilated network for gland instance segmentation in colon histology images</article-title>. <source>Med Image Anal</source>. <year>2019</year>;<volume>52</volume>(<issue>5</issue>):<fpage>199</fpage>&#x2013;<lpage>211</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2018.12.001</pub-id>; <pub-id pub-id-type="pmid">30594772</pub-id></mixed-citation></ref>
<ref id="ref-126"><label>[126]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Gamper</surname> <given-names>J</given-names></string-name>, <string-name><surname>Alemi Koohbanani</surname> <given-names>N</given-names></string-name>, <string-name><surname>Benet</surname> <given-names>K</given-names></string-name>, <string-name><surname>Khuram</surname> <given-names>A</given-names></string-name>, <string-name><surname>Rajpoot</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Pannuke: an open pan-cancer histology dataset for nuclei instance segmentation and classification</article-title>. In: <conf-name>Digital Pathology: 15th European Congress, ECDP 2019; 2019 Apr 10&#x2013;13</conf-name>; <publisher-loc>Warwick, UK</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2019</year>. </mixed-citation></ref>
<ref id="ref-127"><label>[127]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Graham</surname> <given-names>S</given-names></string-name>, <string-name><surname>Vu</surname> <given-names>QD</given-names></string-name>, <string-name><surname>Raza</surname> <given-names>SEA</given-names></string-name>, <string-name><surname>Azam</surname> <given-names>A</given-names></string-name>, <string-name><surname>Tsang</surname> <given-names>YW</given-names></string-name>, <string-name><surname>Kwak</surname> <given-names>JT</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Hover-net: simultaneous segmentation and classification of nuclei in multi-tissue histology images</article-title>. <source>Med Image Anal</source>. <year>2019</year>;<volume>58</volume>(<issue>7</issue>):<fpage>101563</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.media.2019.101563</pub-id>; <pub-id pub-id-type="pmid">31561183</pub-id></mixed-citation></ref>
<ref id="ref-128"><label>[128]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pape</surname> <given-names>C</given-names></string-name>, <string-name><surname>Remme</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wolny</surname> <given-names>A</given-names></string-name>, <string-name><surname>Olberg</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wolf</surname> <given-names>S</given-names></string-name>, <string-name><surname>Cerrone</surname> <given-names>L</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Microscopy-based assay for semi-quantitative detection of SARS-CoV-2 specific antibodies in human sera: a semi-quantitative, high throughput, microscopy-based assay expands existing approaches to measure SARS-CoV-2 specific antibody levels in human sera</article-title>. <source>Bioessays</source>. <year>2021</year>;<volume>43</volume>(<issue>3</issue>):<fpage>2000257</fpage>. doi:<pub-id pub-id-type="doi">10.1002/bies.202000257</pub-id>; <pub-id pub-id-type="pmid">33377226</pub-id></mixed-citation></ref>
<ref id="ref-129"><label>[129]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Edlund</surname> <given-names>C</given-names></string-name>, <string-name><surname>Jackson</surname> <given-names>TR</given-names></string-name>, <string-name><surname>Khalid</surname> <given-names>N</given-names></string-name>, <string-name><surname>Bevan</surname> <given-names>N</given-names></string-name>, <string-name><surname>Dale</surname> <given-names>T</given-names></string-name>, <string-name><surname>Dengel</surname> <given-names>A</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>LIVECell&#x2014;A large-scale dataset for label-free live cell segmentation</article-title>. <source>Nat Meth</source>. <year>2021</year>;<volume>18</volume>(<issue>9</issue>):<fpage>1038</fpage>&#x2013;<lpage>45</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s41592-021-01249-6</pub-id>; <pub-id pub-id-type="pmid">34462594</pub-id></mixed-citation></ref>
<ref id="ref-130"><label>[130]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Orloff</surname> <given-names>DN</given-names></string-name>, <string-name><surname>Iwasa</surname> <given-names>JH</given-names></string-name>, <string-name><surname>Martone</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Ellisman</surname> <given-names>MH</given-names></string-name>, <string-name><surname>Kane</surname> <given-names>CM</given-names></string-name></person-group>. <article-title>The cell: an image library-CCDB: a curated repository of microscopy data</article-title>. <source>Nucleic Acids Res</source>. <year>2012</year>;<volume>41</volume>(<issue>D1</issue>):<fpage>D1241</fpage>&#x2013;<lpage>D50</lpage>. doi:<pub-id pub-id-type="doi">10.1093/nar/gks1257</pub-id>; <pub-id pub-id-type="pmid">23203874</pub-id></mixed-citation></ref>
<ref id="ref-131"><label>[131]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Stringer</surname> <given-names>C</given-names></string-name>, <string-name><surname>Pachitariu</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Cellpose3: one-click image restoration for improved cellular segmentation</article-title>. <source>bioRxiv</source>. <year>2024</year>. doi:<pub-id pub-id-type="doi">10.1101/2024.02.10.579780</pub-id>.</mixed-citation></ref>
<ref id="ref-132"><label>[132]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Caicedo</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Goodman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Karhohs</surname> <given-names>KW</given-names></string-name>, <string-name><surname>Cimini</surname> <given-names>BA</given-names></string-name>, <string-name><surname>Ackerman</surname> <given-names>J</given-names></string-name>, <string-name><surname>Haghighi</surname> <given-names>M</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl</article-title>. <source>Nat Meth</source>. <year>2019</year>;<volume>16</volume>(<issue>12</issue>):<fpage>1247</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s41592-019-0612-7</pub-id>; <pub-id pub-id-type="pmid">31636459</pub-id></mixed-citation></ref>
<ref id="ref-133"><label>[133]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ljosa</surname> <given-names>V</given-names></string-name>, <string-name><surname>Sokolnicki</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Carpenter</surname> <given-names>AE</given-names></string-name></person-group>. <article-title>Annotated high-throughput microscopy image sets for validation</article-title>. <source>Nat Meth</source>. <year>2012</year>;<volume>9</volume>(<issue>7</issue>):<fpage>637</fpage>. doi:<pub-id pub-id-type="doi">10.1038/nmeth.2083</pub-id>; <pub-id pub-id-type="pmid">22743765</pub-id></mixed-citation></ref>
<ref id="ref-134"><label>[134]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>F</given-names></string-name></person-group>. <article-title>FAFS-UNet: redesigning skip connections in UNet with feature aggregation and feature selection</article-title>. <source>Comput Biol Med</source>. <year>2024</year>;<volume>170</volume>(<issue>12</issue>):<fpage>108009</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.compbiomed.2024.108009</pub-id>; <pub-id pub-id-type="pmid">38242013</pub-id></mixed-citation></ref>
<ref id="ref-135"><label>[135]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sivamurugan</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sureshkumar</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Applying dual models on optimized LSTM with U-net segmentation for breast cancer diagnosis using mammogram images</article-title>. <source>Artif Intell Med</source>. <year>2023</year>;<volume>143</volume>(<issue>5</issue>):<fpage>102626</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.artmed.2023.102626</pub-id>; <pub-id pub-id-type="pmid">37673584</pub-id></mixed-citation></ref>
<ref id="ref-136"><label>[136]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Goshima</surname> <given-names>F</given-names></string-name>, <string-name><surname>Tanaka</surname> <given-names>R</given-names></string-name>, <string-name><surname>Matsumoto</surname> <given-names>I</given-names></string-name>, <string-name><surname>Ohkura</surname> <given-names>N</given-names></string-name>, <string-name><surname>Abe</surname> <given-names>T</given-names></string-name>, <string-name><surname>Segars</surname> <given-names>WP</given-names></string-name>, <etal>et al.</etal></person-group> <chapter-title>Deep learning-based algorithm to segment pediatric and adult lungs from dynamic chest radiography images using virtual patient datasets</chapter-title>. In: <source>Medical imaging 2024: physics of medical imaging</source>. <publisher-loc>San Diego, CA, USA</publisher-loc>: <publisher-name>SPIE</publisher-name>; <year>2024</year>.</mixed-citation></ref>
<ref id="ref-137"><label>[137]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Xia</surname> <given-names>S-J</given-names></string-name>, <string-name><surname>Vancoillie</surname> <given-names>L</given-names></string-name>, <string-name><surname>Sotoudeh-Paima</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zarei</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ho</surname> <given-names>FC</given-names></string-name>, <string-name><surname>Tushar</surname> <given-names>FI</given-names></string-name>, <etal>et al.</etal></person-group> <chapter-title>The role of harmonization: a systematic analysis of various task-based scenarios</chapter-title>. In: <source>Medical imaging 2025: physics of medical imaging</source>; <publisher-loc>San Diego, CA, USA</publisher-loc>: <publisher-name>SPIE</publisher-name>; <year>2025</year>.</mixed-citation></ref>
<ref id="ref-138"><label>[138]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Su</surname> <given-names>R</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>D</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>C</given-names></string-name></person-group>. <article-title>MSU-Net: multi-scale U-Net for 2D medical image segmentation</article-title>. <source>Front Genet</source>. <year>2021</year>;<volume>12</volume>:<fpage>639930</fpage>. doi:<pub-id pub-id-type="doi">10.3389/fgene.2021.639930</pub-id>; <pub-id pub-id-type="pmid">33679900</pub-id></mixed-citation></ref>
<ref id="ref-139"><label>[139]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>X</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>H</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>X</given-names></string-name>, <string-name><surname>Dou</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>C-W</given-names></string-name>, <string-name><surname>Heng</surname> <given-names>P-A</given-names></string-name></person-group>. <article-title>H-DenseUNet: hybrid densely connected UNet for liver and tumor segmentation from CT volumes</article-title>. <source>IEEE Transact Med Imag</source>. <year>2018</year>;<volume>37</volume>(<issue>12</issue>):<fpage>2663</fpage>&#x2013;<lpage>74</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmi.2018.2845918</pub-id>; <pub-id pub-id-type="pmid">29994201</pub-id></mixed-citation></ref>
<ref id="ref-140"><label>[140]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sahragard</surname> <given-names>E</given-names></string-name>, <string-name><surname>Farsi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Mohamadzadeh</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Advancing semantic segmentation: enhanced UNet algorithm with attention mechanism and deformable convolution</article-title>. <source>PLoS One</source>. <year>2025</year>;<volume>20</volume>(<issue>1</issue>):<fpage>e0305561</fpage>. doi:<pub-id pub-id-type="doi">10.1371/journal.pone.0305561</pub-id>; <pub-id pub-id-type="pmid">39820812</pub-id></mixed-citation></ref>
<ref id="ref-141"><label>[141]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ahmed</surname> <given-names>HK</given-names></string-name>, <string-name><surname>Tantawi</surname> <given-names>B</given-names></string-name>, <string-name><surname>Magdy</surname> <given-names>M</given-names></string-name>, <string-name><surname>Sayed</surname> <given-names>GI</given-names></string-name></person-group>. <article-title>Quantumedics: brain tumor diagnosis and analysis based on quantum computing and convolutional neural network</article-title>. In: <conf-name>International Conference on Advanced Intelligent Systems and Informatics</conf-name>. <publisher-loc>Cham</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2023</year>. p. <fpage>358</fpage>&#x2013;<lpage>67</lpage>.</mixed-citation></ref>
</ref-list>
</back></article>






