<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">39929</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2023.039929</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Ship Detection and Recognition Based on Improved YOLOv7</article-title>
<alt-title alt-title-type="left-running-head">Ship Detection and Recognition Based on Improved YOLOv7</alt-title>
<alt-title alt-title-type="right-running-head">Ship Detection and Recognition Based on Improved YOLOv7</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Wu</surname><given-names>Wei</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Li</surname><given-names>Xiulai</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Hu</surname><given-names>Zhuhua</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Liu</surname><given-names>Xiaozhang</given-names></name><xref ref-type="aff" rid="aff-3">3</xref><email>lxzh@hainanu.edu.cn</email></contrib>
<aff id="aff-1"><label>1</label><institution>School of Information and Communication Engineering, Hainan University</institution>, <addr-line>Haikou, 570228</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>School of Cyberspace Security, Hainan University</institution>, <addr-line>Haikou, 570228</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>School of Computer Science and Technology, Hainan University</institution>, <addr-line>Haikou, 570228</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Xiaozhang Liu. Email: <email>lxzh@hainanu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>09</day>
<month>6</month>
<year>2023</year></pub-date>
<volume>76</volume>
<issue>1</issue>
<fpage>489</fpage>
<lpage>498</lpage>
<history>
<date date-type="received"><day>24</day><month>2</month><year>2023</year></date>
<date date-type="accepted"><day>18</day><month>4</month><year>2023</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Wu et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Wu et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_39929.pdf"></self-uri>
<abstract>
<p>In this paper, an advanced YOLOv7 model is proposed to tackle the challenges associated with ship detection and recognition tasks, such as the irregular shapes and varying sizes of ships. The improved model replaces the fixed anchor boxes utilized in conventional YOLOv7 models with a set of more suitable anchor boxes specifically designed based on the size distribution of ships in the dataset. This paper also introduces a novel multi-scale feature fusion module, which comprises Path Aggregation Network (PAN) modules, enabling the efficient capture of ship features across different scales. Furthermore, data preprocessing is enhanced through the application of data augmentation techniques, including random rotation, scaling, and cropping, which serve to bolster data diversity and robustness. The distribution of positive and negative samples in the dataset is balanced using random sampling, ensuring a more accurate representation of real-world scenarios. Comprehensive experimental results demonstrate that the proposed method significantly outperforms existing state-of-the-art approaches in terms of both detection accuracy and robustness, highlighting the potential of the improved YOLOv7 model for practical applications in the maritime domain.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Ship position prediction</kwd>
<kwd>target detection</kwd>
<kwd>YOLOv7</kwd>
<kwd>data augmentation techniques</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>Key R &#x0026; D Project of Hainan Province</funding-source>
<award-id>ZDYF2022GXJS348</award-id>
<award-id>ZDYF2022SHFZ039</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Ship recognition is a technology that analyzes image features, such as color, shape, and texture, and has numerous applications in the field of intelligent transportation [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>]. Due to the development of inland waterway transport modes and the increasing density of inland waterway traffic, automatic ship identification and tracking systems face significant challenges [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. Traditional ship identification methods are no longer sufficient to cope with the complexity of the traffic environment. For instance, these methods cannot identify vessels without information on sailing time, direction, or speed. Additionally, existing vessel identification methods require extensive preprocessing of image data, leading to large data volumes and storage space requirements.</p>
<p>The most commonly used ship recognition technologies in the field of intelligent transportation include Automatic Identification System (AIS) based on computer vision [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-7">7</xref>], Global Positioning System (GPS) based on Light Detection and Ranging (LiDAR) [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>], and Human-Aided Inertial System (HAIS) based on Electronic Chart Display and Information System (ECDIS) [<xref ref-type="bibr" rid="ref-10">10</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>]. Although these ship recognition methods based on communication and navigation equipment have their advantages, they also have significant limitations in traffic-intensive waters such as ports because they cannot obtain visual images of ships. Currently, research on deep learning-based ship recognition methods has made progress through automatic extraction of ship image features and continuous learning and training, achieving ship recognition in maritime traffic videos and images [<xref ref-type="bibr" rid="ref-12">12</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>]. Deep learning-based ship recognition methods mainly adopt two approaches: introducing deep learning algorithms into ship recognition and fully utilizing image information through image pre-processing. Recently, deep learning-based object detection methods, such as You Only Look Once (YOLO), have achieved remarkable success in ship detection and recognition. However, YOLOv7 models have limitations in adapting to diverse ship shapes and sizes. Therefore, in this paper, we propose an improved YOLOv7 model that addresses these challenges and achieves better performance in ship detection and recognition.</p>
<p>This paper makes significant contributions in several areas. First, the YOLOv7 model structure is improved by replacing anchor boxes with more suitable ones for ship detection and adding a multi-scale feature fusion module to better capture ship features of different scales. Second, data preprocessing techniques such as data augmentation and sample balancing are adopted to increase data diversity, robustness, training efficiency, and model generalization. Third, extensive experiments on benchmark datasets demonstrate the proposed method&#x2019;s effectiveness in achieving state-of-the-art performance in terms of both detection accuracy and efficiency. Overall, the paper provides a valuable contribution to the field of ship detection and recognition.</p>
<p>The paper is structured as follows: Section 2 reviews related works on target detection. Section 3 explains the improvements made to the YOLOv7 model. Section 4 outlines the experiment methodology. Section 5 presents the results and discussion. Finally, Section 6 provides a summary of the paper.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Works</title>
<p>Target detection algorithms represented by deep learning can be divided into two main categories: two-stage target detection algorithms and one-stage target detection algorithms. Implementing two-stage target detection involves two processes, extracting the object region and then performing CNN classification on the region to identify it. One-stage target detection requires only one extraction of features to achieve target detection. In contrast, one-stage target detection allows for fast detection requirements but may be slightly less accurate than dual-stage target detection.</p>
<sec id="s2_1"><label>2.1</label><title>Two-Stage Target Detection</title>
<p>The two-stage target detection framework generally consists of two stages: candidate region extraction and detection. In 2014, Girshick et al. [<xref ref-type="bibr" rid="ref-16">16</xref>] at UC Berkeley proposed the Region-based Convolutional Neural Network (R-CNN) algorithm, which significantly outperformed the contemporary OverFeat algorithm [<xref ref-type="bibr" rid="ref-17">17</xref>]. R-CNN employs the Selective Search (SS) algorithm [<xref ref-type="bibr" rid="ref-18">18</xref>] to select candidate frames; each frame is then fed separately into the Convolutional Neural Network (CNN) to extract features. Finally, the bounding box is predicted based on regression and Support Vector Machine (SVM) classification. In 2015, He et al. proposed the Spatial Pyramid Pooling Network (SPP-Net) algorithm [<xref ref-type="bibr" rid="ref-19">19</xref>], which introduces a spatial pyramid pooling layer between the convolutional layer and the fully connected layer. This replaces the cropping and scaling operations in R-CNN, ensuring consistent image sizes. SPP-Net accelerates computation and reduces computational costs. In the same year, Girshick [<xref ref-type="bibr" rid="ref-20">20</xref>] from Microsoft Research proposed the Fast R-CNN algorithm. By taking the whole image as input and passing it through the CNN network, Fast R-CNN draws on the SPP-Net concept and efficiently addresses the R-CNN algorithm&#x2019;s need to crop and scale image regions to the same size through the pooling layer structure of Region of Interest (ROI) Pooling. Subsequently, Ren et al. [<xref ref-type="bibr" rid="ref-21">21</xref>] proposed the Faster R-CNN algorithm, accompanied by the Region Proposal Networks (RPN) [<xref ref-type="bibr" rid="ref-22">22</xref>]. This reduced computational overhead and resolved the delay in generating positive and negative sample candidate frames for the Fast R-CNN algorithm. In 2017, He et al. made another breakthrough in the field of target detection by proposing the Mask R-CNN [<xref ref-type="bibr" rid="ref-23">23</xref>]. The Mask R-CNN algorithm replaces the ROI Pooling layer with ROI Align and adds a branching Fully Convolutional Network (FCN) layer for semantic segmentation based on border recognition.</p>
</sec>
<sec id="s2_2"><label>2.2</label><title>One-Stage Target Detection</title>
<p>Two-stage target detection methods are often unsuitable for practical application scenarios due to the need for initial extraction of target candidate regions followed by detection. This results in higher model complexity and increased computational effort to meet system detection speed requirements. Consequently, one-stage target detection methods, such as Single Shot MultiBox Detector (SSD) and the YOLO series, are widely used in the industry for their fast computing speed, lightweight design, and suitability for deployment. YOLO [<xref ref-type="bibr" rid="ref-24">24</xref>] treats target detection as a regression problem, dividing the image into an S&#x2009;&#x00D7;&#x2009;S grid and predicting the detection frame information and class probability of the object within each grid. Since then, one-stage detection algorithms, such as YOLOv2 [<xref ref-type="bibr" rid="ref-25">25</xref>], YOLOv3 [<xref ref-type="bibr" rid="ref-26">26</xref>], YOLOv4 [<xref ref-type="bibr" rid="ref-27">27</xref>], YOLOv5, YOLOv6 [<xref ref-type="bibr" rid="ref-28">28</xref>], and YOLOv7 [<xref ref-type="bibr" rid="ref-29">29</xref>], have been proposed and gained significant attention. The YOLOv3 network is one of the more mature and classical target detection networks, improving on various aspects of YOLO and combining the advantages of many network structures, such as ResNet and Feature Pyramid Network (FPN). YOLOv4 provides a systematic analysis of data pre-processing, detection network design, and prediction network processes. Based on these analyses, it designs an efficient target detector suitable for a single graphics card. YOLOv5 offers four different sizes of target detectors to meet the needs of various applications. Single Shot Detector (SSD) [<xref ref-type="bibr" rid="ref-30">30</xref>] series is also representative of one-stage target detection methods. Detection of objects at different scales is achieved by using feature maps of different layers to detect objects of varying scales. Objects of different sizes are detected at different resolutions, with high-resolution feature maps detecting small-scale objects and low-resolution feature maps detecting large-scale objects. In addition, researchers have made numerous improvements to the SSD [<xref ref-type="bibr" rid="ref-31">31</xref>&#x2013;<xref ref-type="bibr" rid="ref-34">34</xref>] foundation.</p>
</sec>
</sec>
<sec id="s3"><label>3</label><title>Algorithm Description</title>
<sec id="s3_1"><label>3.1</label><title>YOLOv7</title>
<p>The earliest YOLO algorithm (YOLOv1) was released in 2015 and achieved one-stage detection for the first time, ensuring not only higher recognition accuracy but also faster operation than previous two-stage methods. The YOLOv7 network consists of four modules, namely input, backbone, head and prediction (see <xref ref-type="fig" rid="fig-1">Fig. 1</xref>). In the real world, the input images are often of different sizes and not fixed. When performing target detection, they need to be resized to a fixed size. The function of the input module is to scale the image to meet the size requirements of the Backbone, which consists of several BConv convolutional layers, Efficient Local Attention Network (E-ELAN) convolutional layers and Mixed Precision Convolutional (MPConv) convolutional layers. The E-ELAN convolutional layer maintains the original ELAN design architecture and learns more diverse features by guiding the computational blocks of different feature groups to improve the learning ability of the network without destroying the original gradient path. The MPConv convolutional layer adds a Maxpool layer to the BConv layer, forming two branches, the upper branch halves the image aspect through Maxpool and halves the image channel through the BConv layer. The lower branch uses the first BConv layer to halve the image channels and the second BConv layer to halve the image aspect, and finally the Cat operation is used to fuse the features extracted from the upper and lower branches to improve the feature extraction capability of the network. The Head module consists of a Path Aggregation Feature Pyramid Network (PAFPN) structure, which makes it easier to pass information from the bottom to the top by introducing bottom-up paths, thus achieving efficient fusion of features at different levels. The Prediction module uses the Reparameterized Convolutional Block (REP) structure to adjust the number of image channels for the three different scales of P3, P4 and P5 features output from PAFPN, and finally 1&#x2009;&#x00D7;&#x2009;1 convolution for confidence, category and anchor frame prediction.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>YOLOv7 network structure</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_39929-fig-1.tif"/></fig>
</sec>
<sec id="s3_2"><label>3.2</label><title>Improved YOLOv7</title>
<p>Firstly, we have made improvements to the YOLOv7 model architecture. Specifically, we have taken the following measures: Replacing Anchor boxes. In the traditional YOLOv7 model, the Anchor boxes are fixed and cannot adapt to targets of different sizes and shapes. In order to better adapt to the varying sizes and irregular shapes of ships in ship detection tasks, we designed a set of Anchor boxes that are more suitable for ship detection based on the size distribution of ships in the dataset, thereby improving detection accuracy. Adding a Multi-Scale Feature Fusion Module. To better capture ship features at different scales, we added a Multi-Scale Feature Fusion Module. Specifically, we added the PAN (Path Aggregation Network) module to the YOLOv7 model to fuse feature maps from different levels, thereby improving the accuracy and robustness of ship detection.</p>
<p>Secondly, we have made improvements to data preprocessing. Specifically, we have used the following methods: Data Augmentation. To address issues such as complex lighting conditions and image noise in maritime environments, we have used data augmentation techniques, including random rotation, random scaling, random cropping, and other methods to increase the diversity and robustness of the data. Through data augmentation, we can improve the model&#x2019;s generalization ability without increasing the labeled data. Dataset Balancing. To address the problem of imbalanced positive and negative samples in the dataset, we balanced the dataset. Specifically, we used random sampling to reduce the size of the dataset while ensuring that the ratio of positive and negative samples was balanced, thereby improving training efficiency and the model&#x2019;s generalization ability.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Experiment</title>
<p>To evaluate the ship recognition performance of the improved YOLOv7 algorithm, this experiment was trained and tested on the SeaShips dataset.</p>
<p><bold>Introduction to the dataset.</bold> The Seaships dataset [<xref ref-type="bibr" rid="ref-35">35</xref>] was collected to train and evaluate the capability of the target detection algorithm in ship detection. 7000 1920&#x2009;&#x00D7;&#x2009;1080 images were collected in SeaShips. Each image in the SeaShips dataset is annotated with the exact ship label and bounding box, as shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The SeaShips dataset was created from images taken by an on-site video surveillance system deployed around Hengqin Island, Zhuhai, China. The images selected for the dataset images cover different features including different ship types, hull sections, scale, viewing angles, lighting and different levels of occlusion in various complex environments. Different degrees of obscuration in complex environments.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Different categories of Seaships</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_39929-fig-2.tif"/></fig>
<p><bold>Experimental equipment.</bold> Parameters of the model training device used in the experiments: OS, Windows 10; GPU, NVIDIA GeForce RTX 3080 Ti; CPU, 8&#x00D7; Xeon E5-2686 v4; deep learning framework, Torch 1.9.0&#x002B;CUDA 11.1.</p>
<p><bold>Experimental parameter settings.</bold> Due to the limitations of the experimental equipment, we scaled the length and width of the input image to 1/2 of the original size, i.e., 960&#x2009;&#x00D7;&#x2009;540 pixels. The optimizer used SGD, the learning rate was set to 0.01, the momentum was 0.9, the weight decay was 0.0005, the batch size was 16, and 200 epochs were trained, with 10 training epochs and 1 test epoch alternating.</p>
<p><bold>Assessment methods.</bold> Evaluation metrics are essential for assessing the performance of a model. In this paper, Precision, Recall, and mAP (mean average precision of the entire class) have been chosen as evaluation metrics. All three metrics have a range of values from 0 to 1, where the closer the value is to 1, the better the detection accuracy and model performance. The experiments employed mAP@0.5 (mean Average Precision, with IoU threshold greater than 0.5) as the evaluation metric to compute the mean average precision for all classes. We chose to use precision instead of accuracy because, in many cases, precision is a more suitable metric for evaluating machine learning models, particularly when working with imbalanced datasets where the number of positive and negative samples is unequal. Accuracy can be deceptive in such cases, as it may appear high solely due to the abundance of negative samples, while the model&#x2019;s ability to correctly identify positive samples may be poor.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mtext mathvariant="italic">Precision</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mtext mathvariant="italic">Recall</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In the above equation, TP indicates the number of predictions that agree with the true class; FP indicates the number of samples where the true class is negative but the prediction is positive; and FN indicates the number of samples where the true class is positive but the prediction is negative. The full class average precision is obtained by taking a weighted average of the average correctness of all class tests, and its value can also be expressed as the area enclosed by Precision and Recall as the two axes of a right-angle coordinate system respectively.</p>
</sec>
<sec id="s5"><label>5</label><title>Result and Discussion</title>
<p>During the training phase of the model, the changes in Recall can be observed from <xref ref-type="fig" rid="fig-3">Fig. 3a</xref>, while the changes in Precision can be seen in <xref ref-type="fig" rid="fig-3">Fig. 3b</xref>, and the changes in mAP can be derived from <xref ref-type="fig" rid="fig-3">Fig. 3c</xref>. After 100 epochs of training, the magnitudes of changes in Precision and Recall gradually decreased. However, due to certain labeling errors and uneven data distribution in the experimental dataset, Precision and Recall fluctuated greatly during the training process. The changes in mAP gradually decreased after about 50 iterations of the model. After 200 iterations of the improved YOLOv7 algorithm, the final mAP of the model was maintained at 90.15&#x0025;.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>Improved YOLOv7 performance evaluation metrics</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_39929-fig-3.tif"/></fig>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> presents the precision-recall (P-R) curve plots for each model in the field of ship identification. It is evident that the area enclosed by the P-R curve formed by the improved YOLOv7 model is significantly larger than the other two models. It is well-known that the area enclosed by the P-R curve and the coordinate axis is the mAP value, and a higher mAP value indicates better detection performance of the model. Therefore, the results demonstrate that the improved YOLOv7 outperforms the other models in the field of ship identification detection.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>P-R curve comparison</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_39929-fig-4.tif"/></fig>
<p><xref ref-type="fig" rid="fig-5">Fig. 5</xref> displays the category accuracy and mAP values of the three models for six different categories of boats. As shown in the figure, the detection accuracy of the improved YOLOv7 algorithm reaches 90.15&#x0025;, with a particularly high mAP for small fishing boats at 91.89&#x0025;, which is significantly better than the SSD and YOLOv7 models. This result highlights the advantages of the improved YOLOv7 model in inland waterway ship identification detection, which can meet the detection needs for intelligent ship navigation.</p>
<fig id="fig-5"><label>Figure 5</label><caption><title>mAP for each ship category under different algorithms</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_39929-fig-5.tif"/></fig>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> showcases some of the ship detection results obtained by the improved YOLOv7 algorithm. The results demonstrate that the algorithm is capable of accurately detecting different types of ships, regardless of their size. In <xref ref-type="fig" rid="fig-6">Figs. 6b</xref> and <xref ref-type="fig" rid="fig-6">6d</xref>, we can see that the algorithm can identify both large and small ships with high confidence. Moreover, the improved YOLOv7 algorithm is capable of detecting multiple targets in a single image and accurately identifying each target category separately, as demonstrated in <xref ref-type="fig" rid="fig-6">Fig. 6f</xref>. These results showcase the effectiveness and robustness of the proposed algorithm in detecting ships in complex environments.</p>
<fig id="fig-6"><label>Figure 6</label><caption><title>Ships detection results by improved YOLOv7</title></caption><graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_39929-fig-6.tif"/></fig>
</sec>
<sec id="s6"><label>6</label><title>Conclusion</title>
<p>In summary, the proposed improved YOLOv7 model with specifically designed anchor boxes, a novel multi-scale feature fusion module, and enhanced data preprocessing with data augmentation and balanced sampling, has shown significant improvements in ship detection and recognition tasks. The experimental results demonstrate that the improved YOLOv7 model outperforms existing state-of-the-art approaches in terms of both detection accuracy and robustness, with a final mAP of 90.15&#x0025;. The model has demonstrated high accuracy and mAP values for small fishing boats, making it a suitable option for inland waterway ship identification detection. Additionally, the algorithm can accurately detect different types of ships of varying sizes, even in complex environments, and can identify multiple targets in a single image with high confidence. These findings highlight the potential of the improved YOLOv7 model for practical applications in the maritime domain.</p>
</sec>
</body>
<back>
<sec><title>Funding Statement</title>
<p>This work is supported by the Key R &#x0026; D Project of Hainan Province (Grant No. ZDYF2022GXJS348, ZDYF2022SHFZ039).</p></sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p></sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Guo</surname></string-name></person-group>, &#x201C;<article-title>Ship recognition based on Hu invariant moments and convolutional neural network for video surveillance</article-title>,&#x201D; <source>Multimedia Tools and Applications</source>, vol. <volume>80</volume>, pp. <fpage>1343</fpage>&#x2013;<lpage>1373</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Yuan</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Weng</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Yang</surname></string-name></person-group>, &#x201C;<article-title>A high resolution optical satellite image dataset for ship recognition and some new baselines</article-title>,&#x201D; in <conf-name>Proc. ICPRAM</conf-name>, <conf-loc>Porto, Portugal</conf-loc>, pp. <fpage>324</fpage>&#x2013;<lpage>331</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhan</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Gao</surname></string-name> and <string-name><given-names>R.</given-names> <surname>&#x017D;upan</surname></string-name></person-group>, &#x201C;<article-title>Comparison of two deep learning methods for ship target recognition with optical remotely sensed data</article-title>,&#x201D; <source>Neural Computing and Applications</source>, vol. <volume>33</volume>, pp. <fpage>4639</fpage>&#x2013;<lpage>4649</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>MacAulay</surname></string-name></person-group>, &#x201C;<article-title>Molecular mechanisms of brain water transport</article-title>,&#x201D; <source>Nature Reviews Neuroscience</source>, vol. <volume>22</volume>, no. <issue>6</issue>, pp. <fpage>326</fpage>&#x2013;<lpage>344</lpage>, <year>2021</year>; <pub-id pub-id-type="pmid">33846637</pub-id></mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Xu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Xing</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Cui</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Efficient water transport and solar steam generation via radially, hierarchically structured aerogels</article-title>,&#x201D; <source>ACS Nano</source>, vol. <volume>13</volume>, no. <issue>7</issue>, pp. <fpage>7930</fpage>&#x2013;<lpage>7938</lpage>, <year>2019</year>; <pub-id pub-id-type="pmid">31241310</pub-id></mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Fournier</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Casey Hilliard</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Rezaee</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Pelot</surname></string-name></person-group>, &#x201C;<article-title>Past, present, and future of the satellite-based automatic identification system: Areas of applications (2004&#x2013;2016)</article-title>,&#x201D; <source>WMU Journal of Maritime Affairs</source>, vol. <volume>17</volume>, pp. <fpage>311</fpage>&#x2013;<lpage>345</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Jia</surname></string-name> and <string-name><given-names>K. X.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>How big data enriches maritime research&#x2013;a critical review of automatic identification system (AIS) data applications</article-title>,&#x201D; <source>Transport Reviews</source>, vol. <volume>39</volume>, no. <issue>6</issue>, pp. <fpage>755</fpage>&#x2013;<lpage>773</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Hu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Jing</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Lyu</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Yin</surname></string-name></person-group>, &#x201C;<article-title>Estimation of berthing state of maritime autonomous surface ships based on 3D LiDAR</article-title>,&#x201D; <source>Ocean Engineering</source>, vol. <volume>251</volume>, pp. <fpage>111131</fpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Patoliya</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Mewada</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Hassaballah</surname></string-name>, <string-name><given-names>M. A.</given-names> <surname>Khan</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Kadry</surname></string-name></person-group>, &#x201C;<article-title>A robust autonomous navigation and mapping system based on GPS and LiDAR data for unconstraint environment</article-title>,&#x201D; <source>Earth Science Informatics</source>, vol. <volume>15</volume>, pp. <fpage>2703</fpage>&#x2013;<lpage>2715</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Hao</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Zheng</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Han</surname></string-name></person-group>, &#x201C;<article-title>Automatic generation of water route based on AIS big data and ECDIS</article-title>,&#x201D; <source>Arabian Journal of Geosciences</source>, vol. <volume>14</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>8</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zhong</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Shi</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Zhao</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Bai</surname></string-name></person-group>, &#x201C;<article-title>Route planning and tracking for ships based on the ECDIS platform</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>71754</fpage>&#x2013;<lpage>71762</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Leclerc</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Tharmarasa</surname></string-name>, <string-name><given-names>M. C.</given-names> <surname>Florea</surname></string-name>, <string-name><given-names>A. C.</given-names> <surname>Boury-Brisset</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Kirubarajan</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Ship classification using deep learning techniques for maritime target tracking</article-title>,&#x201D; in <conf-name>Proc. FUSION</conf-name>, <conf-loc>Salamanca, Spain</conf-loc>, pp. <fpage>737</fpage>&#x2013;<lpage>744</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Cheng</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Zhang</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Deep learning for autonomous ship-oriented small ship detection</article-title>,&#x201D; <source>Safety Science</source>, vol. <volume>130</volume>, pp. <fpage>104812</fpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>A deep learning model to extract ship size from sentinel-1 SAR images</article-title>,&#x201D; <source>IEEE Transactions on Geoscience and Remote Sensing</source>, vol. <volume>60</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>14</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Ke</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Xu</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>HOG-ShipCLSNet: A novel deep learning network with hog feature fusion for SAR ship classification</article-title>,&#x201D; <source>IEEE Transactions on Geoscience and Remote Sensing</source>, vol. <volume>60</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>22</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Donahue</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Darrell</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Malik</surname></string-name></person-group>, &#x201C;<article-title>Rich feature hierarchies for accurate object detection and semantic segmentation</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <conf-loc>Columbus, Ohio, USA</conf-loc>, pp. <fpage>580</fpage>&#x2013;<lpage>587</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Sermanet</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Eigen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Mathieu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Fergus</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>OverFeat: Integrated recognition, localization and detection using convolutional networks</article-title>,&#x201D; in <conf-name>Proc. ICLR</conf-name>, <conf-loc>Scottsdale, Arizona, USA</conf-loc>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J. R.</given-names> <surname>Uijlings</surname></string-name>, <string-name><given-names>K. E. A.</given-names> <surname>Van De Sande</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Gevers</surname></string-name> and <string-name><given-names>A. W. M.</given-names> <surname>Smeulders</surname></string-name></person-group>, &#x201C;<article-title>Selective search for object recognition</article-title>,&#x201D; <source>International Journal of Computer Vision</source>, vol. <volume>104</volume>, pp. <fpage>154</fpage>&#x2013;<lpage>171</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Ren</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Spatial pyramid pooling in deep convolutional networks for visual recognition</article-title>,&#x201D; <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source>, vol. <volume>37</volume>, no. <issue>9</issue>, pp. <fpage>1904</fpage>&#x2013;<lpage>1916</lpage>, <year>2015</year>; <pub-id pub-id-type="pmid">26353135</pub-id></mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name></person-group>, &#x201C;<article-title>Fast r-cnn</article-title>,&#x201D; in <conf-name>Proc. ICCV</conf-name>, <conf-loc>Santiago, Chile</conf-loc>, pp. <fpage>1440</fpage>&#x2013;<lpage>1448</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ren</surname></string-name>, <string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Sun</surname></string-name></person-group>, &#x201C;<article-title>Faster r-cnn: Towards real-time object detection with region proposal networks</article-title>,&#x201D; <source>Advances in Neural Information Processing Systems</source>, vol. <volume>28</volume>, pp. <fpage>1137</fpage>&#x2013;<lpage>1149</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Fan</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Zhuo</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Tang</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Tai</surname></string-name></person-group>, &#x201C;<article-title>Few-shot object detection with attention-RPN and multi-relation detector</article-title>,&#x201D; in <conf-name>Proc. CVPR</conf-name>, <conf-loc>Long Beach, CA, USA</conf-loc>, pp. <fpage>4013</fpage>&#x2013;<lpage>4022</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>K.</given-names> <surname>He</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Gkioxari</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Doll&#x00E1;r</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name></person-group>, &#x201C;<article-title>Mask r-cnn</article-title>,&#x201D; in <conf-name>Proc. ICCV</conf-name>, <conf-loc>Venice, Italy</conf-loc>, pp. <fpage>2961</fpage>&#x2013;<lpage>2969</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Divvala</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Girshick</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>You only look once: Unified, real-time object detection</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Las Vegas, NV, USA</conf-loc>, pp. <fpage>779</fpage>&#x2013;<lpage>788</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>YOLO9000: Better, faster, stronger</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Honolulu, HI, USA</conf-loc>, pp. <fpage>7263</fpage>&#x2013;<lpage>7271</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Redmon</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Farhadi</surname></string-name></person-group>, &#x201C;<article-title>Yolov3: An incremental improvement</article-title>,&#x201D; <comment>arXiv preprint arXiv:1804.02767</comment>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Bochkovskiy</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Wang</surname></string-name> and <string-name><given-names>H. Y.</given-names> <surname>Mark Liao</surname></string-name></person-group>, &#x201C;<article-title>Yolov4: Optimal speed and accuracy of object detection</article-title>,&#x201D; <comment>arXiv preprint arXiv:2004.10934</comment>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Weng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Geng</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>YOLOv6: A single-stage object detection framework for industrial applications</article-title>,&#x201D; <comment>arXiv preprint arXiv:2209.02976</comment>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Bochkovskiy</surname></string-name> and <string-name><given-names>H. Y.</given-names> <surname>Mark Liao</surname></string-name></person-group>, &#x201C;<article-title>YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors</article-title>,&#x201D; <comment>arXiv preprint arXiv:2207.02696</comment>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Anguelov</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Erhan</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Szegedy</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Reed</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>SSD: Single shot multibox detector</article-title>,&#x201D; in <conf-name>Computer Vision&#x2013;ECCV 2016: 14th European Conf.</conf-name>, <conf-loc>Amsterdam, The Netherlands</conf-loc>, <publisher-name>Springer</publisher-name>, <conf-date>October 11&#x2013;14, 2016</conf-date>, <comment>Proceedings, Part I 14</comment>, pp. <fpage>21</fpage>&#x2013;<lpage>37</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Zhou</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Ni</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Geng</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Hu</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Xu</surname></string-name></person-group>, &#x201C;<article-title>Scale-transferrable object detection</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Conf. on Computer Vision and Pattern Recognition</conf-name>, <conf-loc>Salt Lake City, UT, USA</conf-loc>, pp. <fpage>528</fpage>&#x2013;<lpage>537</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Dvornik</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Shmelkov</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Mairal</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Schmid</surname></string-name></person-group>, &#x201C;<article-title>Blitznet: A real-time deep network for scene understanding</article-title>,&#x201D; in <conf-name>Proc. of the IEEE Int. Conf. on Computer Vision</conf-name>, <conf-loc>Venice, Italy</conf-loc>, pp. <fpage>4154</fpage>&#x2013;<lpage>4162</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Q.</given-names> <surname>Zhao</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Sheng</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Tang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Chen</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>M2det: A single-shot object detector based on multi-level feature pyramid network</article-title>,&#x201D; in <conf-name>Proc. of the AAAI Conf. on Artificial Intelligence</conf-name>, <conf-loc>Honolulu, HI, USA</conf-loc>, vol. <volume>33</volume>, pp. <fpage>9259</fpage>&#x2013;<lpage>9266</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Kong</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Sun</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Tan</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Liu</surname></string-name> and <string-name><given-names>W.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Deep feature pyramid reconfiguration for object detection</article-title>,&#x201D; in <conf-name>Proc. of the European Conf. on Computer Vision (ECCV)</conf-name>, <conf-loc>Munich, Germany</conf-loc>, pp. <fpage>169</fpage>&#x2013;<lpage>185</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Shao</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Du</surname></string-name> and <string-name><given-names>C.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Seaships: A large-scale precisely annotated dataset for ship detection</article-title>,&#x201D; <source>IEEE Transactions on Multimedia</source>, vol. <volume>20</volume>, no. <issue>10</issue>, pp. <fpage>2593</fpage>&#x2013;<lpage>2604</lpage>, <year>2018</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>