<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">72692</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.072692</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Advanced Video Processing and Data Transmission Technology for Unmanned Ground Vehicles in the Internet of Battlefield Things (loBT)</article-title>
<alt-title alt-title-type="left-running-head">Advanced Video Processing and Data Transmission Technology for Unmanned Ground Vehicles in the Internet of Battlefield Things (loBT)</alt-title>
<alt-title alt-title-type="right-running-head">Advanced Video Processing and Data Transmission Technology for Unmanned Ground Vehicles in the Internet of Battlefield Things (loBT)</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>Liu</surname><given-names>Tai</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Ye</surname><given-names>Mao</given-names></name><xref ref-type="aff" rid="aff-2">2</xref><email>yemao@ieee.org</email></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Wu</surname><given-names>Feng</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Zhu</surname><given-names>Chao</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Chen</surname><given-names>Bo</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-6" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Zhang</surname><given-names>Guoyan</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>guoyanzhang@sdu.edu.cn</email></contrib>
<aff id="aff-1"><label>1</label><institution>School of Cyber Science and Technology, Shandong University</institution>, <addr-line>Qingdao, 266237</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Wuhan Zhongyuan Communication Co., Ltd., China Electronics Corporation</institution>, <addr-line>Wuhan, 430205</addr-line>, <country>China</country></aff>
<aff id="aff-3"><label>3</label><institution>System Engineering Institute, Academy of Military Sciences</institution>, <addr-line>Beijing, 100141</addr-line>, <country>China</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Authors: Mao Ye. Email: <email>yemao@ieee.org</email>; Guoyan Zhang. Email: <email>guoyanzhang@sdu.edu.cn</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2026</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>12</day><month>1</month><year>2026</year>
</pub-date>
<volume>86</volume>
<issue>3</issue>
<elocation-id>38</elocation-id>
<history>
<date date-type="received">
<day>01</day>
<month>09</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>10</day>
<month>10</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_72692.pdf"></self-uri>
<abstract>
<p>With the continuous advancement of unmanned technology in various application domains, the development and deployment of blind-spot-free panoramic video systems have gained increasing importance. Such systems are particularly critical in battlefield environments, where advanced panoramic video processing and wireless communication technologies are essential to enable remote control and autonomous operation of unmanned ground vehicles (UGVs). However, conventional video surveillance systems suffer from several limitations, including limited field of view, high processing latency, low reliability, excessive resource consumption, and significant transmission delays. These shortcomings impede the widespread adoption of UGVs in battlefield settings. To overcome these challenges, this paper proposes a novel multi-channel video capture and stitching system designed for real-time video processing. The system integrates the Speeded-Up Robust Features (SURF) algorithm and the Fast Library for Approximate Nearest Neighbors (FLANN) algorithm to execute essential operations such as feature detection, descriptor computation, image matching, homography estimation, and seamless image fusion. The fused panoramic video is then encoded and assembled to produce a seamless output devoid of stitching artifacts and shadows. Furthermore, H.264 video compression is employed to reduce the data size of the video stream without sacrificing visual quality. Using the Real-Time Streaming Protocol (RTSP), the compressed stream is transmitted efficiently, supporting real-time remote monitoring and control of UGVs in dynamic battlefield environments. Experimental results indicate that the proposed system achieves high stability, flexibility, and low latency. With a wireless link latency of 30 ms, the end-to-end video transmission latency remains around 140 ms, enabling smooth video communication. The system can tolerate packet loss rates (PLR) of up to 20% while maintaining usable video quality (with latency around 200 ms). These properties make it well-suited for mobile communication scenarios demanding high real-time video performance.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Unmanned ground vehicle (UGV) communication</kwd>
<kwd>video compression</kwd>
<kwd>packet loss rate (PLR)</kwd>
<kwd>video latency</kwd>
<kwd>video quality</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>National Natural Science Foundation of China</funding-source>
<award-id>72334003</award-id>
</award-group>
<award-group id="awg2">
<funding-source>National Key Research and Development Program of China</funding-source>
<award-id>2022YFB2702804</award-id>
</award-group>
<award-group id="awg3">
<funding-source>Shandong Key Research and Development Program</funding-source>
<award-id>2020ZLYS09</award-id>
</award-group>
<award-group id="awg4">
<funding-source>Jinan Program</funding-source>
<award-id>2021GXRC084-2</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<sec id="s1_1">
<label>1.1</label>
<title>Background</title>
<p>With the rapid advancements in communication and multimedia technologies, video surveillance systems have become integral across various industries and sectors [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-2">2</xref>]. For instance, video surveillance systems deployed on unmanned vehicles are tasked with analyzing and processing visual data captured during operation [<xref ref-type="bibr" rid="ref-3">3</xref>]. However, traditional video surveillance systems for UGVs typically rely on single-channel video acquisition methods. These approaches are inadequate for capturing, processing, and analyzing panoramic images surrounding the vehicle. Additionally, they suffer from significant video transmission and processing latency, which severely impact real-time user experience [<xref ref-type="bibr" rid="ref-4">4</xref>]. Consequently, the development of techniques to obtain panoramic video images while minimizing processing and transmission latency has emerged as a critical area of research in recent years.</p>
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, the use of unmanned vehicles for video communication has been widely adopted today [<xref ref-type="bibr" rid="ref-5">5</xref>]. Unlike video processing and communication systems in the civilian domain [<xref ref-type="bibr" rid="ref-6">6</xref>], video surveillance systems in tactical scenarios must capture more comprehensive battlefield information while minimizing latency.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Typical communication scenarios for unmanned vehicles.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-1.tif"/>
</fig>
<p>For instance, to minimize casualties during military operations, frontline troops often rely on unmanned vehicles for reconnaissance and operations over the frontline terrain. However, a single-direction video perspective fails to provide comprehensive, all-directional video coverage, making it difficult to accurately identify hostile targets, particularly when operating deep within enemy territory. Therefore, all-directional information is essential for accurate decision-making. Typically, unmanned aerial vehicles (UAVs) are used to monitor terrain and enemy formations. However, their wide field of view makes them easily detectable and identifiable [<xref ref-type="bibr" rid="ref-7">7</xref>], limiting their applicability in certain scenarios. For instance, in environments such as mountainous regions or urban warfare, the onboard video surveillance systems of UGVs offer significant advantages. Similarly, panoramic video technology plays a vital role in automotive driver-assistance systems. With the rapid growth in vehicle numbers and lagging development of urban traffic infrastructure, traffic conditions have deteriorated, contributing to a rising number of accidents annually. Many of these accidents result from blind spots and drivers&#x2019; inaccurate perception of inter-vehicle distances. A reliable real-time video transmission system can significantly enhance vehicular safety by mitigating these issues.</p>
</sec>
<sec id="s1_2">
<label>1.2</label>
<title>Motivation and Contributions</title>
<p>This paper investigates the performance of panoramic video stitching and compression technologies in the context of video transmission and communication for UGVs. When an unmanned vehicle operates via a wireless network, its panoramic video mosaicking system employs multiple fixed cameras to capture the same scene from various perspectives [<xref ref-type="bibr" rid="ref-8">8</xref>&#x2013;<xref ref-type="bibr" rid="ref-11">11</xref>]. A critical research challenge lies not only in generating images with a broader field of view than individual images by leveraging inter-image correlations but also in ensuring low-latency video transmission over wireless communication networks. Moreover, in practical applications, the highly complex and variable communication environment often leads to high packet loss rates over wireless links, posing a persistent challenge. Ensuring uninterrupted video communication in highly dynamic battlefield environments with severe packet loss is therefore a critically important problem to address.</p>
<p>To address the aforementioned challenges, our paper makes several key contributions to this field, detailed as follows.
<list list-type="simple">
<list-item><label>(1)</label><p>Real-Time Video Stitching System: We developed a real-time video stitching system that integrates multi-channel video streams using feature extraction, matching, and fusion techniques. The system applies a homography matrix to perform a linear transformation on 3D homogeneous vectors, enabling efficient processing and filtering of spliced video data. This approach produces a seamless panoramic stitching video.</p></list-item>
<list-item><label>(2)</label><p>Block-Based Video Encoding Method: We proposed a block-based video encoding method for video data detection, transformation, and quantification. By leveraging hardware-based encoding and implementing coding control via the Lagrangian algorithm, the system achieves high efficiency without imposing additional CPU load. Moreover, the method ensures timely recovery from data packet loss or dislocation within the sequence set, thereby improving system robustness.</p></list-item>
<list-item><label>(3)</label><p>RTSP-Based Push Flow Server: We designed an RTSP-based streaming server to manage pre-processed video streams and optimize network transmission. This server monitors video readiness and efficiently handles streaming operations, significantly improving video transmission performance.</p></list-item>
</list></p>
<p>The structure of the paper is organized as follows: In <xref ref-type="sec" rid="s2">Section 2</xref>, we review related work on resource allocation. <xref ref-type="sec" rid="s3_1">Section 3.1</xref> describes the system architecture and its components. In <xref ref-type="sec" rid="s3_3">Section 3.3</xref>, we explain the principles and methods of real-time video stitching. <xref ref-type="sec" rid="s3_4">Sections 3.4</xref> and <xref ref-type="sec" rid="s3_5">3.5</xref> introduce video encoding techniques and efficient video streaming strategies. In <xref ref-type="sec" rid="s4">Section 4</xref>, we present experimental results to validate the effectiveness of the proposed approach. Finally, <xref ref-type="sec" rid="s5">Section 5</xref> concludes the paper and discusses future directions.</p>
</sec>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>With the rapid advancement of Internet technology, panoramic video coding and transmission technologies are being increasingly applied across various industries. This growth has driven researchers to explore efficient methods for video processing and transmission, which has led to the development of numerous innovative approaches.</p>
<p>Zhang et al. proposed an acceleration method for calculating optical flow in panoramic video stitching. Their method allows for computations to be performed independently and in parallel, significantly improving processing efficiency [<xref ref-type="bibr" rid="ref-12">12</xref>]. Li et al. introduced a novel flexible super-resolution-based video coding and uploading framework that enhances live video streaming quality under conditions of limited uplink network bandwidth [<xref ref-type="bibr" rid="ref-13">13</xref>]. To address the challenge of insufficient support for 360-degree panoramic videos, Zhao et al. developed a highly versatile 360-degree panoramic SLAM method based on the ORB-SLAM3 system framework [<xref ref-type="bibr" rid="ref-14">14</xref>]. Qiu et al. proposed a multi-azimuth reconstruction algorithm to address the distortion issues in panoramic images, by leveraging the imaging principles of dome cameras [<xref ref-type="bibr" rid="ref-15">15</xref>]. Woo Han et al. introduced a novel deep learning-based network for 360-degree panoramic image inpainting. This approach leverages the conversion of panoramic images from an equirectangular format to a cube map format, enabling inpainting with the effectiveness of single-image inpainting methods [<xref ref-type="bibr" rid="ref-16">16</xref>]. Additionally, Li et al. proposed a transmission scheme based on multi-view switching within the human eye&#x2019;s field of view and demonstrated its effectiveness in reducing network bandwidth usage through a live platform implementation [<xref ref-type="bibr" rid="ref-17">17</xref>].</p>
<p>In the field of video encoding, Chao et al. proposed a novel rate control framework for H.264/Advanced Video Coding (H.264/AVC)-based video coding, which enhances the preservation of gradient-based features such as Scale-Invariant Feature Transform (SIFT) or Speeded-Up Robust Features (SURF) [<xref ref-type="bibr" rid="ref-18">18</xref>]. Manel introduced a Distributed Video Coding (DVC) scheme that incorporates block classification at the decoder, combined with a new residual computing method based on modular arithmetic and simple entropy coding [<xref ref-type="bibr" rid="ref-19">19</xref>]. Wang et al. presented a joint optimization approach for transform and quantization in video coding [<xref ref-type="bibr" rid="ref-20">20</xref>]. Duong et al. proposed learned transforms and entropy coding, which can serve as (non-) linear drop-in replacements or enhancements for linear transforms in existing codecs. These learned transforms can be multi-rate, enabling a single model to function across the entire rate-distortion curve [<xref ref-type="bibr" rid="ref-21">21</xref>]. Ding et al. developed a high-efficiency Deep Feature Coding (DFC) framework that significantly reduces the bitrate of deep features in videos while maintaining retrieval accuracy [<xref ref-type="bibr" rid="ref-22">22</xref>].</p>
<p>In the area of video transmission, Chiu et al. proposed a multidimensional streaming media transmission system consisting of a control center, a client platform, and a multidimensional media producer. The system provides detailed specifications for the login, link, interaction, and logout processes [<xref ref-type="bibr" rid="ref-23">23</xref>]. Wang et al. introduced a reference-frame-cache-based surveillance video transmission system (RSVTS), which enables real-time delivery of wide-angle, high-definition surveillance video over the Internet using multiple rotatable cameras [<xref ref-type="bibr" rid="ref-24">24</xref>]. Bakirci introduced a novel system to tackle the core issues plaguing swarm UAV systems, namely constrained communication range, deficient processing capabilities for real-time tasks, network delays and failures from congestion, and limited operational endurance due to energy mismanagement. The proposed solution employs a suite of salient features, including robust communication, synergistic hardware integration, task allocation, optimized network topology, and efficient routing protocols [<xref ref-type="bibr" rid="ref-25">25</xref>]. Aloman et al. evaluated the performance of three video streaming protocols: MPEG-DASH, RTSP, and RTMP [<xref ref-type="bibr" rid="ref-26">26</xref>], results indicate that RTSP is more efficient than MPEG-DASH in initiating video playback, but at the expense of decreased QoE due to packet loss. Conversely, the longer pre-loading time intervals required by MPEG-DASH and RTMP help mitigate the impact of packet loss during transmission, which is reflected in a lower number of re-buffering events for these two protocols.</p>
<p>Building upon a comprehensive analysis of related work, we note that most existing studies are conducted in environments with sufficient wired bandwidth or stable wireless links. These works often do not address the challenges of video transmission in highly complex and dynamic scenarios, such as those involving network interruptions or high packet loss rates. To tackle these challenges, we approach the problem from a global perspective. First, we propose a real-time video stitching system and analyze the stitching process and image fusion algorithm. Next, we design an efficient video encoding scheme. Finally, we develop a video transmission system based on RTSP. Additionally, we investigate video transmission strategies for scenarios that poor network connectivity results in significant packet loss. This research is particularly relevant for enhancing the surveillance and data transmission capabilities of UGVs.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Research Methodology</title>
<sec id="s3_1">
<label>3.1</label>
<title>The Architecture of the System Composition</title>
<p>The basic operational workflow of the system involves capturing video through multiple AHD HD cameras, followed by stitching and encoding the footage, and finally pushing the stream to clients via the RTSP protocol. The video processing software is mainly composed of upper-layer software, middle-layer software, and lower-layer software. Video acquisition is performed using the V4L2 (Video for Linux 2) driver framework; the video stitching module uses the OpenCV library to stitch multiple video streams, video encoding is performed using the CedarX library; and the video streaming module is responsible for pushing the stream, leveraging the RTSP protocol. The software architecture of this hierarchical system is illustrated in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Video processing software architecture.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-2.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Video Capture Module</title>
<p>The video capture module is based on the Video for Linux 2 (V4L2) driver framework. V4L2 is a standard framework in the Linux kernel for video capture, which comprises a hardware-independent V4L2 driver core and hardware-specific components such as camera drivers, sensors, and other related elements. The V4L2 driver core handles the registration and management of specific camera drivers, providing a unified device file system interface for Linux user-mode applications to access camera data and control camera parameters. The camera driver is a platform-specific component responsible for video frame processing, while the sensor driver is a camera-specific module dedicated to controlling camera parameters. The video acquisition process is illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Video capture process.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-3.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, the video capture process consists of several key steps. First, the video device file is opened, and its capabilities (e.g., video and audio inputs) are queried. Next, video capture parameters are configured, including the capture window size, video frame format (such as pixel format, width, and height), frame rate, and rotation. Subsequently, the driver is instructed to allocate frame buffers (typically at least three) for the video stream; the buffer length and offset are retrieved in the kernel space. These frame buffers are then mapped to the user space via memory mapping, enabling direct data access without copying. All allocated buffers are enqueued into the video capture output queue to store the incoming data. Once initiated, the video capture process proceeds as follows: the driver transfers video data to the first available buffer in the input queue and then moves that buffer to the output queue. The application dequeues a buffer containing data, processes the raw video, and returns the buffer to the input queue for reuse in a circular manner. Finally, video capture is terminated, all buffers are released, and the video device is closed.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Video Real-time Stitching Module</title>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Stitching Process Analysis</title>
<p>Video stitching is an extension of image stitching techniques, with image stitching serving as its fundamental prerequisite [<xref ref-type="bibr" rid="ref-27">27</xref>]. Consequently, the quality of image stitching directly determines the effectiveness of the resultant video stitching. The core principle involves first extracting individual frames from the source video stream. These frames subsequently undergo a series of image stitching operations-including feature extraction, matching, and fusion. Finally, the processed frames are encoded and compressed back into a seamless video format. This complete video stitching workflow is detailed in Algorithm 1.</p>
<fig id="fig-18">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-18.tif"/>
</fig>
<p>In this approach, subsequent video frames requiring real-time stitching can be processed with minimal latency, as the time-consuming steps of feature extraction and registration are significantly reduced. This method relies solely on the precomputed perspective transformation matrix for image transformation, stitching, and fusion. Furthermore, the experimental setup utilizes standard cameras for synchronized video capture, which enhances computational efficiency and improves the practical applicability of the proposed system.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Image Homography Matrix Calculation</title>
<p>The camera is equipped with an infrared high-definition night-vision module, featuring a 6 mm lens that provides a 60-degree field of view. It supports multiple video encoding formats, including H.264, MJPEG, and YUY2 [<xref ref-type="bibr" rid="ref-28">28</xref>&#x2013;<xref ref-type="bibr" rid="ref-30">30</xref>]. This configuration ensures excellent performance in both indoor and outdoor environments, with accurate color reproduction. The physical structure of the three-camera array is illustrated in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Camera assembly structure.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-4.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, a system of three cameras is used to capture the images required for stitching. The principle of stitching two images involves first identifying the overlapping region between them and then extracting similar feature points from this area [<xref ref-type="bibr" rid="ref-31">31</xref>]. A series of subsequent processing steps are then applied to complete the stitching. Empirical results indicate that an overlap of 30% or more between images enables more reliable identification of feature points. For cameras with a 60-degree field of view (FOV), an overlapping angle of approximately 20 degrees between adjacent cameras yields optimal results. In <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, this overlapping region is represented by the intersection of the red and blue dashed lines.</p>
<p>The geometric transformation for projecting a point from one image plane to another can be modeled as a linear transformation of 3D homogeneous coordinates using homography. This transformation is represented by a 3 &#x00D7; 3 non-singular matrix H, known as the homography matrix. As illustrated in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, this matrix enables the projection of a point from one projective plane onto another.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Homography matrix transformation.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-5.tif"/>
</fig>
<p>The linear transformation between two image planes is defined by the following equation.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>A point in one image plane, represented in homogeneous coordinates as <italic>x</italic><sub>1</sub> &#x003D; (<italic>u</italic><sub>1</sub>, <italic>v</italic><sub>1</sub>, 1)<sup>T</sup>, is mapped to a corresponding point in the other plane, <italic>x</italic><sub>2</sub> &#x003D; (<italic>u</italic><sub>2</sub>, <italic>v</italic><sub>2</sub>, 1)<sup>T</sup>, by the following equation:
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mo>[</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo>[</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The homography matrix is derived through a process involving image pre-processing and registration, as detailed below.
<list list-type="simple">
<list-item><label>(1)</label><p>Image Preprocessing</p></list-item>
</list></p>
<p>Image preprocessing constitutes a fundamental and indispensable step in the image stitching pipeline [<xref ref-type="bibr" rid="ref-32">32</xref>]. Ye et al. proposed a systematic image preprocessing framework comprising resizing, denoising, downscaling, binarization, color inversion and morphological operations. This methodology effectively enhances image quality by suppressing noise and emphasizing salient features essential for subsequent interpretation [<xref ref-type="bibr" rid="ref-33">33</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>]. Variations in image acquisition conditions&#x2014;such as inconsistent illumination and differences in camera performance&#x2014;often introduce artifacts including noise and low contrast in raw images. Furthermore, disparities in shooting distance and focal length may contribute to additional geometric and photometric irregularities. Preprocessing is therefore critical to improve the accuracy and efficiency of feature extraction and matching. Common preparatory operations include grayscale conversion and spatial filtering.</p>
<p>Feature point descriptors are largely invariant to chromatic information. Therefore, converting color images to grayscale prior to processing reduces computational complexity and expedites subsequent registration and fusion.</p>
<p>Images captured by cameras are inevitably corrupted by noise during acquisition, digitization, and transmission. This degradation often leads to a reduction in image quality, adversely affecting perceptual fidelity and compromising visual performance.</p>
<p>In general, the presence of noise degrades image quality, resulting in blurring and the obscuration of salient features. This degradation impedes subsequent image analysis and reduces overall visual fidelity, making direct analysis challenging. It is therefore essential to suppress noise-induced interference, enhance meaningful signal content, and ensure consistency across image sets under uniform constraints. Commonly used filtering methods include smoothing filters, median filters, and Gaussian filters, each offering distinct advantages and limitations. The selection of an appropriate filter depends primarily on the characteristics of the noise present. For implementation, the OpenCV library is employed for image processing tasks; notably, the cvtColor() function is used for grayscale conversion.
<list list-type="simple">
<list-item><label>(2)</label><p>Feature Point Detection and Descriptor Extraction</p></list-item>
</list></p>
<p>For feature extraction, we adopt the Speeded-Up Robust Feature (SURF) algorithm [<xref ref-type="bibr" rid="ref-35">35</xref>]. The OpenCV implementation of the Speeded-Up Robust Features (SURF) algorithm comprises five main steps: (a) construction of the Hessian matrix; (b) generation of a scale space using a Gaussian pyramid; (c) initial detection of candidate feature points via non-maximum suppression; (d) precise localization of extreme points; (e) assignment of a dominant orientation to each feature point; and (f) computation of the SURF descriptor. The initial step consists of constructing the Hessian matrix and a scale-space Gaussian pyramid. The Hessian threshold dynamically varies between 300 and 500 based on real-time network conditions, a higher threshold value yields faster detection speeds, and its initial value is set to 500. The upright parameter is set to true, meaning rotation invariance is not calculated. The number of scale space layers (nOctaves) is set to 4. SURF employs the determinant of the Hessian matrix for image approximation, which provides computational efficiency and accelerated feature detection [<xref ref-type="bibr" rid="ref-36">36</xref>]. The Hessian matrix <italic>H</italic> is formed from the second-order partial derivatives of a function <italic>f</italic>(<italic>z, y</italic>). For an image, the Hessian matrix at a given pixel is defined as shown in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>.
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mtable columnalign="left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:mstyle></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The Hessian matrix is computed for each pixel, and its determinant is provided in <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mo movablelimits="true" form="prefix">det</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>&#x22C5;</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msup><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>f</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The determinant value serves as a scalar measure derived from the Hessian matrix <italic>H</italic>. Points can be classified according to the sign of this determinant. In the SURF algorithm, the image intensity at a pixel, denoted as <italic>L</italic>(<italic>x, y</italic>), corresponds to the function value <italic>f</italic>(<italic>x, y</italic>). The second-order partial derivatives are approximated using filters based on the second-order Gaussian derivatives. By convolving the image with these derivative kernels, the components of the Hessian matrix&#x2013;<italic>L</italic><sub><italic>xx</italic></sub>, <italic>L</italic><sub><italic>xy</italic></sub> and <italic>L</italic><sub><italic>yy</italic></sub>&#x2013;are efficiently computed.
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mtable columnalign="left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>xx</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>xy</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>xy</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd><mml:mtd><mml:msub><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>yy</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi mathvariant="normal">&#x03C3;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>To ensure scale invariance of the detected feature points, the Hessian matrix must be filtered with a Gaussian kernel prior to its construction. Thus, the filtered matrix <italic>H</italic> is computed as shown in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>.
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mrow><mml:mtext>L</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>G</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>The function <italic>L</italic>(<italic>x, t</italic>) denotes the multi-scale representation of an image, obtained by convolving the original image <italic>I</italic>(<italic>x</italic>) with a Gaussian kernel <italic>G</italic>(<italic>t</italic>), where <italic>G</italic>(<italic>t</italic>) is defined in <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>.
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mrow><mml:mtext>G</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:msup><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext>g</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>t</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2202;</mml:mi><mml:msup><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>In this context, <italic>t</italic> denotes the variance of the Gaussian function, and <italic>g</italic>(<italic>x</italic>) represents the Gaussian kernel. Using this formulation, the determinant of the Hessian matrix <italic>H</italic> can be computed for every image pixel, and this response value serves to identify feature points. To improve computational efficiency, Herbert Bay et al. proposed replacing <italic>L</italic>(<italic>x</italic>, <italic>t</italic>) with a box-filter approximation. A weight value is introduced to balance the error between the exact value and the approximate value. These weights vary across scales, and the determinant of the matrix <italic>H</italic> can then be expressed as <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref>.
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mo movablelimits="true" form="prefix">det</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>H</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>approx</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>D</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>xx</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:msub><mml:mrow><mml:mtext>D</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>yy</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mn>0.9</mml:mn><mml:msub><mml:mrow><mml:mtext>D</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>xy</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p>The second step involves applying non-maximum suppression to identify candidate feature points. For each pixel, the Hessian response value is compared to those of its 26 neighbors within the three-dimensional scale-space neighborhood. The point is retained as a candidate feature point only if it represents a local extremum (either a maximum or minimum) within this region.</p>
<p>The third step entails the accurate localization of extremal points at sub-pixel precision. This is achieved through three-dimensional linear interpolation across the scale space. Subsequently, points exhibiting response values below a predefined threshold are discarded, retaining only the most salient features.</p>
<p>The fourth step involves assigning a dominant orientation to each feature point. This process begins by computing Haar wavelet responses within a circular region centered at the feature point. Specifically, for a rotating 60-degree sector window, the sums of the horizontal and vertical Haar wavelet responses are accumulated for all points within the sector. The Haar wavelet size is set to 4 s, which is the characteristic scale of the feature point, producing one vector per sector. The sector is rotated in discrete angular intervals, and the direction yielding the highest vector magnitude is selected as the dominant orientation for the feature point.</p>
<p>The fifth step involves constructing the SURF descriptor. A square region centered on the feature point&#x2014;with side length 20 s, which is the scale of the feature point&#x2014;is oriented along the dominant direction identified in Step 4. This region is subdivided into 4 &#x00D7; 4 &#x003D; 16 sub-regions. Within each sub-region, Haar wavelet responses are computed in both the horizontal and vertical directions (aligned with the dominant orientation) for 25 sample pixels. For each sub-region, four values are collected: the sum of the horizontal Haar responses, the sum of their absolute values, the sum of the vertical responses, and the sum of their absolute values. Thus, each feature point is represented by a 64-dimensional descriptor (16 &#x00D7; 4). This constitutes a reduction by half compared to the SIFT descriptor [<xref ref-type="bibr" rid="ref-37">37</xref>,<xref ref-type="bibr" rid="ref-38">38</xref>], leading to improved computational efficiency and significantly accelerated feature matching.
<list list-type="simple">
<list-item><label>(3)</label><p>FLANN-Based Feature Matching</p></list-item>
</list></p>
<p>The Fast Library for Approximate Nearest Neighbors (FLANN) is extensively employed for efficient approximate nearest neighbor search in high-dimensional spaces [<xref ref-type="bibr" rid="ref-39">39</xref>]. It offers a suite of algorithms optimized for this task and can automatically select the most appropriate algorithm and optimal parameters for a given dataset. In computer vision, identifying nearest neighbors in high-dimensional feature spaces is often computationally expensive. FLANN provides a faster alternative to conventional matching algorithms for high-dimensional feature matching. After extracting feature points and computing their descriptors using the SURF algorithm, the FLANN matcher is applied to establish correspondences. Based on these matching results, the homography matrix representing the geometric transformation between images is computed.</p>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Seamless Image Fusion</title>
<p>The objective of image fusion is to seamlessly integrate two images into a common coordinate system. After estimating the homography matrix, the source image is warped into the target image&#x2019;s coordinate system using a perspective transformation, implemented via OpenCV functions. The four corner points of the overlapping region are then computed, and its boundaries are determined. Within this overlapping area, the pixel values are fused by averaging the intensity values from both the warped source image and the target image. Denoting the final stitched image as <italic>I</italic>, the fusion relationship is given by <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref>.
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;&#x00A0;&#x00A0;</mml:mtext><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;&#x00A0;&#x00A0;</mml:mtext><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Unlike simple averaging, the intensity values of corresponding pixels in the overlapping region are not computed as a direct average of the warped source image and the target image. Instead, a weighted average is applied to the pixels from both images. Let <italic>I</italic> denote the final fused image and let <italic>I</italic><sub><italic>1</italic></sub> and <italic>I</italic><sub><italic>2</italic></sub> represent the two images to be stitched.</p>
<p>In <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>, <italic>w</italic><sub>1</sub> denotes the weight assigned to pixels from the left image within the overlapping region of the stitched result, while <italic>w</italic><sub>2</sub> represents the weight for the corresponding pixels from the object image, satisfying <italic>w</italic><sub>1</sub> &#x002B; <italic>w</italic><sub>2</sub> &#x003D; 1 with 0 &#x003C; <italic>w</italic><sub>1</sub> &#x003C; 1 and 0 &#x003C; <italic>w</italic><sub>2</sub> &#x003C; 1. To achieve a smooth transition across the overlapping area, appropriate weighting values can be selected to effectively eliminate visible seams in the fused image.
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>x</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>y</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mspace width="2em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mspace width="2em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mspace width="1em" /><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>I</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>Depending on the desired fusion outcome, appropriate weighting functions <italic>w</italic><sub>1</sub> and <italic>w</italic><sub>2</sub> can be selected. In this study, a gradual transition weighting scheme was employed. The two weights are determined based on the width of the overlapping region. Let <italic>W</italic> denote the total width of the overlap. Then, <italic>w</italic><sub>1</sub> decreases linearly from 1 to 0 across the region, while <italic>w</italic><sub>2</sub> increases linearly from 0 to 1, ensuring <italic>w</italic><sub>1</sub> &#x002B; <italic>w</italic><sub>2</sub> &#x003D; 1 throughout. This approach results in a smooth transition between images <italic>I</italic><sub>1</sub> and <italic>I</italic><sub>2</sub> within the overlap, effectively eliminating visible seams and achieving a natural blending effect.</p>
<p>Furthermore, when the pixel value in either <italic>I</italic><sub>1</sub>(<italic>x</italic>, <italic>y</italic>) or <italic>I</italic><sub>2</sub>(<italic>x</italic>, <italic>y</italic>) is too low, that region may appear black. To mitigate this, the gradual in-out weighting method is applied alongside a threshold check. Specifically, if <italic>I</italic><sub>1</sub>(<italic>x</italic>, <italic>y</italic>) &#x003C; <italic>k</italic>, the pixel is discarded by setting <italic>w</italic><sub>1</sub> &#x003D; 0, effectively excluding it from fusion. The threshold <italic>k</italic> can be empirically adjusted based on experimental results to achieve optimal blending performance.</p>
<p>To evaluate the similarity performance after image fusion, we employ the Mean Squared Error (MSE) and Peak Signal-to-Noise Ratio (PSNR) as the quality assessment metrics.
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mtext>M</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x22C5;</mml:mo></mml:mrow><mml:mrow><mml:mtext>N</mml:mtext></mml:mrow></mml:mrow></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:msup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:mtext>K</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>i</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>j</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:math></disp-formula>
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mi>N</mml:mi><mml:mi>R</mml:mi><mml:mo>=</mml:mo><mml:mn>10</mml:mn><mml:msub><mml:mi>log</mml:mi><mml:mrow><mml:mn>10</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mfrac><mml:mrow><mml:mi>M</mml:mi><mml:mi>A</mml:mi><mml:msubsup><mml:mi>X</mml:mi><mml:mrow><mml:mi>I</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:mfrac><mml:mo>)</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref>, <italic>K</italic> denotes the original image, <italic>I</italic> represents the fused image, <italic>M</italic> is the total number of pixels in image <italic>I</italic>, and <italic>N</italic> is the total number of pixels in image <italic>K</italic>. A smaller MSE value indicates a higher degree of similarity between the images. In <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref>, MAX is the maximum possible pixel value of the image. A higher PSNR value indicates less distortion and better quality of the generated image.</p>
</sec>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Video Encoding</title>
<p>Video encoding employs H.264, a block-based compression technology that primarily involves data detection, transformation, and quantization [<xref ref-type="bibr" rid="ref-40">40</xref>]. H.264 encoding can be implemented through two primary methods: software-based and hardware-based encoding [<xref ref-type="bibr" rid="ref-41">41</xref>]. Hardware encoding offers greater efficiency without consuming additional CPU resources. The selected hardware platform utilizes the AWVideoEncoder library, which is built upon the CedarX encoding component. This library provides a simplified interface with minimal parameter configuration: only the input and output data formats need to be specified, while remaining parameters are automatically optimized to default values. The upper software layer can initiate encoding through a straightforward function call, abstracting underlying complexities such as memory management, hardware interfaces, platform specifics, and SDK version dependencies. The H.264 hardware encoding process is outlined in Algorithm 2.</p>
<fig id="fig-19">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-19.tif"/>
</fig>
<p>To mitigate the effects of network packet loss, H.264 improves system resilience through the use of resynchronization mechanisms during decoding. This capability enables the timely recovery of lost or misordered data packets, as illustrated in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>H.264 decode resynchronization.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-6.tif"/>
</fig>
<p>Given the complexity of the specialized vehicle&#x2019;s operating environment and the presence of numerous moving objects in the video, block-based motion compensation is employed. This technique partitions each frame into multiple macroblocks. The prediction process models the translational motion of these blocks. The displacement of each block is interpolated to sub-pixel accuracy, enabling motion compensation with a precision of up to 1/4 pixel, as depicted in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>H.264 motion compensation design.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-7.tif"/>
</fig>
<p>Furthermore, the H.264 codec employs the Lagrangian algorithm for rate-distortion optimization within the encoding mode detailed in the following section [<xref ref-type="bibr" rid="ref-42">42</xref>].
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msup><mml:mi>I</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mrow><mml:mtext>argmin</mml:mtext></mml:mrow><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula><disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>D</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>In this formulation, <italic>I</italic> denotes the coding mode, <italic>&#x03BB;</italic> represents the Lagrange multiplier, S indicates the input sample, and <italic>J</italic>(<italic>S</italic><sub><italic>k</italic></sub>, <italic>I</italic>/<italic>&#x03BB;</italic>) refers to the Lagrange cost function. Here, <italic>D</italic>(<italic>S</italic>, <italic>I</italic>) and <italic>R</italic>(<italic>S</italic>, <italic>I</italic>) correspond to the distortion and the bit rate of the encoded bitstream, respectively.</p>
<p>The encoding mode is optimal when the Lagrangian cost function <italic>J</italic>(<italic>S</italic><sub><italic>k</italic></sub>, <italic>I</italic>/<italic>&#x03BB;</italic>) is minimized. For a given sample <italic>S</italic><sub><italic>k</italic></sub>, the resulting bit rate and distortion depend solely on the chosen encoding mode <italic>I</italic><sub><italic>k</italic></sub>.
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi></mml:mi><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:munder><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:munderover><mml:munder><mml:mo movablelimits="true" form="prefix">min</mml:mo><mml:mrow><mml:mi>I</mml:mi></mml:mrow></mml:munder><mml:mi>J</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mrow><mml:mtext>S</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext>I</mml:mtext></mml:mrow><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Therefore, by selecting the optimal coding mode for each sample <italic>S</italic><sub><italic>k</italic></sub> &#x2208; <italic>S</italic>, the minimum value of the Lagrangian cost function <italic>J</italic>(<italic>S</italic><sub><italic>k</italic></sub>, <italic>I</italic>/<italic>&#x03BB;</italic>) can be achieved, thereby facilitating precise rate control.</p>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Video Streaming</title>
<p>After encoding, the video is prepared for transmission. The video streaming module comprises two main components: a push server and an RTSP-based streaming component.</p>
<p>The prerequisite for streaming initiation is the deployment of a streaming media server capable of monitoring incoming video requests and pushing the video stream in real time via Ethernet. The streaming server is implemented using the open-source framework EasyDarwin, which supports protocols including RTSP, HLS, and HTTP. The overall architecture of the push-stream server is illustrated in <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, exhibiting the following characteristics.
<list list-type="simple">
<list-item><label>(1)</label><p>The streaming media server integrates EasyCMS and EasyDarwin to manage and distribute RTSP video streams.</p>
</list-item>
<list-item><label>(2)</label><p>When a client stops playback, the streaming session is terminated, and the allocated bandwidth resources are immediately released.</p></list-item>
</list><fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Video streaming system architecture.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-8.tif"/>
</fig></p>
<p>The Real-Time Streaming Protocol (RTSP) is an application-layer protocol designed for controlling the delivery of real-time multimedia data, such as audio and video streams. It supports operations such as pausing and fast-forwarding but does not encapsulate or deliver the data itself. Instead, RTSP provides remote control over the media server, which can select the transport protocol (TCP or UDP) for actual data transmission. When a wireless link connects the client and server, the measured data transmission latency is 30 ms. During video streaming, the system caches the most recently transmitted data. In the event of packet loss caused by wireless link instability, the affected data packets are retransmitted from the cache, thereby minimizing retransmission latency and ensuring continuous playback. The complete video transmission process is outlined in Algorithm 3.</p>
<fig id="fig-20">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-20.tif"/>
</fig>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Analysis of Experimental Results</title>
<sec id="s4_1">
<label>4.1</label>
<title>Video Stitching Experimental Results</title>
<p>As illustrated in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>, images captured by the camera were acquired and processed using the V4L2 framework and OpenCV library functions (a). The two images were then projected onto the same plane using the computed homography matrix (b). A weighted average method was applied to merge the images (c). After projection of the right image, redundant black regions in the fused result were cropped to produce the final stitched image (d).</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Video image acquisition.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-9.tif"/>
</fig>
<p>A performance comparison was carried out between the improved weighted-average algorithm and our proposed approach.</p>
<p>As shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, the experimental results demonstrate that the weighted average fusion algorithm effectively eliminates visible stitching seams, thereby significantly enhancing visual quality and practical utility. Prior to applying the weighted average fusion algorithm, as shown in (a), the PSNR value was 25 dB. Following its application, shown in (b), the measured PSNR value increased to 48 dB, indicating higher fidelity in the fused image.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Comparison of the performance of different image fusion algorithms.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-10.tif"/>
</fig>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Video Processing Latency Experimental Results</title>
<p>As shown in <xref ref-type="fig" rid="fig-11">Fig. 11</xref>, the total latency in video transmission originates primarily from video acquisition, encoding, streaming, decoding, rendering, and display. Denote the end-to-end latency as &#x0394;<italic>T</italic>, the initial timestamp as <italic>T</italic><sub>1</sub>, and the timestamp after decoding as <italic>T</italic><sub>2</sub>. The latency is computed as follows:
<disp-formula id="eqn-17"><label>(17)</label><mml:math id="mml-eqn-17" display="block"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula></p>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Procedure for latency testing.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-11.tif"/>
</fig>
<p>We evaluated the latency of the acquisition, encoding, and streaming processes using four video sequences at 720 p resolution and 25 fps. The measured latency for each processing stage is presented in <xref ref-type="fig" rid="fig-12">Fig. 12</xref>.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Video sender processing latency.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-12.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-12">Fig. 12</xref>, video processing latency occurs predominantly during the acquisition stage, which is largely constrained by the video frame rate. We further measured the latency introduced during stream retrieval, decoding, and rendering on the client side; the corresponding results are presented in <xref ref-type="fig" rid="fig-13">Fig. 13</xref>.</p>
<fig id="fig-13">
<label>Figure 13</label>
<caption>
<title>Video receiver processing latency.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-13.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-13">Fig. 13</xref>, image display and rendering account for the majority of the latency at the receiver side. A comparative analysis of the video transmission and processing latency between the open-source solution and the embedded platform-optimized design was conducted, with the results presented in <xref ref-type="fig" rid="fig-14">Fig. 14</xref>.</p>
<fig id="fig-14">
<label>Figure 14</label>
<caption>
<title>Latency comparison: classic vs. new schemes.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-14.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-14">Fig. 14</xref>, the proposed optimized framework reduces end-to-end video transmission latency by an average of nearly 20 ms compared to the classic scheme. We adopted an ARM-based embedded Allwinner T5 processor. This chip integrates a quad-core Cortex-A53 CPU and GPU, supports multiple video input interfaces, and is compatible with OpenGL ES 1.0/2.0/3.2, Vulkan 1.1, and OpenCL 2.0. A comparative analysis between the classic and proposed methods was conducted in terms of memory usage, CPU utilization, and GPU utilization, with the results summarized in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Resource utilization rates under different schemes.</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Parameter</th>
<th>Classic Scheme</th>
<th>Our Scheme</th>
</tr>
</thead>
<tbody>
<tr>
<td>VIRT</td>
<td>8.3 MB</td>
<td>8.2 MB</td>
</tr>
<tr>
<td>RES</td>
<td>5.4 MB</td>
<td>5.3 MB</td>
</tr>
<tr>
<td>CPU (%)</td>
<td>3.1</td>
<td>1.3</td>
</tr>
<tr>
<td>GPU (%)</td>
<td>1.4</td>
<td>3.2</td>
</tr>
<tr>
<td>MEM (%)</td>
<td>0.4</td>
<td>0.3</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As summarized in <xref ref-type="table" rid="table-1">Table 1</xref>, the proposed solution significantly reduces utilization of general-purpose processing resources on the host platform by efficiently offloading computation to the video processor&#x2019;s GPU, thereby improving overall system performance. To closely emulate real-world conditions&#x2014;where factors such as adverse weather, physical obstructions, and platform mobility often lead to unstable transmission channels, packet loss, and performance degradation&#x2014;we conducted ten latency tests. By placing jamming devices in the wireless channel, the video transmission latency under different network packet loss rates was simulated. Under stable wireless link conditions with a transmission latency of 30 ms, the channel bandwidth is set to 20 Mbps. The system&#x2019;s end-to-end latency remained consistently around 140 ms, as illustrated in <xref ref-type="fig" rid="fig-15">Fig. 15</xref>.</p>
<fig id="fig-15">
<label>Figure 15</label>
<caption>
<title>Video latency under different packet loss rates.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-15.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-15">Fig. 15</xref>, system latency increases with higher packet loss rates during real-time video transmission. This occurs because packet loss triggers the receiver to signal the sender to throttle the transmission rate and prioritize retransmission of lost packets. Consequently, elevated packet loss rates result in increased latency. Since the system continues to process and display successfully received packets while managing retransmissions in parallel, video playback remains uninterrupted. The correlation between packet loss rate and video transmission latency is further detailed in <xref ref-type="fig" rid="fig-16">Fig. 16</xref>.</p>
<fig id="fig-16">
<label>Figure 16</label>
<caption>
<title>Packet loss vs. video transmission performance.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-16.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-16">Fig. 16</xref>, the video transmission system maintains operational integrity at packet loss rates of up to 20%, as a result, the video transmission latency increased to nearly 200 ms. Beyond this threshold, system performance degrades significantly with further increases in packet loss. Moreover, under identical packet loss conditions, the proposed scheme achieves lower transmission latency compared to conventional methods. Based on user experience, video transmission quality is rated on a scale from 0 to 10, with higher scores indicating superior quality. The relationship between packet loss, latency, and perceived user experience is further illustrated in <xref ref-type="fig" rid="fig-17">Fig. 17</xref>.</p>
<fig id="fig-17">
<label>Figure 17</label>
<caption>
<title>Video quality assessment under different latency and packet loss conditions.</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_72692-fig-17.tif"/>
</fig>
<p>As shown in <xref ref-type="fig" rid="fig-17">Fig. 17</xref>, lower latency and reduced packet loss rates contribute to enhanced video transmission quality and improved user experience. The proposed solution achieves a video transmission latency below 200 ms and maintains satisfactory video quality (corresponding to a QoS level of 5) even under packet loss rates of up to 20%. In contrast, conventional schemes exceed 200 ms latency once packet loss surpasses 5%, failing to meet user requirements. Therefore, under identical network conditions, our approach delivers superior video quality compared to traditional methods and demonstrates robust performance in adverse network environments typical of real-world battlefield scenarios.</p>

</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This paper presents an embedded processing platform designed for real-time stitching of multiple video streams. Video data captured from multiple cameras are collected and processed using OpenCV-based algorithms. The processing pipeline involves feature point extraction, homography matrix calculation, perspective transformation, and image fusion, resulting in a seamless panoramic video with an extended field of view. Through optimization of the image fusion algorithm, the stitching quality is significantly improved. The system supports concurrent video input from four Analog High Definition (AHD) cameras and enables hardware-accelerated compression and low-latency transmission of four simultaneous video streams at 720p@25fps. Experimental results show that the overall latency of the embedded surveillance system remains around 140 ms. Even under a packet loss rate of 20%, the system sustains satisfactory video transmission quality, meeting the requirements for unmanned vehicle video communication applications. The proposed system offers key advantages such as low latency, high reliability minimal distortion, and flexible deployment, making it highly suitable for onboard wireless video communication systems.</p>
<p>Future work will focus on further enhancing the system&#x2019;s resilience to even higher packet loss rates and more dynamic network conditions, by integrating a deep learning algorithm, the system achieves real-time perception of the current network environment, capturing metrics such as communication distance, latency, link packet loss rate, vehicle speed, and channel bandwidth. These parameters are then comprehensively processed by the algorithm, and the computational results are used to dynamically adjust system parameters, including the retransmission mechanism and video pixel extraction/processing strategies, thereby improving its overall robustness and real-time performance in challenging operational environments.</p>
</sec>
</body>
<back>
<ack>
<p>The professors, experts, and students from the School of Cyber Science and Engineering at Shandong University and the Systems Engineering Research Institute of the Academy of Military Sciences provided in-depth theoretical guidance and experimental assistance for this research. Engineers from the network communication research team of China Electronics Wuhan Zhongyuan Communication Co., Ltd. offered substantial support during the construction of the physical test and verification platform. We would like to express our sincere gratitude to all of them. We also extend our thanks to the experts who reviewed this paper amidst their busy schedules.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The work has partly been supported by the National Natural Science Foundation of China (Grant No. 72334003), the National Key Research and Development Program of China (Grant No. 2022YFB2702804), the Shandong Key Research and Development Program (Grant No. 2020ZLYS09), and the Jinan Program (Grant No. 2021GXRC084-2).</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>Tai Liu was primarily responsible for proposing the research framework, conducting algorithm simulations, and drafting the manuscript. Feng Wu focused on the design of high-dynamic application scenarios for unmanned vehicles and the analysis of user requirements. Chao Zhu led the implementation of the algorithm in practical software development. Bo Chen was chiefly engaged in the hardware development of the physical system. Mao Ye provided experimental guidance for the research plan, while Guoyan Zhang was mainly responsible for its theoretical direction. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The empirical data reported in this study are grounded in actual measurements from hardware prototypes, ensuring their validity and reliability.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>This study was approved by the Ethics Committee of Shandong University.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Dai</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Integration of multi-channel video and GIS based on LOD</article-title>. In: Proceedings of the <conf-name>2021 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA); 2021 Jun 28&#x2013;30</conf-name>; <publisher-loc>Dalian, China</publisher-loc>. p. <fpage>963</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/icaica52286.2021.9497915</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><collab>IEEE 1857.10-2021</collab></person-group>. <source>IEEE standard for third-generation video coding</source>. <publisher-loc>Piscataway, NJ, USA</publisher-loc>: <publisher-name>IEEE Standards Association</publisher-name>; <year>2022</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Simmers</surname> <given-names>E</given-names></string-name>, <string-name><surname>Salman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Day</surname> <given-names>E</given-names></string-name>, <string-name><surname>Oracevic</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Secure and intelligent video surveillance using unmanned aerial vehicles</article-title>. In: Proceedings of the <conf-name>2023 IEEE International Conference on Smart Mobility (SM); 2023 Mar 19&#x2013;21</conf-name>; <publisher-loc>Thuwal, Saudi Arabia</publisher-loc>. p. <fpage>98</fpage>&#x2013;<lpage>103</lpage>. doi:<pub-id pub-id-type="doi">10.1109/SM57895.2023.10112347</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Minopoulos</surname> <given-names>G</given-names></string-name>, <string-name><surname>Memos</surname> <given-names>VA</given-names></string-name>, <string-name><surname>Psannis</surname> <given-names>KE</given-names></string-name>, <string-name><surname>Ishibashi</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Comparison of video codecs performance for real-time transmission</article-title>. In: Proceedings of the <conf-name>2020 2nd International Conference on Computer Communication and the Internet (ICCCI); 2020 Jun 26&#x2013;29</conf-name>; <publisher-loc>Nagoya, Japan</publisher-loc>. p. <fpage>110</fpage>&#x2013;<lpage>4</lpage>. doi:<pub-id pub-id-type="doi">10.1109/iccci49374.2020.9145973</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sharma</surname> <given-names>P</given-names></string-name>, <string-name><surname>Awasare</surname> <given-names>D</given-names></string-name>, <string-name><surname>Jaiswal</surname> <given-names>B</given-names></string-name>, <string-name><surname>Mohan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Abinaya</surname> <given-names>N</given-names></string-name>, <string-name><surname>Darwhekar</surname> <given-names>I</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>On the latency in vehicular control using video streaming over Wi-Fi</article-title>. In: Proceedings of the <conf-name>2020 National Conference on Communications (NCC); 2020 Feb 21&#x2013;23</conf-name>; <publisher-loc>Kharagpur, India</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ncc48643.2020.9056067</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Anitha Kumari</surname> <given-names>RD</given-names></string-name>, <string-name><surname>Udupa</surname> <given-names>N</given-names></string-name></person-group>. <article-title>A study of the evolution of video codec and its future research direction</article-title>. In: Proceedings of the <conf-name>2020 Third International Conference on Advances in Electronics, Computers and Communications (ICAECC); 2020 Dec 11&#x2013;12</conf-name>; <publisher-loc>Bengaluru, India</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>13</lpage>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Balobanov</surname> <given-names>V</given-names></string-name>, <string-name><surname>Balobanov</surname> <given-names>A</given-names></string-name>, <string-name><surname>Potashnikov</surname> <given-names>A</given-names></string-name>, <string-name><surname>Vlasuyk</surname> <given-names>I</given-names></string-name></person-group>. <article-title>Low latency ONM video compression method for UAV control and communication</article-title>. In: Proceedings of the <conf-name>2018 Systems of Signals Generating and Processing in the Field of on Board Communications; 2018 Mar 14&#x2013;15</conf-name>; <publisher-loc>Moscow, Russia</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/SOSG.2018.8350571</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>R</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>G</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Panoramic video live broadcasting system based on global distribution</article-title>. In: Proceedings of the <conf-name>2019 Chinese Automation Congress (CAC); 2019 Nov 22&#x2013;24</conf-name>; <publisher-loc>Hangzhou, China</publisher-loc>. p. <fpage>63</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/cac48633.2019.8996293</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Rasch</surname> <given-names>J</given-names></string-name>, <string-name><surname>Warno</surname> <given-names>V</given-names></string-name>, <string-name><surname>Ptatt</surname> <given-names>J</given-names></string-name>, <string-name><surname>Tischendorf</surname> <given-names>C</given-names></string-name>, <string-name><surname>Marpe</surname> <given-names>D</given-names></string-name>, <string-name><surname>Schwarz</surname> <given-names>H</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A signal adaptive diffusion filter for video coding using directional total variation</article-title>. In: Proceedings of the <conf-name>2018 25th IEEE International Conference on Image Processing (ICIP); 2018 Oct 7&#x2013;10</conf-name>; <publisher-loc>Athens, Greece</publisher-loc>. p. <fpage>2570</fpage>&#x2013;<lpage>4</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICIP.2018.8451579</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Skupin</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sanchez</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>YK</given-names></string-name>, <string-name><surname>Hannuksela</surname> <given-names>MM</given-names></string-name>, <string-name><surname>Boyce</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wien</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Standardization status of 360 degree video coding and delivery</article-title>. In: Proceedings of the <conf-name>2017 IEEE Visual Communications and Image Processing (VCIP); 2017 Dec 10&#x2013;13</conf-name>; <publisher-loc>St. Petersburg, FL, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>4</lpage>. doi:<pub-id pub-id-type="doi">10.1109/VCIP.2017.8305083</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Equirectangular projection oriented intra prediction for 360-degree video coding</article-title>. In: Proceedings of the <conf-name>2020 IEEE International Conference on Visual Communications and Image Processing (VCIP); 2020 Dec 1&#x2013;4</conf-name>; <publisher-loc>Macau, China</publisher-loc>. p. <fpage>483</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/vcip49819.2020.9301871</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xue</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>S</given-names></string-name></person-group>. <article-title>A parallel acceleration method in panoramic video mosaic system based on 5G Internet of Things</article-title>. In: Proceedings of the <conf-name>2020 International Wireless Communications and Mobile Computing (IWCMC); 2020 Jun 15&#x2013;19</conf-name>; <publisher-loc>Limassol, Cyprus</publisher-loc>. p. <fpage>1995</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/iwcmc48107.2020.9148230</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>A</given-names></string-name>, <string-name><surname>Jiang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zou</surname> <given-names>L</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Z</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>A super-resolution flexible video coding solution for improving live streaming quality</article-title>. <source>IEEE Trans Multimed</source>. <year>2023</year>;<volume>25</volume>:<fpage>6341</fpage>&#x2013;<lpage>55</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tmm.2022.3207580</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zhao</surname> <given-names>X</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>H</given-names></string-name></person-group>. <article-title>An ORB-SLAM3 autonomous positioning and orientation approach using 360-degree panoramic video</article-title>. In: Proceedings of the <conf-name>2022 29th International Conference on Geoinformatics; 2022 Aug 15&#x2013;18</conf-name>; <publisher-loc>Beijing, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/Geoinformatics57846.2022.9963855</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Qiu</surname> <given-names>S</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Duan</surname> <given-names>G</given-names></string-name>, <string-name><surname>Li</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Multi-directional reconstruction algorithm for panoramic camera</article-title>. <source>Comput Mater Contin</source>. <year>2020</year>;<volume>65</volume>(<issue>1</issue>):<fpage>433</fpage>&#x2013;<lpage>43</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2020.09708</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Woo Han</surname> <given-names>S</given-names></string-name>, <string-name><surname>Young Suh</surname> <given-names>D</given-names></string-name></person-group>. <article-title>A 360-degree panoramic image inpainting network using a cube map</article-title>. <source>Comput Mater Contin</source>. <year>2020</year>;<volume>66</volume>(<issue>1</issue>):<fpage>213</fpage>&#x2013;<lpage>28</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2020.012223</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>K</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Research on a panoramic video transmission scheme based on multi-view switching within the human eye&#x2019;s field of view</article-title>. In: Proceedings of the <conf-name>2022 34th Chinese Control and Decision Conference (CCDC); 2022 Aug 15&#x2013;17</conf-name>; <publisher-loc>Hefei, China</publisher-loc>. p. <fpage>2286</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CCDC55256.2022.10033652</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chao</surname> <given-names>J</given-names></string-name>, <string-name><surname>Huitl</surname> <given-names>R</given-names></string-name>, <string-name><surname>Steinbach</surname> <given-names>E</given-names></string-name>, <string-name><surname>Schroeder</surname> <given-names>D</given-names></string-name></person-group>. <article-title>A novel rate control framework for SIFT/SURF feature preservation in H.264/AVC video compression</article-title>. <source>IEEE Trans Circuits Syst Video Technol</source>. <year>2015</year>;<volume>25</volume>(<issue>6</issue>):<fpage>958</fpage>&#x2013;<lpage>72</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TCSVT.2014.2367354</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Manel</surname> <given-names>B</given-names></string-name></person-group>. <article-title>Block-based distributed video coding without channel codes</article-title>. In: Proceedings of the <conf-name>2015 3rd International Conference on Control, Engineering &#x0026; Information Technology (CEIT); 2015 May 25&#x2013;27</conf-name>; <publisher-loc>Tlemcen, Algeria</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CEIT.2015.7233154</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>W</given-names></string-name>, <string-name><surname>Xiong</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>D</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Joint optimization of transform and quantization for high efficiency video coding</article-title>. <source>IEEE Access</source>. <year>2019</year>;<volume>7</volume>:<fpage>62534</fpage>&#x2013;<lpage>44</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ACCESS.2019.2917260</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Duong</surname> <given-names>LR</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>C</given-names></string-name>, <string-name><surname>Han</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Multi-rate adaptive transform coding for video compression</article-title>. In: Proceedings of the <conf-name>ICASSP 2023&#x2014;2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP); 2023 Jun 4&#x2013;10</conf-name>; <publisher-loc>Rhodes Island, Greece</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICASSP49357.2023.10095879</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ding</surname> <given-names>L</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Fan</surname> <given-names>H</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Rate-performance-loss optimization for inter-frame deep feature coding from videos</article-title>. <source>IEEE Trans Image Process</source>. <year>2017</year>;<volume>26</volume>(<issue>12</issue>):<fpage>5743</fpage>&#x2013;<lpage>57</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIP.2017.2745203</pub-id>; <pub-id pub-id-type="pmid">28858800</pub-id></mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chiu</surname> <given-names>JC</given-names></string-name>, <string-name><surname>Tseng</surname> <given-names>HY</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>ZY</given-names></string-name></person-group>. <article-title>Design of multidimension-media streaming protocol based on RTSP</article-title>. In: Proceedings of the <conf-name>2020 International Computer Symposium (ICS); 2020 Dec 17&#x2013;19</conf-name>; <publisher-loc>Tainan, Taiwan</publisher-loc>. p. <fpage>341</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ics51289.2020.00074</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>B</given-names></string-name>, <string-name><surname>Yuen</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Joint coding-transmission optimization for a video surveillance system with multiple cameras</article-title>. <source>IEEE Trans Multimed</source>. <year>2018</year>;<volume>20</volume>(<issue>3</issue>):<fpage>620</fpage>&#x2013;<lpage>33</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TMM.2017.2748459</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bakirci</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A novel swarm unmanned aerial vehicle system: incorporating autonomous flight, real-time object detection, and coordinated intelligence for enhanced performance</article-title>. <source>Trait Du Signal</source>. <year>2023</year>;<volume>40</volume>(<issue>5</issue>):<fpage>2063</fpage>&#x2013;<lpage>78</lpage>. doi:<pub-id pub-id-type="doi">10.18280/ts.400524</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Aloman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ispas</surname> <given-names>AI</given-names></string-name>, <string-name><surname>Ciotirnae</surname> <given-names>P</given-names></string-name>, <string-name><surname>Sanchez-Iborra</surname> <given-names>R</given-names></string-name>, <string-name><surname>Cano</surname> <given-names>MD</given-names></string-name></person-group>. <article-title>Performance evaluation of video streaming using MPEG DASH, RTSP, and RTMP in mobile networks</article-title>. In: Proceedings of the <conf-name>2015 8th IFIP Wireless and Mobile Networking Conference (WMNC); 2015 Oct 5&#x2013;7</conf-name>; <publisher-loc>Munich, Germany</publisher-loc>. p. <fpage>144</fpage>&#x2013;<lpage>51</lpage>. doi:<pub-id pub-id-type="doi">10.1109/WMNC.2015.12</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Mamta</surname></string-name>, <string-name><surname>Pillai</surname> <given-names>A</given-names></string-name>, <string-name><surname>Punj</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Image splicing detection using retinex based contrast enhancement and deep learning</article-title>. In: Proceedings of the <conf-name>2023 International Conference on Advanced Computing &#x0026; Communication Technologies (ICACCTech); 2023 Dec 23&#x2013;24</conf-name>; <publisher-loc>Banur, India</publisher-loc>. p. <fpage>771</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICACCTech61146.2023.00127</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Siqueira</surname> <given-names>I</given-names></string-name>, <string-name><surname>Correa</surname> <given-names>G</given-names></string-name>, <string-name><surname>Grellert</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Complexity and coding efficiency assessment of the versatile video coding standard</article-title>. In: Proceedings of the <conf-name>2021 IEEE International Symposium on Circuits and Systems (ISCAS); 2021 May 22&#x2013;28</conf-name>; <publisher-loc>Daegu, Republic of Korea</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/iscas51556.2021.9401714</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lim</surname> <given-names>WQ</given-names></string-name>, <string-name><surname>Schwarz</surname> <given-names>H</given-names></string-name>, <string-name><surname>Marpe</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wiegand</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Post sample adaptive offset for video coding</article-title>. In: Proceedings of the <conf-name>2019 Picture Coding Symposium (PCS); 2019 Nov 12&#x2013;15</conf-name>; <publisher-loc>Ningbo, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/PCS48520.2019.8954544</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shilpa</surname> <given-names>KS</given-names></string-name>, <string-name><surname>Narayan</surname> <given-names>DG</given-names></string-name>, <string-name><surname>Kotabagi</surname> <given-names>S</given-names></string-name>, <string-name><surname>Uma</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Suitability analysis of IEEE 802.15.4 networks for video surveillance</article-title>. In: Proceedings of the <conf-name>2011 International Conference on Computational Intelligence and Communication Networks; 2011 Oct 7&#x2013;9</conf-name>; <publisher-loc>Gwalior, India</publisher-loc>. p. <fpage>702</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CICN.2011.153</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Shi</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Image mosaics algorithm based on feature points matching</article-title>. In: Proceedings of the <conf-name>2011 International Conference on Electronics, Communications and Control (ICECC); 2011 Sep 9&#x2013;11</conf-name>; <publisher-loc>Ningbo, China</publisher-loc>. p. <fpage>278</fpage>&#x2013;<lpage>81</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICECC.2011.6067889</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Fang</surname> <given-names>JT</given-names></string-name>, <string-name><surname>Tu</surname> <given-names>YL</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>LP</given-names></string-name>, <string-name><surname>Chang</surname> <given-names>PC</given-names></string-name></person-group>. <article-title>Real-time complexity control for high efficiency video coding</article-title>. In: Proceedings of the <conf-name>2018 IEEE International Conference on Information Communication and Signal Processing (ICICSP); 2018 Sep 28&#x2013;30</conf-name>; <publisher-loc>Singapore</publisher-loc>. p. <fpage>85</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICICSP.2018.8549738</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ye</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>L</given-names></string-name>, <string-name><surname>Milne</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hillier</surname> <given-names>J</given-names></string-name>, <string-name><surname>S&#x00F8;lvsten</surname> <given-names>S</given-names></string-name></person-group>. <article-title>GAN-enabled framework for fire risk assessment and mitigation of building blueprints</article-title>. In: <conf-name>Proceedings of the 30th EG-ICE: International Conference on Intelligent Computing in Engineering; 2023 Jul 4&#x2013;7</conf-name>; <publisher-loc>London, UK</publisher-loc>. </mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>D</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ye</surname> <given-names>M</given-names></string-name>, <string-name><surname>S&#x00F8;lvsten</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Automated fire risk assessment and mitigation in building blueprints using computer vision and deep generative models</article-title>. <source>Adv Eng Inform</source>. <year>2024</year>;<volume>62</volume>(<issue>6</issue>):<fpage>102614</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.aei.2024.102614</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shinde</surname> <given-names>PS</given-names></string-name>, <string-name><surname>Dongre</surname> <given-names>YV</given-names></string-name></person-group>. <article-title>Objective video quality assessment based on SURF feature matching</article-title>. In: Proceedings of the <conf-name>2019 5th International Conference on Computing, Communication, Control and Automation (ICCUBEA); 2019 Sep 19&#x2013;21</conf-name>; <publisher-loc>Pune, India</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCUBEA47591.2019.9129297</pub-id>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jadhav</surname> <given-names>D</given-names></string-name>, <string-name><surname>Bhosle</surname> <given-names>U</given-names></string-name></person-group>. <article-title>SURF based video summarization and its optimization</article-title>. In: Proceedings of the <conf-name>2017 International Conference on Communication and Signal Processing (ICCSP); 2017 Apr 6&#x2013;8</conf-name>; <publisher-loc>Chennai, India</publisher-loc>. p. <fpage>1252</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCSP.2017.8286581</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jia</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>W</given-names></string-name>, <string-name><surname>Rong</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>An improvement in image registration with SIFT features and logistic regression</article-title>. In: Proceedings of the <conf-name>2018 11th International Congress on Image and Signal Processing, Biomedical Engineering and Informatics (CISP-BMEI); 2018 Oct 13&#x2013;15</conf-name>; <publisher-loc>Beijing, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CISP-BMEI.2018.8633020</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Pei</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>B</given-names></string-name>, <string-name><surname>Tao</surname> <given-names>Q</given-names></string-name></person-group>. <article-title>Multivideo mosaic based on SIFT algorithm</article-title>. In: Proceedings of the <conf-name>2011 International Conference on Computer Science and Network Technology; 2011 Dec 24&#x2013;26</conf-name>; <publisher-loc>Harbin, China</publisher-loc>. p. <fpage>1497</fpage>&#x2013;<lpage>501</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICCSNT.2011.6182249</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Raheem</surname> <given-names>HA</given-names></string-name>, <string-name><surname>Al-Assadi</surname> <given-names>TA</given-names></string-name></person-group>. <article-title>Video important shot detection based on ORB algorithm and FLANN technique</article-title>. In: Proceedings of the <conf-name>2022 8th International Engineering Conference on Sustainable Technology and Development (IEC); 2022 Feb 23&#x2013;24</conf-name>; <publisher-loc>Erbil, Iraq</publisher-loc>. p. <fpage>113</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1109/IEC54822.2022.9807488</pub-id>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Choi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Park</surname> <given-names>MW</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>H</given-names></string-name>, <string-name><surname>Choi</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ikonin</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Video codec using flexible block partitioning and advanced prediction, transform and loop filtering technologies</article-title>. <source>IEEE Trans Circuits Syst Video Technol</source>. <year>2020</year>;<volume>30</volume>(<issue>5</issue>):<fpage>1326</fpage>&#x2013;<lpage>45</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tcsvt.2020.2971268</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Saleh</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Tahir</surname> <given-names>NM</given-names></string-name>, <string-name><surname>Hashim</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Coding structure and performance of high efficiency video coding (HEVC) and H.264/AVC</article-title>. In: Proceedings of the <conf-name>2015 IEEE Symposium on Computer Applications &#x0026; Industrial Electronics (ISCAIE); 2015 Apr 12&#x2013;14</conf-name>; <publisher-loc>Langkawi, Malaysia</publisher-loc>. p. <fpage>53</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ISCAIE.2015.7298327</pub-id>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Song</surname> <given-names>L</given-names></string-name>, <string-name><surname>Luo</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Lagrangian method based rate-distortion optimization revisited for dependent video coding</article-title>. In: Proceedings of the <conf-name>2017 IEEE International Conference on Image Processing (ICIP); 2017 Sep 17&#x2013;20</conf-name>; <publisher-loc>Beijing, China</publisher-loc>. p. <fpage>3021</fpage>&#x2013;<lpage>5</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ICIP.2017.8296837</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>