<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">IASC</journal-id>
<journal-id journal-id-type="nlm-ta">IASC</journal-id>
<journal-id journal-id-type="publisher-id">IASC</journal-id>
<journal-title-group>
<journal-title>Intelligent Automation &#x0026; Soft Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2326-005X</issn>
<issn pub-type="ppub">1079-8587</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta><!--type4-->
<article-id pub-id-type="publisher-id">17200</article-id>
<article-id pub-id-type="doi">10.32604/iasc.2021.017200</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Key Frame Extraction of Surveillance Video Based on Frequency Domain Analysis</article-title><alt-title alt-title-type="left-running-head">Key Frame Extraction of Surveillance Video Based on Frequency Domain Analysis</alt-title><alt-title alt-title-type="right-running-head">Key Frame Extraction of Surveillance Video Based on Frequency Domain Analysis</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western">
<surname>Zhang</surname>
<given-names>Yunzuo</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
<email>zhangyunzuo888@sina.com</email>
</contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western">
<surname>Zhang</surname>
<given-names>Shasha</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western">
<surname>Zhang</surname>
<given-names>Jiayu</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western">
<surname>Guo</surname>
<given-names>Kaina</given-names>
</name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western">
<surname>Cai</surname>
<given-names>Zhaoquan</given-names>
</name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<aff id="aff-1">
<label>1</label><institution>School of Information Science and Technology, Shijiazhuang Tiedao University</institution>, <addr-line>Shijiazhuang, 050043</addr-line>, <country>China</country></aff>
<aff id="aff-2">
<label>2</label><institution>Department of Computer Science and Engineering, Huizhou University</institution>, <addr-line>Huizhou, 516007</addr-line>, <country>China</country></aff>
</contrib-group><author-notes><corresp id="cor1">&#x002A;Corresponding Author: Yunzuo Zhang. Email: 
<email>zhangyunzuo888@sina.com</email></corresp></author-notes>
<pub-date pub-type="epub" date-type="pub" iso-8601-date="2021-04-02">
<day>02</day>
<month>04</month>
<year iso-8601-date="2021">2021</year>
</pub-date>
<volume>29</volume>
<issue>1</issue>
<fpage>259</fpage>
<lpage>272</lpage>
<history>
<date date-type="received">
<day>23</day>
<month>1</month>
<year iso-8601-date="2021">2021</year>
</date>
<date date-type="accepted">
<day>25</day>
<month>3</month>
<year iso-8601-date="2021">2021</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2021 Zhang et al.</copyright-statement>
<copyright-year>2021</copyright-year>
<copyright-holder>Zhang et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_IASC_17200.pdf"></self-uri>
<abstract>
<p>Video key frame extraction, reputed as an essential step in video analysis and content-based video retrieval, and meanwhile, also serves as the basis and premise of generating video synopsis. Video key frame extraction means extracting the meaningful parts of the video by analyzing their content and structure to form a concise and semantically expressive summary. Up to now, people have achieved many research results in key frame extraction. Nevertheless, because the surveillance video has no specific structure, such as news, sports games, and other videos, it is not accurate enough to directly extract the key frame with the existing effective key frame extraction method. Hence, based on frequency domain analysis, this paper proposed a key frame extraction method for surveillance video, which obtains the frequency spectrum and phase spectrum by performing Fourier transform on the surveillance video frames. Using the frequency domain information of two adjacent frames can accurately reflect the global motion state changes and local motion state changes of the moving target. Experimental results show that the proposed method is correct and effective, and the extracted key frames can more accurately capture the changes in the global and local motion states of the target compared with the previous methods.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Frequency domain analysis</kwd>
<kwd>frequency spectrum</kwd>
<kwd>Fourier transform</kwd>
<kwd>local motion states</kwd>
<kwd>key frame extraction</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In many fields, such as the national economic construction and public security information construction, surveillance video plays an increasingly important role. The infrastructure of the video surveillance system has begun to take shape and is still developing rapidly. With thousands of surveillance cameras monitoring and recording round the clock, the amount of video data has exploded, so finding the required information in many surveillance videos is undoubtedly a needle in a haystack [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-6">6</xref>]. For this reason, we need to express this information concisely and clearly, which involves the key frame extraction method.</p>
<p>Key frame extraction is an essential step in video analysis and content-based video retrieval. Simultaneously, it is also the basis and premise for generating video synopsis [<xref ref-type="bibr" rid="ref-7">7</xref>&#x2013;<xref ref-type="bibr" rid="ref-9">9</xref>]. Video key frame extraction is to extract the meaningful parts of the video by analyzing their content and structure, thereby forming a concise summary capable of expressing semantics to eliminate redundancy, shorten the video time, and improve the browsing and query efficiency. At present, the technology of extracting key frames in surveillance video has made a significant breakthrough [<xref ref-type="bibr" rid="ref-10">10</xref>&#x2013;<xref ref-type="bibr" rid="ref-18">18</xref>]. Nevertheless, applying the existing effective video key frame extraction directly to surveillance video will result in inaccurate keyframe extraction. Unlike news, sports programs, or other videos with structured information, since surveillance video does not have a specific structure, many algorithms for extracting key frames are not suitable for surveillance video [<xref ref-type="bibr" rid="ref-19">19</xref>&#x2013;<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
<p>Therefore, this paper proposed a key frame extraction method with surveillance video based on frequency domain analysis [<xref ref-type="bibr" rid="ref-22">22</xref>&#x2013;<xref ref-type="bibr" rid="ref-29">29</xref>]. The method can perform frequency domain analysis on the video frame to obtain its frequency spectrum and phase spectrum. Furthermore, the changes in the frequency domain information of adjacent video frames, which means the changes in the frequency spectrum and phase spectrum of the video frame can reflect the changes in the global and local motion status of moving targets in the video. Using the change of the frequency domain information of two adjacent frames can accurately reflect the change of the local motion state of the target in the video to achieve the purpose of accurately extracting key frames.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Frequency Domain Analysis</title>
<p>The frequency-domain processing of an image is to convert the image into a frequency domain and then modify and delete the frequency domain information to optimize the image quality. Two-dimensional frequency domain processing provides a new perspective for target recognition and detection of video images. These problems, which are backbreaking to solve in the airspace, may be relatively easier to solve by the transformation domain. The method can provide fresh ideas to deal with the problem [<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-31">31</xref>]. The image frequency domain transforms methods mainly include: Fourier transform, discrete cosine transform, two-dimensional orthogonal transform, and wavelet transform. The following will introduce the two-dimensional Fourier transform involved in this paper and analyze its feasibility.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Two-Dimensional Discrete Fourier Transform</title>
<p>In general, the formula of the two-dimensional Fourier transform is:</p>
<p><disp-formula id="eqn-1">
<label>(1)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-1.png"/><tex-math id="tex-eqn-1"><![CDATA[F\left( {u,v} \right) = \mathop \int \nolimits_{ - \infty }^\infty \mathop \int \nolimits_{ - \infty }^\infty f\left( {x,y} \right){e^{ - j2\pi \left( {ux + vy} \right)}}dxdy]]></tex-math>--><mml:math id="mml-eqn-1" display="block"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mrow><mml:mo largeop="false">&#x222B;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:mrow><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mrow><mml:mo largeop="false">&#x222B;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:mrow><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>j</mml:mi><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mi>x</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>v</mml:mi><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mi>d</mml:mi><mml:mi>x</mml:mi><mml:mi>d</mml:mi><mml:mi>y</mml:mi></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>The inverse formula is:</p>
<p><disp-formula id="eqn-2">
<label>(2)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-2.png"/><tex-math id="tex-eqn-2"><![CDATA[f\left( {x,y} \right) = \mathop \int \nolimits_{ - \infty }^\infty \mathop \int \nolimits_{ - \infty }^\infty F\left( {u,v} \right){e^{j2\pi \left( {ux + vy} \right)}}dudv]]></tex-math>--><mml:math id="mml-eqn-2" display="block"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:msubsup><mml:mrow><mml:mo largeop="false">&#x222B;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:mrow><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mrow><mml:mo largeop="false">&#x222B;</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:mrow><mml:mi mathvariant="normal">&#x221E;</mml:mi></mml:msubsup><mml:mo>&#x2061;</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mi>j</mml:mi><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mi>x</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>v</mml:mi><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mi>d</mml:mi><mml:mi>u</mml:mi><mml:mi>d</mml:mi><mml:mi>v</mml:mi></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>The above formula can verify that the Fourier transform has separation, periodicity, conjugate symmetry, linearity, rotation, and proportionality.</p>
<p>For an image of size <inline-formula id="ieqn-1">
<!--<alternatives><inline-graphic xlink:href="ieqn-1.tif"/><tex-math id="tex-ieqn-1"><![CDATA[m \times n]]></tex-math>--><mml:math id="mml-ieqn-1"><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>n</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, the equation of the discrete Fourier transform is as follow:</p>
<p><disp-formula id="eqn-3">
<label>(3)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-3.png"/><tex-math id="tex-eqn-3"><![CDATA[F\left( {u,v} \right) = \displaystyle{1 \over {mn}}\mathop \sum \limits_{x = 0}^{m - 1} \mathop \sum \limits_{y = 0}^{n - 1} f\left( {x,y} \right){e^{ - j2\pi \left( {ux/m + vy/n} \right)}}]]></tex-math>--><mml:math id="mml-eqn-3" display="block"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>m</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>j</mml:mi><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>v</mml:mi><mml:mi>y</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Among them, use variables <inline-formula id="ieqn-2">
<!--<alternatives><inline-graphic xlink:href="ieqn-2.tif"/><tex-math id="tex-ieqn-2"><![CDATA[u]]></tex-math>--><mml:math id="mml-ieqn-2"><mml:mi>u</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> and <inline-formula id="ieqn-3">
<!--<alternatives><inline-graphic xlink:href="ieqn-3.tif"/><tex-math id="tex-ieqn-3"><![CDATA[v]]></tex-math>--><mml:math id="mml-ieqn-3"><mml:mi>v</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> to define its frequency. The frequency-domain system is a coordinate system formed by <inline-formula id="ieqn-4">
<!--<alternatives><inline-graphic xlink:href="ieqn-4.tif"/><tex-math id="tex-ieqn-4"><![CDATA[F\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-4"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, and the frequency variable is the sum. The space domain is a coordinate system defined by <inline-formula id="ieqn-5">
<!--<alternatives><inline-graphic xlink:href="ieqn-5.tif"/><tex-math id="tex-ieqn-5"><![CDATA[f\left( {x,y} \right)]]></tex-math>--><mml:math id="mml-ieqn-5"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>. The spectral components, along with the four corners and directions of the spectrogram, are all zero. The following figure is an example; figures (b) and (c) in <xref ref-type="fig" rid="fig-1">Fig. 1</xref> show the frequency spectrum and phase spectrum of the original image, respectively.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of the Image (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-1.png"/>
</fig>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Feasibility Analysis</title>
<p>After the two-dimensional Fourier transform, the image will obtain its corresponding frequency spectrum and phase spectrum. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows the frequency spectrum and phase spectrum of a rectangular.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of the Rectangular (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-2.png"/>
</fig>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the frequency spectrum and phase spectrum of the rectangular after changing the image shape.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of the Rectangular (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-3.png"/>
</fig>
<p>Comparing <xref ref-type="fig" rid="fig-2">Figs. 2</xref> and <xref ref-type="fig" rid="fig-3">3</xref>, we can find that when the rectangle changes, although the shape of the frequency spectrum and phase spectrum does not change significantly, the value of the graphic matrix has changed.</p>
<p>According to the Fourier transform property, when the rectangle in the image has rotated, its frequency spectrum and phase spectrum will change. When the rectangle in the image has translated, the frequency spectrum will not change, but the phase spectrum will, as shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of the Rectangular (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-4.png"/>
</fig>
<p>Take the moving target as an example; in the actual video, when the moving target goes from upright to squatting, the frequency spectrum and phase spectrum of the two video frames are shown in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of Two Video Frames (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum (d) Original Image (e) Frequency Spectrum (f) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-5.png"/>
</fig>
<p><xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows that after the target squats, the frequency spectrum and phase spectrum have changed. Compared with the two frequency spectra that the shape does not have many changes; phase spectra changes can be observed directly from the image.</p>
<p>When the target appears or disappears, the frequency spectrum and phase spectrum will change. Take the target disappearing as an example; <xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows the frequency spectrum and phase spectrum of the two video frames.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of Two Video Frames (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum (d) Original Image (e) Frequency Spectrum (f) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-6.png"/>
</fig>
<p>Compare the frequency spectrum and phase spectrum of the two video frames in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, the difference between the two video frames appeared. The shape of the frequency spectrum has not changed significantly, but the data have. Comparing the frequency spectrum and phase spectrum of the four video frames in <xref ref-type="fig" rid="fig-6">Figs. 6</xref> and <xref ref-type="fig" rid="fig-5">5</xref>, almost all target movements will cause changes in the frequency spectrum and phase spectrum. Unless the target keeps unchanged except translation, at this time, the frequency spectrum does not change, but the phase spectrum will also change.</p>
<p>During the target movement, the moving target is making a turn. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows the frequency spectrum and phase spectrum of the two video frames.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>The Frequency Spectrum and Phase Spectrum of Two Video Frames (a) Original Image (b) Frequency Spectrum (c) Phase Spectrum (d) Original Image (e) Frequency Spectrum (f) Phase Spectrum</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-7.png"/>
</fig>
<p>Observing <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, we can find that the frequency spectrum and phase spectrum have changed. Compared with the frequency spectra, the phase spectra vary more obviously.</p>
<p>According to the above analysis, when the moving target in the video changes from static to motion or from motion to static, stretching, squatting, turning, the global motion states and the local motion states have changed, the frequency spectrum and phase spectrum also will change. Meanwhile, when there are multiple targets in the videos, changes in any one target will also cause changes in the frequency spectrum or phase spectrum. Therefore, it is feasible to use the frequency domain information of adjacent video frames to reflect the changes in the target motion state in the surveillance video. Based on this, the paper proposes a key frame extraction based on frequency domain analysis.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Algorithm Description</title>
<p>From the perspective of frequency domain analysis, this paper uses the frequency domain information change to reflect the change of the target motion state. By analyzing the principle of the two-dimensional discrete Fourier transform, we can find that it is reasonable to use the frequency domain information to measure the target motion change. <xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows the basic framework of the proposed key frame extraction method.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>The Basic Framework of Video Key Frame Extraction Method</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-8.png"/>
</fig>
<p>From <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, we can find that the following steps of the input surveillance video sequence are as follow:</p>
<p>Step1: Image preprocessing. Perform grayscale processing on the surveillance video sequence.</p>
<p>Step2: Fourier transform. For surveillance video frame <inline-formula id="ieqn-6">
<!--<alternatives><inline-graphic xlink:href="ieqn-6.tif"/><tex-math id="tex-ieqn-6"><![CDATA[f\left( {x,y} \right)]]></tex-math>--><mml:math id="mml-ieqn-6"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, perform a two-dimensional discrete Fourier transform by formula (4) to obtain <inline-formula id="ieqn-7">
<!--<alternatives><inline-graphic xlink:href="ieqn-7.tif"/><tex-math id="tex-ieqn-7"><![CDATA[F\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-7"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>:</p>
<p><disp-formula id="eqn-4">
<label>(4)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-4.png"/><tex-math id="tex-eqn-4"><![CDATA[F\left( {u,v} \right) = \displaystyle{1 \over {mn}}\mathop \sum \limits_{x = 0}^{m - 1} \mathop \sum \limits_{y = 0}^{n - 1} f\left( {x,y} \right){e^{ - j2\pi \left( {ux/m + vy/n} \right)}}]]></tex-math>--><mml:math id="mml-eqn-4" display="block"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>m</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>j</mml:mi><mml:mn>2</mml:mn><mml:mi>&#x03C0;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>m</mml:mi><mml:mo>&#x002B;</mml:mo><mml:mi>v</mml:mi><mml:mi>y</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>n</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Step3: Frequency spectrum and phase spectrum. According to <inline-formula id="ieqn-8">
<!--<alternatives><inline-graphic xlink:href="ieqn-8.tif"/><tex-math id="tex-ieqn-8"><![CDATA[F\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-8"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, the Fourier frequency spectrum <inline-formula id="ieqn-9">
<!--<alternatives><inline-graphic xlink:href="ieqn-9.tif"/><tex-math id="tex-ieqn-9"><![CDATA[\left| {F\left( {u,v} \right)} \right|]]></tex-math>--><mml:math id="mml-ieqn-9"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is as follow.</p>
<p><disp-formula id="eqn-5">
<label>(5)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-5.png"/><tex-math id="tex-eqn-5"><![CDATA[\left| {F\left( {u,v} \right)} \right| = {\left[ {{R^2}\left( {u,v} \right) + {I^2}\left( {u,v} \right)} \right]^{1/2}}]]></tex-math>--><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x002B;</mml:mo><mml:mrow><mml:msup><mml:mi>I</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mn>1</mml:mn><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>In the formula, <inline-formula id="ieqn-10">
<!--<alternatives><inline-graphic xlink:href="ieqn-10.tif"/><tex-math id="tex-ieqn-10"><![CDATA[R\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-10"><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> and <inline-formula id="ieqn-11">
<!--<alternatives><inline-graphic xlink:href="ieqn-11.tif"/><tex-math id="tex-ieqn-11"><![CDATA[I\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-11"><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> denote the real and imaginary parts of <inline-formula id="ieqn-12">
<!--<alternatives><inline-graphic xlink:href="ieqn-12.tif"/><tex-math id="tex-ieqn-12"><![CDATA[F\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-12"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, respectively. According to the <inline-formula id="ieqn-13">
<!--<alternatives><inline-graphic xlink:href="ieqn-13.tif"/><tex-math id="tex-ieqn-13"><![CDATA[F\left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-13"><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, the phase angle <inline-formula id="ieqn-14">
<!--<alternatives><inline-graphic xlink:href="ieqn-14.tif"/><tex-math id="tex-ieqn-14"><![CDATA[\varphi \left( {u,v} \right)]]></tex-math>--><mml:math id="mml-ieqn-14"><mml:mi>&#x03C6;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula> is as follow.</p>
<p><disp-formula id="eqn-6">
<label>(6)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-6.png"/><tex-math id="tex-eqn-6"><![CDATA[\varphi \left( {u,v} \right) = {\rm arctan}\left[ {\displaystyle{{I\left( {u,v} \right)} \over {R\left( {u,v} \right)}}} \right]]]></tex-math>--><mml:math id="mml-eqn-6" display="block"><mml:mi>&#x03C6;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x003D;</mml:mo><mml:mrow><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">r</mml:mi><mml:mi mathvariant="normal">c</mml:mi><mml:mi mathvariant="normal">t</mml:mi><mml:mi mathvariant="normal">a</mml:mi><mml:mi mathvariant="normal">n</mml:mi></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>I</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mi>R</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Step4: Calculate the mean square error MSE1 and MSE2, respectively. According to the obtained frequency spectrum and phase spectrum, we can calculate the mean square error MSE1 of the adjacent phase spectrum and the mean square error MSE2 of the adjacent frame frequency spectrum, respectively. Assuming the width and height of the video frame of the video sequence is <inline-formula id="ieqn-15">
<!--<alternatives><inline-graphic xlink:href="ieqn-15.tif"/><tex-math id="tex-ieqn-15"><![CDATA[W]]></tex-math>--><mml:math id="mml-ieqn-15"><mml:mi>W</mml:mi></mml:math>
<!--</alternatives>--></inline-formula> and <inline-formula id="ieqn-16">
<!--<alternatives><inline-graphic xlink:href="ieqn-16.tif"/><tex-math id="tex-ieqn-16"><![CDATA[H]]></tex-math>--><mml:math id="mml-ieqn-16"><mml:mi>H</mml:mi></mml:math>
<!--</alternatives>--></inline-formula>, respectively, the current frame spectrum is <inline-formula id="ieqn-17">
<!--<alternatives><inline-graphic xlink:href="ieqn-17.tif"/><tex-math id="tex-ieqn-17"><![CDATA[f\left( {x,y} \right)]]></tex-math>--><mml:math id="mml-ieqn-17"><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, and the previous frame spectrum is <inline-formula id="ieqn-18">
<!--<alternatives><inline-graphic xlink:href="ieqn-18.tif"/><tex-math id="tex-ieqn-18"><![CDATA[b\left( {x,y} \right)]]></tex-math>--><mml:math id="mml-ieqn-18"><mml:mi>b</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
<!--</alternatives>--></inline-formula>, then their mean square error MSE1 is defined as:</p>
<p><disp-formula id="eqn-7">
<label>(7)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-7.png"/><tex-math id="tex-eqn-7"><![CDATA[MSE1 = \displaystyle{1 \over {WH}}\mathop \sum \limits_{x = 0}^{W - 1} \mathop \sum \limits_{y = 0}^{H - 1} {\left| {\left| {f\left( {x,y} \right) - b\left( {x,y} \right)} \right|} \right|^2}\;]]></tex-math>--><mml:math id="mml-eqn-7" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x003D;</mml:mo><mml:mstyle scriptlevel="0" displaystyle="true"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mi>W</mml:mi><mml:mi>H</mml:mi></mml:mrow></mml:mfrac></mml:mrow><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>W</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:munderover><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>y</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mrow><mml:mi>H</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:munderover><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mi>f</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mi>b</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:msup></mml:mrow><mml:mspace width="thickmathspace"></mml:mspace></mml:mstyle></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Use the formula (7) to calculate MSE1 and MSE2.</p>
<p>Step 5: This step is to weight MSE1 and MSE2. To make the mean square error change more evident than before, we expended MSE1 by five times. The frequency spectrum contains the grey information, and the phase spectrum contains the information of the edge and the overall structure of the original image. Compared with the frequency spectrum, the phase spectrum includes more visual information, containing more critical information [<xref ref-type="bibr" rid="ref-32">32</xref>,<xref ref-type="bibr" rid="ref-33">33</xref>]. Because the phase spectrum is meaningful, we expended MSE2 by a factor of 10. Due to weighting, the curve will have a sudden shift.</p>
<p>Stpe6: Form the MSE curve. According to the above steps, the calculation formula of the mean square error MSE of the frame is:</p>
<p><disp-formula id="eqn-8">
<label>(8)</label>
<!--<alternatives><graphic mimetype="image" mime-subtype="png" xlink:href="eqn-8.png"/><tex-math id="tex-eqn-8"><![CDATA[MSE = 5MSE1 + 10MSE2]]></tex-math>--><mml:math id="mml-eqn-8" display="block"><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mo>&#x003D;</mml:mo><mml:mn>5</mml:mn><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mn>1</mml:mn><mml:mo>&#x002B;</mml:mo><mml:mn>10</mml:mn><mml:mi>M</mml:mi><mml:mi>S</mml:mi><mml:mi>E</mml:mi><mml:mn>2</mml:mn></mml:math>
<!--</alternatives>--></disp-formula></p>
<p>Calculate the MSE according to the formula (8) to form the MSE curve.</p>
<p>Step 7: Detect the peak point of the curve. The formed MSE curve detects the curve&#x2019;s peak point and extracts the video frame corresponding to the curve peak and the first frame and the last frame as candidate key frames.</p>
<p>Step 8: Extract the final key frame. Firstly, to reduce the redundancy of extracting key frames, extract the video frames where the peak changes, that is, extract the video frames whose current frame peak is N times the previous key frame peak (starting from the second peak). We set the MSE of the first frame of the video to 0, so the second frame must change suddenly. Because the first and second frames will be redundant, start from the second peak. The video frame extracted comprises the video frame at the peak mutation and the first and last frames. Then the extracted key frames are optimized using the key frame optimization criterion based on the peak signal-to-noise ratio visual discrimination mechanism [<xref ref-type="bibr" rid="ref-33">33</xref>,<xref ref-type="bibr" rid="ref-34">34</xref>], and finally, the final key frame is determined.</p>
</sec>
<sec id="s4">
<label>4</label>
<title>Experimental Results and Analysis</title>
<p>To comprehensively evaluate the correctness and effectiveness of the frequency domain-based method, we have conducted many experiments on public video sets and self-collected videos. In the following section, 4.1 extracts key frames from two typical videos and analyzes the extraction result to verify the correctness of the proposed key frame algorithm. 4.2 confirms the effectiveness of the algorithm proposed in this section.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Algorithm Correctness</title>
<p>By observing the experimental results, we can find that using the frequency domain information can accurately capture the changes in the global and local motion states of the target. This section analyzes the experimental results by verifying the correctness of the key frame extraction algorithm based on frequency domain analysis.</p>
<p>The moving target in the test video begins with the target appearing, then crouching, and ends with the target standing up. The changes in the motion state of the target in the video include changes in the global motion state, such as the appearance of the target, acceleration, deceleration, changes in local motion states during walking, and changes in topical motion states during walking. Furthermore, other local changes in motion status are incorporated, such as squatting down while walking and putting the bag in hand on the ground. In this experiment, we set the experimental parameter N &#x003D; 1. <xref ref-type="fig" rid="fig-9">Fig. 9</xref> shows the key frame extraction result except for the first and last frames.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>The Key Frame Extraction Result of Test Video by the Proposed Method</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-9.png"/>
</fig>
<p>By observing the key frame extraction results, we can find that the proposed method extracts the 13<sup>th</sup> frame of the video, in which the target&#x2019;s feet begin to enter the surveillance scene. In the 29<sup>th</sup> frame, the movement of the target&#x2019;s leg changes more significantly. The 37<sup>th</sup>, 41<sup>st</sup>, and 43<sup>rd</sup> frames represent the change in the target&#x2019;s movement speed as it travels. The 55<sup>th</sup>, 65<sup>th</sup>, and 86<sup>th</sup> frames can describe the changes in the local state of the objective as it squats down and stands.</p>
<p>To further verify the correctness, this experiment selects other video clips for key frame extraction. <xref ref-type="fig" rid="fig-10">Fig. 10</xref> shows the fragment video sequence. In this video, between two targets who are fighting, one is lying on the ground, and the other starts to run away, and the third moving target starts to appear in the process of running away. In this experiment, we set the experimental parameter N &#x003D; 1.11, and <xref ref-type="fig" rid="fig-10">Fig. 10</xref> shows the key frame extraction result of the test video except for the first and last frames.</p>
<p>Observing <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, we can find that this method will miss some video frames with more prominent peaks due to parameter N and the optimization criteria and will extract video frames with smaller crest values. Nevertheless, the final key frame determined can capture the change of the target motion state. The key frames extracted from the 3<sup>rd</sup> frame to the 75<sup>th</sup> frame describe the movement process of the two targets exchanging positions before the battle, that is, the change of the global moving target. From frame 87 to frame 135, the four frames describe the changes in the target legs during the battle, which means the local motion state&#x2019;s variation. In frame 171, the white target knocked down the black target and fled. At this time, a third moving target appears. The four key frames extracted subsequently describe the global motion state changes of the second and third target&#x2019;s movement direction and speed.</p>
<p>Based on the proposed method analysis, the key frame extraction can be summarized as follows: The changes in the global and local motion state of the target captured by the frequency domain information can more accurately extract the local motion state change of the target. What is more, the correctness of the key frame extraction method based on the frequency domain analysis proposed is verified.</p>
<fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>The Key Frame Extraction Result of Test Video by the Proposed Method</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-10.png"/>
</fig>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Algorithm Effectiveness</title>
<p>This paper compares the four key frame extraction methods: MA, MTSS, ME, and the center offset-based method (denoted as CO) to verify the effectiveness of the proposed method. We set the experimental parameter to N &#x003D; 1 and set the center offset-based method&#x2019;s experimental parameter to N &#x003D; 5. We evaluate the extraction results of five key frame extraction methods from both subjective and objective aspects.</p>
<p>The following is a verification of the effectiveness of the proposed method from a subjective perspective. The number of key frames extracted from the same video by different methods is the same during the experiment to ensure objectivity. There are six key frames for the test video except for the first and last frames of the video. <xref ref-type="fig" rid="fig-11">Fig. 11</xref> shows the key frame extraction result of the proposed method of test video.</p>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Test Video Key Frame Extraction Result (a) Proposed Method (b) CO (c) MTSS (d) MA (e) ME</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-11.png"/>
</fig>
<p>Observing <xref ref-type="fig" rid="fig-11">Fig. 11</xref>, the five methods can extract the process of target entry and exit and put the bag. CO, MTSS, MA can all detect the appearance of the target in a relatively timely manner. However, compared with CO and other methods, the method proposed in this chapter is more sensitive to changes in the local movement of the target leg than other methods. Furthermore, it can clearly describe the movement state changes of the target in putting the bag.</p>
<p>Use SRD criteria to evaluate the effectiveness of the proposed method objectively. <xref ref-type="fig" rid="fig-12">Fig. 12</xref> shows the SRD of the proposed method and five key frame extraction methods of CO, MTSS, MA, and ME on test video.</p>
<fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Average SRD of Different Method</title>
</caption>
<graphic mimetype="image" mime-subtype="png" xlink:href="fig-12.png"/>
</fig>
<p>By observing the average SRD of different methods, we can find that when the key frame extraction ratio is 10%, the average SRD of the proposed method is about 0.5dB higher than that of CO. When the key frame extraction rate is 12%, the average SRD of the method proposed in this paper is that same as that of CO. In terms of overall trends, the method proposed in this paper has a higher advantage than the three methods of MA, MTSS, and ME. Therefore, as far as SRD is concerned, compared with other comparison methods, the method based on frequency domain analysis has more significant advantages, meanwhile, the ability to reconstruct video frames is more vital.</p>
<p>The correctness of the proposed key frame algorithm is verified by analyzing the results of extracting key frames from the test video. Meanwhile, the subjective and objective performance of the method proposed in this paper and other motion-related key frame extraction methods are compared on the test video to verify the effectiveness of the method. The experimental results analysis shows that the method based on frequency domain analysis captures the changes in the local motion state of the target more accurately than other methods. This method realizes the precise extraction of key frames of the surveillance video.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>The paper proposed a key frame extraction method based on frequency domain analysis to capture the changes in the local motion state of the target accurately. This method first performs a two-dimensional Fourier transform on the video frame to obtain the frequency spectrum and phase spectrum of the video frame. Then calculate the mean square error of the frequency spectrum and phase spectrum of the current frame and the previous frame, and process them separately. After processing, add the two to get the mean square error of the current frame, forming a mean square error curve. Then extract the candidate key frames and optimize them to obtain the final key frame. Experimental results show that the proposed method is correct and effective, and the extracted key frames can more accurately capture the changes in the target&#x2019;s local motion state.</p>
</sec>
</body>
<back><fn-group>
<fn fn-type="other">
<p><bold>Funding Statement: </bold>This work was supported by the National Nature Science Foundation of China (Grant Nos. 61702347, 61772225), Natural Science Foundation of Hebei Province (Grant Nos. F2017210161, F2018210148).</p>
</fn>
<fn fn-type="conflict">
<p><bold>Conflicts of Interest: </bold>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1">
<label>[1]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>S. E.</given-names> 
<surname>Avila</surname></string-name>, <string-name>
<given-names>A. B.</given-names> 
<surname>Lopes</surname></string-name>, <string-name>
<given-names>L. J.</given-names> 
<surname>Antonio</surname></string-name> and <string-name>
<given-names>A. A.</given-names> 
<surname>Arnaldo</surname></string-name>
</person-group>, &#x201C;
<article-title>VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method</article-title>,&#x201D; 
<source>Pattern Recognition Letters</source>, vol. 
<volume>32</volume>, no. 
<issue>1</issue>, pp. 
<fpage>56</fpage>&#x2013;
<lpage>68</lpage>, 
<year iso-8601-date="2011">2011</year>.</mixed-citation>
</ref>
<ref id="ref-2">
<label>[2]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>P.</given-names> 
<surname>Jiang</surname></string-name> and <string-name>
<given-names>X.</given-names> 
<surname>Qin</surname></string-name>
</person-group>, &#x201C;
<article-title>Key frame based on video summary using visual attention clues</article-title>,&#x201D; 
<source>IEEE Trans. Multimedia</source>, vol. 
<volume>17</volume>, no. 
<issue>2</issue>, pp. 
<fpage>64</fpage>&#x2013;
<lpage>73</lpage>, 
<year iso-8601-date="2010">2010</year>.</mixed-citation>
</ref>
<ref id="ref-3">
<label>[3]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>C. R.</given-names> 
<surname>Huang</surname></string-name>, <string-name>
<given-names>P. C.</given-names> 
<surname>Chung</surname></string-name>, <string-name>
<given-names>D. K.</given-names> 
<surname>Yang</surname></string-name>, <string-name>
<given-names>H. C.</given-names> 
<surname>Chen</surname></string-name> and <string-name>
<given-names>G. J.</given-names> 
<surname>Huang</surname></string-name>
</person-group>, &#x201C;
<article-title>Maximum a posteriori probability estimation for online surveillance video synopsis</article-title>,&#x201D; 
<source>IEEE Trans. on Circuits and Systems for Video Technology</source>, vol. 
<volume>24</volume>, no. 
<issue>8</issue>, pp. 
<fpage>1417</fpage>&#x2013;
<lpage>1429</lpage>, 
<year iso-8601-date="2014">2014</year>.</mixed-citation>
</ref>
<ref id="ref-4">
<label>[4]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>R.</given-names> 
<surname>Zhong</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Hu</surname></string-name>, <string-name>
<given-names>Z.</given-names> 
<surname>Wang</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Wang</surname></string-name>
</person-group>, &#x201C;
<article-title>Fast synopsis for moving objects using compressed video</article-title>,&#x201D; 
<source>IEEE Signal Processing Letters</source>, vol. 
<volume>21</volume>, no. 
<issue>7</issue>, pp. 
<fpage>834</fpage>&#x2013;
<lpage>838</lpage>, 
<year iso-8601-date="2014">2014</year>.</mixed-citation>
</ref>
<ref id="ref-5">
<label>[5]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Nie</surname></string-name>, <string-name>
<given-names>C.</given-names> 
<surname>Xiao</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Sun</surname></string-name> and <string-name>
<given-names>P.</given-names> 
<surname>Li</surname></string-name>
</person-group>, &#x201C;
<article-title>Compact video synopsis via global spatiotemporal optimization</article-title>,&#x201D; 
<source>IEEE Trans. on Visualization and Computer Graphics</source>, vol. 
<volume>19</volume>, no. 
<issue>10</issue>, pp. 
<fpage>1664</fpage>&#x2013;
<lpage>1676</lpage>, 
<year iso-8601-date="2013">2013</year>.</mixed-citation>
</ref>
<ref id="ref-6">
<label>[6]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>W.</given-names> 
<surname>Hu</surname></string-name>, <string-name>
<given-names>N.</given-names> 
<surname>Xie</surname></string-name>, <string-name>
<given-names>X. Z. L.</given-names> 
<surname>Li</surname></string-name> and <string-name>
<given-names>S.</given-names> 
<surname>Maybank</surname></string-name>
</person-group>, &#x201C;
<article-title>A survey on visual content-based video indexing and retrieval</article-title>,&#x201D; 
<source>IEEE Transactions on Systems</source>, vol. 
<volume>41</volume>, no. 
<issue>6</issue>, pp. 
<fpage>797</fpage>&#x2013;
<lpage>819</lpage>, 
<year iso-8601-date="2011">2011</year>.</mixed-citation>
</ref>
<ref id="ref-7">
<label>[7]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>N.</given-names> 
<surname>Ejaz</surname></string-name>, <string-name>
<given-names>I.</given-names> 
<surname>Mehmood</surname></string-name> and <string-name>
<given-names>S. W. </given-names>
<surname>Baik</surname></string-name>
</person-group>, &#x201C;
<article-title>Efficient visual attention-based framework for extracting key frames from videos</article-title>,&#x201D; 
<source>Signal Processing: Image Communication</source>, vol. 
<volume>28</volume>, no. 
<issue>1</issue>, pp. 
<fpage>34</fpage>&#x2013;
<lpage>44</lpage>, 
<year iso-8601-date="2013">2013</year>.</mixed-citation>
</ref>
<ref id="ref-8">
<label>[8]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>L. J.</given-names> 
<surname>Lai</surname></string-name> and <string-name>
<given-names>Y.</given-names> 
<surname>Yi</surname></string-name>
</person-group>, &#x201C;
<article-title>Key frame extraction based on visual attention model</article-title>,&#x201D; 
<source>Journal of Visual Communication and Image Representation</source>, vol. 
<volume>23</volume>, no. 
<issue>1</issue>, pp. 
<fpage>114</fpage>&#x2013;
<lpage>125</lpage>, 
<year iso-8601-date="2012">2012</year>.</mixed-citation>
</ref>
<ref id="ref-9">
<label>[9]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>S.</given-names> 
<surname>Luo</surname></string-name>, <string-name>
<given-names>S. J.</given-names> 
<surname>Ma</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Liang</surname></string-name> and <string-name>
<given-names>L. M.</given-names> 
<surname>Pan</surname></string-name>
</person-group>, &#x201C;
<article-title>Method of key frame extraction based on sub-shot cluster</article-title>,&#x201D; 
<source>IEEE Int. Conf. on Progress in Informatics and Computing</source> , vol. 
<volume>31</volume>, no. 
<issue>3</issue>, pp. 
<fpage>348</fpage>&#x2013;
<lpage>352</lpage>, 
<year iso-8601-date="2011">2011</year>.</mixed-citation>
</ref>
<ref id="ref-10">
<label>[10]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>H.</given-names> 
<surname>Zeng</surname></string-name> and <string-name>
<given-names>H. H.</given-names> 
<surname>Yang</surname></string-name>
</person-group>, &#x201C;
<article-title>A two phase video keyframe extraction method</article-title>,&#x201D; 
<source>Computer and Modernization</source>, no. 
<volume>6</volume>, pp. 
<fpage>33</fpage>&#x2013;
<lpage>35</lpage>, 
<year iso-8601-date="2011">2011</year>.</mixed-citation>
</ref>
<ref id="ref-11">
<label>[11]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>K. W.</given-names> 
<surname>Sze</surname></string-name>, <string-name>
<given-names>K. M.</given-names> 
<surname>Lam</surname></string-name> and <string-name>
<given-names>G. P.</given-names> 
<surname>Qiu</surname></string-name>
</person-group>, &#x201C;
<article-title>A new key frame representation for video segment retrieval</article-title>,&#x201D; 
<source>IEEE Trans. on Circuits and Systems for Video Technology</source>, vol. 
<volume>15</volume>, no. 
<issue>9</issue>, pp. 
<fpage>1148</fpage>&#x2013;
<lpage>1155</lpage>, 
<year iso-8601-date="2005">2005</year>.</mixed-citation>
</ref>
<ref id="ref-12">
<label>[12]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>W.</given-names> 
<surname>Wolf</surname></string-name>
</person-group>, &#x201C;
<article-title>Key frame selection by motion analysis</article-title>,&#x201D; in <conf-name>IEEE Int. Conf. on Acoustics, Speech and Signal Processing</conf-name>, 
<publisher-loc>Atlanta, GA</publisher-loc>, Vol. 
<volume>2</volume>, pp. 
<fpage>1228</fpage>&#x2013;
<lpage>1231</lpage>, 
<year iso-8601-date="1996">1996</year>. </mixed-citation>
</ref>
<ref id="ref-13">
<label>[13]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>T.</given-names> 
<surname>Liu</surname></string-name>, <string-name>
<given-names>H. J.</given-names> 
<surname>Zhang</surname></string-name> and <string-name>
<given-names>F.</given-names> 
<surname>Qi</surname></string-name>
</person-group>, &#x201C;
<article-title>A novel video key frame extraction algorithm based on perceived motion energy model</article-title>,&#x201D; 
<source>IEEE Trans. on Circuits and Systems for Video Technology</source>, vol. 
<volume>13</volume>, no. 
<issue>10</issue>, pp. 
<fpage>1006</fpage>&#x2013;
<lpage>1013</lpage>, 
<year iso-8601-date="2003">2003</year>.</mixed-citation>
</ref>
<ref id="ref-14">
<label>[14]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>T. Y.</given-names> 
<surname>Liu</surname></string-name>, <string-name>
<given-names>X. D.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Feng</surname></string-name> and <string-name>
<given-names>K. T.</given-names> 
<surname>Lo</surname></string-name>
</person-group>, &#x201C;
<article-title>Shot reconstruction degree: A novel criterion for key frame selection</article-title>,&#x201D; 
<source>Pattern Recognition Letters</source>, vol. 
<volume>25</volume>, no. 
<issue>1</issue>, pp. 
<fpage>1451</fpage>&#x2013;
<lpage>1457</lpage>, 
<year iso-8601-date="2004">2004</year>.</mixed-citation>
</ref>
<ref id="ref-15">
<label>[15]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y. Z.</given-names> 
<surname>Ma</surname></string-name>, <string-name>
<given-names>Y. L.</given-names> 
<surname>Chang</surname></string-name> and <string-name>
<given-names>H.</given-names> 
<surname>Yuan</surname></string-name>
</person-group>, &#x201C;
<article-title>Key-frame extraction based on motion acceleration</article-title>,&#x201D; 
<source>Optical Engineering</source>, vol. 
<volume>47</volume>, no. 
<issue>9</issue>, pp. 
<fpage>1</fpage>&#x2013;
<lpage>3</lpage>, 
<year iso-8601-date="2008">2008</year>.</mixed-citation>
</ref>
<ref id="ref-16">
<label>[16]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>C.</given-names> 
<surname>Li</surname></string-name>, <string-name>
<given-names>Y. T.</given-names> 
<surname>Wu</surname></string-name>, <string-name>
<given-names>S. S.</given-names> 
<surname>Yu</surname></string-name> and <string-name>
<given-names>T.</given-names> 
<surname>Chen</surname></string-name>
</person-group>, &#x201C;
<article-title>Motion-focusing key frame extraction and video summarization for lane surveillance system</article-title>,&#x201D; in <conf-name>Proc. 16th IEEE ICIP</conf-name>, <conf-loc>Cairo</conf-loc>, pp. 
<fpage>4329</fpage>&#x2013;
<lpage>4332</lpage>, 
<year iso-8601-date="2009">2009</year>. </mixed-citation>
</ref>
<ref id="ref-17">
<label>[17]</label><mixed-citation publication-type="conf-proc">
<person-group person-group-type="author"><string-name>
<given-names>T.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Wu</surname></string-name> and <string-name>
<given-names>L.</given-names> 
<surname>Chen</surname></string-name>
</person-group>, &#x201C;
<article-title>An approach to video key-frame extraction based on rough set</article-title>,&#x201D; in <conf-name>2007 Int. Conf. on Multimedia and Ubiquitous Engineering</conf-name>, 
<publisher-loc>Seoul</publisher-loc>, pp. 
<fpage>590</fpage>&#x2013;
<lpage>596</lpage>, 
<year iso-8601-date="2007">2007</year>. </mixed-citation>
</ref>
<ref id="ref-18">
<label>[18]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>J. M.</given-names> 
<surname>Zhang</surname></string-name> and <string-name>
<given-names>X. J.</given-names> 
<surname>Jiang</surname></string-name>
</person-group>, &#x201C;
<article-title>Key frame extraction based on particle swarm optimization</article-title>,&#x201D; 
<source>Journal of Computer Applications</source>, vol. 
<volume>31</volume>, no. 
<issue>2</issue>, pp. 
<fpage>358</fpage>&#x2013;
<lpage>361</lpage>, 
<year iso-8601-date="2011">2011</year>.</mixed-citation>
</ref>
<ref id="ref-19">
<label>[19]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>S. C.</given-names> 
<surname>Raikwar</surname></string-name>, <string-name>
<given-names>C.</given-names> 
<surname>Bhatnagar</surname></string-name> and <string-name>
<given-names>A. S.</given-names> 
<surname>Jalal</surname></string-name>
</person-group>, &#x201C;
<article-title>A framework for key frame extraction from surveillance video</article-title>,&#x201D; 
<source>Proc. 5th IEEE ICCCT</source>, 
<publisher-loc>Allahabad</publisher-loc>, pp. 
<fpage>297</fpage>&#x2013;
<lpage>300</lpage>, 
<year iso-8601-date="2014">2014</year>.</mixed-citation>
</ref>
<ref id="ref-20">
<label>[20]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Z.</given-names> 
<surname>Cernekova</surname></string-name>, <string-name>
<given-names>I.</given-names> 
<surname>Pitas</surname></string-name> and <string-name>
<given-names>C.</given-names> 
<surname>Nikou</surname></string-name>
</person-group>, &#x201C;
<article-title>Information theory-based shot cut/fade detection and video summarization</article-title>,&#x201D; 
<source>IEEE Trans. on Circuits and Systems for Video Technology</source>, vol. 
<volume>16</volume>, no. 
<issue>1</issue>, pp. 
<fpage>82</fpage>&#x2013;
<lpage>91</lpage>, 
<year iso-8601-date="2006">2006</year>.</mixed-citation>
</ref>
<ref id="ref-21">
<label>[21]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Y.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>R.</given-names> 
<surname>Tao</surname></string-name> and <string-name>
<given-names>Y.</given-names> 
<surname>Wang</surname></string-name>
</person-group>, &#x201C;
<article-title>Motion state adaptive video summarization via spatiotemporal analysis</article-title>,&#x201D; 
<source>IEEE Trans. on Circuits and Systems for Video Technology</source>, vol. 
<volume>27</volume>, no. 
<issue>6</issue>, pp. 
<fpage>1340</fpage>&#x2013;
<lpage>1352</lpage>, 
<year iso-8601-date="2017">2017</year>.</mixed-citation>
</ref>
<ref id="ref-22">
<label>[22]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>X. L.</given-names> 
<surname>Fan</surname></string-name>, <string-name>
<given-names>J. J.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>Z. H.</given-names> 
<surname>Jia</surname></string-name> and <string-name>
<given-names>X. S.</given-names> 
<surname>Cai</surname></string-name>
</person-group>, &#x201C;
<article-title>Estimation of water droplet movement direction based on frequency-domain anlysis</article-title>,&#x201D; 
<source>China Powder Technology</source>, vol. 
<volume>15</volume>, no. 
<issue>4</issue>, pp. 
<fpage>63</fpage>&#x2013;
<lpage>65</lpage>, 
<year iso-8601-date="2009">2009</year>.</mixed-citation>
</ref>
<ref id="ref-23">
<label>[23]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>Q. R.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>G. C.</given-names> 
<surname>Gu</surname></string-name>, <string-name>
<given-names>H. B.</given-names> 
<surname>Liu</surname></string-name> and <string-name>
<given-names>H. M.</given-names> 
<surname>Xiao</surname></string-name>
</person-group>, &#x201C;
<article-title>Salient region detection using multi-scale analysis in the frequency domain</article-title>,&#x201D; 
<source>Journal of Harbin Engineering University</source>, vol. 
<volume>31</volume>, no. 
<issue>3</issue>, pp. 
<fpage>361</fpage>&#x2013;
<lpage>365</lpage>, 
<year iso-8601-date="2010">2010</year>.</mixed-citation>
</ref>
<ref id="ref-24">
<label>[24]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>X. D.</given-names> 
<surname>Yang</surname></string-name>
</person-group>, &#x201C;
<article-title>Location algorithm for image matching based on frequency domain analysis</article-title>,&#x201D; 
<source>Communication power technology</source>, vol. 
<volume>29</volume>, no. 
<issue>4</issue>, pp. 
<fpage>20</fpage>&#x2013;
<lpage>22&#x002B;43</lpage>, 
<year iso-8601-date="2012">2012</year>.</mixed-citation>
</ref>
<ref id="ref-25">
<label>[25]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>R. Y.</given-names> 
<surname>Chen</surname></string-name>, <string-name>
<given-names>L. L.</given-names> 
<surname>Pan</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Zhou</surname></string-name> and <string-name>
<given-names>Q. H.</given-names> 
<surname>Lei</surname></string-name>
</person-group>, &#x201C;
<article-title>Image retrieval based on deep feature extraction and reduction with improved CNN and PCA</article-title>,&#x201D; 
<source>Journal of Information Hiding and Privacy Protection</source>, vol. 
<volume>2</volume>, no. 
<issue>2</issue>, pp. 
<fpage>67</fpage>&#x2013;
<lpage>76</lpage>, 
<year iso-8601-date="2020">2020</year>.</mixed-citation>
</ref>
<ref id="ref-26">
<label>[26]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>J.</given-names> 
<surname>Niu</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Jiang</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Fu</surname></string-name>, <string-name>
<given-names>T.</given-names> 
<surname>Zhang</surname></string-name> and <string-name>
<given-names>N.</given-names> 
<surname>Masini</surname></string-name>
</person-group>, &#x201C;
<article-title>Image deblurring of video surveillance system in rainy environment</article-title>,&#x201D; 
<source>Computers, Materials &#x0026; Continua</source>, vol. 
<volume>65</volume>, no. 
<issue>1</issue>, pp. 
<fpage>807</fpage>&#x2013;
<lpage>816</lpage>, 
<year iso-8601-date="2020">2020</year>.</mixed-citation>
</ref>
<ref id="ref-27">
<label>[27]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>A.</given-names> 
<surname>Gumaei</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Al-Rakhami</surname></string-name> and <string-name>
<given-names>H.</given-names> 
<surname>AlSalman</surname></string-name>
</person-group>, &#x201C;
<article-title>Dl-har: Deep learning-based human activity recognition framework for edge computing</article-title>,&#x201D; 
<source>Computers, Materials &#x0026; Continua</source>, vol. 
<volume>65</volume>, no. 
<issue>2</issue>, pp. 
<fpage>1033</fpage>&#x2013;
<lpage>1057</lpage>, 
<year iso-8601-date="2020">2020</year>.</mixed-citation>
</ref>
<ref id="ref-28">
<label>[28]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>W.</given-names> 
<surname>Song</surname></string-name>, <string-name>
<given-names>J.</given-names> 
<surname>Yu</surname></string-name>, <string-name>
<given-names>X.</given-names> 
<surname>Zhao</surname></string-name> and <string-name>
<given-names>A.</given-names> 
<surname>Wang</surname></string-name>
</person-group>, &#x201C;
<article-title>Research on action recognition and content analysis in videos based on DNN and MLN</article-title>,&#x201D; 
<source>Computers, Materials &#x0026; Continua</source>, vol. 
<volume>61</volume>, no. 
<issue>3</issue>, pp. 
<fpage>1189</fpage>&#x2013;
<lpage>1204</lpage>, 
<year iso-8601-date="2019">2019</year>.</mixed-citation>
</ref>
<ref id="ref-29">
<label>[29]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>C.</given-names> 
<surname>Zhu</surname></string-name>, <string-name>
<given-names>Y. K.</given-names> 
<surname>Wang</surname></string-name>, <string-name>
<given-names>D. B.</given-names> 
<surname>Pu</surname></string-name>, <string-name>
<given-names>M.</given-names> 
<surname>Qi</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Sun</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Multi-modality video representation for action recognition</article-title>,&#x201D; 
<source>Journal on Big Data</source>, vol. 
<volume>2</volume>, no. 
<issue>3</issue>, pp. 
<fpage>95</fpage>&#x2013;
<lpage>104</lpage>, 
<year iso-8601-date="2020">2020</year>.</mixed-citation>
</ref>
<ref id="ref-30">
<label>[30]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>F.</given-names> 
<surname>Dou</surname></string-name>
</person-group>, &#x201C;
<article-title>The research of traffic flow detection system based on image frequency spectrum analysis</article-title>,&#x201D; 
<comment>M.S. dissertation</comment>. 
<publisher-name>China University of Petroleum</publisher-name>, 
<publisher-loc>China</publisher-loc>, 
<year iso-8601-date="2017">2017</year>. </mixed-citation>
</ref>
<ref id="ref-31">
<label>[31]</label><mixed-citation publication-type="other">
<person-group person-group-type="author"><string-name>
<given-names>Z. S.</given-names> 
<surname>Tang</surname></string-name>
</person-group>, &#x201C;
<article-title>Image quality assessment based on deep learning, spatial and frequency domain analysis</article-title>,&#x201D; 
<comment>M.S. dissertation</comment>. 
<publisher-name>Xi&#x2019;an University of Technology</publisher-name>, 
<publisher-loc>China</publisher-loc>, 
<year iso-8601-date="2019">2019</year>. </mixed-citation>
</ref>
<ref id="ref-32">
<label>[32]</label><mixed-citation publication-type="book">
<person-group person-group-type="author"><string-name>
<given-names>D. C.</given-names> 
<surname>Ghiglia</surname></string-name> and <string-name>
<given-names>M. D.</given-names> 
<surname>Pritt</surname></string-name>
</person-group>, 
<source>Two-Dimensional Phase Unwrapping: Theory, Algorithms, and Software</source>. 
<publisher-loc>Hoboken, NJ</publisher-loc>: 
<publisher-name>Wiley</publisher-name>, 
<year iso-8601-date="1998">1998</year>.</mixed-citation>
</ref>
<ref id="ref-33">
<label>[33]</label><mixed-citation publication-type="book">
<person-group person-group-type="author">
<collab>Gonzalez</collab>
</person-group>, &#x201C;
<source>Digital Image Processing</source>,&#x201D; 
<publisher-loc>Beijing</publisher-loc>: 
<publisher-name>Electronic Industry Press</publisher-name>, 
<year iso-8601-date="2005">2005</year>.</mixed-citation>
</ref>
<ref id="ref-34">
<label>[34]</label><mixed-citation publication-type="journal">
<person-group person-group-type="author"><string-name>
<given-names>G.</given-names> 
<surname>Zhang</surname></string-name>, <string-name>
<given-names>H.</given-names> 
<surname>Sun</surname></string-name>, <string-name>
<given-names>Y.</given-names> 
<surname>Zheng</surname></string-name>, <string-name>
<given-names>G.</given-names> 
<surname>Xia</surname></string-name>, <string-name>
<given-names>L.</given-names> 
<surname>Feng</surname></string-name> <etal>et al.</etal>
</person-group><italic>,</italic> &#x201C;
<article-title>Optimal discriminative projection for sparse representation-based classification via bilevel optimization</article-title>,&#x201D; 
<source>IEEE Trans. on Circuits and Systems for Video Technology</source>, vol. 
<volume>30</volume>, no. 
<issue>4</issue>, pp. 
<fpage>1065</fpage>&#x2013;
<lpage>1077</lpage>, 
<year iso-8601-date="2020">2020</year>.</mixed-citation>
</ref>
</ref-list>
</back>
</article>