<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">56318</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2024.056318</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Improving Generalization for Hyperspectral Image Classification: The Impact of Disjoint Sampling on Deep Models</article-title>
<alt-title alt-title-type="left-running-head">Improving Generalization for Hyperspectral Image Classification: The Impact of Disjoint Sampling on Deep Models</alt-title>
<alt-title alt-title-type="right-running-head">Improving Generalization for Hyperspectral Image Classification: The Impact of Disjoint Sampling on Deep Models</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Ahmad</surname><given-names>Muhammad</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>mahmad00@gmail.com</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Mazzara</surname><given-names>Manuel</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Distefano</surname><given-names>Salvatore</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Khan</surname><given-names>Adil Mehmood</given-names></name><xref ref-type="aff" rid="aff-4">4</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Altuwaijri</surname><given-names>Hamad Ahmed</given-names></name><xref ref-type="aff" rid="aff-5">5</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Computer Science, National University of Computer and Emerging Sciences</institution>, <addr-line>Chiniot, 35400</addr-line>, <country>Pa</country><country>kistan</country></aff>
<aff id="aff-2"><label>2</label><institution>Institute of Software Development and Engineering, Innopolis University</institution>, <addr-line>Innopolis, 420500</addr-line>, <country>Russia</country></aff>
<aff id="aff-3"><label>3</label><institution>Dipartimento di Matematica e Informatica&#x2014;MIFT, University of Messina</institution>, <addr-line>Messina, 98121</addr-line>, <country>Italy</country></aff>
<aff id="aff-4"><label>4</label><institution>School of Computer Science, University of Hull</institution>, <addr-line>Hull, HU6 7RX</addr-line>, <country>UK</country></aff>
<aff id="aff-5"><label>5</label><institution>Department of Geography, College of Humanities and Social Sciences, King Saud University</institution>, <addr-line>Riyadh, 11451</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Muhammad Ahmad. Email: <email>mahmad00@gmail.com</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2024</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>15</day><month>10</month><year>2024</year></pub-date>
<volume>81</volume>
<issue>1</issue>
<fpage>503</fpage>
<lpage>532</lpage>
<history>
<date date-type="received">
<day>11</day>
<month>7</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>8</month>
<year>2024</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2024 The Authors.</copyright-statement>
<copyright-year>2024</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_56318.pdf"></self-uri>
<abstract>
<p>Disjoint sampling is critical for rigorous and unbiased evaluation of state-of-the-art (SOTA) models e.g., Attention Graph and Vision Transformer. When training, validation, and test sets overlap or share data, it introduces a bias that inflates performance metrics and prevents accurate assessment of a model&#x2019;s true ability to generalize to new examples. This paper presents an innovative disjoint sampling approach for training SOTA models for the Hyperspectral Image Classification (HSIC). By separating training, validation, and test data without overlap, the proposed method facilitates a fairer evaluation of how well a model can classify pixels it was not exposed to during training or validation. Experiments demonstrate the approach significantly improves a model&#x2019;s generalization compared to alternatives that include training and validation data in test data (A trivial approach involves testing the model on the entire Hyperspectral dataset to generate the ground truth maps. This approach produces higher accuracy but ultimately results in low generalization performance). Disjoint sampling eliminates data leakage between sets and provides reliable metrics for benchmarking progress in HSIC. Disjoint sampling is critical for advancing SOTA models and their real-world application to large-scale land mapping with Hyperspectral sensors. Overall, with the disjoint test set, the performance of the deep models achieves 96.36% accuracy on Indian Pines data, 99.73% on Pavia University data, 98.29% on University of Houston data, 99.43% on Botswana data, and 99.88% on Salinas data.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Hyperspectral image classification</kwd>
<kwd>disjoint sampling</kwd>
<kwd>Graph CNN</kwd>
<kwd>spatial-spectral transformer</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>King Saud University</funding-source>
<award-id>RSPD2024R848</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Hyperspectral Imaging (HSI) plays a pivotal role in various domains such as remote sensing [<xref ref-type="bibr" rid="ref-1">1</xref>], earth observation [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>], urban planning [<xref ref-type="bibr" rid="ref-4">4</xref>], agriculture [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-6">6</xref>], forestry [<xref ref-type="bibr" rid="ref-7">7</xref>], target/object detection [<xref ref-type="bibr" rid="ref-8">8</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>], mineral exploration [<xref ref-type="bibr" rid="ref-10">10</xref>], environmental monitoring [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>], climate change [<xref ref-type="bibr" rid="ref-13">13</xref>] food processing, bakery products, bloodstain identification, and meat processing.</p>
<p>Hyperspectral (HS) remote sensing plays a crucial role in urban planning by providing detailed insights and tools for efficient and informed decision-making [<xref ref-type="bibr" rid="ref-14">14</xref>]. HS remote sensors capture and analyze high-resolution spectral data across numerous narrow and contiguous spectral bands, offering comprehensive information on the composition and characteristics of urban environments [<xref ref-type="bibr" rid="ref-15">15</xref>]. HS data enables precise identification and mapping of various urban land cover types, such as vegetation, impervious surfaces, and soil [<xref ref-type="bibr" rid="ref-16">16</xref>]. Additionally, HS Imagining (HSI) facilitates the detection and monitoring of changes in land use, vegetation health, and pollution levels within urban areas [<xref ref-type="bibr" rid="ref-17">17</xref>]. These capabilities enhance urban planners&#x2019; ability to assess the impact of urbanization, analyze urban metabolism, and evaluate the effectiveness of sustainability measures. By covering the entire processing chain, from data acquisition to analysis, HS remote sensing serves as a valuable tool for urban planners seeking a deeper understanding of urban environments and their dynamics.</p>
<p>HSI presents both challenges and opportunities for effective classification [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>]. In recent years, Convolutional Neural Networks (CNNs) [<xref ref-type="bibr" rid="ref-20">20</xref>], Attention Graph [<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>], and Spatial-Spectral Transformers [<xref ref-type="bibr" rid="ref-23">23</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>] have demonstrated remarkable success in various tasks, prompting researchers to explore their potential in HSI analysis [<xref ref-type="bibr" rid="ref-25">25</xref>]. However, achieving robust and reliable classification results requires careful consideration of data sampling techniques [<xref ref-type="bibr" rid="ref-26">26</xref>]. Random sampling for data splitting can lead to several issues. It can result in non-representative training, validation, and test sets, causing models to overfit or underfit. Different random splits produce inconsistent results, making it hard to draw meaningful conclusions [<xref ref-type="bibr" rid="ref-27">27</xref>]. Random sampling offers no control over data distribution, introducing bias in imbalanced datasets [<xref ref-type="bibr" rid="ref-28">28</xref>]. It hinders the reproducibility of experimental results and limits the exploration of data relationships. To address these challenges, disjoint sampling is a crucial yet often overlooked consideration when evaluating spatial-spectral Hyperspectral Image Classification (HSIC) models. As demonstrated by the works [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-29">29</xref>&#x2013;<xref ref-type="bibr" rid="ref-33">33</xref>], traditional evaluations using overlapping training and test samples can lead to biased results and unfair assessments of model performance.</p>
<p>Even though several methodologies meticulously employ disjoint sets for training and testing their models, there&#x2019;s a notable inconsistency in their approach when it comes to generating land-cover maps [<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-32">32</xref>]. Specifically, many of these methods deviate from the disjoint sampling principle by utilizing the entire dataset for HSIC (Thematic Maps). This practice introduces a conflict between the reported accuracy and the methodology employed. To address this inconsistency, it is essential to advocate for the use of a disjoint test set exclusively for generating land-cover maps. By doing so, the evaluation process aligns more closely with the principles of unbiased model assessment. It ensures that the model is confronted with truly unseen data during the map generation phase, fostering a more accurate representation of its real-world performance.</p>
<p>Moreover, disjoint sampling is essential for training and evaluating deep models [<xref ref-type="bibr" rid="ref-34">34</xref>,<xref ref-type="bibr" rid="ref-35">35</xref>]. This method involves carefully selecting diverse and representative samples from various regions, land cover types, and environmental conditions to overcome biased or non-representative training data limitations [<xref ref-type="bibr" rid="ref-36">36</xref>]. It ensures the model learns robust features, enhancing classification performance and adaptability to unseen data. Additionally, disjoint sampling facilitates fair and accurate model evaluation by keeping training, validation, and testing samples separate [<xref ref-type="bibr" rid="ref-37">37</xref>]. Furthermore, disjoint sampling is crucial in training SOTA models for HSIC, notably for CNN and Spatial-Spectral Transformer-based models. It enhances generalization, ensures fair evaluation, and enables result interpretability. The use of disjoint training, validation, and test samples is imperative in HSIC for various reasons, such as:</p>
<p><bold>Unbiased Evaluation:</bold> It is crucial to evaluate HSIC models using completely separate and disjoint data for training, validation, and testing in order to properly assess a model&#x2019;s true ability to generalize to new unknown examples [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-29">29</xref>].</p>
<p><bold>Preventing Data Leakage and Mitigating Overfitting:</bold> Maintaining disjoint samples for training, validation, and testing is crucial to obtaining an accurate evaluation of a model&#x2019;s true generalization performance [<xref ref-type="bibr" rid="ref-22">22</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>]. Employing disjoint subsets of the data at each stage of model development is pivotal in augmenting generalization performance. Through iterative training on distinct partitions, the model is compelled to infer underlying patterns shared across diverse examples, rather than being influenced by potentially misleading idiosyncrasies within a single fixed training sample [<xref ref-type="bibr" rid="ref-38">38</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>]. This practice discourages the memorization of irrelevant characteristics specific to individual data samples. Instead, it fosters the capability to effectively process a wider range of presentations, including both seen and unseen examples.</p>
<p><bold><italic>Therefore, considering the above, this paper made the following contributions:</italic></bold>
<list list-type="order">
<list-item><p>This paper presents a novel approach for disjoint train, validation, and test splits for HSIC. Ensuring the disjoint splits eliminates data leakage between subsets, which can bias performance evaluations. The proposed technique provides a practical implementation for creating disjoint train, validation, and test splits from ground truth data. This allows researchers to obtain unbiased performance evaluations and reliable comparisons between HSIC models.</p></list-item>
<list-item><p>By offering a standardized approach for creating evaluation splits, the proposed technique enhances the reproducibility and transparency of HSIC research. It fosters a more rigorous and standardized evaluation of classification models. The source code can be accessed at: <ext-link ext-link-type="uri" xlink:href="https://github.com/mahmad00/Disjoint-Sampling-for-Hyperspectral-Image-Classification">https://github.com/mahmad00/Disjoint-Sampling-for-Hyperspectral-Image-Classification</ext-link> (accessed on 14 June 2024).</p></list-item>
</list></p>
</sec>
<sec id="s2">
<label>2</label>
<title>Mathematical Formulation</title>
<p>Let&#x2019;s consider HSI composed of <italic>B</italic> spectral bands, each with a spatial resolution of <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>H</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>W</mml:mi></mml:math></inline-formula> pixels. The HSI data cube, denoted as <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>X</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>H</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>W</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, is initially partitioned into overlapping 3D patches [<xref ref-type="bibr" rid="ref-24">24</xref>,<xref ref-type="bibr" rid="ref-40">40</xref>]. Each patch is centered at a spatial location <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and covers a spatial extent of <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>S</mml:mi></mml:math></inline-formula> (PS &#x003D; Patch size) pixels across all <italic>B</italic> bands. The total number of 3D patches (<italic>N</italic>) extracted from <italic>X</italic> (i.e., <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>X</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>B</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>) is given by <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mo stretchy="false">(</mml:mo><mml:mi>H</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>S</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x00D7;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>W</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>S</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>. A patch located at <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mo stretchy="false">(</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is represented as <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and spans spatially from <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle></mml:math></inline-formula> to <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>&#x03B1;</mml:mi><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle></mml:math></inline-formula> in width and <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle></mml:math></inline-formula> to <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>&#x03B2;</mml:mi><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mi>S</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mn>2</mml:mn></mml:mfrac></mml:mstyle></mml:math></inline-formula> in height. The labeling of these patches is determined by the label assigned to the central pixel within each patch as described in Algorithm 1.</p>
<fig id="fig-13">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-13.tif"/>
</fig>
<p>The 3D patches extracted from the HSI are used to generate separate training, validation, and test sets using the proposed splitting algorithm. The key algorithm, titled &#x201C;Disjoint Train, Validation, and Test Split&#x201D;, handles dividing the HSI data into the respective portions. It takes the Ground Truth (GT) labels and ratios for the test and validation sets (teRatio and vrRatio) as inputs. The unique values in the GT labels and their frequency counts are identified, excluding zeros (background pixel labels). An iterative process is then used to create disjoint training, validation, and test sets based on these unique values and their indices. The resulting indices are utilized to extract and organize the corresponding Hyperspectral cubes and labels for each set. This ensures the subsets are separate while maintaining the integrity of spectral classes during model training and evaluation. The algorithm outputs the training, validation, and test samples along with their matching class labels. This partitioning approach contributes to the robustness and reliability of the subsequent analysis.</p>
<fig id="fig-14">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-14.tif"/>
</fig>
<p>Let us consider that <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>n</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mi>m</mml:mi></mml:math></inline-formula>, and <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>p</mml:mi></mml:math></inline-formula> represent the finite numbers of labeled training, validation, and test samples, respectively, selected from patch data to form the training, validation, and test sets as shown in <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref>&#x2013;<xref ref-type="disp-formula" rid="eqn-3">(3)</xref>:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>n</mml:mi></mml:msubsup></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>D</mml:mi><mml:mi>V</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>m</mml:mi></mml:msubsup></mml:math></disp-formula>
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>p</mml:mi></mml:msubsup></mml:math></disp-formula>where <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>n</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>m</mml:mi></mml:math></inline-formula> are the total number of training and validation samples. The remaining <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>p</mml:mi></mml:math></inline-formula> samples constitute the test set. It is important to note that the intersection of the training set, validation set, and test set is an empty set (<inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mi>&#x03D5;</mml:mi></mml:math></inline-formula>), ensuring the distinctiveness of the samples in each set as shown in Algorithm 2, <xref ref-type="fig" rid="fig-1">Fig. 1</xref> and <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Initially, the HSI cube is divided into overlapping 3D patches, as detailed in Algorithm 1 and Stage 1. Each patch is centered at a spatial point and spans a <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>S</mml:mi></mml:math></inline-formula> pixel extent across all spectral bands. These patches are then utilized in Algorithm 2 to create disjoint training, validation, and test splits based on the geographical locations of the HSI samples, as outlined in Stage 2. The selected samples are fed into various models for feature learning and optimization. The processed features are subsequently passed through a fully connected layer for classification, and the softmax function is applied to generate class probability distributions. These distributions are used to create the final ground truth maps for the disjoint validation, disjoint test, and full HSI test sets, as illustrated in Stage 3. The red dotted lines delineate the stages in the proposed workflow</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-1.tif"/>
</fig>
<p><disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mi>R</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mi>V</mml:mi></mml:msub><mml:mo>&#x2229;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>T</mml:mi><mml:mi>E</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03D5;</mml:mi></mml:math></disp-formula></p>
 
<p>As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, <bold>Stage 1: 3D Patch Extraction:</bold> The HSI cube is initially divided into overlapping 3D patches. Each patch is centered at a spatial point and spans a <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi>P</mml:mi><mml:mi>S</mml:mi><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>S</mml:mi></mml:math></inline-formula> pixel extent across all spectral bands. This process is outlined in Algorithm 1. <bold>Stage 2: Data Splitting:</bold> The extracted patches are used in Algorithm 2 to create disjoint training, validation, and test splits. This splitting is based on the geographical locations of the HSI samples, ensuring that each set covers distinct areas. <bold>Stage 3: Feature Learning and Classification:</bold> The samples from each split (training, validation, and test) are fed into various models for feature learning and optimization. The learned features are passed through a fully connected layer for classification. The softmax function is then applied to generate class probability distributions. The class probability distributions are used to generate the final ground truth maps for the disjoint validation set, disjoint test set, and the full HSI test set. Each of these stages is marked with red dotted lines in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, facilitating a clear and systematic replication of the workflow.</p>
<p>The above disjoint samples (as explained in Algorithm 2) are then processed by the baseline 2D, 3D CNN, and Spatial-Spectral Transformer models [<xref ref-type="bibr" rid="ref-41">41</xref>,<xref ref-type="bibr" rid="ref-42">42</xref>]. In a 2D CNN, the input data undergoes convolution with a 2D kernel function, resulting in the computation of the dot product between the input and the kernel function. The kernel is then applied in a strided manner over the input to cover the entire spatial dimension. Subsequently, the convolved features are subjected to an activation function, which introduces non-linearity into the model, aiding in the learning of non-linear features from the data. For 2D convolution, the activation value of the <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msup><mml:mi>j</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> feature map at spatial location <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> in the <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> layer, denoted by <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula>, can be expressed as shown in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>.
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>&#x2131;</mml:mi></mml:mrow><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.047em" minsize="2.047em">(</mml:mo></mml:mrow></mml:mstyle><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03C1;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03B3;</mml:mi></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03C3;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B4;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03B4;</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C3;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03C1;</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03C1;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.047em" minsize="2.047em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula>where <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mrow><mml:mi>&#x2131;</mml:mi></mml:mrow></mml:math></inline-formula> represents the activation function, <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> signifies the number of feature maps at the <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> layer, while the depth of the kernel <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> pertains to the <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msup><mml:mi>j</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> feature map at the <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> layer. Additionally, <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the bias parameter for the <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msup><mml:mi>j</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> feature map at the <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> layer, with <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mn>2</mml:mn><mml:mi>&#x03B3;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mn>2</mml:mn><mml:mi>&#x03C3;</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> representing the width and height of the kernel, respectively. In contrast, the 3D convolutional process initially calculates the sum of the dot product between input patches and the 3D kernel function, wherein the 3D input patches undergo convolution with the 3D kernel function [<xref ref-type="bibr" rid="ref-40">40</xref>,<xref ref-type="bibr" rid="ref-43">43</xref>]. Subsequently, these feature maps are subjected to an activation function to introduce non-linearity. The Hybrid model generates feature maps of the 3D convolutional layer by applying the 3D kernel function across <italic>B</italic> spectral bands, which are extracted post-dimensionality reduction, in the input layer. For the 3D convolutional process, the activation value at spatial location <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> in the <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:msup><mml:mi>i</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> layer and <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msup><mml:mi>j</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>h</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> feature map can be expressed as in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>:
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mrow><mml:mi>&#x2131;</mml:mi></mml:mrow><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.047em" minsize="2.047em">(</mml:mo></mml:mrow></mml:mstyle><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03C4;</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03BB;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>v</mml:mi></mml:mrow><mml:mrow><mml:mi>v</mml:mi></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03C1;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B3;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03B3;</mml:mi></mml:mrow></mml:munderover><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>&#x03C3;</mml:mi><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B4;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03B4;</mml:mi></mml:mrow></mml:munderover><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>&#x03C3;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03C1;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03BB;</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mi>&#x03C4;</mml:mi></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03C3;</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03C1;</mml:mi><mml:mo>,</mml:mo><mml:mi>z</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.047em" minsize="2.047em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula>where all the parameters are the same as defined in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref> except <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mn>2</mml:mn><mml:mi>v</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> which is the depth of the 3D kernel along a spectral dimension.</p>
<p>For the Spatial-Spectral Transformer model [<xref ref-type="bibr" rid="ref-44">44</xref>&#x2013;<xref ref-type="bibr" rid="ref-46">46</xref>], consider <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>f</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>N</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>D</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> as the input tensor fed into the Transformer, where <italic>N</italic> signifies the number of patches, and <italic>D</italic> denotes the dimensionality of each patch post convolutional processing. This encoding process is fused with the input embeddings, enriching the model with spatial arrangement details. At the core of the Transformer lies its foundational architecture, the encoder, which comprises multiple layers housing multimodal attention mechanisms and a feed-forward network. The attention mechanism assumes a pivotal role in facilitating the model&#x2019;s ability to capture intricate relationships between diverse patches. More specifically, for a given input <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, each layer within the Transformer encoder encompasses layer normalization, cross attention, and Multi-layer Perceptron (MLP).</p>
<p>Given a query matrix <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mi>Q</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, key matrix <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mi>K</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, value matrix <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>V</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>v</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>v</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, and mask matrix <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mi>M</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula>, where <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>n</mml:mi><mml:mi>q</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>k</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>n</mml:mi><mml:mi>v</mml:mi></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>d</mml:mi><mml:mi>v</mml:mi></mml:msub></mml:math></inline-formula> represent the number of queries, keys, values, the dimensionality of keys/queries, and the dimensionality of values, respectively. Lets define the weight matrices for each head <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>i</mml:mi></mml:math></inline-formula> as <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msubsup><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mi>i</mml:mi><mml:mi>Q</mml:mi></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msubsup><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mi>i</mml:mi><mml:mi>K</mml:mi></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, and <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msubsup><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mi>i</mml:mi><mml:mi>V</mml:mi></mml:msubsup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mi>v</mml:mi></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>. Apply linear transformation as: <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>Q</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mi>i</mml:mi><mml:mi>Q</mml:mi></mml:msubsup></mml:math></inline-formula>, <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>K</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mi>i</mml:mi><mml:mi>K</mml:mi></mml:msubsup></mml:math></inline-formula>, and <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:msub><mml:mi>V</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>V</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mi>i</mml:mi><mml:mi>V</mml:mi></mml:msubsup></mml:math></inline-formula>. Later compute the attention scores for each head and apply the Softmax to compute the weighted sum of the values as shown in <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>.
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mi>A</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:msub><mml:mi>n</mml:mi><mml:mi>W</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.047em" minsize="2.047em">(</mml:mo></mml:mrow></mml:mstyle><mml:mfrac><mml:mrow><mml:msub><mml:mi>Q</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msubsup><mml:mi>K</mml:mi><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup></mml:mrow><mml:msqrt><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:msqrt></mml:mfrac><mml:mo>+</mml:mo><mml:mi>M</mml:mi><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="2.047em" minsize="2.047em">)</mml:mo></mml:mrow></mml:mstyle><mml:mo>&#x00D7;</mml:mo><mml:mi>V</mml:mi></mml:math></disp-formula></p>
<p>Then calculate the concatenate the outputs from all heads as <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mi>C</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:mo>&#x22EE;</mml:mo><mml:mo>,</mml:mo><mml:mspace width="thinmathspace" /><mml:msub><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mi>h</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and apply a final linear transformation as <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup></mml:math></inline-formula>. Then given an input <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, layer normalization is computed as shown in <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref>.
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>N</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:msqrt><mml:msup><mml:mi>&#x03BD;</mml:mi><mml:mn>2</mml:mn></mml:msup><mml:mo>+</mml:mo><mml:mi>&#x03B7;</mml:mi></mml:msqrt></mml:mfrac><mml:mi>&#x03B3;</mml:mi><mml:mo>+</mml:mo><mml:mi>&#x03BB;</mml:mi></mml:math></disp-formula>where <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> is the mean and <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:msup><mml:mi>&#x03BD;</mml:mi><mml:mn>2</mml:mn></mml:msup></mml:math></inline-formula> is the variance of <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>x</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>&#x03B7;</mml:mi></mml:math></inline-formula> is a small constant for numerical stability and <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mi>&#x03BB;</mml:mi></mml:math></inline-formula> are learnable parameters. Later the information is processed through MLP that consists of two linear transformations with a ReLU activation function as shown in <xref ref-type="disp-formula" rid="eqn-9">(9)</xref>.
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>L</mml:mi><mml:mi>u</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mi>x</mml:mi><mml:msub><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></disp-formula>where <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:msub><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:msub><mml:mi>b</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:msub><mml:mrow><mml:mi>&#x1D4B2;</mml:mi></mml:mrow><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>f</mml:mi><mml:mi>f</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>, and <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>b</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x211B;</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>d</mml:mi><mml:mi>e</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula>. The output of the multi-head self-attention <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msubsup><mml:mi>H</mml:mi><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup></mml:math></inline-formula> is added to the input and normalized as presented in <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>.
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:msubsup><mml:mrow><mml:mover><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mo>=</mml:mo><mml:mi>N</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>m</mml:mi><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mo>&#x2295;</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula></p>
<p><xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref> adds the attention output <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup></mml:math></inline-formula> to the previous layer&#x2019;s output <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></inline-formula> and normalizes the results.
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p><xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref> computes the output of the attention mechanism in the l-th layer through an MLP. The MLP consists of two linear layers with a ReLU activation function applied between them. The output of the MLP is added to the normalized attention output and normalized again as:
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:msup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>N</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>m</mml:mi><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">(</mml:mo></mml:mrow></mml:mstyle><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup><mml:mo>&#x2295;</mml:mo><mml:msubsup><mml:mrow><mml:mover><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>a</mml:mi><mml:mi>t</mml:mi><mml:mi>t</mml:mi></mml:mrow><mml:mi>l</mml:mi></mml:msubsup><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.2em" minsize="1.2em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula></p>
<p><xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref> adds the output <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msubsup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>M</mml:mi><mml:mi>L</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> to the normalized attention output and then normalizes the result once more. Afterward, <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is flattened into <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mi>&#x0210B;</mml:mi></mml:mrow><mml:mrow><mml:mi>l</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>b</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>h</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>w</mml:mi><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, where <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:mi>b</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mi>h</mml:mi></mml:math></inline-formula>, and <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mi>w</mml:mi></mml:math></inline-formula> denote the batch size, height, and width, respectively. Finally, a Softmax function is utilized to produce the GT maps.</p>
</sec>
<sec id="s3">
<label>3</label>
<title>Experimental Results and Discussion</title>
<sec id="s3_1">
<label>3.1</label>
<title>Experimental Datasets</title>
<p>In order to highlight the importance and the proposed procedure of disjoint sampling in HSIC, the following datasets are used.</p>
<p><bold>The University of Houston:</bold> The University of Houston dataset consists of 144 spectral bands spanning wavelengths from 380 to 1050 nm, the dataset encompasses an imaged spatial region measuring <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mn>349</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1905</mml:mn></mml:math></inline-formula> pixels at a resolution of 2.5 meters per pixel. Additionally, the dataset annotates 15 labeled classes pertaining to urban land use and land cover types. The disjoint train, validation, and test samples are presented in <xref ref-type="table" rid="table-1">Table 1</xref> and <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>University of Houston Dataset: Disjoint sets of training (Tr), validation (Va), and test (Te) samples were chosen, with their geographical locations (Excluding background samples) illustrated in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, to train various SOTA models</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
</tr>
</thead>
<tbody>
<tr>
<td>Healthy grass</td>
<td>75</td>
<td>300</td>
<td>876</td>
<td>Road</td>
<td>75</td>
<td>300</td>
<td>877</td>
</tr>
<tr>
<td>Stressed grass</td>
<td>75</td>
<td>301</td>
<td>878</td>
<td>Highway</td>
<td>73</td>
<td>295</td>
<td>859</td>
</tr>
<tr>
<td>Synthetic grass</td>
<td>41</td>
<td>168</td>
<td>488</td>
<td>Railway</td>
<td>74</td>
<td>296</td>
<td>865</td>
</tr>
<tr>
<td>Trees</td>
<td>74</td>
<td>299</td>
<td>871</td>
<td>Parking lot 1</td>
<td>73</td>
<td>296</td>
<td>864</td>
</tr>
<tr>
<td>Soil</td>
<td>74</td>
<td>298</td>
<td>870</td>
<td>Parking lot 2</td>
<td>28</td>
<td>112</td>
<td>329</td>
</tr>
<tr>
<td>Water</td>
<td>19</td>
<td>78</td>
<td>228</td>
<td>Tennis court</td>
<td>25</td>
<td>103</td>
<td>300</td>
</tr>
<tr>
<td>Residential</td>
<td>76</td>
<td>304</td>
<td>888</td>
<td>Running track</td>
<td>39</td>
<td>159</td>
<td>462</td>
</tr>
<tr>
<td>Commercial</td>
<td>74</td>
<td>299</td>
<td>871</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>The University of Houston Dataset: Geographical locations of the disjoint train, validation, and test samples presented in <xref ref-type="table" rid="table-1">Table 1</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-2.tif"/>
</fig>
<p><bold>Indian Pines:</bold> The Indian Pines dataset was collected by the Airborne Visible/Infrared Imaging Spectrometer (AVIRIS) over an agricultural site in Northwestern Indiana. It consists of <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mn>145</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>145</mml:mn></mml:math></inline-formula> pixels with spectral information across 224 narrow bands ranging from 0.4 to 2.5 micrometers. The major land cover classes in the dataset included agricultural land, forest, highways, rail lines, low-density housing, and built structures separated by smaller roads. Crops such as corn and soybeans covered less than 5% of typical growing areas as the June image showed early stages of development. Ground truths designate 16 non-mutually exclusive classes. The number of bands was reduced to 200 by removing wavelengths associated with water absorption. The disjoint train, validation, and test samples are presented in <xref ref-type="table" rid="table-2">Table 2</xref> and <xref ref-type="fig" rid="fig-3">Fig. 3</xref>.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Indian Pines Dataset: Disjoint sets of training (Tr), validation (Va), and test (Te) samples were chosen, with their geographical locations (Excluding background samples) illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, to train various SOTA models</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
</tr>
</thead>
<tbody>
<tr>
<td>Alfalfa</td>
<td>6</td>
<td>7</td>
<td>33</td>
<td>Oats</td>
<td>3</td>
<td>3</td>
<td>14</td>
</tr>
<tr>
<td>Corn-notill</td>
<td>214</td>
<td>214</td>
<td>1000</td>
<td>Soybean-notill</td>
<td>145</td>
<td>146</td>
<td>681</td>
</tr>
<tr>
<td>Corn-mintill</td>
<td>124</td>
<td>125</td>
<td>581</td>
<td>Soybean-mintill</td>
<td>368</td>
<td>368</td>
<td>1719</td>
</tr>
<tr>
<td>Corn</td>
<td>35</td>
<td>36</td>
<td>166</td>
<td>Soybean-clean</td>
<td>88</td>
<td>89</td>
<td>416</td>
</tr>
<tr>
<td>Grass-pasture</td>
<td>72</td>
<td>72</td>
<td>339</td>
<td>Wheat</td>
<td>30</td>
<td>31</td>
<td>144</td>
</tr>
<tr>
<td>Grass-trees</td>
<td>109</td>
<td>110</td>
<td>511</td>
<td>Woods</td>
<td>189</td>
<td>190</td>
<td>886</td>
</tr>
<tr>
<td>Grass-mowed</td>
<td>4</td>
<td>4</td>
<td>20</td>
<td>Buildings</td>
<td>57</td>
<td>58</td>
<td>271</td>
</tr>
<tr>
<td>Hay-windrowed</td>
<td>71</td>
<td>72</td>
<td>335</td>
<td>Stone-Steel</td>
<td>13</td>
<td>14</td>
<td>66</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Indian Pines: Geographical locations of the disjoint train, validation, and test samples presented in <xref ref-type="table" rid="table-2">Table 2</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-3.tif"/>
</fig>
<p><bold>Pavia University:</bold> The Pavia University dataset was captured using the Reflective Optics System Imaging Spectrometer (ROSIS), this dataset consists of an image with <inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:mn>610</mml:mn><mml:mspace width="thinmathspace" /><mml:mo>&#x00D7;</mml:mo><mml:mspace width="thinmathspace" /><mml:mn>340</mml:mn></mml:math></inline-formula> pixels and 115 spectral bands. It has 9 classes of urban materials-including asphalt, meadows, gravel, trees, metal sheets, bare soil, bitumen, brick, and shadows-comprising 42,776 labeled samples in total. The disjoint train, validation, and test samples are presented in <xref ref-type="table" rid="table-3">Table 3</xref> and <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Pavia University Dataset: Disjoint sets of training (Tr), validation (Va), and test (Te) samples were chosen, with their geographical locations (Excluding background samples) illustrated in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>, to train various SOTA models</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asphalt</td>
<td>994</td>
<td>995</td>
<td>4642</td>
<td>Soil</td>
<td>754</td>
<td>754</td>
<td>3521</td>
</tr>
<tr>
<td>Meadows</td>
<td>2797</td>
<td>2797</td>
<td>13055</td>
<td>Bitumen</td>
<td>199</td>
<td>200</td>
<td>931</td>
</tr>
<tr>
<td>Gravel</td>
<td>314</td>
<td>315</td>
<td>1470</td>
<td>Bricks</td>
<td>552</td>
<td>552</td>
<td>2578</td>
</tr>
<tr>
<td>Trees</td>
<td>459</td>
<td>460</td>
<td>2145</td>
<td>Shadows</td>
<td>142</td>
<td>142</td>
<td>663</td>
</tr>
<tr>
<td>Painted</td>
<td>201</td>
<td>202</td>
<td>942</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
<td>&#x2013;</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Pavia University Dataset: Geographical locations of the disjoint train, validation, and test samples presented in <xref ref-type="table" rid="table-3">Table 3</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-4.tif"/>
</fig>
<p><bold>Salinas:</bold> The Salinas dataset is collected using the 224-band AVIRIS sensor over Salinas Valley, California, this dataset is characterized by high spatial resolution at 3.7 m per pixel. The study area encompasses 512 lines by 217 samples after removing 20 bands obscured by water absorption. Land cover types within the dataset include vegetables, bare soils, and vineyard fields. The Salinas ground truth annotates 16 classes. The disjoint train, validation, and test samples are presented in <xref ref-type="table" rid="table-4">Table 4</xref> and <xref ref-type="fig" rid="fig-5">Fig. 5</xref>.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Salinas Dataset: Disjoint sets of training (Tr), validation (Va), and test (Te) samples were chosen, with their geographical locations (Excluding background samples) illustrated in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, to train various SOTA models</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
</tr>
</thead>
<tbody>
<tr>
<td>Weeds 1</td>
<td>301</td>
<td>301</td>
<td>1407</td>
<td>Soil vinyard develop</td>
<td>930</td>
<td>930</td>
<td>4343</td>
</tr>
<tr>
<td>Weeds 2</td>
<td>558</td>
<td>559</td>
<td>2609</td>
<td>Corn weeds</td>
<td>491</td>
<td>492</td>
<td>2295</td>
</tr>
<tr>
<td>Fallow</td>
<td>296</td>
<td>296</td>
<td>1384</td>
<td>Lettuce 4 wk</td>
<td>160</td>
<td>160</td>
<td>748</td>
</tr>
<tr>
<td>Fallow rough plow</td>
<td>209</td>
<td>209</td>
<td>976</td>
<td>Lettuce 5 wk</td>
<td>289</td>
<td>289</td>
<td>1349</td>
</tr>
<tr>
<td>Fallow smooth</td>
<td>401</td>
<td>402</td>
<td>1875</td>
<td>Lettuce 6 wk</td>
<td>137</td>
<td>137</td>
<td>642</td>
</tr>
<tr>
<td>Stubble</td>
<td>593</td>
<td>594</td>
<td>2772</td>
<td>Lettuce 7 wk</td>
<td>160</td>
<td>161</td>
<td>749</td>
</tr>
<tr>
<td>Celery</td>
<td>536</td>
<td>537</td>
<td>2506</td>
<td>Vinyard untrained</td>
<td>1090</td>
<td>1090</td>
<td>5088</td>
</tr>
<tr>
<td>Grapes untrained</td>
<td>1690</td>
<td>1691</td>
<td>7890</td>
<td>Vinyard trellis</td>
<td>271</td>
<td>271</td>
<td>1265</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Salinas Dataset: Geographical locations of the disjoint train, validation, and test samples presented in <xref ref-type="table" rid="table-4">Table 4</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-5.tif"/>
</fig>
<p><bold>Botswana:</bold> The NASA EO-1 satellite acquired Hyperspectral imagery of the Okavango Delta region in Botswana from 2001&#x2013;2004 using the Hyperion sensor to collect 30 m resolution data across 242 bands from 400&#x2013;2500 nm over a 7.7 km strip. The data analyzed from 31 May, 2001, consisted of observations of 14 land cover classes representing seasonal swamps, occasional swamps, and drier woodlands in the distal delta region after preprocessing removed uncalibrated and noisy bands covering water absorption and retaining 145 bands. The disjoint train, validation, and test samples are presented in <xref ref-type="table" rid="table-5">Table 5</xref> and <xref ref-type="fig" rid="fig-6">Fig. 6</xref>.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Botswana Dataset: Disjoint sets of training (Tr), validation (Va), and test (Te) samples were chosen, with their geographical locations (Excluding background samples) illustrated in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, to train various SOTA models</title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
<th>Class</th>
<th>Tr</th>
<th>Va</th>
<th>Te</th>
</tr>
</thead>
<tbody>
<tr>
<td>Water</td>
<td>40</td>
<td>41</td>
<td>189</td>
<td>Island interior</td>
<td>30</td>
<td>30</td>
<td>143</td>
</tr>
<tr>
<td>Hippo grass</td>
<td>15</td>
<td>15</td>
<td>71</td>
<td>Woodlands</td>
<td>47</td>
<td>47</td>
<td>220</td>
</tr>
<tr>
<td>Floodplain grasses 1</td>
<td>37</td>
<td>38</td>
<td>176</td>
<td>Acacia Shrublands</td>
<td>37</td>
<td>37</td>
<td>174</td>
</tr>
<tr>
<td>Floodplain grasses 2</td>
<td>32</td>
<td>32</td>
<td>151</td>
<td>Acacia Grasslands</td>
<td>45</td>
<td>46</td>
<td>214</td>
</tr>
<tr>
<td>Reeds 1</td>
<td>40</td>
<td>40</td>
<td>189</td>
<td>Short Mopane</td>
<td>27</td>
<td>27</td>
<td>127</td>
</tr>
<tr>
<td>Riparian</td>
<td>40</td>
<td>40</td>
<td>189</td>
<td>Mixed Mopane</td>
<td>40</td>
<td>40</td>
<td>188</td>
</tr>
<tr>
<td>Firescar 2</td>
<td>38</td>
<td>39</td>
<td>182</td>
<td>Exposed soils</td>
<td>14</td>
<td>14</td>
<td>67</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Botswana Dataset: Geographical locations of the disjoint train, validation, and test samples presented in <xref ref-type="table" rid="table-5">Table 5</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-6.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Experimental Settings</title>
<p>This section presents comprehensive experimental settings for various deep learning models, including 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>], Hybrid Inception Net [<xref ref-type="bibr" rid="ref-47">47</xref>], 3D Inception Net [<xref ref-type="bibr" rid="ref-48">48</xref>], 2D Inception Net [<xref ref-type="bibr" rid="ref-49">49</xref>], 2D CNN [<xref ref-type="bibr" rid="ref-50">50</xref>], Hybrid CNN [<xref ref-type="bibr" rid="ref-51">51</xref>], Attention Graph CNN [<xref ref-type="bibr" rid="ref-22">22</xref>]. Spatial-spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>]. Prior to training, 3D overlapped patches are extracted using an <inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mn>8</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>8</mml:mn></mml:math></inline-formula> window size, as outlined in Algorithm 1. All models in this study are trained using the Adam optimizer with a learning rate of 0.0001, a decay rate of 1e-06, and a batch size of 56 for 50 epochs. The loss and accuracy trend is presented in <xref ref-type="fig" rid="fig-7">Fig. 7</xref> for all the competing methods.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Loss and Accuracy trends for all the competing methods for Indian Pines dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-7.tif"/>
</fig>
<p>All evaluations were conducted on a Google Colab, using a Jupyter Notebook. Colab works on online resources and requires a fast and stable internet connection. Colab works on a Python 3 notebook with a graphic processing unit (GPU) for data analysis, offering 25 GB of random access memory (RAM) and 358 GB of storage.</p>
<p>All the competing methods are tested with a patch size of <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mn>8</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>8</mml:mn></mml:math></inline-formula>, using the 15 most informative bands selected through principal component analysis. The dataset is split into 70% test samples, with the remaining 30% equally divided into training and validation sets (15% each). For effective learning, all models are trained using the Adam optimizer with a learning rate of 0.0001 and a decay of <inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:mn>1</mml:mn><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>06</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> over 50 epochs, with a batch size of 56.</p>
<p>The 2D CNN model is trained using four convolutional layers with kernel sizes of <inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mn>16</mml:mn><mml:mo>,</mml:mo><mml:mn>32</mml:mn><mml:mo>,</mml:mo><mml:mn>64</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and the same padding with <inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:mo stretchy="false">(</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mn>16</mml:mn><mml:mo>,</mml:mo><mml:mn>32</mml:mn><mml:mo>,</mml:mo><mml:mn>64</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> number of filters, respectively. Following the convolutional layers, two dense layers are utilized with a dropout rate of 0.4%. Finally, a classification layer with Softmax is added with the number of output units corresponding to the number of classes in the dataset. The 3D CNN model is trained using four convolutional layers with kernel sizes of <inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn><mml:mo>,</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> with <inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mo stretchy="false">(</mml:mo><mml:mn>8</mml:mn><mml:mo>,</mml:mo><mml:mn>16</mml:mn><mml:mo>,</mml:mo><mml:mn>32</mml:mn><mml:mo>,</mml:mo><mml:mn>64</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> number of filters, respectively. Following the convolutional layers, two dense layers are utilized and finally, a classification layer with Softmax is added with the number of output units corresponding to the number of classes in the dataset. The Hybrid CNN model is trained using three 3D convolutional layers with kernel sizes of <inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn><mml:mo>,</mml:mo><mml:mn>5</mml:mn><mml:mo>,</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, followed by a reshaped layer to transform the features into 2D to learn spatial features using <inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 64 filters. Following the convolutional layers, two dense layers are utilized with a dropout rate of 0.4%. Finally, a classification layer with Softmax is added with the number of output units corresponding to the number of classes in the dataset.</p>
<p>The 2D Inception Net architecture consists of three blocks with the following configurations. In the first block, three 2D convolutional layers are used. The first layer employs a <inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 30 filters, the second layer uses a <inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 20 filters, and the third layer utilizes a <inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 10 filters. In the second block, three 2D convolutional layers are utilized. The first layer has a <inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 40 filters, the second layer employs a <inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:mo stretchy="false">(</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 20 filters, and the third layer uses a <inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 10 filters. The third block begins with a 2D max pooling operation using a <inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel and the same padding. This is followed by two 2D convolutional layers with <inline-formula id="ieqn-97"><mml:math id="mml-ieqn-97"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernels and the same padding. The filters for these layers are set to 20 and 10, respectively. Afterward, the outputs from all three blocks are concatenated, and a convolutional layer with a <inline-formula id="ieqn-98"><mml:math id="mml-ieqn-98"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel and 128 filters is applied. Following the convolutional layer, two dense layers are deployed. Finally, a classification layer with Softmax is added with the number of output units corresponding to the number of classes in the dataset.</p>
<p>The 3D Inception Net architecture consists of three blocks with the following configurations. In the first block, three 3D convolutional layers are used. The first layer employs a <inline-formula id="ieqn-99"><mml:math id="mml-ieqn-99"><mml:mo stretchy="false">(</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 30 filters, the second layer uses a <inline-formula id="ieqn-100"><mml:math id="mml-ieqn-100"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 20 filters, and the third layer utilizes a <inline-formula id="ieqn-101"><mml:math id="mml-ieqn-101"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 10 filters and the same padding in all three layers. In the second block, three 3D convolutional layers are utilized. The first layer has a <inline-formula id="ieqn-102"><mml:math id="mml-ieqn-102"><mml:mo stretchy="false">(</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 40 filters, the second layer employs a <inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 20 filters, and the third layer uses a <inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 10 filters and the same padding in all three layers. The third block begins with three 3D convolutional layers with <inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:mo stretchy="false">(</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 60 filters, the second layer uses a <inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 30 filters, and the third layer utilizes a <inline-formula id="ieqn-107"><mml:math id="mml-ieqn-107"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 10 filters and the same padding in all three layers. Afterward, the outputs from all three blocks are concatenated, and a convolutional layer with a <inline-formula id="ieqn-108"><mml:math id="mml-ieqn-108"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel and 128 filters is applied. Following the convolutional layer, two dense layers are deployed with a 0.4% dropout rate. Finally, a classification layer with Softmax is added with the number of output units corresponding to the number of classes in the dataset.</p>
<p>The hybrid Inception Net architecture consists of three blocks with the following configurations. In the first block, three 3D convolutional layers are used. The first layer has a <inline-formula id="ieqn-109"><mml:math id="mml-ieqn-109"><mml:mo stretchy="false">(</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>7</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 30 filters, the second layer uses a <inline-formula id="ieqn-110"><mml:math id="mml-ieqn-110"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 20 filters, and the third layer employs a <inline-formula id="ieqn-111"><mml:math id="mml-ieqn-111"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 10 filters. The same padding is applied in all three layers. Following the convolutional layer, a reshaped layer is used to convert the features from 3D to 2D. Next, a 2D max-pooling layer with a <inline-formula id="ieqn-112"><mml:math id="mml-ieqn-112"><mml:mo stretchy="false">(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> filter is applied, followed by three 2D convolutional layers. Each of these layers uses a <inline-formula id="ieqn-113"><mml:math id="mml-ieqn-113"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel with 16, 32, and 64 filters, respectively. The same padding is used for all three layers. The same configuration is repeated for the second and third blocks, with the numbers of filters set to 40, 20, and 10 for the 3D convolutional layers, and 16, 32, and 64 for the 2D convolutional layers in the second block, and 60, 30, and 10 for the 3D convolutional layers, and 16, 32, and 64 for the 2D convolutional layers in the third block, respectively. Afterward, the outputs from all three blocks are concatenated, and a convolutional layer with a <inline-formula id="ieqn-114"><mml:math id="mml-ieqn-114"><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> kernel and 128 filters is applied. Following the convolutional layer, two dense layers are deployed with a 0.4% dropout rate. Finally, a classification layer with Softmax is added with the number of output units corresponding to the number of classes in the dataset.</p>
<p>The Attention Graph CNN [<xref ref-type="bibr" rid="ref-22">22</xref>] and Spatial-Spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>] models are trained according to the settings specified in their respective papers. The Transformer model, in particular, is used without the wavelet transformation and consists of 4 layers with 8 heads to compute the final maps. A dropout rate of 0.1 is applied to the classification layers. For more detailed information, please refer to the original papers.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Qualitative and Quantitative Results and Discussion</title>
<p>This section provides a detailed exploration of experimental results in comparison to the state-of-the-art (SOTA) works published in recent years. While many recent research endeavors present extensive experimental outcomes to highlight the strengths and weaknesses of their approaches, it is noteworthy that the experimental results in the literature may follow diverse protocols. For instance, the selection of training, validation, and test samples might be randomly done, and the percentage distribution may be identical. However, there could be variations in the geographical locations of each model, as these models may have undergone training, validation, and testing at different times. Comparative models may have been executed in multiple instances, either sequentially or in parallel, introducing a new set of training, validation, and test samples with the same number or percentage. Consequently, to ensure a fair comparison between the works proposed in the literature and the current study, it is imperative to employ identical experimental settings and execute them with the same set of training, validation, and test samples. This approach ensures a consistent and unbiased evaluation of the proposed methodologies against existing benchmarks.</p>
<p>A prevalent concern in the majority of recent literature is the presence of overlapping training and test samples. When training and validation samples are randomly selected, with or without considering the point mentioned earlier, the data split often includes overlapping samples. This situation introduces bias to the model, as overlapping implies the model has already encountered the training and validation samples, leading to inflated accuracy metrics. To prevent this issue, this study ensures that, despite the random selection of samples, the intersection between training, test, and validation samples remains consistently empty for all competing methods. This measure aims to maintain the integrity of the model evaluation process and uphold the reliability of accuracy assessments.</p>
<p>To ensure a robust and fair evaluation, the datasets are split into disjoint training, validation, and test sets. Following the proposed method, we begin by dividing the HSI dataset into disjoint training, validation, and test sets. Each model is then trained on the training set and tuned on the validation set to optimize performance. Subsequently, the models are evaluated on the disjoint test set and the complete HSI dataset to assess their generalization capabilities. The experimental results demonstrate the effectiveness of the proposed method in improving the classification accuracy of HSIC as shown in <xref ref-type="table" rid="table-6">Tables 6</xref>&#x2013;<xref ref-type="table" rid="table-10">10</xref> and <xref ref-type="fig" rid="fig-8">Figs. 8</xref>&#x2013;<xref ref-type="fig" rid="fig-12">12</xref>. Among the deep learning models considered, 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>] and Hybrid Inception Net [<xref ref-type="bibr" rid="ref-47">47</xref>] achieve the highest classification accuracy, indicating their suitability for HSIC. Additionally, the results highlight the importance of using a large and diverse training dataset to achieve optimal performance.</p>
<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>Indian Pines Dataset: Per class comparative results of various SOTA models are showcased on disjoint validation and test sets. Additionally, results on the entire HSI dataset serving as the test set are also presented. The comparative methods include 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>], Hybrid Inception Net (Hybrid IN) [<xref ref-type="bibr" rid="ref-47">47</xref>], 3D Inception Net (3D IN) [<xref ref-type="bibr" rid="ref-48">48</xref>], 2D Inception Net (2D IN) [<xref ref-type="bibr" rid="ref-49">49</xref>], 2D CNN [<xref ref-type="bibr" rid="ref-50">50</xref>], Hybrid CNN [<xref ref-type="bibr" rid="ref-51">51</xref>], Attention Graph CNN (Attention GCN) [<xref ref-type="bibr" rid="ref-22">22</xref>], and Spatial-Spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>]. The geographical maps for each model for disjoint validation, test, and complete test are presented in <xref ref-type="fig" rid="fig-8">Fig. 8</xref></title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/> 
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th align="center" colspan="3">2D CNN</th>
<th align="center" colspan="3">3D CNN</th>
<th align="center" colspan="3">Hybrid CNN</th>
<th align="center" colspan="3">2D IN</th>
<th align="center" colspan="3">3D IN</th>
<th align="center" colspan="3">Hybrid IN</th>
<th align="center" colspan="3">Attention GCN</th>
<th align="center" colspan="3">SSViT</th>
</tr>
<tr>
<th></th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
</tr>
</thead>
<tbody>
<tr>
<td>Alfalfa</td>
<td>57.14</td>
<td>69.69</td>
<td>99.87</td>
<td>85.71</td>
<td>75.75</td>
<td>99.92</td>
<td>100.00</td>
<td>96.97</td>
<td>99.99</td>
<td>100.00</td>
<td>93.94</td>
<td>99.98</td>
<td>100.00</td>
<td>90.91</td>
<td>99.97</td>
<td>100.00</td>
<td>93.94</td>
<td>99.98</td>
<td>71.43</td>
<td>60.61</td>
<td>99.86</td>
<td>100.00</td>
<td>90.91</td>
<td>99.97</td>
</tr>
<tr>
<td>Corn-notill</td>
<td>74.76</td>
<td>75.4</td>
<td>78.36</td>
<td>92.52</td>
<td>94.10</td>
<td>94.75</td>
<td>92.53</td>
<td>92.80</td>
<td>93.77</td>
<td>84.58</td>
<td>84.20</td>
<td>86.62</td>
<td>97.66</td>
<td>97.30</td>
<td>97.76</td>
<td>95.33</td>
<td>96.00</td>
<td>96.50</td>
<td>76.64</td>
<td>71.30</td>
<td>75.70</td>
<td>85.51</td>
<td>84.70</td>
<td>87.11</td>
</tr>
<tr>
<td>Corn-mintill</td>
<td>76.8</td>
<td>81.06</td>
<td>82.77</td>
<td>96.00</td>
<td>97.76</td>
<td>97.83</td>
<td>97.60</td>
<td>99.83</td>
<td>99.52</td>
<td>84.80</td>
<td>86.40</td>
<td>88.19</td>
<td>96.80</td>
<td>98.45</td>
<td>98.43</td>
<td>97.60</td>
<td>99.66</td>
<td>99.40</td>
<td>60.80</td>
<td>61.96</td>
<td>66.14</td>
<td>82.4</td>
<td>84.34</td>
<td>86.39</td>
</tr>
<tr>
<td>Corn</td>
<td>58.33</td>
<td>45.78</td>
<td>54.85</td>
<td>94.44</td>
<td>89.16</td>
<td>91.56</td>
<td>94.44</td>
<td>95.18</td>
<td>95.78</td>
<td>88.89</td>
<td>79.52</td>
<td>83.97</td>
<td>100.00</td>
<td>98.80</td>
<td>99.16</td>
<td>88.89</td>
<td>88.55</td>
<td>90.30</td>
<td>63.89</td>
<td>53.61</td>
<td>62.03</td>
<td>58.33</td>
<td>48.80</td>
<td>57.81</td>
</tr>
<tr>
<td>Grass-pasture</td>
<td>94.44</td>
<td>94.10</td>
<td>95.03</td>
<td>97.22</td>
<td>96.76</td>
<td>97.31</td>
<td>98.61</td>
<td>97.94</td>
<td>98.34</td>
<td>95.83</td>
<td>95.28</td>
<td>96.07</td>
<td>98.61</td>
<td>97.05</td>
<td>97.72</td>
<td>95.83</td>
<td>96.76</td>
<td>97.10</td>
<td>90.28</td>
<td>89.97</td>
<td>91.30</td>
<td>94.44</td>
<td>96.17</td>
<td>96.48</td>
</tr>
<tr>
<td>Grass-trees</td>
<td>100.</td>
<td>99.41</td>
<td>99.58</td>
<td>99.09</td>
<td>99.41</td>
<td>99.45</td>
<td>100.00</td>
<td>99.80</td>
<td>99.86</td>
<td>98.18</td>
<td>99.41</td>
<td>99.32</td>
<td>100.00</td>
<td>99.80</td>
<td>99.86</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.18</td>
<td>98.63</td>
<td>98.77</td>
<td>99.09</td>
<td>99.22</td>
<td>99.32</td>
</tr>
<tr>
<td>Grass-mowed</td>
<td>0.</td>
<td>0.</td>
<td>7.14</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>75.00</td>
<td>70.00</td>
<td>75.00</td>
<td>75.00</td>
<td>70.00</td>
<td>75.00</td>
<td>25.00</td>
<td>20.00</td>
<td>25.00</td>
<td>25.00</td>
<td>5.00</td>
<td>14.29</td>
<td>75.00</td>
<td>75.00</td>
<td>78.57</td>
</tr>
<tr>
<td>Hay-windrowed</td>
<td>98.611</td>
<td>98.50</td>
<td>98.74</td>
<td>100</td>
<td>99.70</td>
<td>99.79</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.21</td>
<td>98.74</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.61</td>
<td>95.52</td>
<td>96.65</td>
<td>98.61</td>
<td>99.40</td>
<td>99.37</td>
</tr>
<tr>
<td>Oats</td>
<td>0.</td>
<td>0.</td>
<td>5.0</td>
<td>66.66</td>
<td>71.43</td>
<td>75.00</td>
<td>66.67</td>
<td>85.71</td>
<td>85.00</td>
<td>33.33</td>
<td>71.43</td>
<td>70.00</td>
<td>0.00</td>
<td>28.57</td>
<td>35.00</td>
<td>0.00</td>
<td>28.57</td>
<td>30.00</td>
<td>0.00</td>
<td>21.43</td>
<td>30.00</td>
<td>0.00</td>
<td>78.57</td>
<td>70.00</td>
</tr>
<tr>
<td>Soybean-notill</td>
<td>75.34</td>
<td>74.44</td>
<td>77.77</td>
<td>92.47</td>
<td>91.34</td>
<td>92.80</td>
<td>91.78</td>
<td>93.10</td>
<td>93.93</td>
<td>86.30</td>
<td>81.64</td>
<td>85.08</td>
<td>91.78</td>
<td>94.42</td>
<td>94.86</td>
<td>88.36</td>
<td>91.48</td>
<td>92.18</td>
<td>81.51</td>
<td>79.00</td>
<td>82.41</td>
<td>82.19</td>
<td>82.97</td>
<td>85.39</td>
</tr>
<tr>
<td>Soybean-mintill</td>
<td>85.05</td>
<td>88.24</td>
<td>89.16</td>
<td>97.83</td>
<td>97.27</td>
<td>97.76</td>
<td>99.73</td>
<td>99.71</td>
<td>99.76</td>
<td>89.40</td>
<td>92.15</td>
<td>92.91</td>
<td>98.10</td>
<td>98.49</td>
<td>98.66</td>
<td>98.91</td>
<td>98.72</td>
<td>98.94</td>
<td>86.14</td>
<td>87.73</td>
<td>89.33</td>
<td>91.58</td>
<td>92.03</td>
<td>93.16</td>
</tr>
<tr>
<td>Soybean-clean</td>
<td>58.42</td>
<td>63.22</td>
<td>67.11</td>
<td>92.13</td>
<td>96.88</td>
<td>96.63</td>
<td>93.26</td>
<td>96.88</td>
<td>96.80</td>
<td>91.01</td>
<td>90.87</td>
<td>92.24</td>
<td>89.89</td>
<td>97.12</td>
<td>96.46</td>
<td>97.75</td>
<td>96.88</td>
<td>97.47</td>
<td>50.56</td>
<td>49.76</td>
<td>56.16</td>
<td>79.78</td>
<td>80.53</td>
<td>83.31</td>
</tr>
<tr>
<td>Wheat</td>
<td>83.87</td>
<td>95.13</td>
<td>94.14</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>96.77</td>
<td>99.31</td>
<td>99.02</td>
<td>100.00</td>
<td>97.92</td>
<td>98.54</td>
<td>100.00</td>
<td>98.61</td>
<td>99.02</td>
<td>96.77</td>
<td>92.36</td>
<td>94.15</td>
<td>93.55</td>
<td>99.31</td>
<td>98.54</td>
</tr>
<tr>
<td>Woods</td>
<td>97.36</td>
<td>97.40</td>
<td>97.78</td>
<td>96.32</td>
<td>97.29</td>
<td>97.55</td>
<td>98.95</td>
<td>99.32</td>
<td>99.37</td>
<td>98.42</td>
<td>99.10</td>
<td>99.13</td>
<td>98.95</td>
<td>98.53</td>
<td>98.81</td>
<td>98.95</td>
<td>99.21</td>
<td>99.30</td>
<td>97.37</td>
<td>97.18</td>
<td>97.63</td>
<td>97.89</td>
<td>96.73</td>
<td>97.39</td>
</tr>
<tr>
<td>Buildings</td>
<td>75.86</td>
<td>81.91</td>
<td>83.67</td>
<td>100</td>
<td>97.79</td>
<td>98.45</td>
<td>100.00</td>
<td>98.89</td>
<td>99.22</td>
<td>86.21</td>
<td>92.25</td>
<td>92.49</td>
<td>100.00</td>
<td>98.15</td>
<td>98.70</td>
<td>100.00</td>
<td>97.42</td>
<td>98.19</td>
<td>77.58</td>
<td>75.65</td>
<td>79.53</td>
<td>93.10</td>
<td>94.83</td>
<td>95.34</td>
</tr>
<tr>
<td>Stone-Steel</td>
<td>50.</td>
<td>83.33</td>
<td>79.56</td>
<td>92,86</td>
<td>96.97</td>
<td>96.77</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>92.86</td>
<td>98.48</td>
<td>97.85</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>78.57</td>
<td>83.33</td>
<td>84.95</td>
<td>92.86</td>
<td>100.00</td>
<td>98.92</td>
</tr>
<tr>
<td><bold>Kappa</bold></td>
<td>79.85</td>
<td>81.91</td>
<td>90.23</td>
<td>95.41</td>
<td>95.75</td>
<td>97.78</td>
<td>96.74</td>
<td>97.36</td>
<td>98.58</td>
<td>89.35</td>
<td>89.88</td>
<td>94.74</td>
<td>96.74</td>
<td>97.38</td>
<td>98.59</td>
<td>96.22</td>
<td>96.68</td>
<td>98.22</td>
<td>79.67</td>
<td>78.29</td>
<td>88.76</td>
<td>87.86</td>
<td>88.29</td>
<td>93.93</td>
</tr>
<tr>
<td><bold>OA</bold></td>
<td>82.33</td>
<td>84.17</td>
<td>93.10</td>
<td>95.97</td>
<td>96.27</td>
<td>98.43</td>
<td>97.14</td>
<td>97.69</td>
<td>99.00</td>
<td>90.64</td>
<td>91.13</td>
<td>96.29</td>
<td>97.14</td>
<td>97.70</td>
<td>99.00</td>
<td>96.69</td>
<td>97.09</td>
<td>98.74</td>
<td>82.20</td>
<td>81.06</td>
<td>92.08</td>
<td>89.34</td>
<td>89.74</td>
<td>95.71</td>
</tr>
<tr>
<td><bold>AA</bold></td>
<td>67.88</td>
<td>71.73</td>
<td>75.66</td>
<td>93.95</td>
<td>93.85</td>
<td>95.97</td>
<td>95.85</td>
<td>97.26</td>
<td>97.58</td>
<td>87.60</td>
<td>89.51</td>
<td>91.04</td>
<td>90.42</td>
<td>91.59</td>
<td>93.06</td>
<td>87.86</td>
<td>87.86</td>
<td>88.96</td>
<td>72.08</td>
<td>70.19</td>
<td>76.18</td>
<td>87.72</td>
<td>87.72</td>
<td>89.19</td>
</tr>
<tr>
<td><bold>Time (s)</bold></td>
<td>1.54</td>
<td>1.68</td>
<td>8.74</td>
<td>0.45</td>
<td>1.39</td>
<td>9.80</td>
<td>0.53</td>
<td>0.81</td>
<td>8.64</td>
<td>0.85</td>
<td>1.39</td>
<td>10.96</td>
<td>1.48</td>
<td>3.14</td>
<td>14.84</td>
<td>0.89</td>
<td>2.68</td>
<td>12.75</td>
<td>1.18</td>
<td>1.38</td>
<td>10.27</td>
<td>1.28</td>
<td>2.24</td>
<td>14.72</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Pavia University Dataset: Per class comparative results of various SOTA models are showcased on disjoint validation and test sets. Additionally, results on the entire HSI dataset serving as the test set are also presented. The comparative methods include 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>], Hybrid Inception Net (Hybrid IN) [<xref ref-type="bibr" rid="ref-47">47</xref>], 3D Inception Net (3D IN) [<xref ref-type="bibr" rid="ref-48">48</xref>], 2D Inception Net (2D IN) [<xref ref-type="bibr" rid="ref-49">49</xref>], 2D CNN [<xref ref-type="bibr" rid="ref-50">50</xref>], Hybrid CNN [<xref ref-type="bibr" rid="ref-51">51</xref>], Attention Graph CNN (Attention GCN) [<xref ref-type="bibr" rid="ref-22">22</xref>], and Spatial-Spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>]. The geographical maps for each model for disjoint validation, test, and complete test are presented in <xref ref-type="fig" rid="fig-9">Fig. 9</xref></title>
</caption>
<table frame="hsides">
<colgroup>
<col/> 
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th align="center" colspan="3">2D CNN</th>
<th align="center" colspan="3">3D CNN</th>
<th align="center" colspan="3">Hybrid CNN</th>
<th align="center" colspan="3">2D IN</th>
<th align="center" colspan="3">3D IN</th>
<th align="center" colspan="3">Hybrid IN</th>
<th align="center" colspan="3">Attention GCN</th>
<th align="center" colspan="3">SSViT</th>
</tr>
<tr>
<th></th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
</tr>
</thead>
<tbody>
<tr>
<td>Asphalt</td>
<td>99.70</td>
<td>99.87</td>
<td>99.99</td>
<td>99.80</td>
<td>99.78</td>
<td>99.99</td>
<td>100.00</td>
<td>99.83</td>
<td>99.99</td>
<td>98.59</td>
<td>98.97</td>
<td>99.96</td>
<td>99.40</td>
<td>99.05</td>
<td>99.97</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.29</td>
<td>98.32</td>
<td>99.94</td>
<td>98.29</td>
<td>98.15</td>
<td>99.94</td>
</tr>
<tr>
<td>Meadows</td>
<td>99.96</td>
<td>99.95</td>
<td>99.96</td>
<td>99.96</td>
<td>100.00</td>
<td>99.99</td>
<td>99.96</td>
<td>99.99</td>
<td>99.99</td>
<td>99.89</td>
<td>99.96</td>
<td>99.96</td>
<td>99.96</td>
<td>100.00</td>
<td>99.99</td>
<td>99.89</td>
<td>99.91</td>
<td>99.92</td>
<td>99.82</td>
<td>99.90</td>
<td>99.89</td>
<td>100.00</td>
<td>98.91</td>
<td>99.94</td>
</tr>
<tr>
<td>Gravel</td>
<td>94.29</td>
<td>94.35</td>
<td>95.19</td>
<td>98.41</td>
<td>97.55</td>
<td>98.04</td>
<td>98.41</td>
<td>98.50</td>
<td>98.71</td>
<td>93.65</td>
<td>93.67</td>
<td>94.62</td>
<td>97.46</td>
<td>97.76</td>
<td>98.05</td>
<td>97.14</td>
<td>97.69</td>
<td>97.95</td>
<td>87.30</td>
<td>85.17</td>
<td>86.04</td>
<td>89.21</td>
<td>88.76</td>
<td>90.52</td>
</tr>
<tr>
<td>Trees</td>
<td>98.91</td>
<td>99.07</td>
<td>99.18</td>
<td>99.13</td>
<td>99.39</td>
<td>99.25</td>
<td>99.13</td>
<td>99.49</td>
<td>99.51</td>
<td>96.96</td>
<td>97.86</td>
<td>98.04</td>
<td>98.26</td>
<td>98.97</td>
<td>99.02</td>
<td>99.78</td>
<td>99.86</td>
<td>99.87</td>
<td>95.43</td>
<td>96.41</td>
<td>96.57</td>
<td>97.39</td>
<td>98.60</td>
<td>98.63</td>
</tr>
<tr>
<td>Painted</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.89</td>
<td>99.93</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Soil</td>
<td>99.87</td>
<td>99.20</td>
<td>99.42</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.34</td>
<td>98.64</td>
<td>98.95</td>
<td>100.00</td>
<td>99.94</td>
<td>99.96</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.43</td>
<td>99.60</td>
<td>99.73</td>
<td>99.03</td>
<td>99.28</td>
</tr>
<tr>
<td>Bitumen</td>
<td>98.00</td>
<td>98.82</td>
<td>98.87</td>
<td>99.50</td>
<td>99.36</td>
<td>99.47</td>
<td>99.50</td>
<td>99.79</td>
<td>99.77</td>
<td>96.50</td>
<td>97.96</td>
<td>98.05</td>
<td>99.50</td>
<td>99.79</td>
<td>99.77</td>
<td>100.00</td>
<td>99.68</td>
<td>99.77</td>
<td>94.00</td>
<td>93.34</td>
<td>94.29</td>
<td>97.00</td>
<td>96.24</td>
<td>96.92</td>
</tr>
<tr>
<td>Bricks</td>
<td>96.56</td>
<td>97.05</td>
<td>97.42</td>
<td>98.73</td>
<td>99.19</td>
<td>99.24</td>
<td>99.46</td>
<td>99.34</td>
<td>99.46</td>
<td>96.56</td>
<td>96.47</td>
<td>97.01</td>
<td>99.28</td>
<td>99.50</td>
<td>99.54</td>
<td>99.28</td>
<td>98.80</td>
<td>99.02</td>
<td>96.56</td>
<td>97.40</td>
<td>97.64</td>
<td>94.38</td>
<td>93.91</td>
<td>94.89</td>
</tr>
<tr>
<td>Shadows</td>
<td>100.00</td>
<td>98.79</td>
<td>99.16</td>
<td>99.30</td>
<td>99.40</td>
<td>99.47</td>
<td>99.30</td>
<td>99.85</td>
<td>99.79</td>
<td>100.00</td>
<td>98.49</td>
<td>98.94</td>
<td>100.00</td>
<td>99.85</td>
<td>99.94</td>
<td>100.00</td>
<td>99.70</td>
<td>99.79</td>
<td>99.30</td>
<td>99.25</td>
<td>99.37</td>
<td>99.30</td>
<td>99.55</td>
<td>99.58</td>
</tr>
<tr>
<td><bold>Kappa</bold></td>
<td>98.95</td>
<td>98.95</td>
<td>99.55</td>
<td>99.57</td>
<td>99.60</td>
<td>99.83</td>
<td>99.69</td>
<td>99.73</td>
<td>99.88</td>
<td>98.31</td>
<td>98.40</td>
<td>99.31</td>
<td>99.42</td>
<td>99.48</td>
<td>99.77</td>
<td>99.65</td>
<td>99.62</td>
<td>99.84</td>
<td>97.62</td>
<td>97.61</td>
<td>98.90</td>
<td>97.81</td>
<td>97.69</td>
<td>99.02</td>
</tr>
<tr>
<td><bold>OA</bold></td>
<td>99.21</td>
<td>99.21</td>
<td>99.86</td>
<td>99.67</td>
<td>99.70</td>
<td>99.95</td>
<td>99.77</td>
<td>99.79</td>
<td>99.96</td>
<td>98.72</td>
<td>98.79</td>
<td>99.79</td>
<td>99.56</td>
<td>99.61</td>
<td>99.93</td>
<td>99.74</td>
<td>99.72</td>
<td>99.95</td>
<td>98.21</td>
<td>98.20</td>
<td>99.66</td>
<td>98.39</td>
<td>98.26</td>
<td>99.70</td>
</tr>
<tr>
<td><bold>AA</bold></td>
<td>98.59</td>
<td>98.57</td>
<td>98.80</td>
<td>99.43</td>
<td>99.41</td>
<td>99.52</td>
<td>99.53</td>
<td>99.64</td>
<td>99.69</td>
<td>97.94</td>
<td>97.99</td>
<td>98.38</td>
<td>99.32</td>
<td>99.43</td>
<td>99.58</td>
<td>99.57</td>
<td>99.51</td>
<td>99.59</td>
<td>96.74</td>
<td>96.58</td>
<td>97.04</td>
<td>97.26</td>
<td>97.13</td>
<td>97.74</td>
</tr>
<tr>
<td><bold>Time (s)</bold></td>
<td>1.05</td>
<td>2.91</td>
<td>85.48</td>
<td>1.42</td>
<td>3.52</td>
<td>90.00</td>
<td>0.80</td>
<td>5.47</td>
<td>83.94</td>
<td>1.47</td>
<td>3.81</td>
<td>85.17</td>
<td>2.98</td>
<td>11.53</td>
<td>139.04</td>
<td>2.90</td>
<td>7.13</td>
<td>105.40</td>
<td>1.80</td>
<td>5.01</td>
<td>100.10</td>
<td>2.49</td>
<td>10.61</td>
<td>144.50</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>University of Houston Dataset: Per class comparative results of various SOTA models are showcased on disjoint validation and test sets. Additionally, results on the entire HSI dataset serving as the test set are also presented. The comparative methods include 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>], Hybrid Inception Net (Hybrid IN) [<xref ref-type="bibr" rid="ref-47">47</xref>], 3D Inception Net (3D IN) [<xref ref-type="bibr" rid="ref-48">48</xref>], 2D Inception Net (2D IN) [<xref ref-type="bibr" rid="ref-49">49</xref>], 2D CNN [<xref ref-type="bibr" rid="ref-50">50</xref>], Hybrid CNN [<xref ref-type="bibr" rid="ref-51">51</xref>], Attention Graph CNN (Attention GCN) [<xref ref-type="bibr" rid="ref-22">22</xref>], and Spatial-Spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>]. The geographical maps for each model for disjoint validation, test, and complete test are presented in <xref ref-type="fig" rid="fig-10">Fig. 10</xref></title>
</caption>
<table frame="hsides">
<colgroup>
<col/> 
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th align="center" colspan="3">2D CNN</th>
<th align="center" colspan="3">3D CNN</th>
<th align="center" colspan="3">Hybrid CNN</th>
<th align="center" colspan="3">2D IN</th>
<th align="center" colspan="3">3D IN</th>
<th align="center" colspan="3">Hybrid IN</th>
<th align="center" colspan="3">Attention GCN</th>
<th align="center" colspan="3">SSViT</th>
</tr>
<tr>
<th></th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
</tr>
</thead>
<tbody>
<tr>
<td>Healthy grass</td>
<td>97.33</td>
<td>96.69</td>
<td>99.99</td>
<td>97.00</td>
<td>97.95</td>
<td>99.99</td>
<td>99.33</td>
<td>99.09</td>
<td>99.99</td>
<td>98.67</td>
<td>98.63</td>
<td>99.99</td>
<td>99.33</td>
<td>98.86</td>
<td>99.99</td>
<td>100.00</td>
<td>99.54</td>
<td>99.99</td>
<td>94.33</td>
<td>94.18</td>
<td>99.99</td>
<td>98.00</td>
<td>98.17</td>
<td>99.99</td>
</tr>
<tr>
<td>Stressed grass</td>
<td>99.67</td>
<td>99.20</td>
<td>99.36</td>
<td>99.67</td>
<td>98.86</td>
<td>99.12</td>
<td>99.67</td>
<td>98.97</td>
<td>99.20</td>
<td>99.67</td>
<td>98.97</td>
<td>99.20</td>
<td>99.67</td>
<td>99.32</td>
<td>99.44</td>
<td>99.34</td>
<td>98.41</td>
<td>98.72</td>
<td>98.67</td>
<td>98.41</td>
<td>98.56</td>
<td>99.34</td>
<td>98.75</td>
<td>98.96</td>
</tr>
<tr>
<td>Synthetic grass</td>
<td>98.81</td>
<td>99.39</td>
<td>99.28</td>
<td>95.83</td>
<td>96.31</td>
<td>96.41</td>
<td>97.02</td>
<td>97.95</td>
<td>97.85</td>
<td>95.24</td>
<td>92.83</td>
<td>93.83</td>
<td>98.21</td>
<td>98.16</td>
<td>98.28</td>
<td>98.81</td>
<td>98.57</td>
<td>98.71</td>
<td>97.62</td>
<td>97.54</td>
<td>97.70</td>
<td>97.02</td>
<td>96.11</td>
<td>96.56</td>
</tr>
<tr>
<td>Trees</td>
<td>100.00</td>
<td>99.77</td>
<td>99.84</td>
<td>98.33</td>
<td>98.85</td>
<td>98.79</td>
<td>98.99</td>
<td>99.54</td>
<td>99.44</td>
<td>99.33</td>
<td>98.74</td>
<td>98.95</td>
<td>99.67</td>
<td>99.89</td>
<td>99.84</td>
<td>99.33</td>
<td>99.66</td>
<td>99.60</td>
<td>90.30</td>
<td>89.44</td>
<td>90.03</td>
<td>97.32</td>
<td>97.93</td>
<td>97.91</td>
</tr>
<tr>
<td>Soil</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.66</td>
<td>100.00</td>
<td>99.92</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.89</td>
<td>99.92</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.66</td>
<td>99.77</td>
<td>99.76</td>
<td>100.00</td>
<td>99.89</td>
<td>99.92</td>
</tr>
<tr>
<td>Water</td>
<td>78.21</td>
<td>82.46</td>
<td>82.46</td>
<td>92.31</td>
<td>95.18</td>
<td>94.77</td>
<td>97.44</td>
<td>99.12</td>
<td>98.77</td>
<td>79.49</td>
<td>85.09</td>
<td>84.62</td>
<td>92.31</td>
<td>91.23</td>
<td>92.00</td>
<td>97.44</td>
<td>97.81</td>
<td>97.85</td>
<td>75.64</td>
<td>75.44</td>
<td>76.92</td>
<td>94.87</td>
<td>95.61</td>
<td>95.69</td>
</tr>
<tr>
<td>Residential</td>
<td>95.07</td>
<td>94.14</td>
<td>94.72</td>
<td>96.05</td>
<td>96.28</td>
<td>96.45</td>
<td>98.36</td>
<td>98.54</td>
<td>98.58</td>
<td>95.39</td>
<td>94.26</td>
<td>94.87</td>
<td>98.36</td>
<td>97.75</td>
<td>98.03</td>
<td>98.68</td>
<td>98.09</td>
<td>98.34</td>
<td>84.54</td>
<td>86.26</td>
<td>86.67</td>
<td>92.76</td>
<td>94.37</td>
<td>94.32</td>
</tr>
<tr>
<td>Commercial</td>
<td>85.28</td>
<td>85.19</td>
<td>85.85</td>
<td>97.66</td>
<td>97.70</td>
<td>97.83</td>
<td>97.66</td>
<td>97.82</td>
<td>97.91</td>
<td>99.33</td>
<td>95.64</td>
<td>96.78</td>
<td>94.98</td>
<td>97.82</td>
<td>97.27</td>
<td>96.32</td>
<td>97.36</td>
<td>97.27</td>
<td>81.94</td>
<td>84.27</td>
<td>84.65</td>
<td>93.98</td>
<td>96.90</td>
<td>96.38</td>
</tr>
<tr>
<td>Road</td>
<td>91.67</td>
<td>89.28</td>
<td>90.50</td>
<td>94.33</td>
<td>92.82</td>
<td>93.61</td>
<td>97.33</td>
<td>95.78</td>
<td>96.41</td>
<td>94.33</td>
<td>91.33</td>
<td>92.57</td>
<td>94.00</td>
<td>92.93</td>
<td>93.61</td>
<td>94.33</td>
<td>92.93</td>
<td>93.69</td>
<td>83.00</td>
<td>80.62</td>
<td>82.35</td>
<td>91.66</td>
<td>91.33</td>
<td>91.93</td>
</tr>
<tr>
<td>Highway</td>
<td>93.90</td>
<td>93.83</td>
<td>94.21</td>
<td>98.98</td>
<td>98.72</td>
<td>98.86</td>
<td>95.93</td>
<td>95.81</td>
<td>95.93</td>
<td>97.63</td>
<td>96.97</td>
<td>97.31</td>
<td>99.32</td>
<td>99.30</td>
<td>99.35</td>
<td>99.32</td>
<td>99.30</td>
<td>99.35</td>
<td>91.53</td>
<td>93.95</td>
<td>93.72</td>
<td>97.63</td>
<td>97.21</td>
<td>97.47</td>
</tr>
<tr>
<td>Railway</td>
<td>95.27</td>
<td>96.30</td>
<td>96.28</td>
<td>98.31</td>
<td>98.15</td>
<td>98.30</td>
<td>100.00</td>
<td>99.77</td>
<td>99.84</td>
<td>99.66</td>
<td>99.65</td>
<td>99.68</td>
<td>99.32</td>
<td>99.31</td>
<td>99.35</td>
<td>99.66</td>
<td>98.03</td>
<td>98.54</td>
<td>93.24</td>
<td>91.56</td>
<td>92.47</td>
<td>98.99</td>
<td>98.38</td>
<td>98.62</td>
</tr>
<tr>
<td>Parking lot 1</td>
<td>94.93</td>
<td>95.49</td>
<td>95.62</td>
<td>98.99</td>
<td>99.88</td>
<td>99.68</td>
<td>98.65</td>
<td>99.54</td>
<td>99.35</td>
<td>97.64</td>
<td>98.03</td>
<td>98.05</td>
<td>98.99</td>
<td>99.77</td>
<td>99.59</td>
<td>98.99</td>
<td>99..88</td>
<td>99.68</td>
<td>95.27</td>
<td>93.52</td>
<td>94.32</td>
<td>98.65</td>
<td>99.19</td>
<td>99.10</td>
</tr>
<tr>
<td>Parking lot 2</td>
<td>63.39</td>
<td>62.92</td>
<td>65.25</td>
<td>79.46</td>
<td>81.46</td>
<td>82.09</td>
<td>93.75</td>
<td>96.96</td>
<td>96.38</td>
<td>85.71</td>
<td>88.15</td>
<td>88.27</td>
<td>94.64</td>
<td>94.83</td>
<td>95.10</td>
<td>96.43</td>
<td>96.05</td>
<td>96.38</td>
<td>50.00</td>
<td>52.88</td>
<td>55.01</td>
<td>59.82</td>
<td>68.99</td>
<td>68.65</td>
</tr>
<tr>
<td>Tennis court</td>
<td>97.09</td>
<td>96.67</td>
<td>96.96</td>
<td>99.03</td>
<td>98.00</td>
<td>98.36</td>
<td>99.03</td>
<td>98.67</td>
<td>98.83</td>
<td>90.29</td>
<td>87.00</td>
<td>88.55</td>
<td>99.03</td>
<td>100.00</td>
<td>99.77</td>
<td>96.12</td>
<td>100.00</td>
<td>99.07</td>
<td>90.29</td>
<td>90.00</td>
<td>90.65</td>
<td>97.09</td>
<td>95.67</td>
<td>96.26</td>
</tr>
<tr>
<td>Running track</td>
<td>100.00</td>
<td>99.78</td>
<td>99.85</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.92</td>
<td>99.24</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>95.27</td>
<td>92.42</td>
<td>94.09</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td><bold>Kappa</bold></td>
<td>93.91</td>
<td>93.7</td>
<td>96.99</td>
<td>97.00</td>
<td>97.14</td>
<td>98.61</td>
<td>98.29</td>
<td>98.38</td>
<td>99.20</td>
<td>96.85</td>
<td>95.93</td>
<td>98.16</td>
<td>98.05</td>
<td>98.17</td>
<td>99.11</td>
<td>98.38</td>
<td>98.23</td>
<td>99.17</td>
<td>89.35</td>
<td>89.13</td>
<td>94.79</td>
<td>95.44</td>
<td>96.01</td>
<td>98.02</td>
</tr>
<tr>
<td><bold>OA</bold></td>
<td>94.37</td>
<td>94.18</td>
<td>99.88</td>
<td>97.23</td>
<td>97.36</td>
<td>99.94</td>
<td>98.42</td>
<td>98.50</td>
<td>99.97</td>
<td>97.09</td>
<td>96.24</td>
<td>99.92</td>
<td>98.20</td>
<td>98.31</td>
<td>99.96</td>
<td>98.50</td>
<td>98.37</td>
<td>99.97</td>
<td>90.16</td>
<td>89.96</td>
<td>99.79</td>
<td>95.79</td>
<td>96.31</td>
<td>99.92</td>
</tr>
<tr>
<td><bold>AA</bold></td>
<td>92.71</td>
<td>92.74</td>
<td>93.34</td>
<td>96.37</td>
<td>96.68</td>
<td>96.95</td>
<td>98.21</td>
<td>98.50</td>
<td>98.56</td>
<td>95.49</td>
<td>94.94</td>
<td>95.46</td>
<td>97.86</td>
<td>97.94</td>
<td>98.11</td>
<td>98.32</td>
<td>98.37</td>
<td>99.48</td>
<td>88.23</td>
<td>88.02</td>
<td>89.13</td>
<td>94.48</td>
<td>95.23</td>
<td>95.45</td>
</tr>
<tr>
<td><bold>Time (s)</bold></td>
<td>1.63</td>
<td>2.54</td>
<td>320.98</td>
<td>0.76</td>
<td>1.58</td>
<td>323.42</td>
<td>0.47</td>
<td>1.12</td>
<td>309.30</td>
<td>0.63</td>
<td>1.38</td>
<td>362.58</td>
<td>1.78</td>
<td>4.55</td>
<td>542.21</td>
<td>1.24</td>
<td>2.85</td>
<td>449.60</td>
<td>2.07</td>
<td>2.73</td>
<td>342.90</td>
<td>2.04</td>
<td>3.76</td>
<td>533.75</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-9">
<label>Table 9</label>
<caption>
<title>Botswana Dataset: Per class comparative results of various SOTA models are showcased on disjoint validation and test sets. Additionally, results on the entire HSI dataset serving as the test set are also presented. The comparative methods include 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>], Hybrid Inception Net (Hybrid IN) [<xref ref-type="bibr" rid="ref-47">47</xref>], 3D Inception Net (3D IN) [<xref ref-type="bibr" rid="ref-48">48</xref>], 2D Inception Net (2D IN) [<xref ref-type="bibr" rid="ref-49">49</xref>], 2D CNN [<xref ref-type="bibr" rid="ref-52">52</xref>], Hybrid CNN [<xref ref-type="bibr" rid="ref-51">51</xref>], Attention Graph CNN (Attention GCN) [<xref ref-type="bibr" rid="ref-22">22</xref>], and Spatial-Spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>]. The geographical maps for each model for disjoint validation, test, and complete test are presented in <xref ref-type="fig" rid="fig-11">Fig. 11</xref></title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/> 
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th align="center" colspan="3">2D CNN</th>
<th align="center" colspan="3">3D CNN</th>
<th align="center" colspan="3">Hybrid CNN</th>
<th align="center" colspan="3">2D IN</th>
<th align="center" colspan="3">3D IN</th>
<th align="center" colspan="3">Hybrid IN</th>
<th align="center" colspan="3">Attention GCN</th>
<th align="center" colspan="3">SSViT</th>
</tr>
<tr>
<th></th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
</tr>
</thead>
<tbody>
<tr>
<td>Water</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.94</td>
<td>99.99</td>
<td>97.56</td>
<td>100.00</td>
<td>99.99</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>97.56</td>
<td>99.47</td>
<td>99.99</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Hippo grass</td>
<td>53.33</td>
<td>53.52</td>
<td>60.40</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>80.00</td>
<td>50.70</td>
<td>62.38</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Floodplain grasses 1</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>89.47</td>
<td>87.50</td>
<td>89.64</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Floodplain grasses 2</td>
<td>96.88</td>
<td>99.34</td>
<td>99.07</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>93.75</td>
<td>99.34</td>
<td>98.60</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Reeds 1</td>
<td>87.5</td>
<td>92.06</td>
<td>92.19</td>
<td>100.00</td>
<td>99.47</td>
<td>99.63</td>
<td>95.00</td>
<td>97.35</td>
<td>97.40</td>
<td>87.5</td>
<td>90.48</td>
<td>91.45</td>
<td>90.00</td>
<td>94.18</td>
<td>94.42</td>
<td>97.50</td>
<td>100.00</td>
<td>99.63</td>
<td>92.50</td>
<td>95.24</td>
<td>95.54</td>
<td>87.50</td>
<td>93.12</td>
<td>93.31</td>
</tr>
<tr>
<td>Riparian</td>
<td>82.50</td>
<td>82.54</td>
<td>85.13</td>
<td>95.00</td>
<td>94.71</td>
<td>95.54</td>
<td>87.50</td>
<td>92.06</td>
<td>92.57</td>
<td>90.00</td>
<td>95.24</td>
<td>95.17</td>
<td>95.00</td>
<td>95.24</td>
<td>95.91</td>
<td>90.00</td>
<td>94.71</td>
<td>94.80</td>
<td>87.50</td>
<td>86.77</td>
<td>88.85</td>
<td>92.50</td>
<td>94.71</td>
<td>95.17</td>
</tr>
<tr>
<td>Firescar 2</td>
<td>100.00</td>
<td>98.35</td>
<td>98.84</td>
<td>100.00</td>
<td>98.90</td>
<td>99.23</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Island interior</td>
<td>96.67</td>
<td>98.60</td>
<td>98.52</td>
<td>100.00</td>
<td>95.10</td>
<td>96.55</td>
<td>100.00</td>
<td>96.50</td>
<td>97.54</td>
<td>100.00</td>
<td>95.80</td>
<td>97.04</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>86.67</td>
<td>93.01</td>
<td>93.10</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Woodlands</td>
<td>82.98</td>
<td>85.00</td>
<td>86.94</td>
<td>97.87</td>
<td>96.36</td>
<td>97.13</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>95.45</td>
<td>96.82</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>98.64</td>
<td>99.04</td>
<td>100.00</td>
<td>97.73</td>
<td>98.41</td>
</tr>
<tr>
<td>Acacia Shrublands</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>97.30</td>
<td>88.51</td>
<td>91.53</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
</tr>
<tr>
<td>Acacia Grasslands</td>
<td>93.48</td>
<td>97.66</td>
<td>97.38</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>93.48</td>
<td>95.79</td>
<td>96.07</td>
<td>97.83</td>
<td>99.07</td>
<td>99.02</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>93.48</td>
<td>85.51</td>
<td>88.85</td>
<td>97.83</td>
<td>99.07</td>
<td>99.02</td>
</tr>
<tr>
<td>Short mopane</td>
<td>88.89</td>
<td>87.40</td>
<td>89.50</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>92.59</td>
<td>97.64</td>
<td>97.24</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>85.19</td>
<td>90.55</td>
<td>99.16</td>
<td>100.00</td>
<td>99.21</td>
<td>99.44</td>
</tr>
<tr>
<td>Mixed mopane</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>99.47</td>
<td>99.63</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>95.00</td>
<td>94.15</td>
<td>95.15</td>
<td>100.00</td>
<td>99.47</td>
<td>99.62</td>
</tr>
<tr>
<td>Exposed soils</td>
<td>92.86</td>
<td>89.55</td>
<td>91.58</td>
<td>92.86</td>
<td>97.01</td>
<td>96.84</td>
<td>100.00</td>
<td>100.00</td>
<td>100.00</td>
<td>92.85</td>
<td>92.54</td>
<td>93.68</td>
<td>92.86</td>
<td>95.52</td>
<td>95.79</td>
<td>92.86</td>
<td>97.01</td>
<td>96.84</td>
<td>71.43</td>
<td>70.15</td>
<td>74.74</td>
<td>92.86</td>
<td>97.01</td>
<td>96.84</td>
</tr>
<tr>
<td><bold>Kappa</bold></td>
<td>91.97</td>
<td>92.96</td>
<td>96.88</td>
<td>99.11</td>
<td>98.48</td>
<td>99.39</td>
<td>98.22</td>
<td>98.81</td>
<td>99.44</td>
<td>96.66</td>
<td>97.10</td>
<td>98.72</td>
<td>98.22</td>
<td>98.81</td>
<td>99.44</td>
<td>98.66</td>
<td>99.43</td>
<td>99.7</td>
<td>91.97</td>
<td>90.48</td>
<td>96.02</td>
<td>97.77</td>
<td>98.38</td>
<td>99.26</td>
</tr>
<tr>
<td><bold>OA</bold></td>
<td>92.59</td>
<td>93.51</td>
<td>99.95</td>
<td>99.18</td>
<td>98.60</td>
<td>99.99</td>
<td>98.35</td>
<td>98.90</td>
<td>99.99</td>
<td>96.91</td>
<td>97.32</td>
<td>99.98</td>
<td>98.35</td>
<td>98.90</td>
<td>99.99</td>
<td>98.77</td>
<td>99.47</td>
<td>99.99</td>
<td>92.59</td>
<td>91.23</td>
<td>99.94</td>
<td>97.94</td>
<td>98.51</td>
<td>99.98</td>
</tr>
<tr>
<td><bold>AA</bold></td>
<td>91.08</td>
<td>91.72</td>
<td>92.83</td>
<td>98.98</td>
<td>98.61</td>
<td>98.92</td>
<td>98.58</td>
<td>98.99</td>
<td>99.11</td>
<td>96.89</td>
<td>97.32</td>
<td>97.65</td>
<td>98.26</td>
<td>98.86</td>
<td>98.94</td>
<td>98.60</td>
<td>99.41</td>
<td>99.38</td>
<td>90.70</td>
<td>88.54</td>
<td>90.61</td>
<td>97.91</td>
<td>98.59</td>
<td>98.70</td>
</tr>
<tr>
<td><bold>Time (s)</bold></td>
<td>1.67</td>
<td>0.71</td>
<td>91.08</td>
<td>0.50</td>
<td>0.73</td>
<td>173.25</td>
<td>0.20</td>
<td>0.33</td>
<td>169.01</td>
<td>0.81</td>
<td>0.52</td>
<td>153.69</td>
<td>0.78</td>
<td>1.34</td>
<td>269.21</td>
<td>0.61</td>
<td>1.34</td>
<td>231.29</td>
<td>0.71</td>
<td>0.70</td>
<td>185.74</td>
<td>1.01</td>
<td>1.35</td>
<td>276.85</td>
</tr>
</tbody>
</table>
</table-wrap>
<table-wrap id="table-10">
<label>Table 10</label>
<caption>
<title>Salinas Dataset: Per class comparative results of various SOTA models are showcased on disjoint validation and test sets. Additionally, results on the entire HSI dataset serving as the test set are also presented. The comparative methods include 3D CNN [<xref ref-type="bibr" rid="ref-20">20</xref>], Hybrid Inception Net (Hybrid IN) [<xref ref-type="bibr" rid="ref-47">47</xref>], 3D Inception Net (3D IN) [<xref ref-type="bibr" rid="ref-48">48</xref>], 2D Inception Net (2D IN) [<xref ref-type="bibr" rid="ref-49">49</xref>], 2D CNN [<xref ref-type="bibr" rid="ref-50">50</xref>], Hybrid CNN [<xref ref-type="bibr" rid="ref-51">51</xref>], Attention Graph CNN (Attention GCN) [<xref ref-type="bibr" rid="ref-22">22</xref>], and Spatial-Spectral Transformer [<xref ref-type="bibr" rid="ref-24">24</xref>]. The geographical maps for each model for disjoint validation, test, and complete test are presented in <xref ref-type="fig" rid="fig-12">Fig. 12</xref></title>
</caption>
<table frame="hsides">
<colgroup>
<col align="left"/> 
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Class</th>
<th align="center" colspan="3">2D CNN</th>
<th align="center" colspan="3">3D CNN</th>
<th align="center" colspan="3">Hybrid CNN</th>
<th align="center" colspan="3">2D IN</th>
<th align="center" colspan="3">3D IN</th>
<th align="center" colspan="3">Hybrid IN</th>
<th align="center" colspan="3">Attention GCN</th>
<th align="center" colspan="3">SSViT</th>
</tr>
<tr>
<th></th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
<th>Va</th>
<th>Te</th>
<th>HSI</th>
</tr>
</thead>
<tbody>
<tr>
<td>Weeds 1</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Weeds 2</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.64</td>
<td>100</td>
<td>99.95</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Fallow</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.32</td>
<td>99.64</td>
<td>99.65</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.66</td>
<td>99.78</td>
<td>99.80</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>98.65</td>
<td>99.71</td>
<td>99.60</td>
<td>99.32</td>
<td>99.71</td>
<td>99.70</td>
</tr>
<tr>
<td>Fallow rough plow</td>
<td>100</td>
<td>99.90</td>
<td>99.93</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.80</td>
<td>99.86</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.52</td>
<td>100</td>
<td>99.93</td>
<td>98.56</td>
<td>99.28</td>
<td>99.21</td>
<td>99.52</td>
<td>99.90</td>
<td>99.86</td>
</tr>
<tr>
<td>Fallow smooth</td>
<td>97.26</td>
<td>98.19</td>
<td>98.32</td>
<td>99.25</td>
<td>99.20</td>
<td>99.33</td>
<td>98.76</td>
<td>99.20</td>
<td>99.07</td>
<td>97.26</td>
<td>97.55</td>
<td>97.87</td>
<td>99.75</td>
<td>100</td>
<td>99.96</td>
<td>99.75</td>
<td>99.95</td>
<td>99.93</td>
<td>98.01</td>
<td>97.87</td>
<td>98.21</td>
<td>98.76</td>
<td>98.51</td>
<td>98.77</td>
</tr>
<tr>
<td>Stubble</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Celery</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.81</td>
<td>99.80</td>
<td>99.83</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Grapes untrained</td>
<td>99.53</td>
<td>99.21</td>
<td>99.38</td>
<td>99.59</td>
<td>99.54</td>
<td>99.62</td>
<td>99.88</td>
<td>99.68</td>
<td>99.76</td>
<td>98.11</td>
<td>98.18</td>
<td>98.45</td>
<td>99.88</td>
<td>99.85</td>
<td>99.88</td>
<td>99.82</td>
<td>99.67</td>
<td>99.74</td>
<td>98.17</td>
<td>98.16</td>
<td>98.35</td>
<td>97.99</td>
<td>97.68</td>
<td>98.09</td>
</tr>
<tr>
<td>Soil vinyard develop</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.98</td>
<td>99.98</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Corn weeds</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.96</td>
<td>99.97</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.91</td>
<td>99.94</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Lettuce 4wk</td>
<td>99.38</td>
<td>99.73</td>
<td>99.72</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.86</td>
<td>99.91</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>98.75</td>
<td>98.80</td>
<td>98.88</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Lettuce 5 wk</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Lettuce 6 wk</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.28</td>
<td>99.84</td>
<td>99.78</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Lettuce 7 wk</td>
<td>99.38</td>
<td>100</td>
<td>99.91</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td>Vinyard untrained</td>
<td>97.25</td>
<td>96.91</td>
<td>97.37</td>
<td>99.82</td>
<td>99.84</td>
<td>99.86</td>
<td>99.91</td>
<td>99.86</td>
<td>99.89</td>
<td>95.41</td>
<td>95.48</td>
<td>96.15</td>
<td>99.72</td>
<td>99.88</td>
<td>99.88</td>
<td>100</td>
<td>99.72</td>
<td>99.81</td>
<td>95.60</td>
<td>94.99</td>
<td>95.46</td>
<td>97.80</td>
<td>96.76</td>
<td>97.40</td>
</tr>
<tr>
<td>Vinyard trellis</td>
<td>99.63</td>
<td>99.53</td>
<td>99.61</td>
<td>100</td>
<td>99.84</td>
<td>99.89</td>
<td>99.26</td>
<td>99.76</td>
<td>99.72</td>
<td>99.26</td>
<td>99.68</td>
<td>99.67</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>100</td>
<td>99.89</td>
<td>99.13</td>
<td>99.22</td>
<td>100</td>
<td>100</td>
<td>100</td>
</tr>
<tr>
<td><bold>Kappa</bold></td>
<td>99.29</td>
<td>99.23</td>
<td>99.59</td>
<td>99.78</td>
<td>99.81</td>
<td>99.89</td>
<td>99.86</td>
<td>99.85</td>
<td>99.92</td>
<td>98.67</td>
<td>98.72</td>
<td>99.31</td>
<td>99.92</td>
<td>99.94</td>
<td>99.97</td>
<td>99.93</td>
<td>99.88</td>
<td>99.94</td>
<td>98.63</td>
<td>98.60</td>
<td>99.20</td>
<td>99.09</td>
<td>98.88</td>
<td>99.42</td>
</tr>
<tr>
<td><bold>OA</bold></td>
<td>99.36</td>
<td>99.31</td>
<td>99.71</td>
<td>99.80</td>
<td>99.83</td>
<td>99.93</td>
<td>99.88</td>
<td>99.87</td>
<td>99.94</td>
<td>98.81</td>
<td>98.85</td>
<td>99.52</td>
<td>99.93</td>
<td>99.95</td>
<td>99.98</td>
<td>99.94</td>
<td>99.89</td>
<td>99.96</td>
<td>98.77</td>
<td>98.75</td>
<td>99.45</td>
<td>99.19</td>
<td>98.99</td>
<td>99.60</td>
</tr>
<tr>
<td><bold>AA</bold></td>
<td>99.53</td>
<td>99.59</td>
<td>99.64</td>
<td>99.85</td>
<td>99.88</td>
<td>99.89</td>
<td>99.86</td>
<td>99.91</td>
<td>99.90</td>
<td>99.35</td>
<td>99.38</td>
<td>99.47</td>
<td>99.96</td>
<td>99.98</td>
<td>99.98</td>
<td>99.94</td>
<td>99.96</td>
<td>99.96</td>
<td>99.12</td>
<td>99.23</td>
<td>99.29</td>
<td>99.59</td>
<td>99.53</td>
<td>99.61</td>
</tr>
<tr>
<td><bold>Time (s)</bold></td>
<td>1.23</td>
<td>5.56</td>
<td>44.41</td>
<td>1.45</td>
<td>5.56</td>
<td>47.61</td>
<td>0.82</td>
<td>5.56</td>
<td>47.67</td>
<td>2.84</td>
<td>5.67</td>
<td>43.78</td>
<td>5.33</td>
<td>14.52</td>
<td>72.51</td>
<td>2.85</td>
<td>9.15</td>
<td>61.40</td>
<td>3.07</td>
<td>5.93</td>
<td>49.99</td>
<td>2.87</td>
<td>20.92</td>
<td>78.67</td>
</tr>
</tbody>
</table>
</table-wrap>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Indian Pines Dataset: Land cover maps for disjoint validation, test, and the entire HSI used as a test set are provided. Comprehensive class-wise results can be found in <xref ref-type="table" rid="table-6">Table 6</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-8.tif"/>
</fig><fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Pavia University Dataset: Land cover maps for disjoint validation, test, and the entire HSI used as a test set are provided. Comprehensive class-wise results can be found in <xref ref-type="table" rid="table-7">Table 7</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-9a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-9b.tif"/>
</fig><fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>University Houston Dataset: Land cover maps for disjoint validation, test, and the entire HSI used as a test set are provided. Comprehensive class-wise results can be found in <xref ref-type="table" rid="table-8">Table 8</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-10a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-10b.tif"/>
</fig><fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Botswana Dataset: Land cover maps for disjoint validation, test, and the entire HSI used as a test set are provided. Comprehensive class-wise results can be found in <xref ref-type="table" rid="table-9">Table 9</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-11a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-11b.tif"/>
</fig><fig id="fig-12">
<label>Figure 12</label>
<caption>
<title>Salinas Dataset: Land cover maps for disjoint validation, test, and the entire HSI used as a test set are provided. Comprehensive class-wise results can be found in <xref ref-type="table" rid="table-10">Table 10</xref></title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-12a.tif"/>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_56318-fig-12b.tif"/>
</fig>
<p>The comparative methods frequently misclassify samples with similar spatial structures, exemplified by the misclassification of Meadows and Bare Soil classes in the Pavia University dataset, as illustrated in <xref ref-type="fig" rid="fig-9">Fig. 9</xref>. Furthermore, the overall accuracy (OA) for the Grapes Untrained class is lower compared to other classes due to the aforementioned reasons as shown in <xref ref-type="table" rid="table-7">Table 7</xref>. In summary, higher accuracy can be attained by employing a higher number of labeled samples (complete HSI dataset as the test set), as depicted in <xref ref-type="fig" rid="fig-8">Figs. 8</xref>&#x2013;<xref ref-type="fig" rid="fig-12">12</xref> and <xref ref-type="table" rid="table-6">Tables 6</xref>&#x2013;<xref ref-type="table" rid="table-10">10</xref>, nevertheless, the elevated accuracy is accompanied by the drawbacks of bias, redundancy, and diminished generalization performance. <xref ref-type="table" rid="table-6">Tables 6</xref>&#x2013;<xref ref-type="table" rid="table-10">10</xref> also illustrate the computational time required to process and evaluate the HSI datasets used in this study. As depicted in the Tables, the time exhibits a gradual increase with the growing number of samples, i.e., Disjoint validation, disjoint test, and complete HSI dataset as a test set.</p>

</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Statistical Tests</title>
<p>Average, overall, and Kappa accuracy may not always be appropriate measures, especially when there are significant differences in the number of samples in each class within a dataset. To clarify this point, consider the following scenario. Suppose we have a dataset with 90 healthy (positive) individuals and 10 not-healthy (negative) individuals. If a conventional model correctly predicts 90% of individuals as healthy, it might still predict the not-healthy individuals as healthy. What would be the best accuracy in this scenario?</p>
<p>In this setting, the model identifies 10 individuals as &#x201C;False Negative&#x201D;, 0 as &#x201C;True Positive&#x201D;, 0 as &#x201C;False Positive&#x201D;, and 90 as &#x201C;True Negative&#x201D;. Thus, the average accuracy would be 90%, i.e., <inline-formula id="ieqn-115"><mml:math id="mml-ieqn-115"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mn>90</mml:mn><mml:mo>+</mml:mo><mml:mn>0</mml:mn></mml:mrow><mml:mn>100</mml:mn></mml:mfrac></mml:mstyle><mml:mo>=</mml:mo><mml:mn>0.9</mml:mn></mml:math></inline-formula>. However, the model is highly biased since it predicts all the not-healthy individuals as healthy. In such scenarios, overall and average accuracies can be misleading or misinterpreted, indicating that these measures alone are not sufficient for evaluating a machine learning model. Therefore, it is important to consider additional statistical measures to validate the model beyond simple accuracy metrics.</p>
<p>Several statistical tests can be used to validate the results. For this work, we consider Recall (True Positive Rate or Sensitivity), Precision (Positive Predictive Value, PPV), and F1 score (a harmonic mean of precision and recall). In an ideal scenario, PPV should be 1, which occurs when the numerator and denominator are equal, i.e., when True Positive (TP) equals TP &#x002B; False Positive (FP), making FP equal to 0. As FP increases, PPV decreases, leading to an inappropriate model. A similar trend can be observed for Recall, where False Negative (FN) replaces FP. Recall and PPV can be computed as follows:
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>N</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula>
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi></mml:mrow><mml:mrow><mml:mi>T</mml:mi><mml:mi>P</mml:mi><mml:mo>+</mml:mo><mml:mi>F</mml:mi><mml:mi>P</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>For a classification model to be effective, both Precision (Positive Predictive Value, PPV) and Recall need to be high, which means that both False Positives (FP) and False Negatives (FN) must be low. In addition to Recall and PPV, the F1 score should also be computed, as it combines both Recall and PPV to provide a single metric that offers statistical significance and deeper insight into the classifier&#x2019;s generalization performance. The F1 score can be calculated as follows:
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:mi>F</mml:mi><mml:mn>1</mml:mn><mml:mtext>&#x00A0;</mml:mtext><mml:mi>s</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mo>=</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mfrac><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>r</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>i</mml:mi><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mo>+</mml:mo><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>a</mml:mi><mml:mi>l</mml:mi><mml:mi>l</mml:mi></mml:mrow></mml:mfrac></mml:math></disp-formula></p>
<p>A model is considered effective if it achieves high values for PPV, Recall, and the F1 score. These metrics provide a more comprehensive evaluation of the model&#x2019;s performance compared to using accuracy alone. The detailed statistical results are presented in <xref ref-type="table" rid="table-11">Table 11</xref>.</p>
<table-wrap id="table-11">
<label>Table 11</label>
<caption>
<title>Macro Average of Statistical results for various SOTA models on several benchmark Hyperspectral Datasets over disjoint test set. P &#x003D; Precision; R &#x003D; Recall; F1 &#x003D; F1-score</title>
</caption>
<table frame="hsides">
<colgroup>
<col/> 
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Data</th>
<th align="center" colspan="3">2D CNN</th>
<th align="center" colspan="3">3D CNN</th>
<th align="center" colspan="3">Hybrid CNN</th>
<th align="center" colspan="3">2D IN</th>
<th align="center" colspan="3">3D IN</th>
<th align="center" colspan="3">Hybrid IN</th>
<th align="center" colspan="3">Attention GCN</th>
<th align="center" colspan="3">SSViT</th>
</tr>
<tr>
<th></th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
<th>P</th>
<th>R</th>
<th>F1</th>
</tr>
</thead>
<tbody>
<tr>
<td><bold>IP</bold></td>
<td>0.84%</td>
<td>0.84%</td>
<td>0.84%</td>
<td>0.97%</td>
<td>0.94%</td>
<td>0.95%</td>
<td>0.98%</td>
<td>0.97%</td>
<td>0.98%</td>
<td>0.92%</td>
<td>0.90%</td>
<td>0.90%</td>
<td>0.98%</td>
<td>0.92%</td>
<td>0.93%</td>
<td>0.97%</td>
<td>0.88%</td>
<td>0.89%</td>
<td>0.77%</td>
<td>0.70%</td>
<td>0.72%</td>
<td>0.90%</td>
<td>0.88%</td>
<td>0.88%</td>
</tr>
<tr>
<td><bold>PU</bold></td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>0.98%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>0.98%</td>
<td>0.97%</td>
<td>0.97%</td>
<td>0.98%</td>
<td>0.97%</td>
<td>0.97%</td>
</tr>
<tr>
<td><bold>SA</bold></td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>1.00%</td>
<td>1.00%</td>
</tr>
<tr>
<td><bold>UH</bold></td>
<td>0.95%</td>
<td>0.93%</td>
<td>0.94%</td>
<td>0.98%</td>
<td>0.97%</td>
<td>0.97%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.97%</td>
<td>0.95%</td>
<td>0.96%</td>
<td>0.98%</td>
<td>0.98%</td>
<td>0.98%</td>
<td>0.98%</td>
<td>0.98%</td>
<td>0.98%</td>
<td>0.91%</td>
<td>0.88%</td>
<td>0.89%</td>
<td>0.97%</td>
<td>0.95%</td>
<td>0.96%</td>
</tr>
<tr>
<td><bold>BS</bold></td>
<td>0.95%</td>
<td>0.92%</td>
<td>0.92%</td>
<td>0.98%</td>
<td>0.99%</td>
<td>0.98%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.97%</td>
<td>0.97%</td>
<td>0.97%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>1.00%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.92%</td>
<td>0.89%</td>
<td>0.90%</td>
<td>0.99%</td>
<td>0.99%</td>
<td>0.99%</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>This paper introduced a novel technique for generating disjoint train, validation, and test splits in Hyperspectral Image Classification (HSIC). By efficiently partitioning the ground truth data, the proposed technique ensured unbiased performance evaluations and facilitated reliable comparisons between classification models. It proved to be a valuable tool for creating disjoint splits, guaranteeing that the subsets were representative of the entire dataset and that the classification results were not skewed by data leakage. While the technique demonstrated significant advantages, limitations were acknowledged, and opportunities for further improvement were identified. Future research could investigate alternative data-splitting strategies that incorporate additional factors, such as class imbalance or spatial coherence, to further enhance the representativeness and generalizability of the subsets. Addressing these aspects could lead to the development of even more robust and effective data-splitting techniques for HSIC.</p>
</sec>
</body>
<back>
<ack><p>Not applicable.</p>
</ack>
<sec><title>Funding Statement</title>
<p>The authors extend their appreciation to the Researchers Supporting Project number (RSPD2024R848), King Saud University, Riyadh, Saudi Arabia.</p>
</sec>
<sec><title>Author Contributions</title>
<p>Study conception and desing: Muhammad Ahmad, Manuel Mazzara, Salvatore Distefano; Analysis and interpretation of results: Muhammad Ahmad, Hamad Ahmed Altuwaijri, Adil Mehmood Khan; manuscript preparation: Muhammad Ahmad, Manuel Mazzara, Salvatore Distefano, Adil Mehmood Khan, Hamad Ahmed Altuwaijri. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The data used in this study is publicly available and can be accessed from the corresponding data repository <ext-link ext-link-type="uri" xlink:href="https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes">https://www.ehu.eus/ccwintco/index.php/Hyperspectral_Remote_Sensing_Scenes</ext-link> (accessed on 2 January 2024). The source code for the experiments conducted in this paper can be accessed at <ext-link ext-link-type="uri" xlink:href="https://github.com/mahmad00/Disjoint-Sampling-for-Hyperspectral-Image-Classification">https://github.com/mahmad00/Disjoint-Sampling-for-Hyperspectral-Image-Classification</ext-link> (accessed on 2 May 2024).</p>
</sec>
<sec><title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Hyperspectral image classification-traditional to deep models: A survey for future prospects</article-title>,&#x201D; <source>IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.</source>, vol. <volume>15</volume>, pp. <fpage>968</fpage>&#x2013;<lpage>999</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/JSTARS.2021.3133021</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Aksoy</surname></string-name>, <string-name><given-names>E.</given-names> <surname>Sertel</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Roscher</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Tanik</surname></string-name>, and <string-name><given-names>N.</given-names> <surname>Hamzehpour</surname></string-name></person-group>, &#x201C;<article-title>Assessment of soil salinity using explainable machine learning methods and landsat 8 images</article-title>,&#x201D; <source>Int. J. Appl. Earth Obs. Geoinf.</source>, vol. <volume>130</volume>, <year>2024</year>, Art. no. <comment>103879</comment>. doi: <pub-id pub-id-type="doi">10.1016/j.jag.2024.103879</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>V.</given-names> <surname>Lodhi</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Chakravarty</surname></string-name>, and <string-name><given-names>P.</given-names> <surname>Mitra</surname></string-name></person-group>, &#x201C;<article-title>Hyperspectral imaging for earth observation: Platforms and instruments</article-title>,&#x201D; <source>J. Indian Inst. Sci.</source>, vol. <volume>98</volume>, no. <issue>4</issue>, pp. <fpage>429</fpage>&#x2013;<lpage>443</lpage>, <year>2018</year>. doi: <pub-id pub-id-type="doi">10.1007/s41745-018-0070-8</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yao</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chanussote</surname></string-name></person-group>, &#x201C;<article-title>HD-Net: High-resolution decoupled network for building footprint extraction via deeply supervised body and boundary decomposition</article-title>,&#x201D; <source>ISPRS J. Photogramm. Remote Sens.</source>, vol. <volume>209</volume>, no. <issue>1</issue>, pp. <fpage>51</fpage>&#x2013;<lpage>65</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1016/j.isprsjprs.2024.01.022</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Jiang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>van der Werff</surname></string-name>, <string-name><given-names>F.</given-names> <surname>van Ruitenbeek</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Dijkstra</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Lievens</surname></string-name> and <string-name><given-names>M.</given-names> <surname>van der Meijde</surname></string-name></person-group>, &#x201C;<article-title>The influence of changing moisture content on laboratory acquired spectral feature parameters and mineral classification</article-title>,&#x201D; <source>Int. J. Appl. Earth Obs. Geoinf.</source>, vol. <volume>130</volume>, <year>2024</year>, Art. no. <comment>103884</comment>. doi: <pub-id pub-id-type="doi">10.1016/j.jag.2024.103884</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>P. D.</given-names> <surname>Dao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>He</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Shang</surname></string-name></person-group>, &#x201C;<article-title>Recent advances of hyperspectral imaging technology and applications in agriculture</article-title>,&#x201D; <source>Remote Sens.</source>, vol. <volume>12</volume>, no. <issue>16</issue>, <year>2020</year>, Art. no. <comment>2659</comment>. doi: <pub-id pub-id-type="doi">10.3390/rs12162659</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Ad&#x00E3;o</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Hyperspectral imaging: A review on UAV-based sensors, data processing and applications for agriculture and forestry</article-title>,&#x201D; <source>Remote Sens.</source>, vol. <volume>9</volume>, no. <issue>11</issue>, <year>2017</year>, Art. no. <comment>1110</comment>. doi: <pub-id pub-id-type="doi">10.3390/rs9111110</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Chen</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>CDasXORNet: Change detection of buildings from bi-temporal remote sensing images as an XOR problem</article-title>,&#x201D; <source>Int. J. Appl. Earth Obs. Geoinf.</source>, vol. <volume>130</volume>, no. <issue>1</issue>, <year>2024</year>, Art. no. <comment>103836</comment>. doi: <pub-id pub-id-type="doi">10.1016/j.jag.2024.103836</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yao</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>LRR-Net: An interpretable deep unfolding network for hyperspectral anomaly detection</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>61</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2023.3279834</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Bedini</surname></string-name></person-group>, &#x201C;<article-title>The use of hyperspectral remote sensing for mineral exploration: A review</article-title>,&#x201D; <source>J. Hyperspectr. Remote Sens.</source>, vol. <volume>7</volume>, no. <issue>4</issue>, pp. <fpage>189</fpage>&#x2013;<lpage>211</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.29150/jhrs.v7.4.p189-211</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Weber</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Hyperspectral imagery for environmental urban planning</article-title>,&#x201D; in <conf-name>IGARSS 2018&#x2014;2018 IEEE Int. Geosci. Remote Sens. Symp.</conf-name>, <publisher-name>IEEE</publisher-name>, <year>2018</year>, pp. <fpage>1628</fpage>&#x2013;<lpage>1631</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M. B.</given-names> <surname>Stuart</surname></string-name>, <string-name><given-names>A. J.</given-names> <surname>McGonigle</surname></string-name>, and <string-name><given-names>J. R.</given-names> <surname>Willmott</surname></string-name></person-group>, &#x201C;<article-title>Hyperspectral imaging in environmental monitoring: A review of recent developments and technological advances in compact field deployable systems</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>19</volume>, no. <issue>14</issue>, <year>2019</year>, Art. no. <comment>3071</comment>. doi: <pub-id pub-id-type="doi">10.3390/s19143071</pub-id>; <pub-id pub-id-type="pmid">31336796</pub-id></mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>C. B.</given-names> <surname>Pande</surname></string-name> and <string-name><given-names>K. N.</given-names> <surname>Moharir</surname></string-name></person-group>, &#x201C;<chapter-title>Application of hyperspectral remote sensing role in precision farming and sustainable agriculture under climate change: A review</chapter-title>,&#x201D; in <source>Climate Change Impacts on Natural Resources, Ecosystems and Agricultural Systems</source>, <publisher-name>Springer International Publishing</publisher-name>, <year>2023</year>, pp. <fpage>503</fpage>&#x2013;<lpage>520</lpage>. doi: <pub-id pub-id-type="doi">10.1007/978-3-031-19059-9_21</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Han</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Huang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Li</surname></string-name></person-group>, &#x201C;<article-title>Trustworthy remote sensing interpretation: Concepts, technologies, and applications</article-title>,&#x201D; <source>ISPRS J. Photogramm. Remote Sens.</source>, vol. <volume>209</volume>, no. <issue>3</issue>, pp. <fpage>150</fpage>&#x2013;<lpage>172</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1016/j.isprsjprs.2024.02.003</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>S. S.</given-names> <surname>Deshpande</surname></string-name> and <string-name><given-names>A. B.</given-names> <surname>Inamdar</surname></string-name></person-group>, <source>Hyperspectral Remote Sensing in Urban Environments</source>. <publisher-loc>UK</publisher-loc>: <publisher-name>Routledge Taylor &#x0026; Francis Group</publisher-name>, <year>2023</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Sajadi</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Automated pixel purification for delineating pervious and impervious surfaces in a city using advanced hyperspectral imagery techniques</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>12</volume>, no. <issue>1</issue>, pp. <fpage>82560</fpage>&#x2013;<lpage>82583</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2024.3408805</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Chauhan</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<chapter-title>Chapter 10-earth observation applications for urban mapping and monitoring: Research prospects, opportunities and challenges</chapter-title>,&#x201D; in <source>Earth Observation in Urban Monitoring, Earth Observation</source>, <person-group person-group-type="editor"><string-name><given-names>A.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>P. K.</given-names> <surname>Srivastava</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Saikia</surname></string-name>, <string-name><given-names>R. K.</given-names> <surname>Mall</surname></string-name></person-group>, Eds., <publisher-name>Elsevier</publisher-name>, <year>2024</year>, pp. <fpage>197</fpage>&#x2013;<lpage>229</lpage>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Hong</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>SpectralGPT: Spectral remote sensing foundation model</article-title>,&#x201D; <source>IEEE Trans. Pattern Anal. Mach. Intell.</source>, vol. <volume>46</volume>, no. <issue>8</issue>, pp. <fpage>5227</fpage>&#x2013;<lpage>5244</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/TPAMI.2024.3362475</pub-id>; <pub-id pub-id-type="pmid">38568772</pub-id></mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>I.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Farooque</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Hadi</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>Xiao</surname></string-name></person-group>, &#x201C;<article-title>MSTSENet: Multiscale spectral-spatial transformer with squeeze and excitation network for hyperspectral image classification</article-title>,&#x201D; <source>Eng. Appl. Artif. Intell.</source>, vol. <volume>134</volume>, no. <issue>3</issue>, <year>2024</year>, Art. no. <comment>108669</comment>. doi: <pub-id pub-id-type="doi">10.1016/j.engappai.2024.108669</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>A. M.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Mazzara</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Distefano</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Ali</surname></string-name> and <string-name><given-names>M. S.</given-names> <surname>Sarfraz</surname></string-name></person-group>, &#x201C;<article-title>A fast and compact 3-D CNN for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>19</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2020.3043710</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Gao</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Plaza</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>Graph convolutional networks for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>59</volume>, no. <issue>7</issue>, pp. <fpage>5966</fpage>&#x2013;<lpage>5978</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2020.3015157</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Jamali</surname></string-name>, <string-name><given-names>S. K.</given-names> <surname>Roy</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>P. M.</given-names> <surname>Atkinson</surname></string-name>, and <string-name><given-names>P.</given-names> <surname>Ghamisi</surname></string-name></person-group>, &#x201C;<article-title>Attention graph convolutional network for disjoint hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2024.3356422</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>Extended vision transformer (ExViT) for land use and land cover classification: A multimodal deep learning framework</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>61</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2023.3284671</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>U.</given-names> <surname>Ghous</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Usama</surname></string-name>, and <string-name><given-names>M.</given-names> <surname>Mazzara</surname></string-name></person-group>, &#x201C;<article-title>Waveformer: Spectral-spatial wavelet transformer for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>21</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2024.3441938</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Mazzara</surname></string-name></person-group>, &#x201C;<article-title>SCSNet: Sharpened cosine similarity-based neural network for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>21</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>4</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2024.3365611</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Lu</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Fang</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Fu</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Ding</surname></string-name>, and <string-name><given-names>X.</given-names> <surname>Kang</surname></string-name></person-group>, &#x201C;<article-title>Dual-stream class-adaptive network for semi-supervised hyperspectral image classification</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>62</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>11</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2024.3357455</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Usama</surname></string-name>, <string-name><given-names>A. M.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Distefano</surname></string-name>, <string-name><given-names>H. A.</given-names> <surname>Altuwaijri</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Mazzara</surname></string-name></person-group>, &#x201C;<article-title>Spatial spectral transformer with conditional position encoding for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>21</volume>, <year>2024</year>, Art. no. <comment>5508205</comment>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2024.3431188</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Tan</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<chapter-title>Data pruning via moving-one-sample-out</chapter-title>,&#x201D; in <source>Advances in Neural Information Processing Systems</source>, <person-group person-group-type="editor"><string-name><given-names>A.</given-names> <surname>Oh</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Naumann</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Globerson</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Saenko</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Hardt</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Levine</surname></string-name></person-group>, Eds., <publisher-name>Curran Associates</publisher-name>, Inc., <year>2023</year>, vol. <volume>36</volume>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>A disjoint samples-based 3D-CNN with active transfer learning for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>60</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2022.3209182</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Gei&#x00DF;</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Aravena Pelizar</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Schrade</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Brenning</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Taubenb&#x00F6;ck</surname></string-name></person-group>, &#x201C;<article-title>On the effect of spatially non-disjoint training and test samples on estimated model generalization capabilities in supervised classification with spatial features</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>14</volume>, no. <issue>11</issue>, pp. <fpage>2008</fpage>&#x2013;<lpage>2012</lpage>, <year>2017</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2017.2747222</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Spatial prior fuzziness pool-based interactive classification of hyperspectral images</article-title>,&#x201D; <source>Remote Sens.</source>, vol. <volume>11</volume>, no. <issue>9</issue>, <year>2019</year>, Art. no. <comment>1136</comment>. doi: <pub-id pub-id-type="doi">10.3390/rs11091136</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>R.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>TransHSI: A hybrid CNN-transformer method for disjoint sample-based hyperspectral image classification</article-title>,&#x201D; <source>Remote Sens.</source>, vol. <volume>15</volume>, no. <issue>22</issue>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.3390/rs15225331</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>B.</given-names> <surname>Zhang</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Yokoya</surname></string-name>, <string-name><given-names>J. A.</given-names> <surname>Benediktsson</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>Multimodal artificial intelligence foundation models: Unleashing the power of remote sensing big data in earth observation</article-title>,&#x201D; <source>Innov. Geosci.</source>, vol. <volume>2</volume>, no. <issue>1</issue>, <year>2024</year>, Art. no. <comment>100055</comment>. doi: <pub-id pub-id-type="doi">10.59717/j.xinn-geo.2024.100055</pub-id>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Liu</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>UCSL: Toward unsupervised common subspace learning for cross-modal image classification</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>61</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>12</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2023.3282951</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yao</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>SpectralMamba: Efficient mamba for hyperspectral image classification</article-title>,&#x201D; <year>2024</year>, <source>arXiv:2404.08489</source>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Hong</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Cross-city matters: A multimodal remote sensing benchmark dataset for cross-city semantic segmentation using high-resolution domain adaptation networks</article-title>,&#x201D; <source>Remote Sens. Environ.</source>, vol. <volume>299</volume>, <year>2023</year>, Art. no. <comment>113856</comment>. doi: <pub-id pub-id-type="doi">10.1016/j.rse.2023.113856</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Yao</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Semi-active convolutional neural networks for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>60</volume>, no. <issue>2</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>15</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2022.3230411</pub-id>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>U.</given-names> <surname>Ghous</surname></string-name>, <string-name><given-names>M. S.</given-names> <surname>Sarfraz</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Ahmad</surname></string-name>, <string-name><given-names>C.</given-names> <surname>Li</surname></string-name>, and <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name></person-group>, &#x201C;<article-title>(2&#x002B;1)D extreme xception net for hyperspectral image classification</article-title>,&#x201D; <source>IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.</source>, vol. <volume>17</volume>, pp. <fpage>5159</fpage>&#x2013;<lpage>5172</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/JSTARS.2024.3362936</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>S. K.</given-names> <surname>Adari</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Alla</surname></string-name></person-group>, &#x201C;<chapter-title>Introduction to machine learning</chapter-title>,&#x201D; in <source>Beginning Anomaly Detection Using Python-Based Deep Learning: Implement Anomaly Detection Applications with Keras and PyTorch</source>. <publisher-loc>Berkeley, CA</publisher-loc>: <publisher-name>Apress</publisher-name>, <year>2024</year>, pp. <fpage>105</fpage>&#x2013;<lpage>134</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. K.</given-names> <surname>Roy</surname></string-name>, <string-name><given-names>G.</given-names> <surname>Krishna</surname></string-name>, <string-name><given-names>S. R.</given-names> <surname>Dubey</surname></string-name>, and <string-name><given-names>B. B.</given-names> <surname>Chaudhuri</surname></string-name></person-group>, &#x201C;<article-title>HybridSN: Exploring 3-D-2-D CNN feature hierarchy for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>17</volume>, no. <issue>2</issue>, pp. <fpage>277</fpage>&#x2013;<lpage>281</lpage>, <year>2020</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2019.2918719</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Y.</given-names> <surname>Wang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Yu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Wen</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Dong</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Zang</surname></string-name></person-group>, &#x201C;<article-title>Learning a 3-D-CNN and convolution transformers for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>21</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2024.3365615</pub-id>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Song</surname></string-name>, <string-name><given-names>Z.</given-names> <surname>Feng</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name>, and <string-name><given-names>L.</given-names> <surname>Jiao</surname></string-name></person-group>, &#x201C;<article-title>Interactive spectral-spatial transformer for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Trans. Circuits Syst. Video Technol.</source>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/TCSVT.2024.3386578</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Yang</surname></string-name> <etal>et al.</etal></person-group>, &#x201C;<article-title>Synergistic 2D/3D convolutional neural network for hyperspectral image classification</article-title>,&#x201D; <source>Remote Sens.</source>, vol. <volume>12</volume>, no. <issue>12</issue>, <year>2020</year>, Art. no. <comment>2033</comment>. doi: <pub-id pub-id-type="doi">10.3390/rs12122033</pub-id>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>W.</given-names> <surname>Liao</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Wang</surname></string-name>, and <string-name><given-names>H.</given-names> <surname>Zhao</surname></string-name></person-group>, &#x201C;<article-title>Hyperspectral image classification using attention-only spatial-spectral network based on transformer</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>12</volume>, pp. <fpage>93677</fpage>&#x2013;<lpage>93688</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2024.3424674</pub-id>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Arshad</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>A light-weighted spectral-spatial transformer model for hyperspectral image classification</article-title>,&#x201D; <source>IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.</source>, vol. <volume>17</volume>, pp. <fpage>12008</fpage>&#x2013;<lpage>12019</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/JSTARS.2024.3419070</pub-id>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>T.</given-names> <surname>Arshad</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Ma</surname></string-name>, <string-name><given-names>W.</given-names> <surname>Chen</surname></string-name>, <string-name><given-names>H. F.</given-names> <surname>Zhu</surname></string-name> and <string-name><given-names>Y. N.</given-names> <surname>Wu</surname></string-name></person-group>, &#x201C;<article-title>Deep spectral spatial feature enhancement through transformer for hyperspectral image classification</article-title>,&#x201D; <source>IEEE Geosci. Remote Sens. Lett.</source>, vol. <volume>21</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>5</lpage>, <year>2024</year>. doi: <pub-id pub-id-type="doi">10.1109/LGRS.2024.3424986</pub-id>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>F&#x0131;rat</surname></string-name>, <string-name><given-names>M. E.</given-names> <surname>Asker</surname></string-name>, <string-name><given-names>M. I.</given-names> <surname>Bay&#x0131;nd&#x0131;r</surname></string-name>, and <string-name><given-names>D.</given-names> <surname>Hanbay</surname></string-name></person-group>, &#x201C;<article-title>Hybrid 3D/2D complete inception module and convolutional neural network for hyperspectral remote sensing image classification</article-title>,&#x201D; <source>Neural Process. Lett.</source>, vol. <volume>55</volume>, no. <issue>2</issue>, pp. <fpage>1087</fpage>&#x2013;<lpage>1130</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1007/s11063-022-10929-z</pub-id>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name></person-group>, &#x201C;<article-title>Improved three-dimensional inception networks for hyperspectral remote sensing image classification</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>11</volume>, pp. <fpage>32648</fpage>&#x2013;<lpage>32658</lpage>, <year>2023</year>. doi: <pub-id pub-id-type="doi">10.1109/ACCESS.2023.3262992</pub-id>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Xiong</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Yuan</surname></string-name>, and <string-name><given-names>Q.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>AI-NET: Attention inception neural networks for hyperspectral image classification</article-title>,&#x201D; in <conf-name>IGARSS 2018&#x2014;2018 IEEE Int. Geosci. Remote Sens. Symp.</conf-name>, <publisher-name>IEEE</publisher-name>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Wu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Hong</surname></string-name>, and <string-name><given-names>J.</given-names> <surname>Chanussot</surname></string-name></person-group>, &#x201C;<article-title>Convolutional neural networks for multimodal remote sensing data classification</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>60</volume>, pp. <fpage>1</fpage>&#x2013;<lpage>10</lpage>, <year>2022</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2022.3228927</pub-id>.</mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Ghaderizadeh</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Abbasi-Moghadam</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Sharifi</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Zhao</surname></string-name>, and <string-name><given-names>A.</given-names> <surname>Tariq</surname></string-name></person-group>, &#x201C;<article-title>Hyperspectral image classification using a hybrid 3D-2D convolutional neural networks</article-title>,&#x201D; <source>IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens.</source>, vol. <volume>14</volume>, pp. <fpage>7570</fpage>&#x2013;<lpage>7588</lpage>, <year>2021</year>. doi: <pub-id pub-id-type="doi">10.1109/JSTARS.2021.3099118</pub-id>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>X.</given-names> <surname>Yang</surname></string-name>, <string-name><given-names>Y.</given-names> <surname>Ye</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Li</surname></string-name>, <string-name><given-names>R. Y.</given-names> <surname>Lau</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zhang</surname></string-name> and <string-name><given-names>X.</given-names> <surname>Huang</surname></string-name></person-group>, &#x201C;<article-title>Hyperspectral image classification with deep learning models</article-title>,&#x201D; <source>IEEE Trans. Geosci. Remote Sens.</source>, vol. <volume>56</volume>, no. <issue>9</issue>, pp. <fpage>5408</fpage>&#x2013;<lpage>5423</lpage>, <year>2018</year>. doi: <pub-id pub-id-type="doi">10.1109/TGRS.2018.2815613</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>