<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMES</journal-id>
<journal-id journal-id-type="nlm-ta">CMES</journal-id>
<journal-id journal-id-type="publisher-id">CMES</journal-id>
<journal-title-group>
<journal-title>Computer Modeling in Engineering &#x0026; Sciences</journal-title>
</journal-title-group>
<issn pub-type="epub">1526-1506</issn>
<issn pub-type="ppub">1526-1492</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">68681</article-id>
<article-id pub-id-type="doi">10.32604/cmes.2025.068681</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Augmented Deep-Feature-Based Ear Recognition Using Increased Discriminatory Soft Biometrics</article-title>
<alt-title alt-title-type="left-running-head">Augmented Deep-Feature-Based Ear Recognition Using Increased Discriminatory Soft Biometrics</alt-title>
<alt-title alt-title-type="right-running-head">Augmented Deep-Feature-Based Ear Recognition Using Increased Discriminatory Soft Biometrics</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Jaha</surname><given-names>Emad Sami</given-names></name><email>ejaha@kau.edu.sa</email></contrib>
<aff id="aff-1"><institution>Department of Computer Science, Faculty of Computing and Information Technology, King Abdulaziz University</institution>, <addr-line>Jeddah, 21589</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Emad Sami Jaha. Email: <email>ejaha@kau.edu.sa</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>30</day><month>09</month><year>2025</year>
</pub-date>
<volume>144</volume>
<issue>3</issue>
<fpage>3645</fpage>
<lpage>3678</lpage>
<history>
<date date-type="received">
<day>04</day>
<month>6</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>11</day>
<month>8</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Author.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMES_68681.pdf"></self-uri>
<abstract>
<p>The human ear has been substantiated as a viable nonintrusive biometric modality for identification or verification. Among many feasible techniques for ear biometric recognition, convolutional neural network (CNN) models have recently offered high-performance and reliable systems. However, their performance can still be further improved using the capabilities of soft biometrics, a research question yet to be investigated. This research aims to augment the traditional CNN-based ear recognition performance by adding increased discriminatory ear soft biometric traits. It proposes a novel framework of augmented ear identification/verification using a group of discriminative categorical soft biometrics and deriving new, more perceptive, comparative soft biometrics for feature-level fusion with hard biometric deep features. It conducts several identification and verification experiments for performance evaluation, analysis, and comparison while varying ear image datasets, hard biometric deep-feature extractors, soft biometric augmentation methods, and classifiers used. The experimental work yields promising results, reaching up to 99.94% accuracy and up to 14% improvement using the AMI and AMIC datasets, along with their corresponding soft biometric label data. The results confirm the proposed augmented approaches&#x2019; superiority over their standard counterparts and emphasize the robustness of the new ear comparative soft biometrics over their categorical peers.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Ear recognition</kwd>
<kwd>soft biometrics</kwd>
<kwd>human identification</kwd>
<kwd>human verification</kwd>
<kwd>comparative labeling</kwd>
<kwd>ranking SVM</kwd>
<kwd>deep features</kwd>
<kwd>feature-level fusion</kwd>
<kwd>convolutional neural networks (CNNs)</kwd>
<kwd>deep learning</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>King Abdulaziz University</funding-source>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Biometric recognition technologies have been optimal automated solutions for human identification and verification/authentication for decades, especially in security-conscious societies around the globe. Numerous dependable systems and practical applications have been influentially deployed, ranging from personal-level, e.g., securing handheld devices, to universal-level, e.g., international border control. Such biometric recognition systems and applications are developed employing a variety of discriminative biometric modalities, which can mostly be physiological, such as a person&#x2019;s fingerprint, iris, face, and ear, or behavioral, such as a person&#x2019;s voice, gait, signature, and keystroke [<xref ref-type="bibr" rid="ref-1">1</xref>]. Notably, physiological and behavioral biometric modalities are also known as traditional hard biometrics [<xref ref-type="bibr" rid="ref-2">2</xref>]. Amongst those discriminant hard biometrics, the human ear has recently attracted increased research attention, highlighting its efficacy as a nonintrusive biometric modality for many beneficial applications in various scenarios [<xref ref-type="bibr" rid="ref-3">3</xref>].</p>
<p>Several computer vision algorithms have been devoted to ear biometric recognition tasks, utilizing effective machine and deep learning techniques [<xref ref-type="bibr" rid="ref-4">4</xref>]. Nevertheless, deep learning models based on convolutional neural networks (CNNs) have frequently tended to be more capable than other techniques for comprehensive image analysis, informative representations, and accurate visual recognition [<xref ref-type="bibr" rid="ref-5">5</xref>]. Other than hard biometrics, soft biometrics is a high-level semantic form of traits recently introduced as a new biometric modality. Soft biometric traits are more impervious to changes in viewpoint, pose, occlusion, illumination, and other variable environmental aspects [<xref ref-type="bibr" rid="ref-6">6</xref>]. For instance, age, gender, colors, verbally describable sizes, dimensions, ratios, and many other feasible soft biometrics can be less vulnerable to losing value in challenging scenarios, and they can be deployed where traditional biometrics cannot [<xref ref-type="bibr" rid="ref-7">7</xref>]. Furthermore, such soft biometrics can provide highly invariant feature observability, extraction, and generalizability [<xref ref-type="bibr" rid="ref-8">8</xref>].</p>
<p>Although soft biometrics are insufficiently unique and less distinctive when used alone, they are more efficacious and helpful as supplementary information when used in conjunction with other hard biometrics [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>]. Therefore, soft biometrics are likely to be nominated for fusion with hard biometrics to enhance their recognition performance. Here comes an unanswered research question and an open research gap to be filled in this study. It mainly aims to propose robust soft biometrics for the person&#x2019;s ear and investigate their capabilities in augmenting the performance of traditional hard biometric deep features extracted using efficient CNN models. Even though there is very little relevant research concerning ear soft biometrics [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>&#x2013;<xref ref-type="bibr" rid="ref-11">11</xref>], the current scope of soft biometrics and deep feature fusion to augment ear recognition has yet to be explored based on existing literature. The main contribution of this research can be outlined as follows:
<list list-type="bullet">
<list-item>
<p>A novel framework for augmenting CNN-based ear biometric recognition by feature-level fusion of hard biometric deep features with proposed increased discriminatory soft biometrics;</p></list-item>
<list-item>
<p>A new group of more perceptive comparative soft biometrics, derived automatically via pairwise comparative labeling, ranking ears by attributes, and mapping to refined relative measurements;</p></list-item>
<list-item>
<p>Several CNN deep-feature-based ear recognition approaches augmented by different combinations of proposed categorical and comparative soft biometric traits;</p></list-item>
<list-item>
<p>Extended performance evaluation, analysis, and comparison of traditional vs. augmented ear biometric identification and verification using various ear image datasets, hard biometric CNN deep-feature extractors, and employed classifiers.</p></list-item>
</list></p>
<p>In the remainder of this research paper, <xref ref-type="sec" rid="s2">Section 2</xref> provides a brief background and a review of related studies, <xref ref-type="sec" rid="s3">Section 3</xref> describes the detailed research methodology, <xref ref-type="sec" rid="s4">Section 4</xref> illustrates the conducted experimental work and performance result analyses and comparisons, and finally, <xref ref-type="sec" rid="s5">Section 5</xref> concludes the paper and indicates potential future venues.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Background and Related Studies</title>
<p>The human ear as a nonintrusive biometric modality is still an emerging research field with little interest compared to other well-researched and extensively analyzed biometric modalities, such as the human face. Yet, unlike the face, the ear is a rigid organ considered relatively less vulnerable to variations in illumination, poses, occlusions, aging, and visual or emotional states as in facial expressions [<xref ref-type="bibr" rid="ref-5">5</xref>]. Practical biometrics are expected to satisfy several essential requirements like universality, measurability, collectability, permanence, and uniqueness, which can be satisfiable by ear biometric traits [<xref ref-type="bibr" rid="ref-1">1</xref>]. Furthermore, the ear modality has additional advantages of being stable over time [<xref ref-type="bibr" rid="ref-12">12</xref>], easy to acquire with low/no user-sensor interaction [<xref ref-type="bibr" rid="ref-3">3</xref>], conductible in mere image-based unimodal/multimodal biometric systems [<xref ref-type="bibr" rid="ref-13">13</xref>], and immune to privacy issues and hygiene concerns [<xref ref-type="bibr" rid="ref-5">5</xref>]. Such advantages make ear biometrics acceptable to the public as a typical contactless means to identify/verify a person, involving minimal naked-eye-identifiable information compared to other biometrics like the face [<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>].</p>
<p>Traditional ear biometric recognition can be defined as the automatic process of human recognition using physical ear characteristics [<xref ref-type="bibr" rid="ref-14">14</xref>]. It has been reliably and effectively deployed in diverse applications, including forensics, surveillance, identification, verification, and securing personal devices [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>]. Unique ear biometric traits can be extracted by analyzing several primary ear morphological parts, such as the ear&#x2019;s lobe, scapha, helix, antihelix, tragus, antitragus, and their subcomponents, which were found to be powerful even for challenging tasks like identical twin discrimination or kinship verification [<xref ref-type="bibr" rid="ref-15">15</xref>]. <xref ref-type="fig" rid="fig-11">Fig. A1</xref> in the <xref ref-type="app" rid="app-1">Appendix A</xref> illustrates the general anatomical structure of the human ear. From a biometrics perspective, the overlay-colored with bolded annotations are the principal different ear parts that are significant not only for physical feature extraction as hard biometrics but also the most discriminative, describable, comparable, and thus more likely for inferring viable soft biometrics. <xref ref-type="table" rid="table-1">Table 1</xref> provides a glossary of key terminology in the current research context to help enrich readability and understanding.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Key terminology glossary</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Key term</th>
<th align="center">Definition</th>
</tr>
</thead>
<tbody>
<tr>
<td>Hard biometric (VGG19 or ResNet50)</td>
<td>A normalized standard vision-based deep feature as a biometric trait derived using either VGG-19 or ResNet-50</td>
</tr>
<tr>
<td>Soft biometric attribute</td>
<td>A high-level biometric characteristic semantically describable by categorical, relative, or comparative annotation</td>
</tr>
<tr>
<td>Categorical label</td>
<td>A conventional nominal description represents the category/class of a categorically nameable soft biometric attribute (e.g., [none, black, white, gray, blonde, brown, red, brunette])</td>
</tr>
<tr>
<td>Relative label</td>
<td>A conventional ordinal description reflects the degree of strength of a relatively measurable soft biometric attribute (e.g., [very small, small, medium, large, very large])</td>
</tr>
<tr>
<td>Comparative label</td>
<td>A conventional pairwise comparative description reflects the degree of comparison/difference of a relatively comparable/differentiable and rankable soft biometric attribute (e.g., [much smaller, smaller, similar, larger, much larger])</td>
</tr>
<tr>
<td>Categorical soft biometric (SoftCat)</td>
<td>A normalized numerical representation of a soft biometric trait derived from the corresponding descriptive categorical or relative label</td>
</tr>
<tr>
<td>Comparative soft biometric (SoftCmp)</td>
<td>A normalized refined relative measurement of a soft biometric trait inferred from the ranking enforced by the comparative labels per soft biometric attribute</td>
</tr>
<tr>
<td>Categorical and comparative soft biometric fusion (SoftCat&#x0026;Cmp)</td>
<td>A combination of categorical and comparative soft biometric traits for a single biometric recognition task like identification or verification</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s2_1">
<label>2.1</label>
<title>Traditional Ear Biometric Recognition</title>
<p>For human recognition using ear biometric modality, many computer vision and machine/deep learning algorithms have been devoted to developing robust systems with high accuracy [<xref ref-type="bibr" rid="ref-3">3</xref>]. Most earlier research studies commenced working on the ear recognition domain using standard machine learning methods and handcrafted features [<xref ref-type="bibr" rid="ref-16">16</xref>]. However, most recent ear biometrics-related approaches have dramatically shifted from classical feature extraction and handcrafted feature engineering to deep learning models [<xref ref-type="bibr" rid="ref-17">17</xref>]. Moreover, exploiting ear biometric recognition capabilities in multimodal biometric systems has attracted some research interest. For instance, a new multimodal biometric feature-level fusion was introduced for selecting optimized feature sets from ear, iris, palm, and fingerprint, with a kernelized multiclass support vector machine (MSVM) to improve the security and performance of human authentication [<xref ref-type="bibr" rid="ref-13">13</xref>].</p>
<p>Deep learning methods were employed in effective ear vision-based feature extraction and successful human identification/verification in unimodal/multimodal biometric systems [<xref ref-type="bibr" rid="ref-4">4</xref>]. A multimodal biometric identification was offered, combining transfer learning with sample expansion before feeding face and ear images to the VGG-16 network to enhance accuracy and address the problem of single-sample face and ear datasets [<xref ref-type="bibr" rid="ref-18">18</xref>]. On the other hand, a unimodal ear identification was explored on the AMI ear dataset, using a Pix2Pix generative adversarial network (GAN) to augment the ear data by generating corresponding left ear images for right ear images and <italic>vice versa</italic>, and improve the EarNet model performance [<xref ref-type="bibr" rid="ref-19">19</xref>]. In [<xref ref-type="bibr" rid="ref-20">20</xref>], a framework was developed based on deep convolutional generative adversarial networks (DCGAN) to enhance the ear recognition performance of AlexNet, VGG-16, and VGG-19 on benchmark AMI and AWE ear datasets. MDFNet was introduced as an unsupervised single-layer model for ear print recognition [<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
<p>Different CNN-based deep learning architectures have attained exceptional performance and boosted ear identification capabilities, utilizing reliable deep features to be used as discriminative hard biometric traits [<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-4">4</xref>]. Ensemble classifiers for score-level fusion were suggested for improving ear identification on AMI and IIT Delhi1 ear datasets, using a machine learning technique of discrete curvelet transform (DCT) and also deep learning CNNs of ResNet-50, AlexNet, and GoogleNet [<xref ref-type="bibr" rid="ref-22">22</xref>]. In [<xref ref-type="bibr" rid="ref-5">5</xref>], multiple VGG-based network topologies (VGG-11/13/16/19) were examined in ear identification using AMI, AMIC, and WPUT datasets. Ablation experiments with comparative analysis were conducted for scratch training, deep feature extraction, fine-tuning, and, eventually, multimodal ensembles, which outperformed the other three strategies by averaging posterior probabilities of the VGG-(13, 16, and 19) configuration. Meanwhile, in [<xref ref-type="bibr" rid="ref-5">5</xref>], the performance of training and using every single network alone, especially on more challenging datasets like AMIC, led to much lower performance than ensembles, which are computationally costly and time-consuming. In a later related study, further CNNs, including AlexNet, VGG-16/19, InceptionV3, ResNet-50/101, and ResNeXt-50/101, were separately experimented with and compared in unconstrained ear identification using the EarVN1.0 dataset [<xref ref-type="bibr" rid="ref-14">14</xref>]. Another research was conducted on AMI and EarVN1.0 while boosting the performance of VGG-16/19, ResNet-50, MobileNet, and EfficientNet-B7 for deep feature extraction by vision-based preprocessing like zooming, contour detection, and different data augmentations [<xref ref-type="bibr" rid="ref-12">12</xref>]. A feature-level fusion method was proposed based on channel features and dynamic convolution (CFDCNet) based on an adapted DenseNet-121 model, which outperforms the standard DenseNet-121 benchmark performance on AMI and AWE datasets [<xref ref-type="bibr" rid="ref-23">23</xref>]. Focusing on independently training the left and right ears for ear side-specific person identification can remarkably enhance ResNet-50 model accuracy and vary in performance between left and right as they need not be identical [<xref ref-type="bibr" rid="ref-24">24</xref>].</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Soft Biometrics for Augmented Biometric Recognition</title>
<p>Modern soft biometrics has recently emerged as a new alternative or supplementary means for human recognition [<xref ref-type="bibr" rid="ref-25">25</xref>]. Numerous research efforts have been booming in various domains and scenarios, leading to robust and practical systems for person identification, verification, and retrieval, especially using soft biometrics inferred from the face [<xref ref-type="bibr" rid="ref-26">26</xref>] and body, alongside other body-related supplemental characteristics, such as clothing [<xref ref-type="bibr" rid="ref-7">7</xref>]. A variety of possible soft biometrics attributes can be derived to semantically describe different personal identity aspects: global aspect, e.g., gender and age; facial aspect, e.g., eye size and nose length; body aspect, e.g., height and arm length; and clothing aspects, e.g., clothes category and sleeve length [<xref ref-type="bibr" rid="ref-25">25</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]. They can be annotated using any conventional, namable, and understandable group of semantic labels in different categorical, relative, and comparative labeling forms [<xref ref-type="bibr" rid="ref-8">8</xref>], as in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>

<p>Soft biometric traits have been considerably helpful in empowering functional biometrics systems for various purposes, such as surveillance and forensic applications [<xref ref-type="bibr" rid="ref-28">28</xref>]. In some challenging cases, manual or automatic soft biometric traits can be the only observable cues for identity [<xref ref-type="bibr" rid="ref-6">6</xref>]. In other scenarios, they can be viable where the traditional vision-based traits alone are impractical or degrade performance due to poor data quality or environmental conditions, e.g., distance, viewpoint, illumination, and occlusions [<xref ref-type="bibr" rid="ref-7">7</xref>].</p>
<p>Soft biometrics have been proven as powerful supplemental traits to augment the performance of many standard vision-based hard traits of different biometric modalities. Instead of relying solely on standard hard biometrics, they can be integrated by soft biometrics using different schemes, such as feature-level, score-level, and decision-level, for effective fusion of hard-soft biometric information and enhanced recognition [<xref ref-type="bibr" rid="ref-26">26</xref>]. The feature-level fusion represents the most interaction between facial hard and soft biometrics. It can augment deep CNNs in a challenging scenario where training is limited to front-face images, enabling zero-shot side-face identification and verification [<xref ref-type="bibr" rid="ref-8">8</xref>]. The hard-soft face biometrics fusion is feasible in their original forms and their cancellable biometric hard and soft bio-hashing formats, as their score-level fusion can attain enhanced prompt face image match and retrieval in large-scale datasets [<xref ref-type="bibr" rid="ref-29">29</xref>].</p>
<p>Only a few relevant research explorations have been conducted on soft biometrics in fusion with vision-based hard biometrics using machine learning for human ear biometric identification or verification [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-9">9</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>]. In [<xref ref-type="bibr" rid="ref-1">1</xref>], a set of twenty categorical and thirteen relative ear soft biometric traits were proposed, which led to augmenting identification and verification on AMI-cropped (AMIC) ear images. Different feature-level fusion approaches were applied to combine them with hard biometric features derived using local binary pattern (LBP) and principal component analysis (PCA). For newborn baby identification, four soft biometrics, comprising two categorical traits, gender and blood group, in addition to two relative traits, weight and height, were combined in the score level with different vision-based features to enhance their performance, including PCA, HAAR, fisher linear discriminant analysis (FLDA), independent component analysis (ICA), and geometrical feature extraction (GF) [<xref ref-type="bibr" rid="ref-9">9</xref>].</p>
<p>Another approach was proposed to explore the potency of skin color, hair color, and mole location as soft biometric traits to improve local Gabor binary pattern (LGBP) identification performance in score-level fusion [<xref ref-type="bibr" rid="ref-11">11</xref>]. The study of [<xref ref-type="bibr" rid="ref-10">10</xref>] was dedicated to statistically investigating potential morphological features of the external ear, which can be candidate soft biometric traits. To the best of our knowledge, the comparative form of ear soft biometrics has never been investigated or analyzed as per the existing literature, suggesting the current research study to pioneer this research gap-filling in the ear biometric domain. This research is motivated by several significant limitations in existing related work. Most studies have focused on face and body modalities for soft biometrics analysis and utilization, either in isolation or in fusion with hard biometrics. In contrast, the ear modality has rarely been considered for soft biometric applications. Additionally, those few studies that do incorporate ear soft biometrics tend to use only categorical and relative forms of labeling, which are often less discriminative than comparative labeling. Another limitation is the lack of automated systems for categorical or comparative soft biometric annotations. There is also an excessive reliance on traditional crowdsourcing methods using human annotators, which, while still a standard predominant practice for acquiring manual labels in face and body soft biometrics, is not ideal.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Research Methodology</title>
<p>This research proposed a novel methodology framework to achieve augmented ear biometric recognition by fusing increased discriminatory soft biometric traits with reliable hard biometric deep features. It investigated the capabilities of effective categorical and further perceptive comparative soft biometrics in enhancing human identification and verification performance. <xref ref-type="fig" rid="fig-1">Fig. 1</xref> shows a brief overview of the proposed research methodology framework and clarifies the process workflow.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Framework overview of the proposed research methodology</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-1.tif"/>
</fig>
<p>The framework starts with preprocessing an input ear image (<italic>Ear</italic>) from the AMI/AMIC dataset and preparing categorical/relative label input from the soft biometric label dataset. SoftCat is extracted as the first normalized categorical soft biometric template, comprising categorical and relative traits in the categorical soft biometric extraction phase. A relative-based only feature vector is additionally extracted to be mapped to a corresponding comparative feature vector. Thus, in the comparative soft biometric extraction phase, random pairwise (<italic>Ear &#x03B1;</italic> and <italic>Ear &#x03B2;</italic>) comparisons are generated between AMI/AMIC training samples only, where their relative labels are fetched and compared per attribute to infer comparative labels. Comparative labels are used as constraints with RankSVM to rank all ear images by each attribute. After mapping, SoftCmp is extracted as the second normalized comparative soft biometric template. The third soft biometric template (SoftCat&#x0026;Cmp) is derived as the combination of both SoftCat and SoftCmp. In the hard biometric extraction phase, each input ear image undergoes CNN deep feature extraction using VGG-19 and ResNet-50. Extracted deep-feature vectors are normalized to compose two hard biometric templates (VGG19 and ResNet50), where each is fused with the three soft biometric templates. The final module evaluates and compares the identification and verification performance of two standard hard biometric approaches, as unaugmented baselines, and six soft biometric-based augmented approaches when using SVM and Softmax classifiers. The framework modules will be elaborated in the following sections.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Ear Image and Label Datasets</title>
<p>This research used two ear image datasets and a soft biometric label dataset: AMI, its cropped image version AMIC, and their corresponding soft biometric labels. These AMI-based datasets were adopted due to their suitability to the objectives and compliance with the requirements of this research context and methodology, besides the availability of pre-acquired categorical soft labels. They allowed systematic evaluation and comparison of proposed augmentation approaches in different biometric recognition scenarios.</p>
<sec id="s3_1_1">
<label>3.1.1</label>
<title>AMI Ear Database</title>
<p>The AMI ear database is a standard image dataset for ear biometric recognition experimentation [<xref ref-type="bibr" rid="ref-30">30</xref>]. It comprises 700 ear image samples, in JPEG format with 492 &#x00D7; 702 pix, belonging to a hundred male/female subjects of ages ranging from nineteen to 65. Each subject has seven image samples, representing six different viewpoints: &#x201C;front&#x201D;, &#x201C;left&#x201D;, &#x201C;right&#x201D;, &#x201C;up&#x201D;, &#x201C;down&#x201D;, and &#x201C;zoom&#x201D;, of the right ear, in addition to a seventh image showing the left ear annotated as &#x201C;back&#x201D;. In this research, the seventh was excluded, and only six images representing the same ear per subject were included in all experimental work since left and right ears are naturally not necessarily identical, as substantiated and recommended for accurate performance by [<xref ref-type="bibr" rid="ref-24">24</xref>]. <xref ref-type="fig" rid="fig-2">Fig. 2</xref> exhibits these six images for a male subject from the AMI dataset of the 600 adopted images. As such, 400 were randomly selected as a training dataset, and the remaining 200 were held out as a testing dataset. Namely, each subject had four random images for training vs. two others for testing.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Human ear image samples from the raw AMI dataset represent six various perspectives for a mele subject</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-2.tif"/>
</fig>
</sec>
<sec id="s3_1_2">
<label>3.1.2</label>
<title>AMI-Cropped (AMIC) Ear Image Dataset</title>
<p>A dataset derived from AMI, containing a cropped version of all AMI images, has been repeatedly used in the literature and known as AMI-Cropped (AMIC) [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. In AMIC, each AMI&#x2019;s raw image, like those shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, was cropped tightly, surrounding the ear. As a result, the cropping process discarded all partial regions of hair, face, and neck around the ear, and it kept only the ear within a minimal ear bounding-box as the only region-of-interest (ROI). <xref ref-type="fig" rid="fig-3">Fig. 3</xref> illustrates six AMIC&#x2019;s cropped ear images for a female subject. In this research, the adopted AMIC dataset consists of a cropped version of the same original 600 ear image samples in the AMI. Likewise, AMIC was split into 400 and 200 random images for training and testing. This dataset was meant to investigate the variability in ear recognition performance in more challenging scenarios, offering more confusable data images and minimal usable biometric information.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>More challenging six cropped ear images in various perspectives for a female subject from the AMIC dataset</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-3.tif"/>
</fig>
</sec>
<sec id="s3_1_3">
<label>3.1.3</label>
<title>AMI-Based Dataset of Ear Categorical Soft Biometric Labels</title>
<p>The AMI-based dataset of ear categorical soft biometric labels [<xref ref-type="bibr" rid="ref-1">1</xref>] was crowdsourced for all image samples of the hundred subjects in the AMI dataset. It consists of 2900 multi-label annotations provided by 666 annotators. The label data were acquired using categorical and relative labeling forms for 33 fine-grained soft biometric attributes of the human ear grouped into eight zones, as listed in <xref ref-type="table" rid="table-15">Table A1</xref> in the <xref ref-type="app" rid="app-1">Appendix A</xref>, where the attributes described with relative label form are bolded, as they will be used in the next section for inferring comparative soft biometric labels. Multiple, two to five, crowdsourcing annotators annotated each ear image by assigning the most suited categorical label from a given label group to best describe each attribute. Other than &#x201C;can&#x2019;t see&#x201D;, each label was assigned a representative numerical value ranging from 1 to 8. These values were arbitrarily assigned to categorical label groups, whereas they worked for each relative label group as a bipolar scale, indicating a compatible numeral rating for each relative label.</p>

<p>In this research, &#x201C;gender&#x201D; was labeled with (male/female), likewise, and added to the categorical label data as a global 34th soft biometric attribute, appended in italics at the end of <xref ref-type="table" rid="table-15">Table A1</xref>. The raw label dataset was thoroughly analyzed for data cleansing, outlier removal, and noise mitigation. Moreover, from 1934 relevant annotations, the categorical and relative labels were used to derive a unique categorical soft biometric trait per attribute for each ear image sample in the training dataset. Each trait was deduced as the median value of all categorical/relative labels acquired from several annotators to describe the same ear image sample specifically. The nascent traits for all attributes were used to compose a 34-value feature vector of categorical soft biometrics (SoftCat) for each of the 400 AMI/AMIC training samples. Also, utilizing randomly selected 200 out of 966 relevant annotations, a similar feature vector was composed for each of the 200 testing samples; however, it comprises 34 raw categorical/relative labels provided by a single annotator, reflecting their categorical soft biometrics to be used as a query to recognize a person-of-interest.</p>

</sec>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Proposed Ear Comparative Soft Biometrics</title>
<p>Among dozens of practical soft biometric attributes, many can be mere nominal, which can only be semantically annotated using a group of categorical labels as absolute types, such as the &#x201C;scapha shape&#x201D; (A6) trait in <xref ref-type="table" rid="table-15">Table A1</xref> in the <xref ref-type="app" rid="app-1">Appendix A</xref> described using the categorical label-group (flat, convex, other) that do not reflect any sortable information. Dissimilarly, some other attributes can be ordinal, which can be further annotated using a group of relative labels as a scalar, such as the &#x201C;scapha size&#x201D; (A5) trait in <xref ref-type="table" rid="table-15">Table A1</xref> described using the relative label-group (very small, small, medium, large, very large), reflecting sortable information of the degree of strength of this attribute. Such ordinal soft biometric attributes are not only relatively describable but also comparable and consequently rankable, which enables the derivation of comparative soft biometrics with increased discriminatory capabilities for augmenting human recognition.</p>

<p><xref ref-type="table" rid="table-2">Table 2</xref> outlines the proposed comparable and differentiable ear soft biometric attributes and corresponding comparative label groups. Thus, thirteen ordinal attributes were adapted, and their descriptive relative labels were extended to their corresponding comparative label form. The resulting comparative labels represent the degree of comparison/difference between two ears per attribute rather than the degree of strength per attribute for a single ear. Each label group comprises five labels, numerically represented consecutively from the lowest to the highest by a 5-point bipolar scale of the integer values from 1 to 5 as their codes, e.g., (much lower &#x003D; 1, lower &#x003D; 2, similar &#x003D; 3, higher &#x003D; 4, much higher &#x003D; 5). The following subsections explain how comparative labels were inferred for soft biometric attributes, how ear image ranking was enforced using the comparative labels, and how comparative soft biometrics were eventually extracted.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Proposed comparable ear soft biometric attributes along with their comparative labels</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th>Ear zone</th>
<th>ID</th>
<th align="center">Soft biometric attribute</th>
<th align="center">Comparative labels</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="3">General</td>
<td>A1</td>
<td>Ear shape</td>
<td>[much simpler, simpler, similar, more complex, much more complex]</td>
</tr>
<tr>

<td>A2</td>
<td>Width to length ratio</td>
<td>[much lower, lower, similar, higher, much higher]</td>
</tr>
<tr>

<td>A3</td>
<td>Ear coverage</td>
<td>[much lower, lower, similar, higher, much higher]</td>
</tr>
<tr>
<td>Scapha</td>
<td>A5</td>
<td>Scapha size</td>
<td>[much smaller, smaller, similar, larger, much larger]</td>
</tr>
<tr>
<td rowspan="2">Earlobe</td>
<td>A13</td>
<td>Earlobe length</td>
<td>[much shorter, shorter, similar, longer, much longer]</td>
</tr>
<tr>

<td>A14</td>
<td>Earlobe size</td>
<td>[much smaller, smaller, similar, larger, much larger]</td>
</tr>
<tr>
<td rowspan="2">Tragus</td>
<td>A18</td>
<td>Space between tragus &#x0026; antitragus</td>
<td>[much smaller, smaller, similar, larger, much larger]</td>
</tr>
<tr>

<td>A19</td>
<td>Tragus thicknesses</td>
<td>[much thinner, thinner, similar, thicker, much thicker]</td>
</tr>
<tr>
<td>Ear hair</td>
<td>A22</td>
<td>Hair density</td>
<td>[much lower, lower, similar, higher, much higher]</td>
</tr>
<tr>
<td rowspan="4">Skin</td>
<td>A24</td>
<td>Skin moles</td>
<td>[much fewer, fewer, similar, more, much more]</td>
</tr>
<tr>

<td>A25</td>
<td>Skin spots</td>
<td>[much fewer, fewer, similar, more, much more]</td>
</tr>
<tr>

<td>A26</td>
<td>Skin crusts</td>
<td>[much less, less, similar, more, much more]</td>
</tr>
<tr>

<td>A27</td>
<td>Skin tone</td>
<td>[much lighter, lighter, similar, darker, much darker]</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s3_2_1">
<label>3.2.1</label>
<title>Inferring AMI-Based Ear Comparative Soft Biometric Labels</title>
<p>An automatic ear soft biometric labeling technique was developed to infer comparative label information for AMI/AMIC ear images by comparing their relative label information. For each comparable soft biometric attribute in <xref ref-type="table" rid="table-2">Table 2</xref>, multiple pairwise comparisons were generated between ear images to compare their relative labels and assign the most applicable comparative label accordingly. Such an automated process enables mimicking human perception to detect the nuanced difference between a subject pair compared in a particular attribute [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>]. <xref ref-type="fig" rid="fig-4">Fig. 4</xref> displays an automatic pairwise comparison between two subjects&#x2019; ear images in the AMI training dataset, resulting in thirteen inferred soft biometric comparative labels.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Example of nascent comparative labels for ear soft biometric attributes inferred by the proposed automatic comparative labeling technique based on a pairwise comparison between two subjects&#x2019; ear images (<italic>Ear &#x03B1;</italic>) and (<italic>Ear &#x03B2;</italic>)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-4.tif"/>
</fig>
<p>Each inferred comparative label describes the degree of similarity/dissimilarity between two relative labels of the same soft biometric attribute for two compared training samples of different subjects in AMI/AMIC. For each pairwise comparison per attribute, associated relative labels were fetched from the AMI-based categorical label dataset and compared to assign a comparative label that best reflects the degree of comparison. This comparative labeling was implemented using a function [<xref ref-type="bibr" rid="ref-6">6</xref>], adapted to suit the ear biometric recognition context. It decides the applicable comparative label code (1&#x2013;5) for each attribute <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula>, based on the &#x00B1; difference computed between the two representative numerical values of the relative labels <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, describing the compared two subjects&#x2019; ears <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>&#x03B1;</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>&#x03B2;</mml:mi></mml:math></inline-formula>, as defined in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref>. Notably, each attribute <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">A</mml:mi></mml:mrow></mml:math></inline-formula>, a set of thirteen comparable attributes in <xref ref-type="table" rid="table-2">Table 2</xref> and comparatively labeled in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:msub><mml:mrow><mml:mtext>compare</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mtable columnalign="left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mn>1</mml:mn><mml:mspace width="1em"></mml:mspace><mml:mrow><mml:mtext>if</mml:mtext></mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2264;</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>2</mml:mn><mml:mspace width="1em"></mml:mspace><mml:mrow><mml:mtext>if</mml:mtext></mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>3</mml:mn><mml:mspace width="1em"></mml:mspace><mml:mrow><mml:mtext>if&#x00A0;</mml:mtext></mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>4</mml:mn><mml:mspace width="1em"></mml:mspace><mml:mrow><mml:mtext>if</mml:mtext></mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>5</mml:mn><mml:mspace width="1em"></mml:mspace><mml:mrow><mml:mtext>if</mml:mtext></mml:mrow><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B1;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>&#x03B2;</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2265;</mml:mo><mml:mn>2</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo fence="true" stretchy="true" symmetric="true"></mml:mo></mml:mrow></mml:math></disp-formula></p>

<p>For each ear pair <italic>&#x03B1;</italic> and <italic>&#x03B2;</italic> to be compared per attribute and for ranking, several considerations can ensure the practicality and accuracy for learning a ranking function while also mimicking real-life scenarios. Each (<italic>&#x03B1;</italic> and <italic>&#x03B2;</italic>) pair is selected only from the training dataset to maintain generalizability, holding out the test dataset as unseen data to recognize. It is randomly chosen without restrictions on which subject should be compared with whom and in what order to avoid bias and reveal robustness against such randomness.</p>
</sec>
<sec id="s3_2_2">
<label>3.2.2</label>
<title>Ranking Ears by Comparable Soft Biometric Attribute</title>
<p>The ability to compare ears in a soft biometric attribute further enables a more meaningful capability of ranking them by that attribute. Such ranking by comparable attributes is a pivotal process prior to extracting viable comparative soft biometrics. Therefore, numerous pairwise comparisons by attribute were generated and annotated with comparative labels primarily for ranking purposes. The goal of the ranking process was to anchor those comparisons and use them as constraints of resemblance and contrast to enforce the desired ordering per attribute on all training samples [<xref ref-type="bibr" rid="ref-25">25</xref>,<xref ref-type="bibr" rid="ref-31">31</xref>]. Hence, all ears were ranked by each attribute, resulting in a list of ordered subjects per attribute. Based on the imposed ordering rules, the thirteen relative-based categorical soft biometrics, which describe only the comparable attributes, were mapped to their corresponding comparative soft biometrics. The resultant comparative soft biometrics are refined relative measurements describing a subject&#x2019;s ear in relation to all remaining subjects in the dataset.</p>
<p>This research used an operative soft-margin RankSVM model [<xref ref-type="bibr" rid="ref-31">31</xref>] that was redevised here to rank ears by attribute. The generated comparison information was employed as similarity/dissimilarity constraints and ordering rules to impose coveted ranking per attribute. The role of the RankSVM model was to learn <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> as a ranking function for each attribute <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">A</mml:mi></mml:mrow></mml:math></inline-formula>. The ranking function <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> can be expressed as follows:<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula>where <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes a 13-value feature vector of thirteen relative-based categorical soft biometrics of the <italic>i</italic>-th ear sample being ranked by attribute <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula>. The weight coefficient vector is denoted as <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> for the ranking function <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, which is being learned to rank all training ear samples by attribute <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula>. The vector <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> can be efficiently deduced from multiple pairwise comparisons in attribute <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula> between training samples.</p>
<p>The total possible pairwise comparisons per attribute can be inferred as the number of all possible two-sample combinations of <italic>n</italic> samples, calculated by <italic>n</italic>!/2(<italic>n</italic> &#x2212; 2)!. Here, since the AMI/AMIC training dataset has multiple samples per same subject, they did not necessarily need to be compared with each other, as the comparisons between samples of different subjects are more significant towards discriminative biometric recognition capabilities. Moreover, generating only a subset of valid pairwise comparisons can be sufficient to learn a successful ranking function [<xref ref-type="bibr" rid="ref-31">31</xref>]. Therefore, a comprehensive subset of about 25% of all possible combinations was generated per attribute. In this subset, only those satisfying particular criteria were selected as a minimal proportion of only about 20% to learn a robust ranking function. <xref ref-type="table" rid="table-3">Table 3</xref> gives a statistical synopsis of generated and included/excluded comparative label data for ranking purposes.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Statistics of generated pairwise comparisons and inferred comparative labels for ear soft biometric attributes</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Comparative label data statistics</th>
<th align="center">Per attribute</th>
<th align="center">For all attributes</th>
<th align="center">%</th>
</tr>
</thead>
<tbody>
<tr>
<td>Possible pairwise comparisons (combinations)</td>
<td>79,800</td>
<td>1,037,400</td>
<td>100%</td>
</tr>
<tr>
<td>Generated comparisons/comparative labels</td>
<td>19,800</td>
<td>257,400</td>
<td>24.8%</td>
</tr>
<tr>
<td>Comparative labels of dominance and similarity comparisons (Satisfy <italic>&#x03B1;</italic> &#x227B; <italic>&#x03B2;</italic> or <italic>&#x03B1;</italic> &#x223C; <italic>&#x03B2;</italic>) &#x002A; for ranking</td>
<td>13,520&#x2013;18,742</td>
<td>201,907</td>
<td>19.5%</td>
</tr>
<tr>
<td>Excluded unsatisfactory comparisons</td>
<td>1058&#x2013;6280</td>
<td>55,493</td>
<td>5.3%</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="table-3fn1" fn-type="other"><p>Note: &#x002A; <italic>&#x03B1;</italic> &#x227B; <italic>&#x03B2;</italic> implies ear sample <italic>&#x03B1;</italic> possesses a higher strength of some attribute than ear sample <italic>&#x03B2;</italic>, and <italic>&#x03B1;</italic> &#x223C; <italic>&#x03B2;</italic> implies <italic>&#x03B1;</italic> and <italic>&#x03B2;</italic> ear samples have a similar strength of some attribute.</p>
</fn>
</table-wrap-foot>
</table-wrap>
<p>The criteria for selecting a subset of comparisons for ranking were applied for each attribute <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">A</mml:mi></mml:mrow></mml:math></inline-formula>, resulting in various numbers of applicable comparative labels per attribute. These criteria merely check whether every comparison belongs to either the <italic>dominance</italic> <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> set or the <italic>similarity</italic> <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> set to include, or leave it out otherwise. <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> was created as a set of constraints comprising all dominance comparisons, each of which is a sorted dissimilar pair of ears (<italic>i</italic>, <italic>j</italic>) <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> satisfying <italic>i</italic> <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mo>&#x227B;</mml:mo></mml:math></inline-formula> <italic>j</italic>, meaning ear <italic>i</italic> possesses a higher degree of strength in attribute <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula> than ear <italic>j</italic>. Conversely, <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> was formed as a constraint set of all similarity comparisons, encompassing every similar pair of ears (<italic>i</italic>, <italic>j</italic>) <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> satisfying <italic>i</italic> <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mo>&#x223C;</mml:mo></mml:math></inline-formula> <italic>j</italic>, namely, ears <italic>i</italic> and <italic>j</italic> possess the same degree of strength in attribute <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula>. Then, <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> were used as pairwise constraints to enforce the desirable ordering and accordingly derive a weight coefficient vector <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> for a ranking function <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, as formulated in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>. For each <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, the weight coefficients were learned through the iterative constraint-based tuning procedure in the ranking SVM, enforced by the dominance and similarity comparisons.<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mtable columnalign="left left left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:mtext>minimize:</mml:mtext></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msup><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mrow><mml:mi>&#x03BE;</mml:mi></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mrow><mml:mtext>subject to:</mml:mtext></mml:mrow></mml:mrow></mml:mtd><mml:mtd><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>;</mml:mo><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd></mml:mtd><mml:mtd><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:msubsup><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>T</mml:mi></mml:mrow></mml:msubsup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow><mml:mo>&#x2264;</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>;</mml:mo><mml:mi mathvariant="normal">&#x2200;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd></mml:mtd><mml:mtd><mml:mrow><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2265;</mml:mo><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula>where <italic>C</italic> is the hyperparameter to balance maximizing the margin vs. minimizing the error, which decreases as the ranking better complies with enforced constraints. Unlike standard SVMs, RankSVM aims to separate the differences between data points rather than separating the data points themselves. In this context, the margin is the smallest difference between the nearest two ranks among all ranks of the ear sample points. <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:msub><mml:mi>&#x03BE;</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the slack variable, gauging the errors in ranking the <italic>i</italic>-th relative to the <italic>j</italic>-th ear sample.</p>
</sec>
<sec id="s3_2_3">
<label>3.2.3</label>
<title>Extracting Ear Comparative Soft Biometrics</title>
<p>This research proposed and extracted a novel set of thirteen ear comparative soft biometrics (SoftCmp) to investigate their capabilities in augmenting ear recognition. These comparative traits were automatically derived after inferring comparative labels and then using them in learning to rank by attribute, described in <xref ref-type="sec" rid="s3_2_1">Sections 3.2.1</xref> and <xref ref-type="sec" rid="s3_2_2">3.2.2</xref>. Each learned ranking function <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:msub><mml:mi>r</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> per attribute and its deduced related weight coefficient <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> were used to enforce explicit ordering for all training ear samples by attribute <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mrow><mml:mi mathvariant="double-struck">a</mml:mi></mml:mrow></mml:math></inline-formula>. This way, each 13-value feature vector of relative-based categorical soft biometrics, derived in <xref ref-type="sec" rid="s3_1_3">Section 3.1.3</xref>, was used to produce a comparative soft biometric trait per attribute using <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>. Then, the resulting traits were used to compose a new 13-value feature vector of comparative soft biometrics (SoftCmp) for each AMI/AMIC training and testing ear sample. Moreover, further composite soft biometric feature vectors were derived to investigate the utmost potential capability of ear recognition augmentation by fusing both categorical and comparative soft biometrics. Each 34-value categorical feature vector (SoftCat) was concatenated with a corresponding 13-value comparative feature vector (SoftCmp) to compose a new 47-value feature vector combining categorical and comparative soft biometrics (SoftCat&#x0026;Cmp). <xref ref-type="table" rid="table-4">Table 4</xref> elaborates on all three proposed and extracted feature vectors of soft biometric traits used to augment ear identification/verification.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Proposed categorical and comparative soft biometric traits to augment ear recognition</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Feature vector</th>
<th>No. of traits</th>
<th>Soft biometric trait description</th>
</tr>
</thead>
<tbody>
<tr>
<td>SoftCat</td>
<td>34</td>
<td>34 categorical soft biometrics, 13 of them are relative-based</td>
</tr>
<tr>
<td>SoftCmp</td>
<td>13</td>
<td>13 comparative soft biometrics as refined relative measures</td>
</tr>
<tr>
<td>SoftCat&#x0026;Cmp</td>
<td>47</td>
<td>Fusion of 34 categorical concatenated with 13 comparative soft biometrics</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_2_4">
<label>3.2.4</label>
<title>Significance Analysis of Ear Comparative Soft Biometrics</title>
<p>The analysis of variance (ANOVA) test was used as a standard statistical tool to analyze the viability of the newly proposed and most significant comparative soft biometric traits in terms of discriminatory and separability, reflecting how significant and well each trait contributes to person recognition. <xref ref-type="table" rid="table-5">Table 5</xref> shows the comparative soft biometrics in order based on their significance and capability to differentiate between groups and contribute to successful person identification or verification. It can be observed that all proposed comparative soft biometrics gained significant F-ratios and corresponding small <italic>p</italic>-values, according to the significance level of <italic>p</italic> &#x003C; 0.05. These results emphasized their potential and potency as discriminative soft biometric traits for augmenting hard biometric traits and improving recognition accuracy. Interestingly, &#x201C;Hair density&#x201D; (A22), &#x201C;Earlobe length&#x201D; (A13), and &#x201C;Earlobe size&#x201D; (A14) distinctly showed the most significance among traits, signifying their efficacy for person recognition purposes.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>ANOVA-based viability and significance analysis of ear comparative soft biometrics</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>ID</th>
<th>Soft biometric attribute</th>
<th>F-ratio</th>
<th><italic>p</italic>-value</th>
</tr>
</thead>
<tbody>
<tr>
<td>A22</td>
<td>Hair density</td>
<td>849.06</td>
<td>3.70e&#x2212;05</td>
</tr>
<tr>
<td>A13</td>
<td>Earlobe length</td>
<td>684.85</td>
<td>9.10e&#x2212;05</td>
</tr>
<tr>
<td>A14</td>
<td>Earlobe size</td>
<td>546.91</td>
<td>1.97e&#x2212;04</td>
</tr>
<tr>
<td>A1</td>
<td>Ear shape</td>
<td>167.51</td>
<td>1.96e&#x2212;03</td>
</tr>
<tr>
<td>A25</td>
<td>Skin spots</td>
<td>123.37</td>
<td>2.67e&#x2212;03</td>
</tr>
<tr>
<td>A5</td>
<td>Scapha size</td>
<td>121.35</td>
<td>2.71e&#x2212;03</td>
</tr>
<tr>
<td>A18</td>
<td>Space between tragus &#x0026; antitragus</td>
<td>119.86</td>
<td>2.74e&#x2212;03</td>
</tr>
<tr>
<td>A27</td>
<td>Skin tone</td>
<td>114.40</td>
<td>2.85e&#x2212;03</td>
</tr>
<tr>
<td>A26</td>
<td>Skin crusts</td>
<td>68.25</td>
<td>4.09e&#x2212;03</td>
</tr>
<tr>
<td>A19</td>
<td>Tragus thicknesses</td>
<td>46.84</td>
<td>4.94e&#x2212;03</td>
</tr>
<tr>
<td>A2</td>
<td>Width to length ratio</td>
<td>32.73</td>
<td>5.67e&#x2212;03</td>
</tr>
<tr>
<td>A24</td>
<td>Skin moles</td>
<td>13.06</td>
<td>7.18e&#x2212;03</td>
</tr>
<tr>
<td>A3</td>
<td>Ear coverage</td>
<td>7.98</td>
<td>7.78e&#x2212;03</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>The F-ratio, together with the resultant <italic>p</italic>-value, was deduced for each trait using the one-way ANOVA as follows [<xref ref-type="bibr" rid="ref-8">8</xref>]:<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:mrow><mml:mtext>F-ratio</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mtext>Total between</mml:mtext></mml:mrow><mml:mrow><mml:mstyle displaystyle="false" scriptlevel="0"><mml:mtext>-</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mtext>group variance</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext>Total within</mml:mtext></mml:mrow><mml:mrow><mml:mstyle displaystyle="false" scriptlevel="0"><mml:mtext>-</mml:mtext></mml:mstyle></mml:mrow><mml:mrow><mml:mtext>group variance</mml:mtext></mml:mrow></mml:mrow></mml:mfrac><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:munder><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mover><mml:mi>X</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:mover><mml:mi>X</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mspace width="negativethinmathspace"></mml:mspace><mml:mspace width="negativethinmathspace"></mml:mspace><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>K</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mrow><mml:msub><mml:mover><mml:mi>X</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>N</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>K</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac></mml:math></disp-formula>where each group, in the human recognition context, comprises all samples of the same subject (person). <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mover><mml:mi>X</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the mean of the <italic>i</italic>-th group&#x2019;s observations, and <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mover><mml:mi>X</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover></mml:math></inline-formula> is the overall mean for all groups&#x2019; observations, where <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the number of observations of the <italic>i</italic>-th group and its corresponding sum of squared <italic>between-group</italic> variances. <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> denotes the <italic>j</italic>-th trait observation of the <italic>i</italic>-th group. <italic>K</italic> is the number of groups (persons), and <italic>N</italic> is the total observations of all groups. Hence, the degree of freedom at the <italic>between-group</italic> level is computed as (<inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>K</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>), and at the <italic>within-group</italic> level is inferred as (<inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:mi>N</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mi>K</mml:mi></mml:math></inline-formula>).</p>
</sec>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Pre-Trained Deep Learning Models for Biometric Feature Extraction</title>
<p>Convolutional neural network (CNN) architectures have been well-established as versatile deep learning models for various image analysis and computer vision purposes [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-32">32</xref>&#x2013;<xref ref-type="bibr" rid="ref-34">34</xref>]. Such pre-trained CNN models can be effectively used as deep-feature extractors for image-based recognition tasks like human identification/verification. Since they have already been sufficiently trained on large-scale datasets, their pre-trained weights can be utilized to initialize these models when transferring learning to a new task instead of random reinitializations [<xref ref-type="bibr" rid="ref-5">5</xref>]. In this research, VGG-19 and ResNet-50 models were adapted as deep-feature extractors. The extracted vision-based deep features composed two feature vectors for ear biometric recognition, denoted as (VGG19 and ResNet50). The two feature vectors represented the ear hard biometrics to be fused and augmented by the proposed ear soft biometrics (i.e., SoftCat, SoftCmp, and SoftCat&#x0026;Cmp).</p>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>VGG-19 as Deep-Feature Extractor</title>
<p>The VGG-19 model is a CNN-based deep learning architecture designed as a visual geometry group (VGG) version with a uniform structure of nineteen hidden layers: sixteen convolutional (conv) and three fully connected (FC) [<xref ref-type="bibr" rid="ref-33">33</xref>]. It was successfully pre-trained for generic image recognition on the sizable ImageNet dataset. Thus, it offers functional transferable learning by slight fine-tuning, even using a small dataset of a new target task [<xref ref-type="bibr" rid="ref-5">5</xref>]. This research adopted the strategy of using the pre-trained VGG-19 as a deep-feature extractor. Hence, the standard architecture was adapted to suit the context of augmenting deep-feature-based ear recognition with soft biometrics. <xref ref-type="table" rid="table-6">Table 6a</xref> illustrates the architecture components of the VGG-19 adapted and used as a deep-feature extractor.</p>
<table-wrap id="table-6">
<label>Table 6</label>
<caption>
<title>The VGG-19 and ResNet-50 architectures used for hard biometric deep-feature extraction</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th colspan="3">(a) VGG-19 deep-feature extractor</th>
<th colspan="3">(b) ResNet-50 deep-feature extractor</th>
</tr>
<tr>
<th align="center">Layer type</th>
<th align="center">Filter size, depth</th>
<th align="center">Output size</th>
<th align="center">Layer type</th>
<th align="center">Filter size, depth</th>
<th align="center">Output size</th>
</tr>
</thead>
<tbody>
<tr>
<td>Input</td>
<td>&#x2013;</td>
<td>492 &#x00D7; 702 RGB</td>
<td>Input</td>
<td>&#x2013;</td>
<td>492 &#x00D7; 702 RGB</td>
</tr>
<tr>
<td>Block1_conv(1&#x2013;2)</td>
<td>[3 &#x00D7; 3 conv, 64] &#x00D7; 2</td>
<td>492 &#x00D7; 702</td>
<td>Conv1</td>
<td>7 &#x00D7; 7 conv, 64</td>
<td>246 &#x00D7; 351</td>
</tr>
<tr>
<td>Max pool</td>
<td>&#x2013;</td>
<td>246 &#x00D7; 351</td>
<td>Max pool</td>
<td>&#x2013;</td>
<td>123 &#x00D7; 176</td>
</tr>
<tr>
<td rowspan="2">Block2_conv(1&#x2013;2)</td>
<td rowspan="2">[3 &#x00D7; 3 conv, 128] &#x00D7; 2</td>
<td rowspan="2">246 &#x00D7; 351</td>
<td></td>
<td rowspan="3"><inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mrow><mml:mo>[</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>64</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>64</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>256</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula></td>
<td rowspan="3">123 &#x00D7; 176</td>
</tr>
<tr>
<td>Conv2_block<break/>(1&#x2013;3)</td>
</tr>
<tr>
<td>Max pool</td>
<td>&#x2013;</td>
<td>123 &#x00D7; 175</td>
<td></td>
</tr>
<tr>
<td rowspan="2">Block3_conv(1&#x2013;4)</td>
<td rowspan="2">[3 &#x00D7; 3 conv, 256] &#x00D7; 4</td>
<td rowspan="2">123 &#x00D7; 175</td>
<td></td>
<td rowspan="3"><inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:mrow><mml:mo>[</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>128</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>128</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>512</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mn>4</mml:mn></mml:math></inline-formula></td>
<td rowspan="3">62 &#x00D7; 88</td>
</tr>
<tr>
<td>Conv3_block<break/>(1&#x2013;4)</td>
</tr>
<tr>
<td>Max pool</td>
<td>&#x2013;</td>
<td>61 &#x00D7; 87</td>
<td></td>
</tr>
<tr>
<td rowspan="2">Block4_conv(1&#x2013;4)</td>
<td rowspan="2">[3 &#x00D7; 3 conv, 512] &#x00D7; 4</td>
<td rowspan="2">61 &#x00D7; 87</td>
<td></td>
<td rowspan="3"><inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mrow><mml:mo>[</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>256</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>256</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>1024</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mn>6</mml:mn></mml:math></inline-formula></td>
<td rowspan="3">31 &#x00D7; 44</td>
</tr>
<tr>
<td>Conv4_block<break/>(1&#x2013;6)</td>
</tr>
<tr>
<td>Max pool</td>
<td>&#x2013;</td>
<td>30 &#x00D7; 43</td>
<td></td>
</tr>
<tr>
<td rowspan="2">Block5_conv(1&#x2013;4)</td>
<td rowspan="2">[3 &#x00D7; 3 conv, 512] &#x00D7; 4</td>
<td rowspan="2">30 &#x00D7; 43</td>
<td></td>
<td rowspan="3"><inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mrow><mml:mo>[</mml:mo><mml:mtable rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>512</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>3</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>512</mml:mn></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mtext>&#x00A0;conv</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mn>2048</mml:mn></mml:mrow></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula></td>
<td rowspan="3">16 &#x00D7; 22</td>
</tr>
<tr>
<td>Conv5_block<break/>(1&#x2013;3)</td>
</tr>
<tr>
<td>Max pool</td>
<td>&#x2013;</td>
<td>15 &#x00D7; 21</td>
<td></td>
</tr>
<tr>
<td>Global average pool</td>
<td>&#x2013;</td>
<td>512</td>
<td>Global average<break/> pool</td>
<td>&#x2013;</td>
<td>2048</td>
</tr>
</tbody>
</table>
</table-wrap>
  
<p>The ImageNet-based pre-trained weights were used for model initialization. The original input size 224 &#x00D7; 224 was replaced by the full size 492 &#x00D7; 702 of the raw RGB ear data image in the AMI dataset, as it was empirically found to be the optimal size, maintaining the most informative vision-based deep features for biometric recognition on both AMI and AMIC. As such, the AMIC&#x2019;s cropped ear images were rescaled to fit the required input size. The model used 3 &#x00D7; 3 convolution kernels with a stride of 1 and 2 &#x00D7; 2 max pooling operations with a stride of 2 across all five architecture blocks. A one-pixel zero padding was enforced to preserve the spatial dimensions of the output feature map after each convolution. A rectified linear unit (ReLU) activation function was used with each convolution. The last three FC layers were replaced with a global average pooling layer to derive a fixed-size deep-feature output of 512 values, representing the VGG19 feature vector of hard biometrics for each image in both AMI/AMIC training and testing datasets.</p>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>ResNet-50 as Deep-Feature Extractor</title>
<p>The ResNet-50 is a 50-layer version of the residual network (ResNet) devised based on CNN deep learning architecture with 49 convolutional layers and one average pooling layer [<xref ref-type="bibr" rid="ref-32">32</xref>]. ResNet-50 was also intensively pre-trained using ImageNet for generic image recognition tasks. It allows learning transfer and generalization by utilizing its reliable pre-trained weights for target task-specific network initialization or fine-tuning. The inherent power of ResNets lies in exploiting increased depth for more perceptive feature extraction and addressing likely grain vanishing or overproduction issues [<xref ref-type="bibr" rid="ref-12">12</xref>]. Since ResNets are equipped with skip connections to avoid possible loss of image information along the network&#x2019;s depth increase [<xref ref-type="bibr" rid="ref-14">14</xref>]. This research adjusted the standard ResNet-50 architecture to transfer its learned deep features to ear biometric recognition to be augmented by soft biometrics. <xref ref-type="table" rid="table-6">Table 6b</xref> presents the ResNet-50 architecture and its modules adapted and employed in this research as a ResNet-based deep-feature extractor.</p>

<p>Here, the adjusted ResNet-based model for AMI and AMIC was also initialized using the ImageNet-based pre-trained weights, and the original ResNet-50 input size was changed from 224 &#x00D7; 224 to 492 &#x00D7; 702 because it was also here the optimal empirically found input size for best ear biometric signature representation in this research. Accordingly, the AMIC&#x2019;s variant-size cropped images were rescaled to the new input size. As shown in <xref ref-type="table" rid="table-6">Table 6b</xref>, the adapted model used a 7 &#x00D7; 7 kernel with a stride of 2 and three-pixel zero padding in the first conv layer. In the subsequent multi-block conv layers, each block of three convolutions used two 1 &#x00D7; 1 kernels with a stride of 1 for the top and bottom convolutions. However, a 3 &#x00D7; 3 kernel was used for the middle convolution, where one-pixel zero padding was applied. A stride of 1 was used for all 3 &#x00D7; 3 kernels except Conv3_block1, Conv4_block1, and Conv5_block1, where it was a stride of 2 to downsample the feature maps. The last FC layer was substituted with a global average pooling layer, which derives a fixed-size deep-feature output of 2048 values, representing the ResNet50 feature vector of hard biometrics for each training and testing ear image sample in AMI/AMIC.</p>

</sec>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Biometric Trait Normalization and Feature-Level Fusion</title>
<p>Using the proposed hard and soft biometric feature extraction methods, for each ear image data sample in both AMI and AMIC datasets, all hard biometric deep features and corresponding soft biometrics were extracted and normalized using min-max normalization to rescale all biometric trait values between zero and one. Across the entire experimental work, the same procedure was applied to all AMI, AMIC, and soft biometrics original datasets by splitting each into two disjoint 66.67% training and 33.33% test subsets.</p>
<p>For augmenting deep-feature-based ear recognition by soft biometrics, feature-level fusion was adopted to investigate the ultimate capabilities of the proposed soft biometrics and their potency in this endeavor. Feature-level fusion was selected over other fusion strategies, e.g., classifier-level and decision-level, because it empowers most interaction and synergy between augmented hard and augmenting soft biometrics in achieving enhanced recognition performance. Moreover, it enables feature viability analysis and benefits from the best qualities of biometric modalities, and has often been proven as an effective fusion strategy for integrating diverse biometric traits or modalities for significantly improved recognition [<xref ref-type="bibr" rid="ref-34">34</xref>].</p>
<p>Consequently, eight biometric templates were composed for each ear image to be ready for use in further training and testing for ear identification and verification. Two represented unaugmented standard hard biometric templates of VGG-based and ResNet-based deep-feature vectors, which were used as benchmark baselines for performance comparison purposes. The remaining six represented soft biometric-based augmented templates, where each standard template was concatenated with each of the three proposed soft biometric feature vectors, SoftCat, SoftCmp, and SoftCat&#x0026;Cmp. <xref ref-type="table" rid="table-7">Table 7</xref> shows VGG and ResNet unaugmented standard ear biometric templates and their soft-based counterparts augmented by categorical and comparative soft biometrics. These two VGG-based and ResNet-based traditional deep-feature approaches and their six augmented counterparts were evaluated and compared in eight ear biometric identification and verification experiments that will be described in <xref ref-type="sec" rid="s4">Section 4</xref>.</p>
<table-wrap id="table-7">
<label>Table 7</label>
<caption>
<title>Composition of two standard and six augmented-by-soft ear biometric templates</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Approach</th>
<th align="center">Trait type</th>
<th align="center">Biometric template (Feature vectors) composition of deep/hard &#x0026; soft traits</th>
<th align="center">No. of traits</th>
</tr>
</thead>
<tbody>
<tr>
<td>VGG19</td>
<td>Standard</td>
<td>512 normalized VGG-19 deep-features</td>
<td>512</td>
</tr>
<tr>
<td>VGG19 &#x002B; SoftCat</td>
<td>Augmented</td>
<td>512 normalized VGG-19 &#x002B; 34 normalized categorical soft traits</td>
<td>546</td>
</tr>
<tr>
<td>VGG19 &#x002B; SoftCmp</td>
<td>Augmented</td>
<td>Normalized VGG-19 &#x002B; 13 normalized comparative soft traits</td>
<td>525</td>
</tr>
<tr>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td>Augmented</td>
<td>512 normalized VGG-19 &#x002B; 34 normalized categorical &#x002B; 13 normalized comparative soft traits</td>
<td>559</td>
</tr>
<tr>
<td>ResNet50</td>
<td>Standard</td>
<td>2048 normalized ResNet-50 deep-features</td>
<td>2048</td>
</tr>
<tr>
<td>ResNet50 &#x002B; SoftCat</td>
<td>Augmented</td>
<td>2048 normalized ResNet-50 &#x002B; 34 normalized categorical soft traits</td>
<td>2082</td>
</tr>
<tr>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>Augmented</td>
<td>2048 normalized ResNet-50 &#x002B; 13 normalized comparative soft traits</td>
<td>2061</td>
</tr>
<tr>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td>Augmented</td>
<td>2048 normalized ResNet-50 &#x002B; 34 normalized categorical &#x002B; 13 normalized comparative soft traits</td>
<td>2095</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Effective Classifiers for Ear Recognition</title>
<p>The methodology employed two efficacious SVM-based and Softmax-based classifiers to streamline extended analysis and performance variation investigation using machine learning-based vs. deep learning-based classifiers. Thus, all proposed augmented approaches were extensively explored and comprehensively assessed using different biometric datasets, scenarios, and tasks with different classification methods.</p>
<sec id="s3_5_1">
<label>3.5.1</label>
<title>SVM-Based Classifier</title>
<p>Support vector machines (SVMs) are reliable and practical methods widely deployed for diverse classification problems and pattern recognition endeavors, including human identification and verification [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-8">8</xref>]. Their key role is to find an optimal hyperplane that efficiently differentiates between classes with max hard/soft margin and min misclassification. SVM-based classification can be applied to data points even if nonlinearly separable via mapping the classification problem to a higher-dimensional feature space, where they became linearly separable [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>]. The desired mapping can be accomplished by utilizing helpful kernel functions, such as linear, sigmoid, polynomial, and radial basis.</p>
<p>A soft-margin SVM classifier was separately trained for each approach in <xref ref-type="table" rid="table-7">Table 7</xref>, using its designated training AMI/AMIC dataset. The input of each per-approach SVM-based classifier varies based on its biometric template size, ensuing from the preceding hard/soft biometric feature extraction phases, as shown in <xref ref-type="table" rid="table-7">Table 7</xref>. The grid-search strategy was applied to enforce three-fold cross-validation on the training dataset to decide the following optimal model parameter values empirically. The model could choose between four kernel functions: linear, sigmoid, polynomial (poly), and radial basis function (RBF). The tuning of the regularization hyperparameter <italic>C</italic> was allowed to vary between six logarithmically spaced values 10<sup>&#x2212;2</sup> to 10<sup>3</sup>, whereas the decision boundary was controlled by the <italic>gamma</italic> (&#x03B3;) hyperparameter, ranging between five values 10<sup>&#x2212;3</sup> to 10. In addition, weight coefficients for the categorical and comparative soft biometric traits were used, ranging from 0.1 to 1.5, to control their significance level of the extent they were allowed to contribute to sample representation, feature-level fusion, and thus performance augmentation. Such weight coefficients can be empirically decided to better achieve augmented performance by enforcing a form of regularization on soft biometric traits [<xref ref-type="bibr" rid="ref-8">8</xref>]. They can also balance their influences on the recognition task to exploit their maximum capabilities and avoid possible adverse dominance in the feature space [<xref ref-type="bibr" rid="ref-1">1</xref>].</p>

</sec>
<sec id="s3_5_2">
<label>3.5.2</label>
<title>Softmax-Based Classifier</title>
<p>Softmax-based classifiers have been proven to be high-performing integral configurations in various functional deep-learning architectures for generic image recognition [<xref ref-type="bibr" rid="ref-32">32</xref>,<xref ref-type="bibr" rid="ref-33">33</xref>] or biometric template matching and recognition [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-23">23</xref>]. In this research, a softmax-based feedforward neural network (FFNN) was constructed to be independently trained and used as a classifier for each approach in identification and verification. The Adam optimizer was used with a learning rate of 10<sup>&#x2212;3</sup>, where the number of epochs and batch size were 100 and 10 for VGG-based approaches and 100 and 50 for ResNet-based approaches. The input layer was tailored to each approach, matching the number of features in its biometric template. Next, an FC layer is added with a compatible matching of the input layer size, where ReLU was used as the activation function, followed by a 20% dropout rate as a regularization layer. Another FC layer with a hundred neurons was appended to provide class probabilities (logits) for the hundred unique subjects in the dataset. Here, the Softmax activation function played a focal role in accurate classification, and is defined as follows [<xref ref-type="bibr" rid="ref-35">35</xref>]:<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:mtext>Softmax</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mfrac><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup><mml:mrow><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:msup><mml:mi>e</mml:mi><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:mfrac></mml:math></disp-formula>where <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the <italic>i</italic>-th output feature vector from the last FC layer. In the numerator is the standard exponential of <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, which is normalized by dividing by the total sum of <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> exponentials for all <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>j</mml:mi></mml:math></inline-formula> &#x003D; 1, &#x2026;, <italic>N</italic> classes. <xref ref-type="fig" rid="fig-5">Fig. 5</xref> shows the proposed Softmax-based FFNN trained and used as a robust classifier for ear recognition.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Separately trained and used Softmax-based FFNN as a classifier per biometric approach</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-5.tif"/>
</fig>
</sec>
</sec>
<sec id="s3_6">
<label>3.6</label>
<title>Performance Evaluation</title>
<p>Performance evaluation, analysis, and comparison of ear recognition approaches were carried out in two primary target biometric tasks: identification and verification. Therefore, this research used variant standard statistical evaluation metrics along with analytical visual representations of those suited to each target biometric task, as follows:</p>
<p><bold>Ear identification performance evaluation:</bold> cumulative match characteristic (CMC) curves per experiment, enabling visual identification performance representation and comparison between different used approaches; area under the CMC-curve (CMC-AUC); CMC-based accuracy at the top-match rank 1 (R1), with its 95% confidence interval (CI), and rank 5 (R5), and the average accuracy of the first five ranks R1&#x2013;R5; precision; and recall.</p>
<p><bold>Ear verification performance evaluation:</bold> receiver operator characteristic (ROC) curves per experiment to visualize and compare verification performance representatives while gradually varying the identity accept/reject decision thresholds from 0 to 1 along the curve and accordingly updating the true accept rate (TAR) vs. the false accept rate (FAR); the area under the ROC-curve (ROC-AUC); the equal error rate (EER) of which both accept and reject error rates (FAR and FRR) are equal; the resulting verification accuracy inferred as (1 &#x2212; EER), with its 95% confidence interval (CI); the decidability index (<italic>d</italic><sup>&#x2032;</sup>) to characterize the separability between the genuine and imposter distributions.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Experimental Work and Results</title>
<p>As proof of concept, various experiments were conducted and analyzed to investigate the capabilities of the proposed ear soft biometrics in augmenting standard deep features for ear recognition purposes. Both identification and verification tasks were experimented on AMI and AMIC datasets, using SVM-based (SVM) and Softmax-based (Softmax) classifiers to evaluate and compare the performance of eight approaches. Two standard (hard biometric) deep-feature approaches, VGG-19 and ResNet-50, served as baselines for benchmarking, along with three proposed counterparts for each baseline that were augmented by different soft biometrics (i.e., SoftCat, SoftCmp, and SoftCat&#x0026;Cmp). The two VGG-based and ResNet-based traditional deep-feature approaches, as well as their six augmented counterparts, listed in <xref ref-type="table" rid="table-7">Table 7</xref>, were evaluated and compared in each of eight ear biometric identification and verification experiments.</p>

<p>As such, eight experiments were designed to vary between target biometric recognition tasks, classifiers, datasets, and posed data challenges. They were evaluated and compared in each experiment to explore, characterize, and confirm the efficacy of the proposed augmented approaches over their unaugmented baselines. Their identification and verification performance was evaluated using different metrics and compared from different aspects to show their superiority in such enforced scenarios. <xref ref-type="table" rid="table-8">Table 8</xref> summarizes all experiments conducted in this research, which will be demonstrated in detail in the following subsections.</p>
<table-wrap id="table-8">
<label>Table 8</label>
<caption>
<title>Summary of all eight conducted ear recognition experiments and their specifications</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col/>
<col/>
<col/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Recognition</th>
<th rowspan="2">Dataset</th>
<th rowspan="2">Experiment</th>
<th rowspan="2">Classifier</th>
<th colspan="2">Compared approaches</th>
</tr>
<tr>
<th>task</th>
<th>Baseline</th>
<th>Proposed</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="4">Identification</td>
<td>AMI</td>
<td>Exp. 1</td>
<td>SVM</td>
<td></td>
<td>For each Exp. 3</td>
</tr>
<tr>
<td/>
<td>Exp. 2</td>
<td>Softmax</td>
<td></td>
<td>augmented VGG19 by</td>
</tr>
<tr>
<td>AMIC</td>
<td>Exp. 3</td>
<td>SVM</td>
<td>For each Exp.</td>
<td>SoftCat, SoftCmp, and</td>
</tr>
<tr>
<td/>
<td>Exp. 4</td>
<td>Softmax</td>
<td>Standard VGG19 and</td>
<td>SoftCat&#x0026;Cmp soft</td>
</tr>
<tr>
<td rowspan="4">Verification</td>
<td>AMI</td>
<td>Exp. 5</td>
<td>SVM</td>
<td>Standard ResNet50</td>
<td>biometrics and 3</td>
</tr>
<tr>
<td/>
<td>Exp. 6</td>
<td>Softmax</td>
<td/>
<td>augmented ResNet50 by</td>
</tr>
<tr>
<td>AMIC</td>
<td>Exp. 7</td>
<td>SVM</td>
<td/>
<td>SoftCat, SoftCmp, and</td>
</tr>
<tr>
<td/>
<td>Exp. 8</td>
<td>Softmax</td>
<td/>
<td>SoftCat&#x0026;Cmp soft biometrics</td>
</tr>
</tbody>
</table>
</table-wrap>
<sec id="s4_1">
<label>4.1</label>
<title>Ear Biometric Identification</title>
<p>The first group of experiments (Exp. 1 to 4) was accomplished on the ear biometric identification task, considering the scenario deployed in many real-life systems for human identification purposes. As a one-to-many recognition problem, identification uses an unknown ear sample as an unseen query biometric template to probe a gallery [<xref ref-type="bibr" rid="ref-1">1</xref>]. The system should respond to the query by deciding whether this biometric template belongs to a known subject and retrieving the top-match identity from the enrolled individuals in the gallery/training set, if any. Several standard evaluation metrics suited for identification, described in <xref ref-type="sec" rid="s3_6">Section 3.6</xref>, besides the improvement rate (improve) as the percentile difference of R1 between the baseline and the augmented performance, were used here to investigate the potency of the proposed soft biometric-based augmented approaches and benchmark them against the traditional hard biometric deep features.</p>
<sec id="s4_1_1">
<label>4.1.1</label>
<title>Augmented Ear Identification Using AMI Dataset</title>
<p>Exp. 1 and 2 were conducted on the AMI dataset using SVM and Softmax for augmenting ear identification. <xref ref-type="table" rid="table-9">Table 9</xref> presents the identification performance of Exp. 1 and 2, along with <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, which shows their corresponding CMC curves in (a) and (b), respectively. In the overview, it is evident that in both experiments, the proposed fusion of SoftCat&#x0026;Cmp yielded the highest augmented identification for VGG19 and ResNet50 in all ways when used with SVM-based and Softmax-based classifiers, improving the baselines&#x2019; accuracy with rates ranging from 2.5% to 5.5%, as bolded in <xref ref-type="table" rid="table-9">Table 9</xref>.</p>
<table-wrap id="table-9">
<label>Table 9</label>
<caption>
<title>Ear identification performance on AMI dataset using SVM and Softmax-based classifiers</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">Dataset &#x0026; classifier</th>
<th align="center" rowspan="2">Approach</th>
<th colspan="5">Accuracy</th>
<th rowspan="2">CMC-AUC</th>
<th rowspan="2">Precision</th>
<th rowspan="2">Recall</th>
</tr>
<tr>

<th align="center">R1</th>
<th align="center">R5</th>
<th align="center">Avg. R1&#x2013;R5</th>
<th align="center">Improve</th>
<th align="center">CI (95%)</th>

</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>VGG19</td>
<td>93.50</td>
<td>96.5</td>
<td>95.70</td>
<td>Baseline</td>
<td>[0.901, 0.969]</td>
<td>98.768</td>
<td>95.83</td>
<td>93.50</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>98.50</td>
<td><bold>99.5</bold></td>
<td>99.00</td>
<td>5.00%</td>
<td>[0.968, 1.000]</td>
<td>98.848</td>
<td>99.00</td>
<td>98.50</td>
</tr>
<tr>
<td></td>
<td>VGG19 &#x002B; SoftCmp</td>
<td>98.00</td>
<td>99.0</td>
<td>98.80</td>
<td>4.50%</td>
<td>[0.961, 0.999]</td>
<td>98.850</td>
<td>98.67</td>
<td>98.00</td>
</tr>
<tr>
<td>(Exp. 1) AMI &#x0026; SVM</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.00</bold></td>
<td><bold>99.5</bold></td>
<td><bold>99.10</bold></td>
<td><bold>5.50%</bold></td>
<td> <bold>[0.976, 1.000]</bold></td>
<td><bold>98.860</bold></td>
<td><bold>99.33</bold></td>
<td><bold>99.00</bold></td>
</tr>
<tr>
<td></td>
<td>ResNet50</td>
<td>96.00</td>
<td>96.5</td>
<td>96.40</td>
<td>Baseline</td>
<td>[0.933, 0.987]</td>
<td>98.650</td>
<td>96.67</td>
<td>96.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>98.00</td>
<td><bold>99.0</bold></td>
<td>98.40</td>
<td>2.00%</td>
<td>[0.961, 0.999]</td>
<td>98.895</td>
<td>98.67</td>
<td>98.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>98.00</td>
<td>98.5</td>
<td>98.30</td>
<td>2.00%</td>
<td>[0.961, 0.999]</td>
<td>98.895</td>
<td>98.67</td>
<td>98.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>98.50</bold></td>
<td><bold>99.0</bold></td>
<td><bold>98.60</bold></td>
<td><bold>2.50%</bold></td>
<td><bold>[0.968, 1.000]</bold></td>
<td><bold>98.903</bold></td>
<td><bold>99.00</bold></td>
<td><bold>98.50</bold></td>
</tr>
<tr>
<td></td>
<td>VGG19</td>
<td>95.50</td>
<td>98.5</td>
<td>97.40</td>
<td>Baseline</td>
<td>[0.926, 0.984]</td>
<td>98.888</td>
<td>97.00</td>
<td>95.50</td>
</tr>
<tr>
<td></td>
<td>VGG19 &#x002B; SoftCat</td>
<td>99.00</td>
<td><bold>99.5</bold></td>
<td>99.40</td>
<td>3.50%</td>
<td>[0.976, 1.000]</td>
<td>98.945</td>
<td>99.33</td>
<td>99.00</td>
</tr>
<tr>
<td></td>
<td>VGG19 &#x002B; SoftCmp</td>
<td>98.50</td>
<td><bold>99.5</bold></td>
<td>99.30</td>
<td>3.00%</td>
<td>[0.968, 1.000]</td>
<td>98.943</td>
<td>99.00</td>
<td>98.50</td>
</tr>
<tr>
<td>(Exp. 2) AMI &#x0026; Softmax</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.50</bold></td>
<td><bold>99.5</bold></td>
<td><bold>99.50</bold></td>
<td><bold>4.00%</bold></td>
<td> <bold>[0.985, 1.000]</bold></td>
<td><bold>98.953</bold></td>
<td><bold>99.67</bold></td>
<td><bold>99.50</bold></td>
</tr>
<tr>
<td></td>
<td>ResNet50</td>
<td>96.50</td>
<td>98.5</td>
<td>98.00</td>
<td>Baseline</td>
<td>[0.940, 0.990]</td>
<td>98.848</td>
<td>97.17</td>
<td>96.50</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>98.50</td>
<td>99.5</td>
<td>99.10</td>
<td>2.00%</td>
<td>[0.968, 1.000]</td>
<td>98.898</td>
<td>99.17</td>
<td>98.50</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>99.00</td>
<td><bold>100</bold></td>
<td>99.40</td>
<td>2.50%</td>
<td>[0.976, 1.000]</td>
<td>98.975</td>
<td>99.33</td>
<td>99.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.50</bold></td>
<td><bold>100</bold></td>
<td><bold>99.80</bold></td>
<td><bold>3.00%</bold></td>
<td><bold>[0.985, 1.000]</bold></td>
<td><bold>98.993</bold></td>
<td><bold>99.67</bold></td>
<td><bold>99.50</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-9fn1" fn-type="other">
<p>Note: Results in bold are the best-augmented performance per VGG19-based and ResNet50-based approaches.</p>
</fn>
</table-wrap-foot>
</table-wrap><fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>CMC performance of VGG19-based and ResNet50-based baseline and augmented approaches for ear biometric identification on AMI dataset. (<bold>a</bold>) Exp. 1 using the SVM classifier; (<bold>b</bold>) Exp. 2 using the Softmax-based classifier</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-6.tif"/>
</fig>
<p>Focusing on Exp. 1 using SVM, VGG19 &#x002B; SoftCat&#x0026;Cmp achieved the highest accuracy of 99% at R1, followed by ResNet50 &#x002B; SoftCat&#x0026;Cmp and VGG19 &#x002B; SoftCat with the same 98.5% accuracy. Despite the VGG19-based augmented approaches receiving higher CMC scores in initial ranks 1&#x2013;5, the ResNet50-based augmented approaches attained higher CMC-AUC scores, as shown in <xref ref-type="fig" rid="fig-6">Fig. 6a</xref>. Still, all augmented approaches demonstrated greater CMC-AUC than the VGG19 and ResNet50 baselines. The SoftCat traits slightly outperformed the SoftCmp traits in augmenting VGG19 and ResNet50 in some metrics, signifying that such categorical soft biometrics were more functional and interactive with VGG19 deep features and SVM. Nevertheless, SoftCat and SoftCmp performed more similarly in augmenting ResNet50 in Exp. 1.</p>

<p>In Exp. 2 using Softmax, ResNet50 &#x002B; SoftCat&#x0026;Cmp was the top-performing in all means, with an R1 accuracy of 99.5% and the greatest CMC-AUC of 98.99%. Interestingly, ResNet50 &#x002B; SoftCmp came second, gaining a better CMC curve and higher CMC-AUC and R5 scores than VGG19 &#x002B; SoftCat&#x0026;Cmp. Besides, it was the only augmented approach that achieved 100% accuracy at R5 in addition to the top-performing ResNet50 &#x002B; SoftCat&#x0026;Cmp. Here, once again, SoftCat was better in augmenting VGG19 identification performance. Dissimilarly, SoftCmp remarkably surpassed SoftCat by all better scores in augmenting ResNet50 identification performance, suggesting that such comparative soft biometrics (SoftCmp) was more successfully integrative and interactive with ResNet50 deep features together with Softmax for achieving augmented identification performance, despite using only about a third of the number of SoftCat traits. In <xref ref-type="fig" rid="fig-6">Fig. 6b</xref>, all augmented approaches attained greater CMC and CMC-AUC compared to their baselines.</p>

<p><xref ref-type="table" rid="table-9">Table 9</xref>, together with <xref ref-type="fig" rid="fig-6">Fig. 6</xref>, shows that soft biometrics effectively augment CNN deep features on AMI. All augmented approaches improve the baselines in identification. Adding SoftCat to deep features yields better synergistic effects when using the SVM classifier, whereas the integration between SoftCat and deep features is higher when using the Softmax classifier. The SoftCat&#x0026;Cmp combination achieves the best augmentation and maximum interaction because it integrates the potencies of both Cat and Cmp soft traits.</p>

</sec>
<sec id="s4_1_2">
<label>4.1.2</label>
<title>Augmented Ear Identification Using AMIC Dataset</title>
<p>Two further ear identification experiments Exp. 3 and 4 were conducted on AMIC, which comprised more challenging cropped ear images. Exp. 3 used SVM as a classifier, whereas Exp. 4 used Softmax instead. <xref ref-type="table" rid="table-10">Table 10</xref> reports Exp. 3 and 4 identification performance results on the AMIC dataset. <xref ref-type="fig" rid="fig-7">Fig. 7</xref> illustrates CMC curves, enabling a visual performance comparison. Generally, SoftCat&#x0026;Cmp-based augmented approaches were superior in augmenting the performance of VGG19 and ResNet50 baseline approaches using both SVM-based and Softmax-based classifiers, receiving their highest accuracy of 93% and 96%, respectively, with significant improvement rates reaching up to 14%. All SoftCmp-based augmented approaches, except VGG19 &#x002B; SoftCmp with SVM, outperformed their SoftCat-based augmented counterparts and obtained higher results and more prosperous identification augmentation on the more challenging AMIC data.</p>
<table-wrap id="table-10">
<label>Table 10</label>
<caption>
<title>Ear identification performance on AMIC dataset using SVM and Softmax-based classifiers</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">Dataset &#x0026; classifier</th>
<th align="center" rowspan="2">Approach</th>
<th colspan="5">Accuracy</th>
<th rowspan="2">CMC-AUC</th>
<th rowspan="2">Precision</th>
<th rowspan="2">Recall</th>
</tr>
<tr>
<th align="center">R1</th>
<th align="center">R5</th>
<th align="center">Avg. R1&#x2013;R5</th>
<th align="center">Improve</th>
<th align="center">CI (95%)</th>

</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>VGG19</td>
<td>75.50</td>
<td>85.0</td>
<td>81.00</td>
<td>Baseline</td>
<td>[0.695, 0.815]</td>
<td>96.353</td>
<td>81.15</td>
<td>75.50</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>87.00</td>
<td>92.0</td>
<td>89.70</td>
<td>11.50%</td>
<td>[0.823, 0.917]</td>
<td>97.460</td>
<td><bold>91.03</bold></td>
<td>87.00</td>
</tr>
<tr>
<td>(Exp. 3) AMIC </td>
<td>VGG19 &#x002B; SoftCmp</td>
<td>86.00</td>
<td>91.5</td>
<td>89.10</td>
<td>10.50%</td>
<td>[0.812, 0.908]</td>
<td>97.465</td>
<td>88.98</td>
<td>86.00</td>
</tr>
<tr>
<td>&#x0026; SVM</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>87.50</bold></td>
<td><bold>92.5</bold></td>
<td><bold>90.80</bold></td>
<td><bold>12.00%</bold></td>
<td> <bold>[0.829, 0.921]</bold></td>
<td><bold>97.668</bold></td>
<td>90.53</td>
<td><bold>87.50</bold></td>
</tr>
<tr>
<td/>
<td>ResNet50</td>
<td>87.00</td>
<td>91.0</td>
<td>89.40</td>
<td>Baseline</td>
<td>[0.823, 0.917]</td>
<td>97.600</td>
<td>88.67</td>
<td>87.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>92.00</td>
<td>97.0</td>
<td>94.90</td>
<td>5.00%</td>
<td>[0.882, 0.958]</td>
<td>98.680</td>
<td>93.58</td>
<td>92.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>92.50</td>
<td>96.5</td>
<td>94.90</td>
<td>5.50%</td>
<td>[0.888, 0.962]</td>
<td>98.613</td>
<td><bold>95.08</bold></td>
<td>92.50</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>93.50</bold></td>
<td><bold>98.0</bold></td>
<td><bold>96.40</bold></td>
<td><bold>6.50%</bold></td>
<td> <bold>[0.901, 0.969]</bold></td>
<td><bold>98.748</bold></td>
<td>94.83</td>
<td><bold>93.50</bold></td>
</tr>
<tr>
<td></td>
<td>VGG19</td>
<td>80.50</td>
<td>92.5</td>
<td>87.40</td>
<td>Baseline</td>
<td>[0.750, 0.860]</td>
<td>97.788</td>
<td>84.23</td>
<td>80.50</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>91.00</td>
<td>97.0</td>
<td>94.80</td>
<td>10.50%</td>
<td>[0.870, 0.950]</td>
<td>98.710</td>
<td>93.65</td>
<td>91.00</td>
</tr>
<tr>
<td></td>
<td>VGG19 &#x002B; SoftCmp</td>
<td>92.00</td>
<td>97.0</td>
<td>95.20</td>
<td>11.50%</td>
<td>[0.882, 0.958]</td>
<td>98.755</td>
<td>93.50</td>
<td>92.00</td>
</tr>
<tr>
<td>(Exp. 4) AMIC &#x0026; Softmax</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>93.00</bold></td>
<td><bold>98.5</bold></td>
<td><bold>95.90</bold></td>
<td><bold>12.50%</bold></td>
<td> <bold>[0.895, 0.965]</bold></td>
<td><bold>98.775</bold></td>
<td><bold>94.50</bold></td>
<td><bold>93.00</bold></td>
</tr>
<tr>
<td/>
<td>ResNet50</td>
<td>82.00</td>
<td>93.5</td>
<td>90.10</td>
<td>Baseline</td>
<td>[0.767, 0.873]</td>
<td>97.950</td>
<td>84.90</td>
<td>82.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>94.00</td>
<td><bold>99.0</bold></td>
<td>97.30</td>
<td>12.00%</td>
<td>[0.907, 0.973]</td>
<td>98.870</td>
<td>95.83</td>
<td>94.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>95.00</td>
<td><bold>99.0</bold></td>
<td>97.50</td>
<td>13.00%</td>
<td>[0.920, 0.980]</td>
<td>98.885</td>
<td>96.33</td>
<td>95.00</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>96.00</bold></td>
<td><bold>99.0</bold></td>
<td><bold>98.10</bold></td>
<td><bold>14.00%</bold></td>
<td> <bold>[0.933, 0.987]</bold></td>
<td><bold>98.910</bold></td>
<td><bold>97.33</bold></td>
<td><bold>96.00</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-10fn1" fn-type="other">
<p>Note: Results in bold are the best-augmented performance per VGG19-based and ResNet50-based approaches.</p>
</fn>
</table-wrap-foot>
</table-wrap><fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>CMC performance of VGG19-based and ResNet50-based baseline and augmented approaches for ear biometric identification on AMIC dataset. (<bold>a</bold>) Exp. 3 using the SVM classifier; (<bold>b</bold>) Exp. 4 using the Softmax-based classifier</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-7.tif"/>
</fig>
<p><xref ref-type="table" rid="table-10">Table 10</xref> and <xref ref-type="fig" rid="fig-7">Fig. 7</xref> confirm the contribution to significantly augmenting CNN deep features with soft biometrics in identification. SoftCmp appears more significant than SoftCat in augmenting deep features, reflecting its higher perceptual capabilities on challenging AMIC. Their SoftCat&#x0026;Cmp integration attains the highest overall improvement. Augmented ResNet50 approaches outperform augmented VGG19 counterparts, as they incorporate more features that more effectively interact with soft biometrics in the fusion.</p>

<p>Concerning Exp. 3 using SVM, VGG19 &#x002B; SoftCat&#x0026;Cmp attained the highest performance among the other VGG19-based approaches and was the only one to exceed the ResNet50 baseline, as in <xref ref-type="table" rid="table-10">Table 10</xref> and <xref ref-type="fig" rid="fig-7">Fig. 7a</xref>, which performed better than the VGG19 as a baseline. Although, VGG19 &#x002B; SoftCat&#x0026;Cmp and ResNet50 &#x002B; SoftCat&#x0026;Cmp were the most superior among their corresponding VGG19-based and ResNet50-based peers, VGG19 &#x002B; SoftCat and ResNet50 &#x002B; SoftCmp scored the highest precisions amongst the VGG19-based and ResNet50-based approaches, respectively. Based on the CMC performance representation in <xref ref-type="fig" rid="fig-7">Fig. 7a</xref>, all ResNet50-based augmented approaches surpassed all VGG19-based augmented approaches, with higher CMC-AUC ranging between 98.61% to 98.75%.</p>

<p>In Exp. 4 using Softmax, as consistently observed in <xref ref-type="table" rid="table-10">Table 10</xref> and <xref ref-type="fig" rid="fig-7">Fig. 7b</xref>, the performance improvement was more systematic, such that the SoftCat&#x0026;Cmp, SoftCmp, and SoftCat traits consecutively augmented the ResNet50 then VGG19 deep features, for all metrics. ResNet50 &#x002B; SoftCat&#x0026;Cmp significantly augmented identification to jump from 82% to 96% on a more challenging scenario using only available soft/hard biometric information, limited to AMIC&#x2019;s cropped ear images. VGG19 &#x002B; SoftCmp and ResNet50 &#x002B; SoftCmp, which were equipped with fewer discriminative comparative traits, consistently achieved higher performance in all aspects than VGG19 &#x002B; SoftCat and ResNet50 &#x002B; SoftCat augmented with categorical traits of around triple the number of the comparative traits.</p>

</sec>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Ear Biometric Verification</title>
<p>The second group of experiments (Exp. 5 to 8) was carried out on the ear biometric verification task concerning the scenario enforced in numerous real-life applications for human verification. In this task, an unseen ear sample for a claimed identity is used as a query biometric template to probe a gallery as a one-to-one recognition problem [<xref ref-type="bibr" rid="ref-1">1</xref>]. The system should consider the query by confirming whether the biometric template truly belongs to the claimed subject&#x2019;s identity and deciding whether to accept or reject it based on its authenticity to the previously enrolled templates for that claimed subject in the gallery/training set, as per the confidence level control. Verification standard performance evaluation metrics, described in <xref ref-type="sec" rid="s3_6">Section 3.6</xref>, and the improvement rate (improve) deduced as the percentile difference of (1 &#x2212; EER) between the baseline and the augmented performance, were used here to explore the proposed soft biometric augmentation capabilities and benchmark them against the traditional hard biometric deep features in isolation. The degree of confidence in accepting or rejecting a claimed identity was determined by varying the decision thresholds between 0 and 1 along the ROC curve while recalculating both TAR and FAR for each threshold.</p>
<sec id="s4_2_1">
<label>4.2.1</label>
<title>Augmented Ear Verification Using AMI Dataset</title>
<p>Augmenting ear verification was enforced when using SVM in Exp. 5 and Softmax in Exp. 6 classifiers on the AMI dataset. Ear verification performance is reported in <xref ref-type="table" rid="table-11">Table 11</xref> for Exp. 5 and 6, characterizing and comparing all standard and augmented approaches. Also, ROC performance representations are illustrated in <xref ref-type="fig" rid="fig-8">Fig. 8a</xref> for SVM-based approaches of Exp. 5 and <xref ref-type="fig" rid="fig-8">Fig. 8b</xref> for Softmax-based approaches of Exp. 6.</p>
<table-wrap id="table-11">
<label>Table 11</label>
<caption>
<title>Ear verification performance on AMI dataset using SVM and Softmax-based classifiers</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">Dataset&#x0026; classifier</th>
<th rowspan="2">Approach</th>
<th rowspan="2">ROC-AUC</th>
<th rowspan="2">EER</th>
<th colspan="3">Accuracy</th>
<th align="center" rowspan="2"><italic>d</italic><sup>&#x2032;</sup></th>
</tr>
<tr>

<th>1 &#x2212; EER</th>
<th>Improve</th>
<th>CI (95%)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>VGG19</td>
<td>99.6802</td>
<td>0.03404</td>
<td>96.60</td>
<td>Baseline</td>
<td>[0.941, 0.991]</td>
<td>0.362</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>99.7511</td>
<td>0.02035</td>
<td>97.96</td>
<td>1.37</td>
<td>[0.960, 0.999]</td>
<td><bold>0.395</bold></td>
</tr>
<tr>
<td>(Exp. 5) AMI &#x0026;</td>
<td>VGG19 &#x002B; SoftCmp</td>
<td><bold>99.8053</bold></td>
<td>0.02020</td>
<td>97.98</td>
<td>1.38</td>
<td>[0.960, 0.999]</td>
<td>0.380</td>
</tr>
<tr>
<td>SVM</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td>99.7548</td>
<td><bold>0.01869</bold></td>
<td><bold>98.13</bold></td>
<td><bold>1.54</bold></td>
<td> <bold>[0.963, 1.000]</bold></td>
<td>0.363</td>
</tr>
<tr>
<td/>
<td>ResNet50</td>
<td>99.6639</td>
<td>0.03343</td>
<td>96.66</td>
<td>Baseline</td>
<td>[0.942, 0.992]</td>
<td>0.383</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>99.8747</td>
<td>0.01606</td>
<td>98.39</td>
<td>1.74</td>
<td>[0.966, 1.000]</td>
<td>0.344</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>99.8421</td>
<td>0.01732</td>
<td>98.27</td>
<td>1.61</td>
<td>[0.965, 1.000]</td>
<td><bold>0.436</bold></td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.8893</bold></td>
<td><bold>0.01141</bold></td>
<td><bold>98.86</bold></td>
<td><bold>2.20</bold></td>
<td> <bold>[0.974, 1.000]</bold></td>
<td>0.377</td>
</tr>
<tr>
<td></td>
<td>VGG19</td>
<td>99.9752</td>
<td>0.01061</td>
<td>98.94</td>
<td>Baseline</td>
<td>[0.975,1.000]</td>
<td>2.730</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>99.9963</td>
<td>0.00712</td>
<td>99.29</td>
<td>0.35</td>
<td>[0.981, 1.000]</td>
<td>3.022</td>
</tr>
<tr>
<td>(Exp. 6) AMI &#x0026;</td>
<td>VGG19 &#x002B; SoftCmp</td>
<td>99.9959</td>
<td>0.00636</td>
<td>99.36</td>
<td>0.42</td>
<td>[0.982, 1.000]</td>
<td><bold>3.754</bold></td>
</tr>
<tr>
<td>Softmax</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.9968</bold></td>
<td><bold>0.00616</bold></td>
<td><bold>99.38</bold></td>
<td><bold>2.79</bold></td>
<td> <bold>[0.983, 1.000]</bold></td>
<td>2.759</td>
</tr>
<tr>
<td/>
<td>ResNet50</td>
<td>99.9668</td>
<td>0.01338</td>
<td>98.66</td>
<td>Baseline</td>
<td>[0.971,1.000]</td>
<td>2.548</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>99.9797</td>
<td>0.00308</td>
<td>99.69</td>
<td>1.03</td>
<td>[0.989, 1.000]</td>
<td>2.895</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>99.9964</td>
<td>0.00495</td>
<td>99.51</td>
<td>0.84</td>
<td>[0.985, 1.000]</td>
<td>2.756</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.9997</bold></td>
<td><bold>0.00056</bold></td>
<td><bold>99.94</bold></td>
<td><bold>1.28</bold></td>
<td> <bold>[0.996, 1.000]</bold></td>
<td><bold>3.073</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-11fn1" fn-type="other">
<p>Note: Results in bold are the best-augmented performance per VGG19-based and ResNet50-based approaches.</p>
</fn>
</table-wrap-foot>
</table-wrap><fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>ROC performance of VGG19-based and ResNet50-based baseline and augmented approaches for ear biometric verification on the AMI dataset. (<bold>a</bold>) Exp. 5 using the SVM classifier; (<bold>b</bold>) Exp. 5 using the Softmax-based classifier</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-8.tif"/>
</fig>
<p>In Exp. 5 using SVM, ResNet50 &#x002B; SoftCat&#x0026;Cmp attained the highest ROC-AUC of 99.89% and accuracy (1 &#x2212; EER) of 98.86% with the lowest EER of 0.011 scores over all augmented approaches. Whilst ResNet50 &#x002B; SoftCmp obtained the highest <italic>d</italic><sup>&#x2032;</sup> score overall. All proposed soft biometric traits augmented the performance of VGG19; however, the SoftCmp traits enhanced ROC-AUC better, the SoftCmp then SoftCat traits improved <italic>d</italic><sup>&#x2032;</sup> better, and their combination in SoftCat&#x0026;Cmp reduced EER better. As in <xref ref-type="fig" rid="fig-8">Fig. 8a</xref>, the ResNet50-based augmented approaches offered better verification performance with greater ROC-AUC and lower EER rates than the VGG19-based augmented approaches.</p>

<p><xref ref-type="table" rid="table-11">Table 11</xref>, along with <xref ref-type="fig" rid="fig-8">Fig. 8</xref>, provides supportive performance results for the contribution to augmenting deep features with soft biometrics in verification on the AMI dataset. SoftCat&#x0026;Cmp enforces the best augmented verification utilizing all possible viability of Cat and Cmp traits. All augmented approaches surpass the verification performance of the baselines. Here, SoftCat improves the ResNet50 performance better, whereas SoftCmp improves VGG19 better, due to the variability posed by different biometric scenarios.</p>

</sec>
<sec id="s4_2_2">
<label>4.2.2</label>
<title>Augmented Ear Verification Using AMIC Dataset</title>
<p>Additional ear verification experiments Exp. 7 and 8 were performed on the AMIC&#x2019;s more challenging data. <xref ref-type="table" rid="table-12">Table 12</xref> provides ear verification performance results for VGG19-based and ResNet50-based baselines and proposed approaches on the AMIC dataset, using SVM in Exp. 7 and Softmax in Exp. 8 as classifiers. Accordingly, the corresponding ROC performance for Exp. 7 and 8 are shown in <xref ref-type="fig" rid="fig-9">Fig. 9a</xref>,<xref ref-type="fig" rid="fig-9">b</xref>.</p>
<table-wrap id="table-12">
<label>Table 12</label>
<caption>
<title>Ear verification performance on AMIC dataset using SVM and Softmax-based classifiers</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">Dataset&#x0026; classifier</th>
<th rowspan="2">Approach</th>
<th rowspan="2">ROC-AUC</th>
<th align="center" rowspan="2">EER</th>
<th colspan="3">Accuracy</th>
<th align="center" rowspan="2"><italic>d</italic><sup>&#x2032;</sup></th>
</tr>
<tr>

<th>1 &#x2212; EER</th>
<th>Improve</th>
<th>CI (95%)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>VGG19</td>
<td>97.0160</td>
<td>0.09768</td>
<td>90.23</td>
<td>Baseline</td>
<td>[0.861, 0.943]</td>
<td>0.436</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>98.3036</td>
<td>0.06207</td>
<td>93.79</td>
<td>3.56</td>
<td>[0.904, 0.971]</td>
<td>0.442</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCmp</td>
<td>98.3089</td>
<td><bold>0.05571</bold></td>
<td><bold>94.43</bold></td>
<td><bold>4.20</bold></td>
<td> <bold>[0.913, 0.976]</bold></td>
<td><bold>0.528</bold></td>
</tr>
<tr>
<td>(Exp. 7) AMIC &#x0026;</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>98.3511</bold></td>
<td>0.05934</td>
<td>94.07</td>
<td>3.83</td>
<td>[0.908, 0.973]</td>
<td>0.417</td>
</tr>
<tr>
<td>SVM</td>
<td>ResNet50</td>
<td>98.5427</td>
<td>0.07045</td>
<td>92.95</td>
<td>Baseline</td>
<td>[0.894, 0.965]</td>
<td>0.391</td>
</tr>
<tr>
<td></td>
<td>ResNet50 &#x002B; SoftCat</td>
<td>99.4578</td>
<td>0.03672</td>
<td>96.33</td>
<td>3.37</td>
<td>[0.937, 0.989]</td>
<td>0.359</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>99.4277</td>
<td>0.04242</td>
<td>95.76</td>
<td>2.80</td>
<td>[0.930, 0.986]</td>
<td><bold>0.394</bold></td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.5124</bold></td>
<td><bold>0.03283</bold></td>
<td><bold>96.72</bold></td>
<td><bold>3.76</bold></td>
<td> <bold>[0.943, 0.992]</bold></td>
<td>0.340</td>
</tr>
<tr>
<td></td>
<td>VGG19</td>
<td>99.3582</td>
<td>0.03899</td>
<td>96.10</td>
<td>Baseline</td>
<td>[0.934, 0.988]</td>
<td>2.448</td>
</tr>
<tr>
<td/>
<td>VGG19 &#x002B; SoftCat</td>
<td>99.9140</td>
<td>0.01677</td>
<td>98.32</td>
<td>2.22</td>
<td>[0.965, 1.000]</td>
<td>3.759</td>
</tr>
<tr>
<td></td>
<td>VGG19 &#x002B; SoftCmp</td>
<td>99.9262</td>
<td>0.01359</td>
<td>98.64</td>
<td>2.54</td>
<td>[0.970, 1.000]</td>
<td><bold>5.561</bold></td>
</tr>
<tr>
<td>(Exp. 8) AMIC &#x0026;</td>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.9416</bold></td>
<td><bold>0.00929</bold></td>
<td><bold>99.07</bold></td>
<td><bold>2.97</bold></td>
<td> <bold>[0.977, 1.000]</bold></td>
<td>4.156</td>
</tr>
<tr>
<td>Softmax</td>
<td>ResNet50</td>
<td>99.5503</td>
<td>0.03197</td>
<td>96.80</td>
<td>Baseline</td>
<td>[0.944, 0.992]</td>
<td>3.021</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat</td>
<td>99.9683</td>
<td>0.01091</td>
<td>98.91</td>
<td>2.11</td>
<td>[0.975, 1.000]</td>
<td>4.841</td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>99.9727</td>
<td>0.00803</td>
<td>99.20</td>
<td>2.39</td>
<td>[0.980, 1.000]</td>
<td><bold>5.047</bold></td>
</tr>
<tr>
<td/>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.9804</bold></td>
<td><bold>0.00657</bold></td>
<td><bold>99.34</bold></td>
<td><bold>2.54</bold></td>
<td> <bold>[0.982, 1.000]</bold></td>
<td>4.826</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-12fn1" fn-type="other">
<p>Note: Results in bold are the best-augmented performance per VGG19-based and ResNet50-based approaches.</p>
</fn>
</table-wrap-foot>
</table-wrap><fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>ROC performance of VGG19-based and ResNet50-based baseline and augmented approaches for ear biometric verification on the AMIC dataset. (<bold>a</bold>) Exp. 7 using the SVM classifier; (<bold>b</bold>) Exp. 8 using the Softmax-based classifier</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-9.tif"/>
</fig>
<p>The results in <xref ref-type="table" rid="table-12">Table 12</xref> and <xref ref-type="fig" rid="fig-9">Fig. 9</xref> emphasize that SoftCat&#x0026;Cmp enforces the most augmentation for CNN deep features and best performance in challenging verification on AMIC. SoftCmp is the second-best performer, revealing increased discrimination with the highest genuine-imposter separability. SoftCmp traits offer several advantages over SoftCat traits, especially in verification, as they can detect and characterize subtle differences between compared individuals based on specific soft biometric attributes.</p>

<p>In Exp. 7, using SVM, ResNet50 &#x002B; SoftCat&#x0026;Cmp received the greatest ROC-AUC of 99.51% and accuracy (1 &#x2212; EER) of 96.72% associated with the lowest EER of 0.0328, though ResNet50 &#x002B; SoftCmp scored a higher <italic>d</italic><sup>&#x2032;</sup> of 0.394 and VGG19 &#x002B; SoftCmp scored the highest <italic>d</italic><sup>&#x2032;</sup> of 0.528. Notably, the SoftCmp traits showed their efficient capabilities in augmenting verification performance, where VGG19 &#x002B; SoftCmp surpassed VGG19 &#x002B; SoftCat&#x0026;Cmp with a lower EER and higher <italic>d</italic><sup>&#x2032;</sup> score, while VGG19 &#x002B; SoftCat&#x0026;Cmp still had a larger ROC-AUC. <xref ref-type="fig" rid="fig-9">Fig. 9a</xref> visually characterizes and compares the ROC curves of all approaches using SVM on AMIC. It can also be observed here that the ResNet50-based approaches outperformed the VGG19-based approaches. This finding indicates that the proposed ResNet50-based augmented approaches are more reliable for robust ear verification systems.</p>

<p>In Exp. 8, using Softmax, the SoftCat&#x0026;Cmp traits continuously attained the highest augmented verification results involving ROC-AUC, accuracy (1 &#x2212; EER), and EER. However, SoftCmp was superior in gaining the highest genuine/imposter separability by <italic>d</italic><sup>&#x2032;</sup> of 5.56 for VGG19 &#x002B; SoftCmp and 5.05 for ResNet50 &#x002B; SoftCmp. Hence, this nominated the SoftCmp traits as more viable discriminative traits for augmenting biometric verification than SoftCat. As shown in <xref ref-type="fig" rid="fig-9">Fig. 9b</xref> and <xref ref-type="table" rid="table-12">Table 12</xref> results, ResNet50 &#x002B; SoftCat&#x0026;Cmp achieved the best-augmented performance with a 99.98% ROC-AUC, 99.34% accuracy, and 0.0066 EER, followed by ResNet50 &#x002B; SoftCmp then VGG19 &#x002B; SoftCat&#x0026;Cmp.</p>

</sec>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Overall Performance Summary and Comparison with Related Studies</title>
<p>The overall ear identification and verification performance for all approaches used across the extended experimental work can be summarized as follows. <xref ref-type="table" rid="table-13">Table 13</xref> highlights selected key performance indicators for ear identification and verification, using both SVM and Softmax classifiers on both AMI and AMIC datasets, where the eight approaches are ranked by their overall performance measured by all major evaluation metrics. Furthermore, <xref ref-type="fig" rid="fig-10">Fig. 10</xref> accordingly visualizes the overall performance comparison of CMC-based metrics for identification and ROC-based metrics for verification and emphasizes the concluded overall rank of all eight approaches. Besides, <xref ref-type="table" rid="table-14">Table 14</xref> compares this research with several related studies.</p>
<table-wrap id="table-13">
<label>Table 13</label>
<caption>
<title>Summary-analysis of overall ear identification and verification performance on AMI and AMIC using SVM and Softmax, where approaches are ranked based on key performance indicators</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col/>
<col/>
<col/>
<col align="center"/>
<col/>
<col/>
<col/>
<col align="center"/>
<col/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="3">Approach</th>
<th colspan="4">Identification performance</th>
<th colspan="5">Verification performance</th>
<th align="center" rowspan="3">Overall rank</th>
</tr>
<tr>
<th colspan="3">R1 accuracy (Acc.)</th>
<th align="center" rowspan="2">Max CMC-<break/>AUC</th>
<th colspan="3">(1 &#x2212; ERR) accuracy (Acc.)</th>
<th align="center" rowspan="2">Max ROC-<break/>AUC</th>
<th rowspan="2">Max <italic>d</italic><sup>&#x2032;</sup></th>

</tr>
<tr>
<th>Max</th>
<th>Avg.</th>
<th>Improve</th>
<th>Max</th>
<th>Avg.</th>
<th>Improve</th>
</tr>
</thead>
<tbody>
<tr>
<td>VGG19</td>
<td>95.50</td>
<td>86.25</td>
<td>Baseline</td>
<td>98.888</td>
<td>98.94</td>
<td>93.13</td>
<td>Baseline</td>
<td>99.9752</td>
<td>2.730</td>
<td>8th</td>
</tr>
<tr>
<td>VGG19 &#x002B; SoftCat</td>
<td>99.00</td>
<td>93.88</td>
<td>11.50%</td>
<td>98.945</td>
<td>99.29</td>
<td>95.44</td>
<td>3.56%</td>
<td>99.9963</td>
<td>3.022</td>
<td>6th</td>
</tr>
<tr>
<td>VGG19 &#x002B; SoftCmp</td>
<td>98.50</td>
<td>93.63</td>
<td>11.50%</td>
<td>98.943</td>
<td>99.36</td>
<td>98.56</td>
<td>4.20%</td>
<td>99.9959</td>
<td><bold>3.754</bold></td>
<td>5th</td>
</tr>
<tr>
<td>VGG19 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.50</bold></td>
<td><bold>94.75</bold></td>
<td><bold>12.50%</bold></td>
<td><bold>98.953</bold></td>
<td><bold>99.38</bold></td>
<td><bold>98.03</bold></td>
<td><bold>3.83%</bold></td>
<td><bold>99.9968</bold></td>
<td>2.759</td>
<td><bold>4th</bold></td>
</tr>
<tr>
<td>ResNet50</td>
<td>96.50</td>
<td>90.38</td>
<td>Baseline</td>
<td>98.848</td>
<td>98.66</td>
<td>97.67</td>
<td>Baseline</td>
<td>99.9668</td>
<td>2.548</td>
<td>7th</td>
</tr>
<tr>
<td>ResNet50 &#x002B; SoftCat</td>
<td>98.50</td>
<td>95.63</td>
<td>12.00%</td>
<td>98.898</td>
<td>99.69</td>
<td>98.04</td>
<td>3.37%</td>
<td>99.9797</td>
<td>2.895</td>
<td>3rd</td>
</tr>
<tr>
<td>ResNet50 &#x002B; SoftCmp</td>
<td>99.00</td>
<td>96.13</td>
<td>13.00%</td>
<td>98.975</td>
<td>99.51</td>
<td>99.24</td>
<td>2.80%</td>
<td>99.9964</td>
<td>2.756</td>
<td>2nd</td>
</tr>
<tr>
<td>ResNet50 &#x002B; SoftCat&#x0026;Cmp</td>
<td><bold>99.50</bold></td>
<td><bold>96.88</bold></td>
<td><bold>14.00%</bold></td>
<td><bold>98.993</bold></td>
<td><bold>99.94</bold></td>
<td><bold>99.45</bold></td>
<td><bold>3.76%</bold></td>
<td><bold>99.9997</bold></td>
<td><bold>3.073</bold></td>
<td><bold>1st</bold></td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-13fn1" fn-type="other">
<p>Note: Results in bold are the best-augmented performance per VGG19-based and ResNet50-based approaches.</p>
</fn>
</table-wrap-foot>
</table-wrap><fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Overall performance summary of CMC-based identification metrics and ROC-based verification metrics on AMI and AMIC using SVM and Softmax, where approaches are ranked from the highest to the lowest performance</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-10.tif"/>
</fig><table-wrap id="table-14">
<label>Table 14</label>
<caption>
<title>Comparison with most relevant studies using machine and deep learning (ML) and (DL)</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center" rowspan="2">[Ref.]</th>
<th align="center" rowspan="2">Biometric method&#x0026; task</th>
<th align="center" rowspan="2">Dataset</th>
<th>Hard</th>
<th colspan="3">Soft biometrics</th>
<th rowspan="2">Fusion</th>
<th rowspan="2">Classifier</th>
<th align="center" rowspan="2">R1 Acc./1&#x2212; EER</th>
<th align="center" rowspan="2">Improve</th>
</tr>
<tr>

<th>Biometrics</th>
<th>Categorical</th>
<th>Relative</th>
<th>Comparative</th>

</tr>
</thead>
<tbody>
<tr>
<td>[<xref ref-type="bibr" rid="ref-19">19</xref>]</td>
<td>DL GAN<break/>augmented &#x0026; identification</td>
<td>AMI</td>
<td>EarNet &#x002B; Pix2Pix GAN</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>GAN discriminator</td>
<td>98.0%</td>
<td>5.0%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-12">12</xref>]</td>
<td>DL vision augmented &#x0026; identification</td>
<td>AMI</td>
<td>ResNet-50</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>FC Linear classifier</td>
<td>99.35%</td>
<td>1.55%</td>
</tr>
<tr>
<td/>
<td/>
<td>EarNV 1.0</td>
<td>EfficientNet-B7</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>98.1%</td>
<td>8.8%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-20">20</xref>]</td>
<td>DL GAN augmented &#x0026; identification</td>
<td>AMI</td>
<td>DCGAN &#x002B; AlexNet/VGG-(16/19)</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>Softmax</td>
<td>96.0%</td>
<td>0.5%</td>
</tr>
<tr>
<td/>
<td/>
<td>AWE</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>50.53%</td>
<td>3.28%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-35">35</xref>]</td>
<td>DL DeepBio &#x0026; identification</td>
<td>AMI</td>
<td>CNN &#x002B; Bi-LSTM</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>Feature-level</td>
<td>Softmax</td>
<td>98.57%</td>
<td>0.47%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-5">5</xref>]</td>
<td>DL ensembles &#x0026;<break/>identification</td>
<td>AMI</td>
<td>VGG-(13, 16, 19)</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>Score-level</td>
<td>Avg. post probability of Log-Softmax</td>
<td>97.5%</td>
<td>3.21%</td>
</tr>
<tr>
<td/>
<td/>
<td>AMIC</td>
<td>VGG-(11, 13, 16, 19)</td>
<td>93.21%</td>
<td>3.92%</td>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td/>
<td/>
<td>WPUT</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>79.08%</td>
<td>8.03%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-22">22</xref>]</td>
<td>ML/DL &#x0026; identification</td>
<td>IIT Delhi1</td>
<td>AlexNet</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>Classifier-level</td>
<td>Softmax</td>
<td>94.29</td>
<td>0.79%</td>
</tr>
<tr>
<td/>
<td/>
<td>AMI</td>
<td>PCA-subsets of ResNet-50</td>
<td/>
<td/>
<td/>
<td/>
<td>Ensemble classifier</td>
<td>99.45%</td>
<td>0.45%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-23">23</xref>]</td>
<td>DL feature extraction &#x0026; identification</td>
<td>AMI</td>
<td>CFDCNet based on DenseNet-121</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>Feature-level</td>
<td>FC linear classifier</td>
<td>99.7%</td>
<td>2.7%</td>
</tr>
<tr>
<td/>
<td/>
<td>AWE</td>
<td>72.7%</td>
<td>10.7%</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-11">11</xref>]</td>
<td>ML &#x0026; identification</td>
<td>AWE</td>
<td>LGBP</td>
<td>3 traits</td>
<td>&#x00D7;</td>
<td>&#x00D7;</td>
<td>Score-level</td>
<td>Bayesian</td>
<td>59.5%</td>
<td>5.3%</td>
</tr>
<tr>
<td>[<xref ref-type="bibr" rid="ref-9">9</xref>]</td>
<td>ML &#x0026; identification</td>
<td>Newborn Database</td>
<td>Haar wavelet transform</td>
<td>2 traits</td>
<td>2 traits</td>
<td>&#x00D7;</td>
<td>Score-level</td>
<td>Bayesian</td>
<td>90.7%</td>
<td>5.59%</td>
</tr>
<tr>
<td rowspan="3">[<xref ref-type="bibr" rid="ref-1">1</xref>]</td>
<td>ML &#x0026; identification</td>
<td>AMIC</td>
<td>LBP &#x002B; PCA</td>
<td>20 traits</td>
<td>13 traits</td>
<td>&#x00D7;</td>
<td>Feature-level</td>
<td>k-nearest neighbor (KNN)</td>
<td>86.0%</td>
<td>12.0%</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>SVM</td>
<td>84.5%</td>
</tr>
<tr>
<td/>
<td>verification</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>KNN</td>
<td>92.9%</td>
<td>8.4%</td>
</tr>
<tr>
<td>This study [ours]</td>
<td>DL feature extraction &#x0026; identification</td>
<td>AMI</td>
<td>VGG-19</td>
<td>21 traits</td>
<td>13 traits</td>
<td>13 traits</td>
<td>Feature-level</td>
<td>Softmax</td>
<td>99.5%</td>
<td>4.0%</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td>ResNet-50</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>99.5%</td>
<td>3.0%</td>
</tr>
<tr>
<td/>
<td/>
<td>AMIC</td>
<td>VGG-19</td>
<td/>
<td/>
<td/>
<td/>
<td>SVM</td>
<td>93.5%</td>
<td>6.5%</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td>ResNet-50</td>
<td>Softmax</td>
<td/>
<td/>
<td/>
<td/>
<td>96.0%</td>
<td>14.0%</td>
</tr>
<tr>
<td/>
<td>DL feature extraction &#x0026; verification</td>
<td>AMI</td>
<td>VGG-19</td>
<td>21 traits</td>
<td>13 traits</td>
<td>13 traits</td>
<td>Feature-level</td>
<td>Softmax</td>
<td>99.38%</td>
<td>2.79%</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td>ResNet-50</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>99.94%</td>
<td>1.28%</td>
</tr>
<tr>
<td/>
<td/>
<td>AMIC</td>
<td>VGG-19</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>99.07%</td>
<td>2.97%</td>
</tr>
<tr>
<td/>
<td/>
<td/>
<td>ResNet-50</td>
<td/>
<td/>
<td/>
<td/>
<td/>
<td>99.34%</td>
<td>2.54%</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Based on the overall performance reported in <xref ref-type="table" rid="table-13">Table 13</xref> and <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, regardless of which SVM or Softmax was used, all six proposed augmented approaches successfully enhanced the deep-feature performance. Consequently, the standard VGG19 and ResNet50 approaches without augmentation ranked last as seventh and eighth overall, stressing the effectiveness of all augmented approaches. Where either the AMI or AMIC dataset was used, as shown in <xref ref-type="table" rid="table-13">Table 13</xref>, all R1 accuracy max scores were gained using Softmax as the classifier. As such, Softmax is better-performing and more efficient than SVM in such ear recognition problems.</p>

<p>All three ResNet50-based augmented approaches were assigned overall ranks from 1st to 3rd since they outperformed all three VGG19-based approaches, which were assigned lower overall ranks from 4th to 6th. Hence, compared to VGG19, the ResNet50 deep features are more reliable and discriminative in the ear identification/verification context, especially when fused with soft biometrics. VGG19 &#x002B; SoftCat&#x0026;Cmp and ResNet50 &#x002B; SoftCat&#x0026;Cmp were the top performers within their respective VGG19-based and ResNet50-based groups. ResNet50 &#x002B; SoftCat&#x0026;Cmp ranked first among all, owing to its supremacy in all evaluations. Thus, SoftCat&#x0026;Cmp can enforce the utmost augmentation for VGG19 and ResNet50 deep features by combining both categorical and comparative soft biometric capabilities.</p>
<p>ResNet50 &#x002B; SoftCmp was ranked second-best, indicating that SoftCmp empowers more perceptive and informative comparative soft biometrics than those categorical soft biometrics used in SoftCat. Exceptionally, VGG19 &#x002B; SoftCmp obtained the highest overall <italic>d</italic><sup>&#x2032;</sup> in verification, enabling increased separability between the genuine and imposter populations for more confident accept/reject thresholding. The SoftCmp traits augmented the ResNet50 deep features better, whereas the SoftCat traits were better in augmenting the VGG19 deep features. That means the comparative soft biometric traits (SoftCmp) are more viable and contribute most to augmenting ResNet50. In contrast, the categorical soft biometric traits (SoftCat) are more efficacious and integrate most with VGG19 for augmenting performance.</p>
<p>The primary focus of this research is to investigate the capabilities of newly proposed ear soft biometrics in augmenting the performance of traditional hard biometric CNN-based deep features. Hence, as an initial novel study, all experiments were conducted using AMI and AMIC datasets along with their available raw soft labels, while varying recognition scenarios, hard biometric deep-feature extractors, soft biometric augmentation methods, and matching classifiers. Through these variations, the performance variability and generalizability were evaluated, with all results confirming the robustness and superiority of the proposed augmented approaches in all respects. AMI and AMIC were utilized as different variants of widely used standard benchmark datasets, posing various difficulty levels and challenge aspects [<xref ref-type="bibr" rid="ref-1">1</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-23">23</xref>]. They were chosen over others due to the availability of their raw soft labels, making them most suited for the current research context and initial study. However, they still can be replaced with any other datasets after soft label annotation for all ear images. In terms of maximizing dataset generalizability, the variations in illumination, occlusion, deformation, and ear accessories pose further challenging conditions to consider in a forthcoming extended exploration of the proposed framework using other datasets that incorporate such conditions, e.g., WPUT, EarVN1.0, and USTB, for extensive cross-dataset performance validation and comparison.</p>
<p>Several remarks are noteworthy in the overview of the computational complexity anticipated for the proposed augmented approaches. In all six augmented approaches, given that both VGG-19 and ResNet-50 pre-trained models are used merely as backbones for deep feature extraction, which is the most computationally intensive. Then, the extracted 512 VGG-19-based and 2048 ResNet-50-based deep features are supplemented by fusing them with different combinations of either 13, 34, or 47 soft biometrics. As such, each augmented approach introduces moderate computational complexity, which primarily relies on the conventional processes for deep feature extraction. Namely, the additional fusion process of low-dimensional soft biometric groups poses only a negligible footprint in comparison. RankSVM is conducted only once for each of the 13 comparable soft biometric attributes as a separate prior process needed to learn a ranking function. However, its computational complexity is considerably minimized by generating only about 20% of all possible pairwise comparisons and then using them to enforce the desired ordering, while using those learned ranking functions for mapping relative features to extract comparative soft biometrics is also negligible. Thus, all newly added processes in our proposed augmented approaches yield a tractable representation that enables scalable ear biometric recognition with minimal impact on runtime. The overall architecture offers high discriminative power, achieving significant performance improvements with manageable memory and computational demands, particularly during identification and verification.</p>
<p>Eventually, the current research study was further compared with several related studies to gain a deeper understanding and differentiation. <xref ref-type="table" rid="table-14">Table 14</xref> compares the different characteristics and performance aspects of this research with other most relevant studies. The comparisons highlight its contributions and advantages over existing literature. They also accentuate the promising ear recognition results of the proposed soft biometric-based augmented approaches, which offer competitive performance to earlier relevant hard and soft biometric approaches utilizing machine learning or deep learning technologies.</p>

</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusions</title>
<p>This research study introduces a novel framework empowered by increased discriminatory soft biometrics to augment CNN-based ear biometric recognition. The framework extracts a group of fine-grained categorical and newly proposed comparative soft biometrics as more perceptive traits. It also extracts VGG and ResNet deep features as traditional hard biometric traits. It then enforces feature-level fusion of hard biometric deep features with different combinations of categorical and comparative soft biometric traits, resulting in several augmented approaches. It finally conducts multiple human identification and verification experiments to evaluate, analyze, and compare the performance of augmented vs. unaugmented approaches while varying ear image datasets, hard biometric deep-feature extractors, and classifiers.</p>
<p>Indeed, soft biometrics can effectively augment CNN deep features for enhanced ear recognition. Comparative soft biometric traits can offer increased discrimination and augmentation capabilities compared to categorical traits, even when fewer comparative traits are used in isolation. Combining both categorical and comparative soft biometrics and integrating their capabilities can improve recognition even further. The experimental investigation reveals significantly augmented identification and verification and promising performance results, which reach up to 99.94% accuracy and improve up to 14%.</p>
<p>The availability of categorical soft biometrics for large-scale ear image datasets is a possible limitation facing this study, signifying a priority initiative to also automate the categorical soft biometric labeling process as potential future work. Such an initiative is motivated by this study, and its pursuance can be inspired by the practical automatic comparative soft biometric labeling and feature extraction by the proposed framework to advance this promising domain. Another limitation to consider in future work is investigating the expected invariance capabilities of the proposed comparative-based soft biometrics on other challenging datasets, such as WPUT, EarVN1.0, and USTB, with various deformations, occlusions, illuminations, and ear accessories. The proposed method can also supplement other ear-dedicated methods and compare with their standard performance.</p>
<p>In future venues, the proposed augmented ear biometric approaches can be extended to further helpful applications and more problem-specific biometric scenarios for various person identification, verification, re-identification, and retrieval. They can also be devoted to contributing to multimodal biometric fields, such as using them as hard-soft ear biometrics along with face biometrics for augmenting side/profile face recognition. Further extended analysis of potential correlations between the fusible soft and hard features may provide a better understanding and practical insights for future investigations.</p>
</sec>
</body>
<back>
<ack>
<p>The author acknowledges with thanks KAU Endowment (WAQF) and the Deanship of Scientific Research (DSR) at King Abdulaziz University (KAU), Jeddah, Saudi Arabia, for financial support for this research publication. The author would also like to thank Ms. Ghoroub Talal Bostaji for collaborating in an earlier related research study, yielding the raw soft label data utilized in this research.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This research was funded by WAQF at King Abdulaziz University, Jeddah, Saudi Arabia.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The datasets used in this article are available in the AMI Ear Database [<xref ref-type="bibr" rid="ref-30">30</xref>] and the AMI-Based AMIC and Ear Categorical Soft Biometric Labels datasets [<xref ref-type="bibr" rid="ref-1">1</xref>].</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The author declares no conflicts of interest to report regarding the present study.</p>
</sec>
<app-group id="appg-1">
<app id="app-1">
<title>Appendix A</title>
<fig id="fig-11">
<label>Figure A1</label>
<caption>
<title>Human ear anatomical structure, where the overlay-colored parts are the significant for physical feature extraction (hard biometrics) and the most discriminative, describable, and comparable for practical soft biometrics</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMES_68681-fig-11.tif"/>
</fig>
<table-wrap id="table-15">
<label>Table A1</label>
<caption>
<title>Ear soft biometric attributes and their corresponding categorical/relative labels</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Ear Zone</th>
<th align="center">ID</th>
<th align="center">Soft biometric attribute</th>
<th align="center">Categorical/Relative labels</th>
</tr>
</thead>
<tbody>
<tr>
<td rowspan="4">General</td>
<td><bold>A1</bold></td>
<td><bold>Ear shape</bold><sup>1</sup></td>
<td><bold>[very simple, simple, medium, complex, very complex]</bold></td>
</tr>
<tr>

<td><bold>A2</bold></td>
<td><bold>Width to length ratio</bold></td>
<td><bold>[much lower, lower, similar, higher, much higher, can&#x2019;t see]</bold></td>
</tr>
<tr>

<td><bold>A3</bold></td>
<td><bold>Ear coverage</bold></td>
<td><bold>[none, slight, fair, most, all]</bold></td>
</tr>
<tr>

<td>A4</td>
<td>Ear abnormality</td>
<td>[normal, abnormal]</td>
</tr>
<tr>
<td rowspan="4">Scapha</td>
<td><bold>A5</bold></td>
<td><bold>Scapha size</bold></td>
<td><bold>[very small, small, medium, large, very large]</bold></td>
</tr>
<tr>

<td>A6</td>
<td>Scapha shape</td>
<td>[flat, convex, other]</td>
</tr>
<tr>

<td>A7</td>
<td>Scapha piercing</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>

<td>A8</td>
<td>Scapha holes</td>
<td>[none, single, double, multiple]</td>
</tr>
<tr>
<td rowspan="4">Helix</td>
<td>A9</td>
<td>Helix shape</td>
<td>[bent, flat]</td>
</tr>
<tr>

<td>A10</td>
<td>Antihelix shape</td>
<td>[prominent, normal, flat]</td>
</tr>
<tr>

<td>A11</td>
<td>Helix piercing</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>

<td>A12</td>
<td>Helix holes</td>
<td>[none, single, double, multiple]</td>
</tr>
<tr>
<td rowspan="5">Earlobe</td>
<td><bold>A13</bold></td>
<td><bold>Earlobe length</bold></td>
<td><bold>[very short, short, average, long, very long]</bold></td>
</tr>
<tr>

<td><bold>A14</bold></td>
<td><bold>Earlobe size</bold></td>
<td><bold>[small, medium, large]</bold></td>
</tr>
<tr>

<td>A15</td>
<td>Earlobe holes</td>
<td>[none, single, double, multiple]</td>
</tr>
<tr>

<td>A16</td>
<td>Earlobe connected with face</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>

<td>A17</td>
<td>Earlobe piercing</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>
<td rowspan="4">Tragus</td>
<td><bold>A18</bold></td>
<td><bold>Space between tragus &#x0026; antitragus</bold></td>
<td><bold>[small, medium, large]</bold></td>
</tr>
<tr>

<td><bold>A19</bold></td>
<td><bold>Tragus thicknesses</bold></td>
<td><bold>[thin, average, thick]</bold></td>
</tr>
<tr>

<td>A20</td>
<td>Tragus shape</td>
<td>[curved, straight, other]</td>
</tr>
<tr>

<td>A21</td>
<td>Tragus piercing</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>
<td rowspan="2">Ear hair</td>
<td><bold>A22</bold></td>
<td><bold>Hair density</bold></td>
<td><bold>[none, little, some, much, very much]</bold></td>
</tr>
<tr>

<td>A23</td>
<td>Hair color</td>
<td>[none, black, white, gray, blonde, brown, red, brunette]</td>
</tr>
<tr>
<td rowspan="8">Skin</td>
<td><bold>A24</bold></td>
<td><bold>Skin mole</bold></td>
<td><bold>[none, very few, few, many, too many]</bold></td>
</tr>
<tr>

<td><bold>A25</bold></td>
<td><bold>Skin spots</bold></td>
<td><bold>[none, minimal, fair, marked, prominent]</bold></td>
</tr>
<tr>

<td><bold>A26</bold></td>
<td><bold>Skin crusts</bold></td>
<td><bold>[none, minimal, fair, marked, prominent]</bold></td>
</tr>
<tr>

<td><bold>A27</bold></td>
<td><bold>Skin tone</bold></td>
<td><bold>[very light, light, medium, dark, very dark]</bold></td>
</tr>
<tr>

<td>A28</td>
<td>Skin color</td>
<td>[white, oriental, tanned, brown, black]</td>
</tr>
<tr>

<td>A29</td>
<td>Skin texture</td>
<td>[smooth, wrinkles, can&#x2019;t see]</td>
</tr>
<tr>

<td>A30</td>
<td>Pre-auricular skin tag</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>

<td>A31</td>
<td>Pre-auricular vertical lines</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>
<td rowspan="2">Accessories</td>
<td>A32</td>
<td>Tattoos</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>

<td>A33</td>
<td>Earphone presence</td>
<td>[yes, no, can&#x2019;t see]</td>
</tr>
<tr>
<td><italic>Global</italic></td>
<td><italic>A34</italic></td>
<td><italic>Gender</italic><sup>2</sup></td>
<td>[<italic>male, female</italic>]</td>
</tr>
</tbody>
</table>
<table-wrap-foot>
<fn id="table-15fn1" fn-type="other">
<p>Note: <sup>1</sup>All 13 bolded soft biometric attributes described using relative labels to derive relative-based categorical soft biometrics. <sup>2</sup>A global soft biometric attribute newly annotated added to the others.</p>
</fn>
</table-wrap-foot>
</table-wrap>
</app>
</app-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Talal Bostaji</surname> <given-names>G</given-names></string-name>, <string-name><surname>Sami Jaha</surname> <given-names>E</given-names></string-name></person-group>. <article-title>Fine-grained soft ear biometrics for augmenting human recognition</article-title>. <source>Comput Syst Sci Eng</source>. <year>2023</year>;<volume>47</volume>(<issue>2</issue>):<fpage>1571</fpage>&#x2013;<lpage>91</lpage>. doi:<pub-id pub-id-type="doi">10.32604/csse.2023.039701</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nelufule</surname> <given-names>N</given-names></string-name>, <string-name><surname>Mabuza-Hocquet</surname> <given-names>G</given-names></string-name>, <string-name><surname>de Kock</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Circular interpolation techniques towards accurate segmentation of iris biometric images for infants</article-title>. In: <conf-name>Proceedings of the 2020 International SAUPEC/RobMech/PRASA Conference; 2020 Jan 29&#x2013;31</conf-name>; <publisher-loc>Cape Town, South Africa</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/saupec/robmech/prasa48453.2020.9041135</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Benzaoui</surname> <given-names>A</given-names></string-name>, <string-name><surname>Khaldi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bouaouina</surname> <given-names>R</given-names></string-name>, <string-name><surname>Amrouni</surname> <given-names>N</given-names></string-name>, <string-name><surname>Alshazly</surname> <given-names>H</given-names></string-name>, <string-name><surname>Ouahabi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>A comprehensive survey on ear recognition: databases, approaches, comparative analysis, and open challenges</article-title>. <source>Neurocomputing</source>. <year>2023</year>;<volume>537</volume>(<issue>1&#x2013;3</issue>):<fpage>236</fpage>&#x2013;<lpage>70</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2023.03.040</pub-id>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kamboj</surname> <given-names>A</given-names></string-name>, <string-name><surname>Rani</surname> <given-names>R</given-names></string-name>, <string-name><surname>Nigam</surname> <given-names>A</given-names></string-name></person-group>. <article-title>A comprehensive survey and deep learning-based approach for human recognition using ear biometric</article-title>. <source>Vis Comput</source>. <year>2022</year>;<volume>38</volume>(<issue>7</issue>):<fpage>2383</fpage>&#x2013;<lpage>416</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s00371-021-02119-0</pub-id>; <pub-id pub-id-type="pmid">33907343</pub-id></mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alshazly</surname> <given-names>H</given-names></string-name>, <string-name><surname>Linse</surname> <given-names>C</given-names></string-name>, <string-name><surname>Barth</surname> <given-names>E</given-names></string-name>, <string-name><surname>Martinetz</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Ensembles of deep learning models and transfer learning for ear recognition</article-title>. <source>Sensors</source>. <year>2019</year>;<volume>19</volume>(<issue>19</issue>):<fpage>4139</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s19194139</pub-id>; <pub-id pub-id-type="pmid">31554303</pub-id></mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jaha</surname> <given-names>ES</given-names></string-name>, <string-name><surname>Nixon</surname> <given-names>MS</given-names></string-name></person-group>. <article-title>From clothing to identity: manual and automatic soft biometrics</article-title>. <source>IEEE Trans Inf Forensics Secur</source>. <year>2016</year>;<volume>11</volume>(<issue>10</issue>):<fpage>2377</fpage>&#x2013;<lpage>90</lpage>. doi:<pub-id pub-id-type="doi">10.1109/TIFS.2016.2584001</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hassan</surname> <given-names>B</given-names></string-name>, <string-name><surname>Izquierdo</surname> <given-names>E</given-names></string-name>, <string-name><surname>Piatrik</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Soft biometrics: a survey: benchmark analysis, open challenges and recommendations</article-title>. <source>Multimed Tools Appl</source>. <year>2024</year>;<volume>83</volume>(<issue>5</issue>):<fpage>15151</fpage>&#x2013;<lpage>94</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11042-021-10622-8</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alsubhi</surname> <given-names>AH</given-names></string-name>, <string-name><surname>Jaha</surname> <given-names>ES</given-names></string-name></person-group>. <article-title>Front-to-side hard and soft biometrics for augmented zero-shot side face recognition</article-title>. <source>Sensors</source>. <year>2025</year>;<volume>25</volume>(<issue>6</issue>):<fpage>1638</fpage>. doi:<pub-id pub-id-type="doi">10.3390/s25061638</pub-id>; <pub-id pub-id-type="pmid">40292688</pub-id></mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tiwari</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Fusion of ear and soft-biometrics for recognition of newborn</article-title>. <source>Signal Image Process</source>. <year>2012</year>;<volume>3</volume>(<issue>3</issue>):<fpage>103</fpage>&#x2013;<lpage>16</lpage>. doi:<pub-id pub-id-type="doi">10.5121/sipij.2012.3309</pub-id>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Purkait</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Application of external ear in personal identification: a somatoscopic study in families</article-title>. <source>Ann Forensic Res Anal</source>. <year>2015</year>;<volume>2</volume>(<issue>1</issue>):<fpage>1015</fpage>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Saeed</surname> <given-names>U</given-names></string-name>, <string-name><surname>Khan</surname> <given-names>MM</given-names></string-name></person-group>. <article-title>Combining ear-based traditional and soft biometrics for unconstrained ear recognition</article-title>. <source>J Electron Imag</source>. <year>2018</year>;<volume>27</volume>(<issue>5</issue>):<fpage>051220</fpage>. doi:<pub-id pub-id-type="doi">10.1117/1.jei.27.5.051220</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Mohamed</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Youssef</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Heakl</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zaky</surname> <given-names>AB</given-names></string-name></person-group>. <article-title>Advancing ear biometrics: enhancing accuracy and robustness through deep learning</article-title>. In: <conf-name>Proceedings of the IEEE Intelligent Methods, Systems, and Applications (IMSA); 2024 Jul 13&#x2013;14</conf-name>; <publisher-loc>Giza, Egypt</publisher-loc>. p. <fpage>437</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1109/IMSA61967.2024.10652851</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sharma</surname> <given-names>R</given-names></string-name>, <string-name><surname>Sandhu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Bharti</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Improved multimodal biometric security: OAWG-MSVM for optimized feature-level fusion and human authentication</article-title>. In: <conf-name>Proceedings of the 2024 International Conference on Intelligent Systems for Cybersecurity (ISCS); 2024 May 3&#x2013;4</conf-name>; <publisher-loc>Gurugram, India</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/ISCS61804.2024.10581400</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alshazly</surname> <given-names>H</given-names></string-name>, <string-name><surname>Linse</surname> <given-names>C</given-names></string-name>, <string-name><surname>Barth</surname> <given-names>E</given-names></string-name>, <string-name><surname>Martinetz</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Deep convolutional neural networks for unconstrained ear recognition</article-title>. <source>IEEE Access</source>. <year>2020</year>;<volume>8</volume>:<fpage>170295</fpage>&#x2013;<lpage>310</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2020.3024116</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Toygar</surname> <given-names>&#x00D6;</given-names></string-name>, <string-name><surname>Alqaralleh</surname> <given-names>E</given-names></string-name>, <string-name><surname>Afaneh</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Symmetric ear and profile face fusion for identical twins and non-twins recognition</article-title>. <source>Signal Image Video Process</source>. <year>2018</year>;<volume>12</volume>(<issue>6</issue>):<fpage>1157</fpage>&#x2013;<lpage>64</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11760-018-1263-3</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Emer&#x0161;i&#x010D;</surname> <given-names>&#x017D;</given-names></string-name>, <string-name><surname>&#x0160;truc</surname> <given-names>V</given-names></string-name>, <string-name><surname>Peer</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Ear recognition: more than a survey</article-title>. <source>Neurocomputing</source>. <year>2017</year>;<volume>255</volume>(<issue>3</issue>):<fpage>26</fpage>&#x2013;<lpage>39</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.neucom.2016.08.139</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Korichi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Slatnia</surname> <given-names>S</given-names></string-name>, <string-name><surname>Aiadi</surname> <given-names>O</given-names></string-name></person-group>. <article-title>TR-ICANet: a fast unsupervised deep-learning-based scheme for unconstrained ear recognition</article-title>. <source>Arab J Sci Eng</source>. <year>2022</year>;<volume>47</volume>(<issue>8</issue>):<fpage>9887</fpage>&#x2013;<lpage>98</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s13369-021-06375-z</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Tomar</surname> <given-names>V</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>N</given-names></string-name>, <string-name><surname>Deshmukh</surname> <given-names>M</given-names></string-name>, <string-name><surname>Singh</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Single sample face and ear recognition using transfer learning and sample expansion</article-title>. In: <conf-name>Proceedings of the 2nd International Conference on Computer, Electronics, Electrical Engineering &#x0026; Their Applications (IC2E3); 2024 Jun 6&#x2013;7</conf-name>; <publisher-loc>Srinagar, India</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/IC2E362166.2024.10827249</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Alomari</surname> <given-names>EAM</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hoque</surname> <given-names>S</given-names></string-name>, <string-name><surname>Deravi</surname> <given-names>F</given-names></string-name></person-group>. <article-title>Ear-based person recognition using Pix2Pix GAN augmentation</article-title>. In: <conf-name>Proceedings of the 2024 International Conference of the Biometrics Special Interest Group (BIOSIG); 2024 Sep 25&#x2013;27</conf-name>; <publisher-loc>Darmstadt, Germany</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/BIOSIG61931.2024.10786744</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Khaldi</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Benzaoui</surname> <given-names>A</given-names></string-name></person-group>. <article-title>A new framework for grayscale ear images recognition using generative adversarial networks under unconstrained conditions</article-title>. <source>Evol Syst</source>. <year>2021</year>;<volume>12</volume>(<issue>4</issue>):<fpage>923</fpage>&#x2013;<lpage>34</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s12530-020-09346-1</pub-id>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Aiadi</surname> <given-names>O</given-names></string-name>, <string-name><surname>Khaldi</surname> <given-names>B</given-names></string-name>, <string-name><surname>Saadeddine</surname> <given-names>C</given-names></string-name></person-group>. <article-title>MDFNet: an unsupervised lightweight network for ear print recognition</article-title>. <source>J Ambient Intell Humaniz Comput</source>. <year>2023</year>;<volume>14</volume>(<issue>10</issue>):<fpage>13773</fpage>&#x2013;<lpage>86</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s12652-022-04028-z</pub-id>; <pub-id pub-id-type="pmid">35757492</pub-id></mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sharkas</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Ear recognition with ensemble classifiers: a deep learning approach</article-title>. <source>Multimed Tools Appl</source>. <year>2022</year>;<volume>81</volume>(<issue>30</issue>):<fpage>43919</fpage>&#x2013;<lpage>45</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11042-022-13252-w</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>L</given-names></string-name></person-group>. <article-title>A feature fusion human ear recognition method based on channel features and dynamic convolution</article-title>. <source>Symmetry</source>. <year>2023</year>;<volume>15</volume>(<issue>7</issue>):<fpage>1454</fpage>. doi:<pub-id pub-id-type="doi">10.3390/sym15071454</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Resmi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Raju</surname> <given-names>G</given-names></string-name>, <string-name><surname>Padmanabha</surname> <given-names>V</given-names></string-name>, <string-name><surname>Mani</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Person identification by models trained using left and right ear images independently</article-title>. In: <conf-name>Proceedings of the 1st International Conference on Innovation in Information Technology and Business (ICIITB 2022); 2022 Nov 9&#x2013;10</conf-name>; <publisher-loc>Muscat, Oman. Dordrecht, The Netherlands</publisher-loc>: <publisher-name>Atlantis Press</publisher-name>; <year>2022</year>. p. <fpage>281</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Nixon</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Jaha</surname> <given-names>ES</given-names></string-name></person-group>. <chapter-title>Soft biometrics for human identification</chapter-title>. In: <person-group person-group-type="editor"><string-name><surname>Ahad</surname> <given-names>MAR</given-names></string-name>, <string-name><surname>Mahbub</surname> <given-names>U</given-names></string-name>, <string-name><surname>Turk</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hartley</surname> <given-names>R</given-names></string-name></person-group>, editors. <source>Computer vision</source>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Chapman and Hall/CRC</publisher-name>; <year>2024</year>. p. <fpage>33</fpage>&#x2013;<lpage>60</lpage>. doi: <pub-id pub-id-type="doi">10.1201/9781003328957-3</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Ezichi</surname> <given-names>S</given-names></string-name></person-group>. <article-title>A comparative study of soft biometric traits and fusion systems for face-based person recognition</article-title>. <source>Int J Image Graph Signal Process</source>. <year>2021</year>;<volume>13</volume>(<issue>6</issue>):<fpage>45</fpage>&#x2013;<lpage>53</lpage>. doi:<pub-id pub-id-type="doi">10.5815/ijigsp.2021.06.05</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jaha</surname> <given-names>ES</given-names></string-name>, <string-name><surname>Nixon</surname> <given-names>MS</given-names></string-name></person-group>. <article-title>Soft biometrics for subject identification using clothing attributes</article-title>. In: <conf-name>Proceedings of the IEEE International Joint Conference on Biometrics; 2014 Sep 29&#x2013;Oct 2</conf-name>; <publisher-loc>Clearwater, FL, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1109/BTAS.2014.6996278</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Becerra-Riera</surname> <given-names>F</given-names></string-name>, <string-name><surname>Morales-Gonz&#x00E1;lez</surname> <given-names>A</given-names></string-name>, <string-name><surname>M&#x00E9;ndez-V&#x00E1;zquez</surname> <given-names>H</given-names></string-name></person-group>. <article-title>A survey on facial soft biometrics for video surveillance and forensic applications</article-title>. <source>Artif Intell Rev</source>. <year>2019</year>;<volume>52</volume>(<issue>2</issue>):<fpage>1155</fpage>&#x2013;<lpage>87</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10462-019-09689-5</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Alshahrani</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Jaha</surname> <given-names>ES</given-names></string-name>, <string-name><surname>Alowidi</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Fusion of hash-based hard and soft biometrics for enhancing face image database search and retrieval</article-title>. <source>Comput Mater Contin</source>. <year>2023</year>;<volume>77</volume>(<issue>3</issue>):<fpage>3489</fpage>&#x2013;<lpage>509</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmc.2023.044490</pub-id>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Gonzalez</surname> <given-names>E</given-names></string-name></person-group>. <article-title>AMI ear database [Internet]</article-title>. <comment>[cited 2025 Jan 3]</comment>. Available from: <ext-link ext-link-type="uri" xlink:href="https://webctim.ulpgc.es/research_works/ami_ear_database">https://webctim.ulpgc.es/research_works/ami_ear_database</ext-link>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jaha</surname> <given-names>ES</given-names></string-name></person-group>. <article-title>Comparative semantic document layout analysis for enhanced document image retrieval</article-title>. <source>IEEE Access</source>. <year>2024</year>;<volume>12</volume>:<fpage>150451</fpage>&#x2013;<lpage>67</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2024.3479990</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Deep residual learning for image recognition</article-title>. In: <conf-name>Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition (CVPR); 2016 Jun 27&#x2013;30</conf-name>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>. p. <fpage>770</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1109/CVPR.2016.90</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Simonyan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zisserman</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Very deep convolutional networks for large-scale image recognition</article-title>. In: <conf-name>Proceedings of the 3rd International Conference on Learning Representations (ICLR 2015); 2015 May 7&#x2013;9</conf-name>; <publisher-loc>San Diego, CA, USA</publisher-loc>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Byeon</surname> <given-names>H</given-names></string-name>, <string-name><surname>Raina</surname> <given-names>V</given-names></string-name>, <string-name><surname>Sandhu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Shabaz</surname> <given-names>M</given-names></string-name>, <string-name><surname>Keshta</surname> <given-names>I</given-names></string-name>, <string-name><surname>Soni</surname> <given-names>M</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Artificial intelligence-enabled deep learning model for multimodal biometric fusion</article-title>. <source>Multimed Tools Appl</source>. <year>2024</year>;<volume>83</volume>(<issue>33</issue>):<fpage>80105</fpage>&#x2013;<lpage>28</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11042-024-18509-0</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Mahajan</surname> <given-names>A</given-names></string-name>, <string-name><surname>Singla</surname> <given-names>SK</given-names></string-name></person-group>. <article-title>DeepBio: a deep CNN and Bi-LSTM learning for person identification using ear biometrics</article-title>. <source>Comput Model Eng Sci</source>. <year>2024</year>;<volume>141</volume>(<issue>2</issue>):<fpage>1623</fpage>&#x2013;<lpage>49</lpage>. doi:<pub-id pub-id-type="doi">10.32604/cmes.2024.054468</pub-id>.</mixed-citation></ref>
</ref-list>
</back></article>