<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">BIOCELL</journal-id>
<journal-id journal-id-type="nlm-ta">BIOCELL</journal-id>
<journal-id journal-id-type="publisher-id">BIOCELL</journal-id>
<journal-title-group>
<journal-title>BIOCELL</journal-title>
</journal-title-group>
<issn pub-type="epub">1667-5746</issn>
<issn pub-type="ppub">0327-9545</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">25865</article-id>
<article-id pub-id-type="doi">10.32604/biocell.2023.025865</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>SW-Net: A novel few-shot learning approach for disease subtype prediction</article-title><alt-title alt-title-type="left-running-head">SW-Net: A novel few-shot learning approach for disease subtype prediction</alt-title><alt-title alt-title-type="right-running-head">SW-Net for few-shot learning</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author">
<name name-style="western"><surname>JI</surname><given-names>YUHAN</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<contrib id="author-2" contrib-type="author" corresp="yes">
<name name-style="western"><surname>LIANG</surname><given-names>YONG</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref><email>yongliangresearch@gmail.com</email>
</contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>YANG</surname><given-names>ZIYI</given-names></name>
<xref ref-type="aff" rid="aff-2">2</xref>
</contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>AI</surname><given-names>NING</given-names></name>
<xref ref-type="aff" rid="aff-1">1</xref>
</contrib>
<aff id="aff-1"><label>1</label><institution>Faculty of Innovation Engineering, School of Computer Science and Engineering, Macau University of Science and Technology</institution>, <addr-line>Macau, 999078</addr-line>, <country>China</country></aff>
<aff id="aff-2"><label>2</label><institution>Tencent Quantum Lab</institution>, <addr-line>Shenzhen, 518000</addr-line>, <country>China</country></aff>
</contrib-group><author-notes><corresp id="cor1"><label>&#x002A;</label>Address correspondence to: Yong Liang, <email>yongliangresearch@gmail.com</email></corresp></author-notes>
<pub-date date-type="collection" publication-format="electronic"><year>2023</year></pub-date>
<pub-date date-type="pub" publication-format="electronic"><day>02</day><month>01</month><year>2023</year></pub-date>
<volume>47</volume>
<issue>3</issue>
<fpage>569</fpage>
<lpage>579</lpage>
<history>
<date date-type="received"><day>06</day><month>8</month><year>2022</year></date>
<date date-type="accepted"><day>24</day><month>10</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Ji et al.</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Ji et al.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_BIOCELL_25865.pdf"></self-uri>
<abstract><p>Few-shot learning is becoming more and more popular in many fields, especially in the computer vision field. This inspires us to introduce few-shot learning to the genomic field, which faces a typical few-shot problem because some tasks only have a limited number of samples with high-dimensions. The goal of this study was to investigate the few-shot disease sub-type prediction problem and identify patient subgroups through training on small data. Accurate disease sub-type classification allows clinicians to efficiently deliver investigations and interventions in clinical practice. We propose the SW-Net, which simulates the clinical process of extracting the shared knowledge from a range of interrelated tasks and generalizes it to unseen data. Our model is built upon a simple baseline, and we modified it for genomic data. Support-based initialization for the classifier and transductive fine-tuning techniques were applied in our model to improve prediction accuracy, and an Entropy regularization term on the query set was appended to reduce over-fitting. Moreover, to address the high dimension and high noise issue, we future extended a feature selection module to adaptively select important features and a sample weighting module to prioritize high-confidence samples. Experiments on simulated data and The Cancer Genome Atlas meta-dataset show that our new baseline model gets higher prediction accuracy compared to other competing algorithms.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Few-shot learning</kwd>
<kwd>Disease sub-type classification</kwd>
<kwd>Feature selection</kwd>
<kwd>Deep learning</kwd>
<kwd>Meta-learning</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><title>Introduction</title>
<p>Disease sub-type prediction aims at identifying sub-types of patients so that it permits a more accurate assessment of prognosis (<xref ref-type="bibr" rid="ref-32">Saria and Goldenberg, 2015</xref>). Predicting disease sub-types with gene expression data is of great significance in molecular biology (<xref ref-type="bibr" rid="ref-28">Rukhsar <italic>et al</italic>., 2022</xref>). Accurate classification allows a more efficient and targeted succeeding therapy (<xref ref-type="bibr" rid="ref-36">Sohn <italic>et al</italic>., 2017</xref>). However, patient genomic data are hard to deal with because of the &#x201C;big p, small N&#x201D; issue, which means high dimensional features with a small number of samples (<xref ref-type="bibr" rid="ref-17">Liang <italic>et al</italic>., 2013</xref>). Especially when the disease is rare (<xref ref-type="bibr" rid="ref-44">Yoo <italic>et al</italic>., 2021</xref>), this is a very crucial problem faced by doctors and clinicians. Few-shot learning, which aims at dealing with the &#x201C;small data&#x201D; issue, has attracted lots of attention, and researchers have made significant progress in many fields, such as computer vision (<xref ref-type="bibr" rid="ref-6">Li <italic>et al</italic>., 2006</xref>; <xref ref-type="bibr" rid="ref-23">Munkhdalai and Yu, 2017</xref>; <xref ref-type="bibr" rid="ref-35">Snell <italic>et al</italic>., 2017</xref>; <xref ref-type="bibr" rid="ref-26">Qiu <italic>et al</italic>., 2018</xref>; <xref ref-type="bibr" rid="ref-22">Mishra <italic>et al</italic>., 2018</xref>; <xref ref-type="bibr" rid="ref-37">Sung <italic>et al</italic>., 2018</xref>). Recently, researchers have explored few-shot learning methods for genomic data and achieved good performance in genomic survival analysis (<xref ref-type="bibr" rid="ref-27">Qiu <italic>et al</italic>., 2020</xref>). This motivates us to introduce few-shot learning for genomic analysis. Our goal in this study was to address the issue of the few-shot disease sub-type prediction problem. This problem is considered in isolation in traditional machine learning methods. However, in practice, doctors and clinicians take several clinical factors into account simultaneously.</p>
<p>The basic idea of our proposed new model was to learn from relevant abundant tasks and generalize to new classes, which are rare diseases. This mimics the process by which doctors and clinicians study the prediction of disease sub-types. The model extracts shared knowledge or experience from a range of interrelated tasks and applies it to new tasks. Although increasingly complex models are being proposed, experiments show that a simple baseline approach can achieve desired results comparable to other complex methods. The training procedure of our model includes a pre-training stage and a fine-tuning stage, which is similar to the transfer learning procedure (<xref ref-type="bibr" rid="ref-41">Weiss <italic>et al</italic>., 2016</xref>). In the first stage, we trained a feature extractor and a classifier at the same time with the base classes. In the fine-tuning stage, we fixed the parameters of the feature extractor. However, a new classifier is learned in this stage with the few samples with tags in the new class. In fact, with some twists of performing fine-tuning and regularization, a simple baseline method outperforms many other competing algorithms on few-shot sub-type prediction tasks.</p>
<p>Most few-shot models are originally designed for images (<xref ref-type="bibr" rid="ref-39">Vinyals <italic>et al</italic>., 2016</xref>; <xref ref-type="bibr" rid="ref-7">Finn <italic>et al</italic>., 2017</xref>; <xref ref-type="bibr" rid="ref-9">Garcia and Bruna, 2017</xref>; <xref ref-type="bibr" rid="ref-1">Bertinetto <italic>et al</italic>., 2018</xref>; <xref ref-type="bibr" rid="ref-29">Rusu <italic>et al</italic>., 2018</xref>; <xref ref-type="bibr" rid="ref-16">Lee <italic>et al</italic>., 2019</xref>). However, the high dimensionality of genomic data makes predictions more difficult compared to images because of the large number of redundant features. To address this issue, our new model appends a feature selection module, which is first proposed by <xref ref-type="bibr" rid="ref-43">Yang <italic>et al</italic>. (2020)</xref> to solve the dimensionality issues.</p>
<p>High noise is another challenging topic for accurate sub-type prediction. Random noise and system bias may be prone to overfitting and affect performance in generalization (<xref ref-type="bibr" rid="ref-17">Liang <italic>et al</italic>., 2013</xref>). Commonly weights are assigned to samples to deal with this issue. Opinions vary on the relationship between sample weight and training loss: one holds that the samples with larger training loss should be more emphasized since they are more likely to be complex ones that are located at the classification boundary. Typical methods include AdaBoost (<xref ref-type="bibr" rid="ref-8">Freund and Schapire, 1997</xref>) and focal loss (<xref ref-type="bibr" rid="ref-18">Lin <italic>et al</italic>., 2020</xref>). On the contrary, another approach is to give priority to samples with smaller losses because these are more likely to have high confidence. Typical methods include self-paced learning (<xref ref-type="bibr" rid="ref-15">Kumar <italic>et al</italic>., 2010</xref>), iterative reweighting (<xref ref-type="bibr" rid="ref-4">de la Torre and Black, 2003</xref>) and its variants (<xref ref-type="bibr" rid="ref-13">Jiang <italic>et al</italic>., 2014</xref>; <xref ref-type="bibr" rid="ref-40">Wang <italic>et al</italic>., 2017</xref>). Meta-weight-net (<xref ref-type="bibr" rid="ref-33">Shu <italic>et al</italic>., 2019</xref>) designed a network that adaptively learns an explicit weighting function directly from data. This methodology prioritizes small loss samples and is especially suitable for heavy noise scenarios. The rationality lies in that the samples with large losses may possibly have corrupted labels, and the reweighting approach could suppress this issue to a certain degree. Since high noise is a vital problem in gene expression data, we adopted the method of <xref ref-type="bibr" rid="ref-33">Shu <italic>et al</italic>. (2019)</xref> to assign weight to the samples and give higher weight to the data with low loss to suppress the influence of the samples with high noise.</p>
<p>In summary, the proposed SW-Net mainly made the following contributions.</p>
<p>First, we applied a new baseline method in the few-shot disease sub-type prediction problem. The basic baseline has been widely explored in many fields, especially computer vision. Our contribution is to modify this baseline method in the field of molecular biology, especially for disease subtype prediction problems. The new model fits well. We used support-based initialization for the classifier and transductive fine-tuning technique in our work. We also append an entropy regularization term on the query set to reduce overfitting.</p>
<p>Second, based on the baseline, we further extended a feature selection module and a sample weighting module to solve the high dimensionality issue for few-shot prediction. The extended modules aim to adaptively select vital features and give priority to samples with small losses.</p>
<p>Third, experiments show that with support-based initialization and transductive fine-tuning, we can achieve a 2%&#x2013;6% improvement in prediction accuracy. With the appended feature selection and sample weighting modules, we can further achieve a 2%&#x2013;2.5% improvement on The Cancer Genome Atlas (TCGA) meta-dataset.</p>
</sec>
<sec id="s2"><title>Materials and Methods</title>
<p>In this part, we first show the basic baseline model for few-shot learning. Then, we present the variants we performed to improve its performance. Finally, we elaborate our extended modules. The model architecture is shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption><title>Structure of SW-Net. We trained a feature selector <italic>g<sub>&#x003C6;</sub></italic>, an embedding function <italic>f</italic><sub>&#x003B8;</sub> and a weighting function <italic>v</italic> with the meta-training dataset in the pre-training stage. In the fine-tuning stage, we train a new classifier <italic>C(&#x00B7;|W<sub>s</sub>)</italic> with the samples with label in the support set. All the parameters are fine-tuned transductively.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f001.tif"/>
</fig>
<sec id="s2_1"><title>Problem definition</title>
<p>To formalize the few-shot prediction problem, we need to introduce some notation first. Let <inline-formula id="ieqn-1">
<mml:math id="mml-ieqn-1"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> represent a labeled sample and its ground-truth label respectively. In the few-shot learning context, we let <inline-formula id="ieqn-2">
<mml:math id="mml-ieqn-2"><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup></mml:math>
</inline-formula> and <inline-formula id="ieqn-3">
<mml:math id="mml-ieqn-3"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup></mml:math>
</inline-formula> denote the support and query datasets respectively. <inline-formula id="ieqn-4">
<mml:math id="mml-ieqn-4"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula> represents some set of classes. The number of classes |<inline-formula id="ieqn-5">
<mml:math id="mml-ieqn-5"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>| is called the ways. The number of labeled samples in each class is called a shot. The goal is to train a network <inline-formula id="ieqn-6">
<mml:math id="mml-ieqn-6"><mml:mi>F</mml:mi></mml:math>
</inline-formula> to exploit the support set <inline-formula id="ieqn-7">
<mml:math id="mml-ieqn-7"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>to make a prediction of the label from the query set, by the following formula:</p>
<p><disp-formula id="eqn-1"><label>(1)</label>
<mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mover><mml:mi>y</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>=</mml:mo><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>where <inline-formula id="ieqn-8">
<mml:math id="mml-ieqn-8"><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>. A few-shot learning problem also has a meta-training dataset <inline-formula id="ieqn-9">
<mml:math id="mml-ieqn-9"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup></mml:math>
</inline-formula>, with abundant data, where <inline-formula id="ieqn-10">
<mml:math id="mml-ieqn-10"><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>. The set of classes <inline-formula id="ieqn-11">
<mml:math id="mml-ieqn-11"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> has no overlapping class with <inline-formula id="ieqn-12">
<mml:math id="mml-ieqn-12"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>. We can take advantage of <inline-formula id="ieqn-13">
<mml:math id="mml-ieqn-13"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> to give parameters of the learning model a good initialization.</p>
</sec>
<sec id="s2_2"><title>Baseline</title>
<p>A simple baseline form includes the following steps: pre-training on the meta-training dataset, fine-tuning on the few-shot dataset and making few-shot predictions (<xref ref-type="bibr" rid="ref-41">Weiss <italic>et al</italic>., 2016</xref>; <xref ref-type="bibr" rid="ref-2">Chen <italic>et al</italic>., 2019</xref>). Our SW-Net follows the basic procedure. In the pre-training stage, we first trained a model with the cross-entropy loss on <inline-formula id="ieqn-18">
<mml:math id="mml-ieqn-18"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>y</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msubsup></mml:math>
</inline-formula>. With the training samples in meta-training set classes <inline-formula id="ieqn-19">
<mml:math id="mml-ieqn-19"><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>m</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>, we can learn a classifier <inline-formula id="ieqn-20">
<mml:math id="mml-ieqn-20"><mml:mi>C</mml:mi></mml:math>
</inline-formula> and an embedding function <italic>f</italic> that can transfer high dimensional data of a sample to the low dimensional feature vector. The feature vector will be used in the next stage. Fine-tuning stage: To make our model well-adapt to new classes, we fixed the network parameter <inline-formula id="ieqn-21">
<mml:math id="mml-ieqn-21"><mml:mi>&#x03B8;</mml:mi><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>in the embedding function <inline-formula id="ieqn-22">
<mml:math id="mml-ieqn-22"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> (called the backbone) from the pre-training stage, and then learn a new classifier <inline-formula id="ieqn-23">
<mml:math id="mml-ieqn-23"><mml:mi>C</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula>, where <inline-formula id="ieqn-24">
<mml:math id="mml-ieqn-24"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> &#x2208; <inline-formula id="ieqn-25">
<mml:math id="mml-ieqn-25"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="fraktur">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>is the weight matrix, <inline-formula id="ieqn-26">
<mml:math id="mml-ieqn-26"><mml:mi>d</mml:mi></mml:math>
</inline-formula> represents the dimension of the feature vector, and <inline-formula id="ieqn-27">
<mml:math id="mml-ieqn-27"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is the number of output classes. <inline-formula id="ieqn-28">
<mml:math id="mml-ieqn-28"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is optimized by minimizing cross-entropy loss <inline-formula id="ieqn-29">
<mml:math id="mml-ieqn-29"><mml:mi>L</mml:mi></mml:math>
</inline-formula> with the few samples of support set. The classifier <inline-formula id="ieqn-30">
<mml:math id="mml-ieqn-30"><mml:mi>C</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> is a softmax classifier, which is built up with a linear layer and a softmax function as shown in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref>:</p>
<p><disp-formula id="eqn-2"><label>(2)</label>
<mml:math id="mml-eqn-2" display="block"><mml:mi>S</mml:mi><mml:mi>o</mml:mi><mml:mi>f</mml:mi><mml:mi>t</mml:mi><mml:mi>m</mml:mi><mml:mi>a</mml:mi><mml:mi>x</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:msup><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mi>T</mml:mi></mml:msup><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>Careful initialization of the softmax classifier <inline-formula id="ieqn-31">
<mml:math id="mml-ieqn-31"><mml:mi>C</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="thickmathspace" /></mml:math>
</inline-formula>will make this process efficient. We initialized this classifier with the feature mean of the support set to make it adapts well.</p>
<p>Making few-shot predictions: In this stage, given a query sample, <inline-formula id="ieqn-32">
<mml:math id="mml-ieqn-32"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> obtains the feature vector of the query sample. Then we entered it into the softmax classifier to make the final prediction.</p>
</sec>
<sec id="s2_3"><title>Support based initialization</title>
<p>In a few-shot task, let <inline-formula id="ieqn-33">
<mml:math id="mml-ieqn-33"><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>denote the samples in class <inline-formula id="ieqn-34">
<mml:math id="mml-ieqn-34"><mml:mi>c</mml:mi></mml:math>
</inline-formula> of the support set <inline-formula id="ieqn-35">
<mml:math id="mml-ieqn-35"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>. For the classifier, the weight and bias are <inline-formula id="ieqn-36">
<mml:math id="mml-ieqn-36"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> &#x2208; <inline-formula id="ieqn-37">
<mml:math id="mml-ieqn-37"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="fraktur">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mi>d</mml:mi><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:math>
</inline-formula> and <inline-formula id="ieqn-38">
<mml:math id="mml-ieqn-38"><mml:mi>b</mml:mi></mml:math>
</inline-formula> &#x2208; <inline-formula id="ieqn-39">
<mml:math id="mml-ieqn-39"><mml:mrow><mml:msup><mml:mrow><mml:mrow><mml:mi mathvariant="fraktur">R</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:msup></mml:mrow></mml:math>
</inline-formula>, respectively, <inline-formula id="ieqn-40">
<mml:math id="mml-ieqn-40"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo>,</mml:mo></mml:mrow><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math>
</inline-formula>, where <inline-formula id="ieqn-41">
<mml:math id="mml-ieqn-41"><mml:mrow><mml:msub><mml:mi>C</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> denote the number of classes of<inline-formula id="ieqn-42">
<mml:math id="mml-ieqn-42"><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> and each class of <inline-formula id="ieqn-43">
<mml:math id="mml-ieqn-43"><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is a <inline-formula id="ieqn-44">
<mml:math id="mml-ieqn-44"><mml:mi>d</mml:mi></mml:math>
</inline-formula>-dimensional vector. The first modification we perform is to initialize <inline-formula id="ieqn-45">
<mml:math id="mml-ieqn-45"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> by the average feature of class <inline-formula id="ieqn-46">
<mml:math id="mml-ieqn-46"><mml:mi>c</mml:mi></mml:math>
</inline-formula>.</p>
<p><disp-formula id="eqn-3"><label>(3)</label>
<mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>Intuitively, we can understand the weight vector <inline-formula id="ieqn-47">
<mml:math id="mml-ieqn-47"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> as a prototype, similar to (<xref ref-type="bibr" rid="ref-35">Snell <italic>et al</italic>., 2017</xref>). The classification is distance-based on the input feature and the prototypes, as shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Moreover, we initialized the bias <inline-formula id="ieqn-48">
<mml:math id="mml-ieqn-48"><mml:mrow><mml:msub><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math>
</inline-formula>. Given the labeled samples of support set, we further fine-tune <inline-formula id="ieqn-49">
<mml:math id="mml-ieqn-49"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>, <inline-formula id="ieqn-50">
<mml:math id="mml-ieqn-50"><mml:mi>b</mml:mi></mml:math>
</inline-formula>, and <inline-formula id="ieqn-51">
<mml:math id="mml-ieqn-51"><mml:mi>&#x03B8;</mml:mi><mml:mspace width="thickmathspace" /></mml:math>
</inline-formula>by minimizing cross-entropy classification loss.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption><title>Vector <italic>W<sub>s</sub></italic> was initialized with the feature mean of each class. For each class, we computed the cosine distances between the input feature vector and the prototype weight vector.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f002.tif"/>
</fig>
</sec>
<sec id="s2_4"><title>Cosine distance-based classifier</title>
<p>We design the classifier here differently from the linear one used in the basic baseline to improve performance. According to <xref ref-type="bibr" rid="ref-2">Chen <italic>et al</italic>. (2019)</xref>, the authors compared the effect of Euclidean distance and cosine distance on image datasets and found that cosine distance achieves better performance because of its reduced intra-class variation. For an input feature vector <inline-formula id="ieqn-52">
<mml:math id="mml-ieqn-52"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula>, we compute its cosine distance to each weight vector <inline-formula id="ieqn-53">
<mml:math id="mml-ieqn-53"><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math>
</inline-formula>. A prediction is made according to the probability that <inline-formula id="ieqn-54">
<mml:math id="mml-ieqn-54"><mml:mi>x</mml:mi><mml:mspace width="thickmathspace" /></mml:math>
</inline-formula>is in class <inline-formula id="ieqn-55">
<mml:math id="mml-ieqn-55"><mml:mi>c</mml:mi></mml:math>
</inline-formula> with <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>. Operator <inline-formula id="ieqn-56">
<mml:math id="mml-ieqn-56"><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow><mml:mo>,</mml:mo></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> denotes the cosine similarity between the input vectors and the weight vector.</p>
<p><disp-formula id="eqn-4"><label>(4)</label>
<mml:math id="mml-eqn-4" display="block"><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mspace width="thinmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mspace width="thinmathspace" /><mml:mo stretchy="false">(</mml:mo><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
</sec>
<sec id="s2_5"><title>Transductive fine-tuning</title>
<p>The main idea of transductive learning is to restrict hypothesis space with samples from the test dataset. Some papers in the few-shot learning field have exploited the idea of transductive learning recently. For example, <xref ref-type="bibr" rid="ref-25">Nichol <italic>et al</italic>. (2018)</xref> adapted batch-normalization parameters to query samples. <xref ref-type="bibr" rid="ref-19">Liu <italic>et al</italic>. (2018)</xref> estimated labels of query samples with label propagation. We denote <inline-formula id="ieqn-58">
<mml:math id="mml-ieqn-58"><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mi>&#x03B8;</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mi>b</mml:mi></mml:mrow><mml:mo>}</mml:mo></mml:mrow></mml:math>
</inline-formula> the combined parameters of <inline-formula id="ieqn-59">
<mml:math id="mml-ieqn-59"><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> and <inline-formula id="ieqn-60">
<mml:math id="mml-ieqn-60"><mml:mi>C</mml:mi></mml:math>
</inline-formula>. All the parameters <inline-formula id="ieqn-61">
<mml:math id="mml-ieqn-61"><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>are trained together in the fine-tuning stage.</p>
<p>At test time, we added a Shannon Entropy penalty term of query sample predictions. This is inspired by semi-supervised learning literature, close to work of <xref ref-type="bibr" rid="ref-10">Grandvalet and Bengio (2004)</xref>. More recent methods like <xref ref-type="bibr" rid="ref-3">Dai <italic>et al</italic>. (2017)</xref> and <xref ref-type="bibr" rid="ref-14">Kipf and Welling (2016)</xref> are also suitable for our model, but we used the Shannon Entropy penalty for simplicity. We used unlabeled query samples for transductive learning. <inline-formula id="ieqn-62">
<mml:math id="mml-ieqn-62"><mml:mi>x</mml:mi></mml:math>
</inline-formula> represents a query sample. <inline-formula id="ieqn-63">
<mml:math id="mml-ieqn-63"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>is the prediction. <inline-formula id="ieqn-64">
<mml:math id="mml-ieqn-64"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mspace width="thickmathspace" /></mml:math>
</inline-formula>stands for the Entropy. Multiple query samples can be processed together to get the mean of <inline-formula id="ieqn-65">
<mml:math id="mml-ieqn-65"><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> of all query samples, and we minimized cross-entropy classification loss over all query labels. As we seek outputs with a small Shannon Entropy <inline-formula id="ieqn-66">
<mml:math id="mml-ieqn-66"><mml:mi>H</mml:mi></mml:math>
</inline-formula>, we introduced the regularizer. Thus, the transductive fine-tuning learning for</p>
<p><disp-formula id="eqn-5"><label>(5)</label>
<mml:math id="mml-eqn-5" display="block"><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mo>&#x2217;</mml:mo></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>arg</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mspace width="negativethinmathspace" /><mml:mo>&#x2212;</mml:mo><mml:mspace width="thickmathspace" /><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:mstyle></mml:math>
</disp-formula></p>
<p>It is worth noting that the first term uses the samples with labels from the support set <inline-formula id="ieqn-67">
<mml:math id="mml-ieqn-67"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>, whereas the second term, which is the regularizer, utilizes the unlabeled samples from the query set <inline-formula id="ieqn-68">
<mml:math id="mml-ieqn-68"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula>. The two terms can be imbalanced. We could add a coefficient for the entropy term to control the imbalance problem. However, we set it equal to 1 as we wish to keep its simplicity and avoid optimizing these hyper-parameters.</p>
</sec>
<sec id="s2_6"><title>Feature selection net</title>
<p>We aimed to solve the few-shot disease sub-type prediction problem. However, genomic data is hard to handle due to the high dimensionality, as we mentioned above. To overcome this issue, we extend our baseline with a feature selection module to screen out the genes that are irrelevant to the disease. For each sample <inline-formula id="ieqn-69">
<mml:math id="mml-ieqn-69"><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mi>p</mml:mi></mml:msup></mml:mrow></mml:math>
</inline-formula>. The dimension of genomic data <italic>p</italic> can be very high. We can utilize a selection <italic>&#x03B2;</italic> = (<italic>&#x03B2;</italic><sub>1</sub>, <italic>&#x03B2;</italic><sub>2</sub>, &#x2026;, <italic>&#x03B2;</italic><sub>p</sub>). vector to get a new representation <inline-formula id="ieqn-71">
<mml:math id="mml-ieqn-71"><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>which is the element-wise product of <italic>x</italic>&#x2032; and <inline-formula id="ieqn-72">
<mml:math id="mml-ieqn-72"><mml:mi>&#x03B2;</mml:mi></mml:math>
</inline-formula>. This can help us remove useless features.</p>
<p><disp-formula id="eqn-6"><label>(6)</label>
<mml:math id="mml-eqn-6" display="block"><mml:msup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mo>&#x2299;</mml:mo><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>j</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>]</mml:mo></mml:mrow></mml:math>
</disp-formula></p>
<p>Most regularization methods are based on some assumptions about the training data. However, when we do not have a significant understanding of the basics of gene expression data, it was not feasible to specify a specific regularization form. Here, we set a Softmax layer as the feature selection vector <inline-formula id="ieqn-73">
<mml:math id="mml-ieqn-73"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B2;</mml:mi></mml:mrow></mml:mrow></mml:math>
</inline-formula>. Then we obtained the element-wise product that can adaptively learn feature weighting from data.</p>
<p><disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mtable columnalign='left'><mml:mtr><mml:mtd><mml:msup><mml:mi>x</mml:mi><mml:mo>&#x2032;</mml:mo></mml:msup><mml:mo>=</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2299;</mml:mo><mml:mi>x</mml:mi></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>/</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>j</mml:mi></mml:munder><mml:mi>e</mml:mi></mml:mstyle><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mtext>&#x2009;</mml:mtext><mml:msub><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mi>j</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:mstyle displaystyle='true'><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>i</mml:mi></mml:munder><mml:mrow><mml:msub><mml:mi>&#x03B2;</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mstyle><mml:mrow><mml:mo>(</mml:mo><mml:mi>&#x03C6;</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-74">
<mml:math id="mml-ieqn-74"><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C6;</mml:mi></mml:mrow></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msup><mml:mi>R</mml:mi><mml:mi>p</mml:mi></mml:msup></mml:mrow></mml:math>
</inline-formula> represent the parameter of the Softmax classifier. Here we can easily embed <inline-formula id="ieqn-75">
<mml:math id="mml-ieqn-75"><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C6;</mml:mi></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math>
</inline-formula> into <xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref> and get:</p>
<p><disp-formula id="eqn-8"><label>(8)</label>
<mml:math id="mml-eqn-8" display="block"><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>&#x03C6;</mml:mi><mml:mo>,</mml:mo><mml:mi>&#x03B8;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo>=</mml:mo><mml:mi>c</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mrow><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub><mml:mo>&#x2061;</mml:mo><mml:mi>e</mml:mi><mml:mi>x</mml:mi><mml:mi>p</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>s</mml:mi><mml:mi>i</mml:mi><mml:mi>m</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mrow><mml:msup><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>And in <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref> becomes</p>
<p><disp-formula id="eqn-9"><label>(9)</label>
<mml:math id="mml-eqn-9" display="block"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mi>x</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>x</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>This regularization form needs no expert knowledge of the underlying data. <inline-formula id="ieqn-76">
<mml:math id="mml-ieqn-76"><mml:mi>&#x03C6;</mml:mi></mml:math>
</inline-formula> can be learned along with <inline-formula id="ieqn-77">
<mml:math id="mml-ieqn-77"><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:math>
</inline-formula>. Now we donate the new combined parameters as <inline-formula id="ieqn-78">
<mml:math id="mml-ieqn-78"><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03B8;</mml:mi></mml:mrow></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:msub><mml:mi>W</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow><mml:mo>,</mml:mo><mml:mi>b</mml:mi><mml:mo>,</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x03C6;</mml:mi></mml:mrow></mml:mrow></mml:mrow><mml:mo>}</mml:mo></mml:mrow><mml:mo>.</mml:mo><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>All the parameters <inline-formula id="ieqn-79">
<mml:math id="mml-ieqn-79"><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:math>
</inline-formula> are trained in the fine-tuning stage transductively:</p>
<p><disp-formula id="eqn-10"><label>(10)</label>
<mml:math id="mml-eqn-10" display="block"><mml:mtable columnalign="center" rowspacing=".5em" columnspacing="thickmathspace" displaystyle="true"><mml:mtr><mml:mtd><mml:mrow><mml:msup><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>&#x2217;</mml:mo></mml:mrow></mml:msup></mml:mrow><mml:mo>=</mml:mo><mml:mspace width="thickmathspace" /><mml:mrow><mml:mi mathvariant="italic">a</mml:mi><mml:mi mathvariant="italic">r</mml:mi><mml:mi mathvariant="italic">g</mml:mi></mml:mrow><mml:mi>m</mml:mi><mml:mi>i</mml:mi><mml:mrow><mml:msub><mml:mi>n</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>s</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2212;</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>y</mml:mi><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:msub><mml:mi>N</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>x</mml:mi><mml:mo>,</mml:mo><mml:mi>y</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mi>q</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mi>H</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mrow><mml:msup><mml:mrow><mml:mi mathvariant="normal">&#x0398;</mml:mi></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:mrow></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mo fence="false" stretchy="false">|</mml:mo><mml:mi>x</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:mtd></mml:mtr></mml:mtable></mml:math>
</disp-formula></p>
</sec>
<sec id="s2_7"><title>Sample weighting net</title>
<p>The high noise issue in genomic data is another challenging problem. We set weights to samples to prioritize high-confidence data, with the hope to restrain the influence of the samples with high noise. The weight vector <inline-formula id="ieqn-80">
<mml:math id="mml-ieqn-80"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is the weighted representation of all samples for class <italic>c</italic> from the support set,</p>
<p><disp-formula id="eqn-11"><label>(11)</label>
<mml:math id="mml-eqn-11" display="block"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
<p>where <inline-formula id="ieqn-81">
<mml:math id="mml-ieqn-81"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> reflects how much we believe that sample <inline-formula id="ieqn-82">
<mml:math id="mml-ieqn-82"><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> is clean data. Larger weight <inline-formula id="ieqn-83">
<mml:math id="mml-ieqn-83"><mml:mrow><mml:msub><mml:mi>v</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:math>
</inline-formula> represents we treat it as clean data with higher confidence.</p>
<p>To determine the <inline-formula id="ieqn-84">
<mml:math id="mml-ieqn-84"><mml:mi>v</mml:mi></mml:math>
</inline-formula>, we modified the method proposed by <xref ref-type="bibr" rid="ref-33">Shu <italic>et al</italic>. (2019)</xref>, which attempts to learn a weighting function to assign different weights to clean the noisy samples. The sample weight <inline-formula id="ieqn-85">
<mml:math id="mml-ieqn-85"><mml:mi>v</mml:mi></mml:math>
</inline-formula> is an MLP network. The input of the MLP network is the loss for the sample, and the output of it is the weight, as shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. Since our baseline model treats the support samples as prototypes and we did not compute the losses. The feature vector of each sample is the input instead of the loss. So, the <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref> function can be rewritten as:</p>
<p><disp-formula id="eqn-12"><label>(12)</label>
<mml:math id="mml-eqn-12" display="block"><mml:mrow><mml:msub><mml:mi>w</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mrow><mml:mfrac><mml:mn>1</mml:mn><mml:mrow><mml:mrow><mml:mo>|</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>s</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>|</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mrow><mml:munder><mml:mrow><mml:mo movablelimits="false">&#x2211;</mml:mo></mml:mrow><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:msub><mml:mi>S</mml:mi><mml:mi>c</mml:mi></mml:msub></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mrow><mml:mi mathvariant="script">V</mml:mi></mml:mrow></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mi>&#x03C9;</mml:mi></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:msub><mml:mi>f</mml:mi><mml:mi>&#x03B8;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>g</mml:mi><mml:mi>&#x03C6;</mml:mi></mml:msub></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mrow><mml:msub><mml:mi>x</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mo>)</mml:mo></mml:mrow></mml:mstyle></mml:math>
</disp-formula></p>
</sec>
</sec>
<sec id="s3"><title>Results</title>
<p>To evaluate the performance of our proposed SW-Net, we conducted experiments on both simulated data and the TCGA gene expression dataset. Our SW-Net outperformed conventional machine learning methods and typical few-shot methods.</p>
<sec id="s3_1"><title>Simulated dataset</title>
<p>We constructed the training dataset <inline-formula id="ieqn-86">
<mml:math id="mml-ieqn-86"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>and test dataset <inline-formula id="ieqn-87">
<mml:math id="mml-ieqn-87"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</inline-formula>, where they had non-overlapping classes. We referred to the work of <xref ref-type="bibr" rid="ref-21">Ma and Zhang (2019)</xref> to generate simulated data. For <inline-formula id="ieqn-88">
<mml:math id="mml-ieqn-88"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</inline-formula>, we sampled 100 points from each of the ten Gaussian distributions, which were 2-dimensional distributions with covariance matrix and ten different mean <inline-formula id="ieqn-89">
<mml:math id="mml-ieqn-89"><mml:mi>&#x03BC;</mml:mi></mml:math>
</inline-formula> &#x003D; (2, 2), (6, 6), (0, &#x2212;5), (4, &#x2212;4), (&#x2212;2, 2), (&#x2212;5, 0), (&#x2212;6, 6), (&#x2212;2, &#x2212;9), (&#x2212;5, &#x2212;5), (&#x2212;9, &#x2212;6), respectively as the true features. We then appended 40-dimensional Gaussian irrelevant features with the covariance matrix <inline-formula id="ieqn-90">
<mml:math id="mml-ieqn-90"><mml:mo>&#x2211;</mml:mo><mml:mo>=</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>g</mml:mi></mml:math>
</inline-formula>(10, &#x2026;, 10) and mean <inline-formula id="ieqn-91">
<mml:math id="mml-ieqn-91"><mml:mi>&#x03BC;</mml:mi></mml:math>
</inline-formula> &#x003D; diag (2.5, &#x2026;, 2.5). Therefore, each sample has 42-dimensional features, including the two true features and the forty irrelevant features. For <inline-formula id="ieqn-92">
<mml:math id="mml-ieqn-92"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</inline-formula>, 1000 points were drawn from each of the four Gaussian distributions with the covariance matrix <inline-formula id="ieqn-93">
<mml:math id="mml-ieqn-93"><mml:mo>&#x2211;</mml:mo><mml:mo>=</mml:mo><mml:mi>d</mml:mi><mml:mi>i</mml:mi><mml:mi>a</mml:mi><mml:mi>g</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mspace width="thickmathspace" /></mml:mrow></mml:math>
</inline-formula>and four different means <inline-formula id="ieqn-94">
<mml:math id="mml-ieqn-94"><mml:mrow><mml:mi mathvariant="normal">&#x03BC;</mml:mi></mml:mrow></mml:math>
</inline-formula> &#x003D; (0, 0), (1, 0), (0, 1), (1, 1), as the true features. Then we appended the 40-dimensional Gaussian irrelevant features the same as the setting.</p>
</sec>
<sec id="s3_2"><title>Implementation details</title>
<p>We compared SW-Net with conventional machine learning methods and two typical meta-learning methods (including Prototypical net and Matching net). SW-Net was firstly pre-trained with the training dataset <inline-formula id="ieqn-95">
<mml:math id="mml-ieqn-95"><mml:mrow><mml:msub><mml:mrow><mml:mi mathvariant="normal">D</mml:mi></mml:mrow><mml:mrow><mml:mi>t</mml:mi><mml:mi>r</mml:mi><mml:mi>a</mml:mi><mml:mi>i</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</inline-formula>, which contains 10 classes. Then we randomly selected 1% of the samples from <inline-formula id="ieqn-96">
<mml:math id="mml-ieqn-96"><mml:mrow><mml:msub><mml:mi>D</mml:mi><mml:mrow><mml:mi>t</mml:mi><mml:mi>e</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi></mml:mrow></mml:msub></mml:mrow></mml:math>
</inline-formula>, for each of the four classes as support datasets, and the remaining samples were placed into the query set. The accuracy of SW-Net was tested with 50 random runs. The conventional machine learning methods were trained on 1% of the test set per-class and tested on the remaining samples. The implementation detail adopts the same setting as the work in <xref ref-type="bibr" rid="ref-21">Ma and Zhang (2019)</xref>.</p>
</sec>
<sec id="s3_3"><title>Results on different feature dimension settings</title>
<p>To test the feature selection capability of SW-Net, we increased irrelevant feature dimensions to four levels, which are 100, 500, 1000, and 2000, respectively. Basic implementation settings keep the same. The result is demonstrated in <xref ref-type="table" rid="table-1">Tables 1</xref> and <xref ref-type="table" rid="table-2">2</xref> with 50 random runs by 5-fold cross-validation. In the ablation experiment, the baseline denotes the basic baseline model without any modifications. SI denotes &#x201C;Support-based Initialization&#x201D;; &#x201C;SI&#x002B;TF&#x201D; means that Support-based Initialization and Transductive Fine-tuning were both added to the baseline; In &#x201C;SI&#x002B;TF&#x002B;FS&#x201D;, the FS denotes the Feature Selection net, and in SW-net, we added all modules, including the sample reweighting net, to the baseline. SW-Net outperformed all other comparison methods, including two typical meta-learning methods and five conventional machine-learning methods. With the increase of dimension, the performance gaps between SW-Net and the competing methods increased. This shows the capability of our model to deal with high-dimension data.</p>
<table-wrap id="table-1"><label>TABLE 1</label>
<caption><title>The prediction accuracy by 5-fold cross validation under different feature dimensions</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>100</th>
<th>500</th>
<th>1000</th>
<th>2000</th>
</tr>
</thead>
<tbody>
<tr>
<td>NeutralNet</td>
<td>32.88 &#x00B1; 1.67</td>
<td>26.89 &#x00B1; 0.72</td>
<td>25.39 &#x00B1; 0.88</td>
<td>25.02 &#x00B1; 0.64</td>
</tr>
<tr>
<td>Logistic Regression</td>
<td>42.62 &#x00B1; 1.98</td>
<td>32.80 &#x00B1; 0.98</td>
<td>28.96 &#x00B1; 2.07</td>
<td>27.92 &#x00B1; 0.55</td>
</tr>
<tr>
<td>Random Forest</td>
<td>53.44 &#x00B1; 2.70</td>
<td>29.43 &#x00B1; 2.34</td>
<td>26.22 &#x00B1; 2.31</td>
<td>24.70 &#x00B1; 2.63</td>
</tr>
<tr>
<td>Na&#x00EF;ve Bayes</td>
<td>75.98 &#x00B1; 6.23</td>
<td>55.56 &#x00B1; 5.39</td>
<td>47.17 &#x00B1; 3.77</td>
<td>42.48 &#x00B1; 3.18</td>
</tr>
<tr>
<td>MatchingNet</td>
<td>77.92 &#x00B1; 3.95</td>
<td>70.04 &#x00B1; 5.36</td>
<td>51.24 &#x00B1; 6.88</td>
<td>48.87 &#x00B1; 9.66</td>
</tr>
<tr>
<td>PrototypicalNet</td>
<td>81.49 &#x00B1; 4.60</td>
<td>72.08 &#x00B1; 4.70</td>
<td>54.05 &#x00B1; 9.92</td>
<td>49.66 &#x00B1; 7.79</td>
</tr>
<tr>
<td>SW-Net</td>
<td>87.25 &#x00B1; 4.34</td>
<td>84.38 &#x00B1; 3.83</td>
<td>80.92 &#x00B1; 5.82</td>
<td>77.64 &#x00B1; 5.74</td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-2"><label>TABLE 2</label>
<caption><title>The prediction accuracy of ablation experiment by 5-fold cross validation</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>100</th>
<th>500</th>
<th>1000</th>
<th>2000</th>
</tr>
</thead>
<tbody>
<tr>
<td>Baseline</td>
<td>67.23 &#x00B1; 1.84</td>
<td>62.19 &#x00B1; 2.71</td>
<td>57.24 &#x00B1; 3.07</td>
<td>45.39 &#x00B1; 2.38</td>
</tr>
<tr>
<td>SI</td>
<td>75.02 &#x00B1; 4.43</td>
<td>63.52 &#x00B1; 4.28</td>
<td>27.22 &#x00B1; 5.45</td>
<td>47.98 &#x00B1; 6.60</td>
</tr>
<tr>
<td>SI&#x002B;TF</td>
<td>79.50 &#x00B1; 3.28</td>
<td>72.29 &#x00B1; 5.30</td>
<td>71.32 &#x00B1; 8.19</td>
<td>64.43 &#x00B1; 6.27</td>
</tr>
<tr>
<td>SI&#x002B;TF&#x002B;FS</td>
<td>83.68 &#x00B1; 3.59</td>
<td>83.50 &#x00B1; 6.55</td>
<td>79.52 &#x00B1; 6.38</td>
<td>73.86 &#x00B1; 4.25</td>
</tr>
<tr>
<td>SW-Net</td>
<td>87.25 &#x00B1; 4.34</td>
<td>84.38 &#x00B1; 3.83</td>
<td>80.92 &#x00B1; 5.82</td>
<td>77.64 &#x00B1; 5.74</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Moreover, we tested SW-Net&#x2019;s ability to select vital features. We selected a representative machine learning method, which is Logistic Regression, and compared its learned weights of features with SW-Net on a 42-dimensional feature setting. <xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows the learned weights of features by logistic regression, and <xref ref-type="fig" rid="fig-4">Fig. 4</xref> represents the weights of features learned by SW-Net; we can see that the red bar of SW-Net is much higher than the blue bar, which demonstrates that the selection of true features is better through our model compared with the conventional method.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption><title>Learned feature weights by logistic regression on a simulated dataset. The red bar shows the true features.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f003.tif"/>
</fig><fig id="fig-4">
<label>Figure 4</label>
<caption><title>Learned feature weights by SW-Net on a simulated dataset. The red bar shows the true features.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f004.tif"/>
</fig>
</sec>
<sec id="s3_4"><title>Experiments on the cancer genome atlas meta-dataset</title>
<p>TCGA Meta-Dataset: The field of genomics lacks a consistent benchmark data set. To address this issue, TCGA Meta-Dataset (<xref ref-type="bibr" rid="ref-31">Samiei <italic>et al</italic>., 2019</xref>) offers a dataset from the publicly available clinical dataset, which is TCGA Program. There are 174 tasks which are all classification problems. The input gene-expression data is with 20530 genes. These are good proxy tasks to develop algorithms for few-shot problems. They consist of a variety of clinical problems, such as predicting tumor tissue site, histological type, and many others. The task definition and data can be found at <uri xlink:href="https://github.com/mandanasmi/TCGA_Benchmark">https://github.com/mandanasmi/TCGA_Benchmark</uri>.</p>
<p>Implementation Details: We selected 68 clinical tasks from it. Each task included two classes and each class had no less than 60 samples. To evaluate the performance of SW-Net and other competing methods, we used 80 classes for training and tested the remaining 56 classes. They were tested on the 5-shot and 1-shot settings, respectively. For simplicity, we did not perform a separate hyper-parameter search. All methods utilized the same network as the backbone, which consisted of 2 fully connected layers, both with ReLU (<xref ref-type="bibr" rid="ref-24">Nair and Hinton, 2010</xref>) activation. The sizes of the two hidden layers were 6000 and 2000, and the output size was 200. We used the Adam optimizer, and the learning rates were determined based on a grid search of [0.001, 0.0005, 0.0001, 0.00005, 0.00001]. A learning rate of 0.0001 was selected for the pre-training stage. All other methods used the same learning rate of 0.0001. For the fine-tuning stage, an SGD optimizer with a 0.001 learning rate was selected.</p>
<p>We kept the backbone the same for all methods. For the conventional methods, we used the implementation in scikit-learn (<uri xlink:href="https://scikit-learn.org/">https://scikit-learn.org/</uri>) for Naive Bayes, Logistic Regression, and Random Forest with default settings. We implemented NeuralNet and AffinityNet with default settings in the original paper (<xref ref-type="bibr" rid="ref-21">Ma and Zhang, 2019</xref>). For matching net, prototypical network, and the baseline method, we followed the implementation by <xref ref-type="bibr" rid="ref-2">Chen <italic>et al</italic>. (2019)</xref>, <uri xlink:href="https://github.com/wyharveychen/CloserLookFewShot">https://github.com/wyharveychen/CloserLookFewShot</uri>. The selected tasks for our experiment can be found at <uri xlink:href="https://drive.google.com/file/d/1cYzuMJKbxWsIZqbwhH1LW0bzfkW_Cc9h/view?usp=sharing">https://drive.google.com/file/d/1cYzuMJKbxWsIZqbwhH1LW0bzfkW_Cc9h/view?usp=sharing</uri>.</p>
</sec>
<sec id="s3_5"><title>Results on the cancer genome atlas meta-dataset</title>
<p>We compared SW-Net against the following methods: two representative meta-learning algorithms (including Matching Net and Prototypical Networks) and conventional learning methods (including Logistic Regression, Neural Network, and majority class prediction). We also conducted an ablation experiment to test the performance of each component of the proposed model. For the conventional methods, we randomly selected 120 samples for each task to take 80 of them as training data and use the rest for testing. Each task had two classes. For meta-learning methods and SW-Net, we tested them under 5-shot and 1-shot settings. The result is shown in <xref ref-type="table" rid="table-3">Table 3</xref>. The query shot was set to 15 in this experiment unless otherwise specified. Fine-tuning was performed on one GPU for 30 epochs for SW-Net. Two updates for the weight were made in each epoch: we first updated the cross-entropy term with the support samples and then updated the Shannon Entropy term with the query samples.</p>
<table-wrap id="table-3"><label>TABLE 3</label>
<caption><title>Mean accuracy on all TCGA meta-dataset test tasks under 1-shot and 5-shot settings by 5-fold cross validation. Best results highlighted in bold</title></caption>
<table><colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>1-shot</th>
<th>5-shot</th>
</tr>
</thead>
<tbody>
<tr>
<td>Majority</td>
<td colspan="2" align="center">63.28 &#x00B1; 8.35</td>
</tr>
<tr>
<td>Logistic regression</td>
<td colspan="2" align="center">68.06 &#x00B1; 10.26</td>
</tr>
<tr>
<td>Neural network</td>
<td colspan="2" align="center">68.67 &#x00B1; 11.77</td>
</tr>
<tr>
<td>MatchingNet</td>
<td>61.08 &#x00B1; 16.94</td>
<td>70.86 &#x00B1; 12.55</td>
</tr>
<tr>
<td>Prototypical networks</td>
<td>66.56 &#x00B1; 14.36</td>
<td>74.55 &#x00B1; 13.21</td>
</tr>
<tr>
<td>Baseline</td>
<td>59.89 &#x00B1; 13.02</td>
<td>70.31 &#x00B1; 9.88</td>
</tr>
<tr>
<td>SI</td>
<td>61.69 &#x00B1; 14.90</td>
<td>73.44 &#x00B1; 9.01</td>
</tr>
<tr>
<td>SI&#x002B;TF</td>
<td>66.22 &#x00B1; 12.05</td>
<td>78.01 &#x00B1; 8.87</td>
</tr>
<tr>
<td>SI&#x002B;TF&#x002B;FS</td>
<td>66.90 &#x00B1; 11.43</td>
<td>79.93 &#x00B1; 9.92</td>
</tr>
<tr>
<td>SW-Net</td>
<td><bold>70.05 &#x00B1; 9.40</bold></td>
<td><bold>81.03 &#x00B1; 8.58</bold></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>As in <xref ref-type="table" rid="table-3">Table 3</xref>, the ablation experiment is mentioned in the bottom section of the table. If we only adopted support-based initialization, the performance can be comparable to the other meta-learning algorithms. For the 1-shot experiment, only performing support-based initialization leads to a minor improvement in accuracy over other methods. For the 5-shot setting, performing support-based initialization and fine-tuning obtains a better result than the other methods.</p>

<p>Transductive fine-tuning in the experiment results in a nearly 5% improvement in prediction accuracy for 1-shot over the support-based initialization. Meanwhile, it led to an improvement of nearly 4% prediction accuracy for the 5-shot setting. This demonstrates that the unlabeled query samples used in the transductive fine-tuning are vital for the few-shot setting. SW-Net led to 1%&#x2013;2% improvement in 1-shot and 5-shot settings over transductive fine-tuning. This shows that the selection vector indeed filtered out the useless features and has a positive effect on the prediction accuracy.</p>
<p>We further compared SW-Net with other methods on the lung cancer subtype task and GBM (glioblastoma multiforme) gene expression subtype task separately under 5-shot settings through 5-fold cross-validation. The evaluation criterion included accuracy and area under the ROC curve (AUC). The result of accuracy is shown in <xref ref-type="table" rid="table-4">Tables 4</xref> and <xref ref-type="table" rid="table-5">5</xref>. &#x201C;SI&#x201D; denotes &#x201C;Support-based Initialization&#x201D;; &#x201C;SI&#x002B;TF&#x201D; denotes &#x201C;Support-based Initialization and transductive fine-tuning&#x201D;; &#x201C;SI&#x002B;TF&#x002B;FS&#x201D; represents Feature Selection net is added; SW-net represents that we add the sample reweighting net to the previous model. In <xref ref-type="fig" rid="fig-5">Fig. 5</xref>, we show the AUC on the lung cancer subtype task and GBM gene expression subtype task. The supported-based initialization improved both AUC and accuracy. Both tasks benefited from the feature selection module and sample reweighting module at different degrees.</p>
<table-wrap id="table-4"><label>Table 4</label>
<caption><title>Accuracy on lung cancer sub-type task by 5-fold cross validation</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>Accuracy%</th>
</tr>
</thead>
<tbody>
<tr>
<td>Majority</td>
<td>47.86 &#x00B1; 8.83</td>
</tr>
<tr>
<td>Logistic regression</td>
<td>62.60 &#x00B1; 5.34</td>
</tr>
<tr>
<td>Neural network</td>
<td>64.25 &#x00B1; 1.98</td>
</tr>
<tr>
<td>MatchingNet</td>
<td>73.36 &#x00B1; 10.52</td>
</tr>
<tr>
<td>Prototypical networks</td>
<td>72.56 &#x00B1; 8.22</td>
</tr>
<tr>
<td>AffinityNet</td>
<td>78.20 &#x00B1; 6.76</td>
</tr>
<tr>
<td>Baseline</td>
<td>72.22 &#x00B1; 6.43</td>
</tr>
<tr>
<td>SI</td>
<td>75.25 &#x00B1; 4.01</td>
</tr>
<tr>
<td>SI&#x002B;TF</td>
<td>76.23 &#x00B1; 5.82</td>
</tr>
<tr>
<td>SI&#x002B;TF&#x002B;FS</td>
<td>79.41 &#x00B1; 6.92</td>
</tr>
<tr>
<td>SW-Net</td>
<td><bold>84.55 &#x00B1; 6.78</bold></td>
</tr>
</tbody>
</table>
</table-wrap><table-wrap id="table-5"><label>Table 5</label>
<caption><title>Accuracy on the glioblastoma multiforme (GBM) gene expression sub-type task by 5-fold cross validation</title></caption>
<table><colgroup>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Algorithm</th>
<th>Accuracy%</th>
</tr>
</thead>
<tbody>
<tr>
<td>Majority</td>
<td>42.77 &#x00B1; 9.34</td>
</tr>
<tr>
<td>Logistic regression</td>
<td>56.25 &#x00B1; 4.56</td>
</tr>
<tr>
<td>Neural network</td>
<td>60.20 &#x00B1; 6.98</td>
</tr>
<tr>
<td>MatchingNet</td>
<td>69.33 &#x00B1; 8.55</td>
</tr>
<tr>
<td>Prototypical networks</td>
<td>68.40 &#x00B1; 6.51</td>
</tr>
<tr>
<td>AffinityNet</td>
<td>71.05 &#x00B1; 5.89</td>
</tr>
<tr>
<td>Baseline</td>
<td>67.45 &#x00B1; 4.45</td>
</tr>
<tr>
<td>SI</td>
<td>69.25 &#x00B1; 6.08</td>
</tr>
<tr>
<td>SI&#x002B;TF</td>
<td>73.13 &#x00B1; 7.81</td>
</tr>
<tr>
<td>SI&#x002B;TF&#x002B;FS</td>
<td>74.49 &#x00B1; 6.78</td>
</tr>
<tr>
<td>SW-Net</td>
<td><bold>78.78 &#x00B1; 5.89</bold></td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-5">
<label>Figure 5</label>
<caption><title>Comparison of Area Under the ROC curve on Lung Cancer task and glioblastoma multiforme (GBM) task.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f005.tif"/>
</fig>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> presents the effect of changing the query shot on the mean accuracy of the tasks for 1 support shot and 5 support shots. For the 1 support shot experiment, the Shannon entropy penalty term in SW-Net resulted in an increase in prediction accuracy as the query shot increased. This effect was not obvious in the 5-support shot setting because more labeled data in the support set is available. One interesting point we observed is that 1 query shot gets a higher result because our transductive fine-tuning method can adapt to the few query samples. The 1 query shot is enough to benefit from this method.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption><title>Mean accuracy of SW-Net for different query shots and support shots.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f006.tif"/>
</fig>
<p>To further test the feature selection capability of the SW-Net, we selected 20 top-ranked significant genes of the lung cancer sub-type task with SW-Net and draw the Kaplan-Meier (KM) curve (<xref ref-type="bibr" rid="ref-48">Cerami <italic>et al</italic>., 2012</xref>) with cBioPortal <uri xlink:href="https://www.cbioportal.org">https://www.cbioportal.org</uri> as shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>. Survival analysis of the selected important genes is performed based on the Pan-Cancer Atlas dataset (<xref ref-type="bibr" rid="ref-11">Hoadley <italic>et al</italic>., 2018</xref>). The two curves do not intersect. The Log-rank test <italic>p</italic>-value was 4.387e-4. The blue line, which represents the unaltered group of patients in the selected genes, has a longer survival time.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption><title>K-M curves of 20 top-ranked genes of lung cancer selected by SW-Net.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f007.tif"/>
</fig>
<p>Moreover, we experimented on the lung cancer dataset to investigate the significance of the important genes selected by our model. We selected the 50 top-ranked genes and performed enrichment analysis with Metascape (<xref ref-type="bibr" rid="ref-47">Zhou <italic>et al</italic>., 2019</xref>). The database we use includes WikiPathway (<xref ref-type="bibr" rid="ref-34">Slenter <italic>et al</italic>., 2018</xref>) and Rectome Pathway (<xref ref-type="bibr" rid="ref-5">Fabregat <italic>et al</italic>., 2018</xref>).</p>
<p><xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows that they are enriched in the &#x201C;non-small cell lung cancer&#x201D; pathway. Signaling by epidermal growth factor receptor (EGFR) and cytokine signaling in the immune system are also related to lung cancer. Tuberculosis, which has been proven to be associated with lung cancer (<xref ref-type="bibr" rid="ref-42">Wu <italic>et al</italic>., 2011</xref>; <xref ref-type="bibr" rid="ref-45">Yu <italic>et al</italic>., 2011</xref>), is enriched in the enrichment analysis in our experiment. Other enriched pathways include fms-like tyrosine kinase 3 (FLT3) signaling, S phase, and so on, which are associated with the cell cycle (<xref ref-type="bibr" rid="ref-30">Sage <italic>et al</italic>., 2003</xref>). EGF and EGFR play a vital role in the development of cancer proliferation (<xref ref-type="bibr" rid="ref-12">Huang <italic>et al</italic>., 2014</xref>).</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption><title>Enrichment analysis for the 50 top ranked genes by meta-learning with the reweighting method in the lung cancer dataset.</title></caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="Biocell-47-25865-f008.tif"/>
</fig>
</sec>
</sec>
<sec id="s4"><title>Discussion and Conclusion</title>
<p>Most computational methods are developed for one particular clinical task in isolation. For example, (<xref ref-type="bibr" rid="ref-38">van Wieringen <italic>et al</italic>., 2009</xref>) worked on survival prediction. <xref ref-type="bibr" rid="ref-20">Lyu and Haque (2018)</xref> researched on tumor cell type classification. This is quite different from the real clinical process. Clinicians and doctors need to take several clinical variables into account simultaneously. In other words, these tasks are interrelated with each other. We can get a more reliable result if we have comprehensive knowledge about the patient. It is practical to take relative tasks into account to get more precise prediction accuracy. We utilized a collection of interrelated tasks and build some prior knowledge for the general prediction. Our new SW-Net can achieve competitive disease sub-type prediction accuracy compared to other traditional methods because we considered the correlated tasks.</p>
<p>What&#x2019;s more, the ability of our model to prioritize the genes for survival analysis was validated by experiments. We performed gene set enrichment analysis. The top-ranked genes were enriched in crucial cancer pathways, such as cell cycle, cell death, interleukin, cytokine signaling in the immune system, and so on. Besides the well-known cancer pathways, our experiment reveals that viruses can be a potential factor affecting cancer development, which is not well-studied yet. For lung cancer, the Epstein-Barr virus infection pathway is enriched, which also reveals that hepatotropic viruses may be associated with lung cancer. In recent research, it has been found that hepatotropic viruses are related to advanced non-small cell lung cancer (<xref ref-type="bibr" rid="ref-46">Zapatka <italic>et al</italic>., 2020</xref>).</p>
<p>In conclusion, the small data and high noise are crucial problems researchers encounter when analyzing genomic data. To address this issue, we utilized a modified approach with a reweighting strategy, which can learn from a small number of samples, and the reweighting module suppressed the samples with high noise. We demonstrate that the proposed framework can achieve competitive performance with traditional methods and other complex models. Last, experiments show that the proposed method is interpretable. The top-ranked genes of lung cancer are enriched in biological pathways associated with cancers.</p>
<p>The small data issue is a factor that limits many biomedical analyses. Our work further demonstrates the prospect of meta-learning for solving biomedical problems with small data. In the future, we want to explore the applications of meta-learning for other biomedical problems, including cancer subtype prediction, drug discovery, and medical image analysis.</p>
</sec>
</body>
<back>
<sec sec-type="data-availability"><title>Availability of Data and Materials</title>
<p>The datasets generated during and/or analysed during the current study are available from the corresponding author on reasonable request.</p>
</sec>
<sec><title>Author Contribution</title>
<p>Study conception and design: Yuhan Ji and Yong Liang; data collection: Yuhan Ji and Ziyi Yang; analysis and interpretation of results: Yuhan Ji and Ning Ai; draft manuscript preparation: Yuhan Ji, Yong Liang, Ziyi Yang, and Ning Ai. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec><title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec><title>Funding Statement</title>
<p>This work is supported by the <funding-source>Macau Science and Technology Development</funding-source> Funds Grands No. <award-id>0158/2019/A3</award-id> from the Macau Special Administrative Region of the People&#x2019;s Republic of China.</p>
</sec>
<sec sec-type="COI-statement"><title>Conflicts of Interest</title>
<p>The authors declare that they have no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear"><title>References</title>
<ref id="ref-1"><label>Bertinetto <italic>et al</italic>. (2018)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Bertinetto</surname> <given-names>L</given-names></string-name>, <string-name><surname>Henriques</surname> <given-names>JF</given-names></string-name>, <string-name><surname>Torr</surname> <given-names>PH</given-names></string-name>, <string-name><surname>Vedaldi</surname> <given-names>A</given-names></string-name></person-group> (<year>2018</year>). <article-title>Meta-learning with differentiable closed-form solvers</article-title>. <comment>arXiv preprint arXiv:1805.08136</comment>.</mixed-citation></ref>
<ref id="ref-48"><label>Cerami <italic>et al</italic>. (2012)</label><mixed-citation publication-type="other"><person-group person-group-type="author">Cerami E, Gao J, Dogrusoz U, Gross B, SumS O, Aksoy B, Schultz N. (2012).</person-group> The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. <source>Cancer Discovery</source> <volume>2</volume>: 401&#x2013;404.</mixed-citation></ref>
<ref id="ref-2"><label>Chen <italic>et al</italic>. (2019)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>WY</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>YC</given-names></string-name>, <string-name><surname>Kira</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>YCF</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>JB</given-names></string-name></person-group> (<year>2019</year>). <article-title>A closer look at few-shot classification</article-title>. <comment>arXiv preprint arXiv:1904.04232</comment>.</mixed-citation></ref>
<ref id="ref-3"><label>Dai <italic>et al</italic>. (2017)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Dai</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Cohen</surname> <given-names>WW</given-names></string-name>, <string-name><surname>Salakhutdinov</surname> <given-names>RR</given-names></string-name></person-group> (<year>2017</year>). <article-title>Good semi-supervised learning that requires a bad gan</article-title>. <conf-name>Proceedings of the 31st International Conference on Neural Information Processing Systems</conf-name>, pp. <fpage>6513</fpage>&#x2013;<lpage>6523</lpage>. <publisher-loc>Long Beach</publisher-loc>.</mixed-citation></ref>
<ref id="ref-4"><label>De La Torre and Black (2003)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>de la Torre</surname> <given-names>F</given-names></string-name>, <string-name><surname>Black</surname> <given-names>MJ</given-names></string-name></person-group> (<year>2003</year>). <article-title>A framework for robust subspace learning</article-title>. <source>International Journal of Computer Vision</source> <volume>54</volume>: <fpage>117</fpage>&#x2013;<lpage>142</lpage>. DOI <pub-id pub-id-type="doi">10.1023/A:1023709501986</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>Fabregat <italic>et al</italic>. (2018)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Fabregat</surname> <given-names>A</given-names></string-name>, <string-name><surname>Jupe</surname> <given-names>S</given-names></string-name>, <string-name><surname>Matthews</surname> <given-names>L</given-names></string-name>, <string-name><surname>Sidiropoulos</surname> <given-names>K</given-names></string-name>, <string-name><surname>Gillespie</surname> <given-names>M</given-names></string-name>, <string-name><surname>Garapati</surname> <given-names>P</given-names></string-name>, <string-name><surname>Haw</surname> <given-names>R</given-names></string-name>, <string-name><surname>Jassal</surname> <given-names>B</given-names></string-name>, <string-name><surname>Korninger</surname> <given-names>F</given-names></string-name>, <string-name><surname>May</surname> <given-names>B</given-names></string-name></person-group> (<year>2018</year>). <article-title>The reactome pathway knowledgebase</article-title>. <source>Nucleic Acids Research</source> <volume>46</volume>: <fpage>D649</fpage>&#x2013;<lpage>D655</lpage>. DOI <pub-id pub-id-type="doi">10.1093/nar/gkx1132</pub-id>.</mixed-citation></ref>
<ref id="ref-6"><label>Fei-Fei <italic>et al</italic>. (2006)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>FF</given-names></string-name>, <string-name><surname>Fergus</surname> <given-names>R</given-names></string-name>, <string-name><surname>Perona</surname> <given-names>P</given-names></string-name></person-group> (<year>2006</year>). <article-title>One-shot learning of object categories</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source> <volume>28</volume>: <fpage>594</fpage>&#x2013;<lpage>611</lpage>. DOI <pub-id pub-id-type="doi">10.1109/TPAMI.2006.79</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>Finn <italic>et al</italic>. (2017)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Finn</surname> <given-names>C</given-names></string-name>, <string-name><surname>Abbeel</surname> <given-names>P</given-names></string-name>, <string-name><surname>Levine</surname> <given-names>S</given-names></string-name></person-group> (<year>2017</year>). <article-title>Model-agnostic meta-learning for fast adaptation of deep networks</article-title>. <conf-name>International Conference on Machine Learning</conf-name>, pp. <fpage>1126</fpage>&#x2013;<lpage>1135</lpage>. <publisher-loc>Sydney</publisher-loc>.</mixed-citation></ref>
<ref id="ref-8"><label>Freund and Schapire (1997)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Freund</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Schapire</surname> <given-names>RE</given-names></string-name></person-group> (<year>1997</year>). <article-title>A decision-theoretic generalization of on-line learning and an application to boosting</article-title>. <source>Journal of Computer and System Sciences</source> <volume>55</volume>: <fpage>119</fpage>&#x2013;<lpage>139</lpage>. DOI <pub-id pub-id-type="doi">10.1006/jcss.1997.1504</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>Garcia and Bruna (2017)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Garcia</surname> <given-names>V</given-names></string-name>, <string-name><surname>Bruna</surname> <given-names>J</given-names></string-name></person-group> (<year>2017</year>). <article-title>Few-shot learning with graph neural networks</article-title>. <comment>arXiv preprint arXiv:1711.04043</comment>.</mixed-citation></ref>
<ref id="ref-10"><label>Grandvalet and Bengio (2004)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Grandvalet</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name></person-group> (<year>2004</year>). <article-title>Semi-supervised learning by entropy minimization</article-title>. <conf-name>Proceedings of the 17th International Conference on Neural Information Processing Systems</conf-name>, pp. <fpage>529</fpage>&#x2013;<lpage>536</lpage>. <publisher-loc>Cambridge</publisher-loc>.</mixed-citation></ref>
<ref id="ref-11"><label>Hoadley <italic>et al</italic>. (2018)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hoadley</surname> <given-names>KA</given-names></string-name>, <string-name><surname>Yau</surname> <given-names>C</given-names></string-name>, <string-name><surname>Hinoue</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wolf</surname> <given-names>DM</given-names></string-name>, <string-name><surname>Lazar</surname> <given-names>AJ</given-names></string-name> <etal>et al.</etal></person-group> (<year>2018</year>). <article-title>Cell-of-origin patterns dominate the molecular classification of 10,000 tumors from 33 types of cancer</article-title>. <source>Cell</source> <volume>173</volume>: <fpage>291</fpage>&#x2013;<lpage>304</lpage>. DOI <pub-id pub-id-type="doi">10.1016/j.cell.2018.03.022</pub-id>.</mixed-citation></ref>
<ref id="ref-12"><label>Huang <italic>et al</italic>. (2014)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Huang</surname> <given-names>P</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Xia</surname> <given-names>J</given-names></string-name></person-group> (<year>2014</year>). <article-title>The role of EGF-EGFR signalling pathway in hepatocellular carcinoma inflammatory microenvironment</article-title>. <source>Journal of Cellular and Molecular Medicine</source> <volume>18</volume>: <fpage>218</fpage>&#x2013;<lpage>230</lpage>. DOI <pub-id pub-id-type="doi">10.1111/jcmm.12153</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>Jiang <italic>et al</italic>. (2014)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Meng</surname> <given-names>D</given-names></string-name>, <string-name><surname>Mitamura</surname> <given-names>T</given-names></string-name>, <string-name><surname>Hauptmann</surname> <given-names>AG</given-names></string-name></person-group> (<year>2014</year>). <article-title>Easy samples first: Self-paced reranking for zero-example multimedia search</article-title>. <conf-name>Proceedings of the 22nd ACM International Conference on Multimedia</conf-name>, pp. <fpage>547</fpage>&#x2013;<lpage>556</lpage>. <publisher-loc>Orlando</publisher-loc>.</mixed-citation></ref>
<ref id="ref-14"><label>Kipf and Welling (2016)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Kipf</surname> <given-names>TN</given-names></string-name>, <string-name><surname>Welling</surname> <given-names>M</given-names></string-name></person-group> (<year>2016</year>). <article-title>Semi-supervised classification with graph convolutional networks</article-title>. <comment>arXiv preprint arXiv:1609.02907</comment>.</mixed-citation></ref>
<ref id="ref-15"><label>Kumar <italic>et al</italic>. (2010)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Kumar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Packer</surname> <given-names>B</given-names></string-name>, <string-name><surname>Koller</surname> <given-names>D</given-names></string-name></person-group> (<year>2010</year>). <article-title>Self-paced learning for latent variable models</article-title>. <conf-name>Proceedings of the 23rd International Conference on Neural Information Processing Systems</conf-name>, vol. 1, pp. <fpage>1189</fpage>&#x2013;<lpage>1197</lpage>. Vancouver.</mixed-citation></ref>
<ref id="ref-16"><label>Lee <italic>et al</italic>. (2019)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lee</surname> <given-names>K</given-names></string-name>, <string-name><surname>Maji</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ravichandran</surname> <given-names>A</given-names></string-name>, <string-name><surname>Soatto</surname> <given-names>S</given-names></string-name></person-group> (<year>2019</year>). <article-title>Meta-learning with differentiable convex optimization</article-title>. <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>10657</fpage>&#x2013;<lpage>10665</lpage>. <publisher-loc>Long Beach</publisher-loc>.</mixed-citation></ref>
<ref id="ref-17"><label>Liang <italic>et al</italic>. (2013)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>C</given-names></string-name>, <string-name><surname>Luan</surname> <given-names>XZ</given-names></string-name>, <string-name><surname>Leung</surname> <given-names>KS</given-names></string-name>, <string-name><surname>Chan</surname> <given-names>TM</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>ZB</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>H</given-names></string-name></person-group> (<year>2013</year>). <article-title>Sparse logistic regression with a L 1/2 penalty for gene selection in cancer classification</article-title>. <source>BMC Bioinformatics</source> <volume>14</volume>: <fpage>1</fpage>&#x2013;<lpage>12</lpage>. DOI <pub-id pub-id-type="doi">10.1186/1471-2105-14-198</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>Lin <italic>et al</italic>. (2020)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Lin</surname> <given-names>TY</given-names></string-name>, <string-name><surname>Goyal</surname> <given-names>P</given-names></string-name>, <string-name><surname>Girshick</surname> <given-names>R</given-names></string-name>, <string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Doll&#x00E1;r</surname> <given-names>P</given-names></string-name></person-group> (<year>2020</year>). <article-title>Focal loss for dense object detection</article-title>. <source>IEEE Transactions on Pattern Analysis and Machine Intelligence</source> <volume>42</volume>: <fpage>318</fpage>&#x2013;<lpage>327</lpage>. DOI <pub-id pub-id-type="doi">10.1109/TPAMI.2018.2858826</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>Liu <italic>et al</italic>. (2018)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lichtenberg</surname> <given-names>T</given-names></string-name>, <string-name><surname>Hoadley</surname> <given-names>KA</given-names></string-name>, <string-name><surname>Poisson</surname> <given-names>LM</given-names></string-name>, <string-name><surname>Lazar</surname> <given-names>AJ</given-names></string-name> <etal>et al.</etal></person-group> (<year>2018</year>). <article-title>An integrated TCGA pan-cancer clinical data resource to drive high-quality survival outcome analytics</article-title>. <source>Cell</source> <volume>173</volume>: <fpage>400</fpage>&#x2013;<lpage>416</lpage>. DOI <pub-id pub-id-type="doi">10.1016/j.cell.2018.02.052</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>Lyu and Haque (2018)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lyu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Haque</surname> <given-names>A</given-names></string-name></person-group> (<year>2018</year>). <article-title>Deep learning based tumor type classification using gene expression data</article-title>. <conf-name>Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. Association for Computing Machinery</conf-name>, pp. <fpage>89</fpage>&#x2013;<lpage>96</lpage>. <publisher-loc>New York</publisher-loc>.</mixed-citation></ref>
<ref id="ref-21"><label>Ma and Zhang (2019)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ma</surname> <given-names>T</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>A</given-names></string-name></person-group> (<year>2019</year>). <article-title>AffinityNet: Semi-supervised few-shot learning for disease type prediction</article-title>. <conf-name>Proceedings of the AAAI Conference on Artificial Intelligence</conf-name>, vol. 33<italic>,</italic> pp. <fpage>1069</fpage>&#x2013;<lpage>1076</lpage>. <publisher-loc>Honolulu</publisher-loc>.</mixed-citation></ref>
<ref id="ref-22"><label>Mishra <italic>et al</italic>. (2018)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Mishra</surname> <given-names>N</given-names></string-name>, <string-name><surname>Rohaninejad</surname> <given-names>M</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Abbeel</surname> <given-names>P</given-names></string-name></person-group> (<year>2018</year>). <article-title>A simple neural attentive meta-learner</article-title>. <comment>arXiv preprints, arXiv:1707.03141</comment>.</mixed-citation></ref>
<ref id="ref-23"><label>Munkhdalai and Yu (2017)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Munkhdalai</surname> <given-names>T</given-names></string-name>, <string-name><surname>Yu</surname> <given-names>H</given-names></string-name></person-group> (<year>2017</year>). <article-title>Meta networks</article-title>. <conf-name>Proceedings of the 34th International Conference on Machine Learning</conf-name>, pp. <fpage>2554</fpage>&#x2013;<lpage>2563</lpage>. <publisher-loc>Sydney</publisher-loc>.</mixed-citation></ref>
<ref id="ref-24"><label>Nair and Hinton (2010)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nair</surname> <given-names>V</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>GE</given-names></string-name></person-group> (<year>2010</year>). <article-title>Rectified linear units improve restricted boltzmann machines</article-title>. <conf-name>Proceedings of the 27th International Conference on International Conference on Machine Learning</conf-name>, pp. <fpage>807</fpage>&#x2013;<lpage>814</lpage>. <publisher-loc>Haifa</publisher-loc>.</mixed-citation></ref>
<ref id="ref-25"><label>Nichol <italic>et al</italic>. (2018)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Nichol</surname> <given-names>A</given-names></string-name>, <string-name><surname>Achiam</surname> <given-names>J</given-names></string-name>, <string-name><surname>Schulman</surname> <given-names>J</given-names></string-name></person-group> (<year>2018</year>). <article-title>On first-order meta- learning algorithms</article-title>. <comment>arXiv preprint arXiv:1803.02999</comment>.</mixed-citation></ref>
<ref id="ref-26"><label>Qiu <italic>et al</italic>. (2018)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Qiu</surname> <given-names>YL</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Devos</surname> <given-names>A</given-names></string-name>, <string-name><surname>Selby</surname> <given-names>H</given-names></string-name>, <string-name><surname>Gevaert</surname> <given-names>O</given-names></string-name></person-group> (<year>2018</year>). <article-title>Low-shot learning with imprinted weights</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>5822</fpage>&#x2013;<lpage>5830</lpage>. <publisher-loc>Salt Lake City</publisher-loc>.</mixed-citation></ref>
<ref id="ref-27"><label>Qiu <italic>et al</italic>. (2020)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Qiu</surname> <given-names>YL</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Devos</surname> <given-names>A</given-names></string-name>, <string-name><surname>Selby</surname> <given-names>H</given-names></string-name>, <string-name><surname>Gevaert</surname> <given-names>O</given-names></string-name></person-group> (<year>2020</year>). <article-title>A meta-learning approach for genomic survival analysis</article-title>. <source>Nature Communications</source> <volume>11</volume>: <fpage>6350</fpage>. DOI <pub-id pub-id-type="doi">10.1038/s41467-020-20167-3</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>Rukhsar <italic>et al</italic>. (2022)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Rukhsar</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bangyal</surname> <given-names>WH</given-names></string-name>, <string-name><surname>Ali Khan</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Ag Ibrahim</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Nisar</surname> <given-names>K</given-names></string-name>, <string-name><surname>Rawat</surname> <given-names>DB</given-names></string-name></person-group> (<year>2022</year>). <article-title>Analyzing RNA-seq gene expression data using deep learning approaches for cancer classification</article-title>. <source>Applied Sciences</source> <volume>12</volume>: <fpage>1850</fpage>. DOI <pub-id pub-id-type="doi">10.3390/app12041850</pub-id>.</mixed-citation></ref>
<ref id="ref-29"><label>Rusu <italic>et al</italic>. (2018)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Rusu</surname> <given-names>AA</given-names></string-name>, <string-name><surname>Rao</surname> <given-names>D</given-names></string-name>, <string-name><surname>Sygnowski</surname> <given-names>J</given-names></string-name>, <string-name><surname>Vinyals</surname> <given-names>O</given-names></string-name>, <string-name><surname>Pascanu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Osindero</surname> <given-names>S</given-names></string-name>, <string-name><surname>Hadsell</surname> <given-names>R</given-names></string-name></person-group> (<year>2018</year>). <article-title>Meta-learning with latent embedding optimization</article-title>. <comment>arXiv preprint arXiv:1807.05960</comment>.</mixed-citation></ref>
<ref id="ref-30"><label>Sage <italic>et al</italic>. (2003)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sage</surname> <given-names>J</given-names></string-name>, <string-name><surname>Miller</surname> <given-names>AL</given-names></string-name>, <string-name><surname>P&#x00E9;rez-Mancera</surname> <given-names>PA</given-names></string-name>, <string-name><surname>Wysocki</surname> <given-names>JM</given-names></string-name>, <string-name><surname>Jacks</surname> <given-names>T</given-names></string-name></person-group> (<year>2003</year>). <article-title>Acute mutation of retinoblastoma gene function is sufficient for cell cycle re-entry</article-title>. <source>Nature</source> <volume>424</volume>: <fpage>223</fpage>&#x2013;<lpage>228</lpage>. DOI <pub-id pub-id-type="doi">10.1038/nature01764</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>Samiei <italic>et al</italic>. (2019)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Samiei</surname> <given-names>M</given-names></string-name>, <string-name><surname>W&#x00FC;rfl</surname> <given-names>T</given-names></string-name>, <string-name><surname>Deleu</surname> <given-names>T</given-names></string-name>, <string-name><surname>Weiss</surname> <given-names>M</given-names></string-name>, <string-name><surname>Dutil</surname> <given-names>F</given-names></string-name>, <string-name><surname>Fevens</surname> <given-names>T</given-names></string-name>, <string-name><surname>Boucher</surname> <given-names>G</given-names></string-name>, <string-name><surname>Lemieux</surname> <given-names>S</given-names></string-name>, <string-name><surname>Cohen</surname> <given-names>JP</given-names></string-name></person-group> (<year>2019</year>). <article-title>The tcga meta-dataset clinical benchmark</article-title>. <comment>arXiv preprint arXiv:1910.08636</comment>.</mixed-citation></ref>
<ref id="ref-32"><label>Saria and Goldenberg (2015)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Saria</surname> <given-names>S</given-names></string-name>, <string-name><surname>Goldenberg</surname> <given-names>A</given-names></string-name></person-group> (<year>2015</year>). <article-title>Subtyping: What it is and its role in precision medicine</article-title>. <source>IEEE Intelligent Systems</source> <volume>30</volume>: <fpage>70</fpage>&#x2013;<lpage>75</lpage>. DOI <pub-id pub-id-type="doi">10.1109/MIS.2015.60</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>Shu <italic>et al</italic>. (2019)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Shu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Yi</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>S</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Meng</surname> <given-names>D</given-names></string-name></person-group> (<year>2019</year>). <article-title>Meta-weight-net: Learning an explicit mapping for sample weighting</article-title>. <conf-name>Proceedings of the 33rd International Conference on Neural Information Processing Systems</conf-name>, pp. <fpage>1919</fpage>&#x2013;<lpage>1930</lpage>. <publisher-loc>Vancouver</publisher-loc>.</mixed-citation></ref>
<ref id="ref-34"><label>Slenter <italic>et al</italic>. (2018)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Slenter</surname> <given-names>DN</given-names></string-name>, <string-name><surname>Kutmon</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hanspers</surname> <given-names>K</given-names></string-name>, <string-name><surname>Riutta</surname> <given-names>A</given-names></string-name>, <string-name><surname>Windsor</surname> <given-names>J</given-names></string-name>, <string-name><surname>Nunes</surname> <given-names>N</given-names></string-name>, <string-name><surname>M&#x00E9;lius</surname> <given-names>J</given-names></string-name>, <string-name><surname>Cirillo</surname> <given-names>E</given-names></string-name>, <string-name><surname>Coort</surname> <given-names>SL</given-names></string-name>, <string-name><surname>Digles</surname> <given-names>D</given-names></string-name></person-group> (<year>2018</year>). <article-title>WikiPathways: A multifaceted pathway database bridging metabolomics to other omics research</article-title>. <source>Nucleic Acids Research</source> <volume>46</volume>: <fpage>D661</fpage>&#x2013;<lpage>D667</lpage>. DOI <pub-id pub-id-type="doi">10.1093/nar/gkx1064</pub-id>.</mixed-citation></ref>
<ref id="ref-35"><label>Snell <italic>et al</italic>. (2017)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Snell</surname> <given-names>J</given-names></string-name>, <string-name><surname>Swersky</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zemel</surname> <given-names>R</given-names></string-name></person-group> (<year>2017</year>). <article-title>Prototypical networks for few-shot learning</article-title>. <conf-name>Proceedings of the 31st International Conference on Neural Information Processing Systems</conf-name>, pp. <fpage>4080</fpage>&#x2013;<lpage>4090</lpage>. <publisher-loc>Long Beach</publisher-loc>.</mixed-citation></ref>
<ref id="ref-36"><label>Sohn <italic>et al</italic>. (2017)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sohn</surname> <given-names>BH</given-names></string-name>, <string-name><surname>Hwang</surname> <given-names>JE</given-names></string-name>, <string-name><surname>Jang</surname> <given-names>HJ</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>HS</given-names></string-name>, <string-name><surname>Oh</surname> <given-names>SC</given-names></string-name> <etal>et al.</etal></person-group> (<year>2017</year>). <article-title>Clinical significance of four molecular subtypes of gastric cancer identified by the cancer genome atlas project</article-title>. <source>Clinical Cancer Research</source> <volume>23</volume>: <fpage>4441</fpage>&#x2013;<lpage>4449</lpage>. DOI <pub-id pub-id-type="doi">10.1158/1078-0432.CCR-16-2211</pub-id>.</mixed-citation></ref>
<ref id="ref-37"><label>Sung <italic>et al</italic>. (2018)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sung</surname> <given-names>F</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>L</given-names></string-name>, <string-name><surname>Xiang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Torr</surname> <given-names>PH</given-names></string-name>, <string-name><surname>Hospedales</surname> <given-names>TM</given-names></string-name></person-group> (<year>2018</year>). <article-title>Learning to compare: Relation network for few-shot learning</article-title>. <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>, pp. <fpage>1199</fpage>&#x2013;<lpage>1208</lpage>. <publisher-loc>Salt Lake City</publisher-loc>.</mixed-citation></ref>
<ref id="ref-38"><label>van Wieringen <italic>et al</italic>. (2009)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>van Wieringen</surname> <given-names>WN</given-names></string-name>, <string-name><surname>Kun</surname> <given-names>D</given-names></string-name>, <string-name><surname>Hampel</surname> <given-names>R</given-names></string-name>, <string-name><surname>Boulesteix</surname> <given-names>AL</given-names></string-name></person-group> (<year>2009</year>). <article-title>Survival prediction using gene expression data: A review and comparison</article-title>. <source>Computational Statistics &#x0026; Data Analysis</source> <volume>53</volume>: <fpage>1590</fpage>&#x2013;<lpage>1603</lpage>. DOI <pub-id pub-id-type="doi">10.1016/j.csda.2008.05.021</pub-id>.</mixed-citation></ref>
<ref id="ref-39"><label>Vinyals <italic>et al</italic>. (2016)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Vinyals</surname> <given-names>O</given-names></string-name>, <string-name><surname>Blundell</surname> <given-names>C</given-names></string-name>, <string-name><surname>Lillicrap</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wierstra</surname> <given-names>D</given-names></string-name></person-group> (<year>2016</year>). <article-title>Matching networks for one shot learning</article-title>. <conf-name>Proceedings of the 30th International Conference on Neural Information Processing Systems</conf-name>, pp. <fpage>3637</fpage>&#x2013;<lpage>3645</lpage>. <publisher-loc>Red Hook</publisher-loc>.</mixed-citation></ref>
<ref id="ref-40"><label>Wang <italic>et al</italic>. (2017)</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Kucukelbir</surname> <given-names>A</given-names></string-name>, <string-name><surname>Blei</surname> <given-names>DM</given-names></string-name></person-group> (<year>2017</year>). <article-title>Robust probabilistic modeling with bayesian data reweighting</article-title>. <conf-name>Proceedings of the 34th International Conference on Machine Learning</conf-name>, pp. <fpage>3646</fpage>&#x2013;<lpage>3655</lpage>. <publisher-loc>Sydney</publisher-loc>.</mixed-citation></ref>
<ref id="ref-41"><label>Weiss <italic>et al</italic>. (2016)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Weiss</surname> <given-names>K</given-names></string-name>, <string-name><surname>Khoshgoftaar</surname> <given-names>TM</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>D</given-names></string-name></person-group> (<year>2016</year>). <article-title>A survey of transfer learning</article-title>. <source>Journal of Big Data</source> <volume>3</volume>: <fpage>1</fpage>&#x2013;<lpage>40</lpage>. DOI <pub-id pub-id-type="doi">10.1186/s40537-016-0043-6</pub-id>.</mixed-citation></ref>
<ref id="ref-42"><label>Wu <italic>et al</italic>. (2011)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wu</surname> <given-names>CY</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>HY</given-names></string-name>, <string-name><surname>Pu</surname> <given-names>CY</given-names></string-name>, <string-name><surname>Huang</surname> <given-names>N</given-names></string-name>, <string-name><surname>Shen</surname> <given-names>HC</given-names></string-name>, <string-name><surname>Li</surname> <given-names>CP</given-names></string-name>, <string-name><surname>Chou</surname> <given-names>YJ</given-names></string-name></person-group> (<year>2011</year>). <article-title>Pulmonary tuberculosis increases the risk of lung cancer: A population&#x00AD;based cohort study</article-title>. <source>Cancer</source> <volume>117</volume>: <fpage>618</fpage>&#x2013;<lpage>624</lpage>. DOI <pub-id pub-id-type="doi">10.1002/cncr.25616</pub-id>.</mixed-citation></ref>
<ref id="ref-43"><label>Yang <italic>et al</italic>. (2020)</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Yang</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Shu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Meng</surname> <given-names>D</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>Z</given-names></string-name></person-group> (<year>2020</year>). <article-title>Select-ProtoNet: Learning to select for few-shot disease subtype prediction</article-title>. <comment>arXiv preprint arXiv:2009.00792</comment>.</mixed-citation></ref>
<ref id="ref-44"><label>Yoo <italic>et al</italic>. (2021)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yoo</surname> <given-names>TK</given-names></string-name>, <string-name><surname>Choi</surname> <given-names>JY</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>HK</given-names></string-name></person-group> (<year>2021</year>). <article-title>Feasibility study to improve deep learning in OCT diagnosis of rare retinal diseases with few-shot classification</article-title>. <source>Medical &#x0026; Biological Engineering &#x0026; Computing</source> <volume>59</volume>: <fpage>401</fpage>&#x2013;<lpage>415</lpage>. DOI <pub-id pub-id-type="doi">10.1007/s11517-021-02321-1</pub-id>.</mixed-citation></ref>
<ref id="ref-45"><label>Yu <italic>et al</italic>. (2011)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Yu</surname> <given-names>YH</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>CC</given-names></string-name>, <string-name><surname>Hsu</surname> <given-names>WH</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>HJ</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>WC</given-names></string-name>, <string-name><surname>Muo</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Sung</surname> <given-names>FC</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>CY</given-names></string-name></person-group> (<year>2011</year>). <article-title>Increased lung cancer risk among patients with pulmonary tuberculosis: A population cohort study</article-title>. <source>Journal of Thoracic Oncology</source> <volume>6</volume>: <fpage>32</fpage>&#x2013;<lpage>37</lpage>. DOI <pub-id pub-id-type="doi">10.1097/JTO.0b013e3181fb4fcc</pub-id>.</mixed-citation></ref>
<ref id="ref-46"><label>Zapatka <italic>et al</italic>. (2020)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zapatka</surname> <given-names>M</given-names></string-name>, <string-name><surname>Borozan</surname> <given-names>I</given-names></string-name>, <string-name><surname>Brewer</surname> <given-names>DS</given-names></string-name>, <string-name><surname>Iskar</surname> <given-names>M</given-names></string-name>, <string-name><surname>Grundhoff</surname> <given-names>A</given-names></string-name>, <string-name><surname>Alawi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Desai</surname> <given-names>N</given-names></string-name>, <string-name><surname>S&#x00FC;ltmann</surname> <given-names>H</given-names></string-name>, <string-name><surname>Moch</surname> <given-names>H</given-names></string-name>, <string-name><surname>Cooper</surname> <given-names>CS</given-names></string-name></person-group> (<year>2020</year>). <article-title>The landscape of viral associations in human cancers</article-title>. <source>Nature Genetics</source> <volume>52</volume>: <fpage>320</fpage>&#x2013;<lpage>330</lpage>. DOI <pub-id pub-id-type="doi">10.1038/s41588-019-0558-9</pub-id>.</mixed-citation></ref>
<ref id="ref-47"><label>Zhou <italic>et al</italic>. (2019)</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhou</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhou</surname> <given-names>B</given-names></string-name>, <string-name><surname>Pache</surname> <given-names>L</given-names></string-name>, <string-name><surname>Chang</surname> <given-names>M</given-names></string-name>, <string-name><surname>Khodabakhshi</surname> <given-names>AH</given-names></string-name>, <string-name><surname>Tanaseichuk</surname> <given-names>O</given-names></string-name>, <string-name><surname>Benner</surname> <given-names>C</given-names></string-name>, <string-name><surname>Chanda</surname> <given-names>SK</given-names></string-name></person-group> (<year>2019</year>). <article-title>Metascape provides a biologist-oriented resource for the analysis of systems-level datasets</article-title>. <source>Nature Communications</source> <volume>10</volume>: <fpage>1</fpage>&#x2013;<lpage>10</lpage>. DOI <pub-id pub-id-type="doi">10.1038/s41467-019-09234-6</pub-id>.</mixed-citation></ref>
</ref-list>
</back>
</article>