<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">64969</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2025.064969</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>Graph-Embedded Neural Architecture Search: A Variational Approach for Optimized Model Design</article-title>
<alt-title alt-title-type="left-running-head">Graph-Embedded Neural Architecture Search: A Variational Approach for Optimized Model Design</alt-title>
<alt-title alt-title-type="right-running-head">Graph-Embedded Neural Architecture Search: A Variational Approach for Optimized Model Design</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Hemmi</surname><given-names>Kazuki</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref><email>henmi-kazuki@aist.go.jp</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Tanigaki</surname><given-names>Yuki</given-names></name><xref ref-type="aff" rid="aff-3">3</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Hara</surname><given-names>Kaisei</given-names></name><xref ref-type="aff" rid="aff-4">4</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Onishi</surname><given-names>Masaki</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Degree Programs in Systems and Information Engineering, University of Tsukuba</institution>, <addr-line>1-1-1 Tennodai, Tsukuba, Ibaraki, 305</addr-line>-<addr-line>8577</addr-line>, <country>Japan</country></aff>
<aff id="aff-2"><label>2</label><institution>Artificial Intelligence Research Center, National Institute of Advanced Industrial Science and Technology (AIST)</institution>, <addr-line>1-1-1 Umezono, Tsukuba, Ibaraki, 305</addr-line>-<addr-line>8568</addr-line>, <country>Japan</country></aff>
<aff id="aff-3"><label>3</label><institution>Department of Electronics and Information Systems Engineering, Osaka Institute of Technology</institution>, <addr-line>5-16-1 Omiya, Asahi-ku, Osaka, 535</addr-line>-<addr-line>8585</addr-line>, <country>Japan</country></aff>
<aff id="aff-4"><label>4</label><institution>Department of Electronics and Information Engineering, Nagaoka University of Technology</institution>, <addr-line>1603-1, Kamitomioka Nagaoka, Niigata, 940</addr-line>-<addr-line>2188</addr-line>, <country>Japan</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Kazuki Hemmi. Email: <email>henmi-kazuki@aist.go.jp</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year>
</pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>03</day><month>07</month><year>2025</year>
</pub-date>
<volume>84</volume>
<issue>2</issue>
<fpage>2245</fpage>
<lpage>2271</lpage>
<history>
<date date-type="received">
<day>28</day>
<month>2</month>
<year>2025</year>
</date>
<date date-type="accepted">
<day>28</day>
<month>4</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_64969.pdf"></self-uri>
<abstract>
<p>Neural architecture search (NAS) optimizes neural network architectures to align with specific data and objectives, thereby enabling the design of high-performance models without specialized expertise. However, a significant limitation of NAS is that it requires extensive computational resources and time. Consequently, performing a comprehensive architectural search for each new dataset is inefficient. Given the continuous expansion of available datasets, there is an urgent need to predict the optimal architecture for the previously unknown datasets. This study proposes a novel framework that generates architectures tailored to unknown datasets by mapping architectures that have demonstrated effectiveness on the existing dataset into a latent feature space. As NAS is inherently represented as graph structures, we employed an encoder-decoder transformation model based on variational graph auto-encoders to perform this latent feature mapping. The encoder-decoder transformation model demonstrates strong capability in extracting features from graph structures, making it particularly well-suited for mapping NAS architectures. By training variational graph auto-encoders on existing high-quality architectures, the proposed method constructs a latent space and facilitates the design of optimal architectures for diverse datasets. Furthermore, to effectively define similarity among architectures, we propose constructing the latent space by incorporating both dataset and task features. Experimental results indicate that our approach significantly enhances search efficiency and outperforms conventional methods in terms of model performance.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Neural architecture search</kwd>
<kwd>automated machine learning</kwd>
<kwd>artificial intelligence</kwd>
<kwd>deep learning</kwd>
<kwd>graph neural network</kwd>
</kwd-group>
<funding-group>
<award-group id="awg1">
<funding-source>New Energy and Industrial Technology Development Organization (NEDO)</funding-source>
<award-id>JPNP18002</award-id>
</award-group>
</funding-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>In recent years, neural architecture search (NAS), which automatically explores optimal neural network architectures customized based on the characteristics of datasets and tasks, has played a crucial role in AutoML. Network architectures have conventionally been designed through trial-and-error handcrafting, an approach that necessitates specialized expertise and substantial time to improve performance.</p>
<p>However, NAS faces the challenge of long computational times required for architecture searches. Conducting a full architectural search each time a new dataset emerges is highly inefficient. To effectively address the increasingly diverse datasets expected in the future, techniques that can directly predict the optimal architecture of unseen datasets are required. If the optimal architecture could be directly predicted, it would substantially expedite the architecture generation process and would produce a marked reduction in the search time.</p>
<p>Various generation methods have been proposed, such as iterative random architecture generation or partial modifications of existing architectures. However, these approaches are constrained by limitations such as the inability to consider multiple candidates simultaneously and the difficulty in discovering high-quality solutions owing to their inherently discrete operations. This study focuses on methods that map architectures into a continuous latent feature space to address these issues. Specifically, we attempted to embed existing high-performance architectures in latent feature spaces, and subsequently estimate and generate architectures suitable for new datasets based on their local neighborhood information. We propose a framework called NAVIGATOR (Neural Architecture search using VarIational Graph AuTO-encodeR), which generates architectures appropriate for unseen datasets by mapping architectures that perform well on existing datasets into a latent feature space.</p>
<p>We employed an encoder-decoder transformation model for latent feature mapping based on variational graph auto-encoders (VGAE) [<xref ref-type="bibr" rid="ref-1">1</xref>]. VGAE excels at extracting features from graph structures using its encoder and decoder, rendering it particularly suitable for mapping NAS architectures. By minimizing the error between the input architecture and its reconstructed output, the latent representations are trained to capture the essential characteristics of each architecture. By training a VGAE on existing high-quality architectures, NAVIGATOR constructs a latent space and designs optimal architectures for diverse datasets. Furthermore, to define the similarity, we propose constructing the latent space by incorporating dataset features and task features. We demonstrated that the proposed method can generate architectures with a performance comparable to the performances of existing NAS methods, yet with significantly less search time. Architectural performance in the vicinity of the optimal solution was investigated. Considering architecture generation through the latent space as a form of information transfer, the distribution of architectures within this space significantly influences the performance of the method. An overview of the NAVIGATOR is presented in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Schematic of the proposed approach</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-1.tif"/>
</fig>
<p>The preliminary version [<xref ref-type="bibr" rid="ref-2">2</xref>] of this study was presented at the 33rd International Conference on Artificial Neural Networks. The main updates since the previous version include additional experiments evaluating the utility of the proposed method and an extension of the framework to incorporate the dataset and task features. Each encoder of NAVIGATOR and the base method of the compositions of the contents were also verified as a new preliminary experiment. Furthermore, new validations were conducted to confirm the effectiveness of the proposed method.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Related Work</title>
<p>This section reviews related research on NAS methods and Graph Neural Network (GNN) relevant to this study.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Neural Architecture Search (NAS)</title>
<p>NAS is a subfield of AutoML dedicated to the automated search for optimal neural network 61 architectures in deep learning. NAS approaches primarily optimize two aspects: the connection patterns among the layers and the selection of operations to search for the optimal architecture. Early NAS methods based on evolutionary algorithms and reinforcement learning [<xref ref-type="bibr" rid="ref-3">3</xref>&#x2013;<xref ref-type="bibr" rid="ref-6">6</xref>] were proposed; however, they required extremely high computational cost and time, prompting the need for a more efficient approach. Given the high computational cost of NAS-based architecture searches, efficient architectural representations were sought, leading to the emergence of cell-based NAS methods designed to reduce the search space. Among the cell-based NAS methods, DARTS [<xref ref-type="bibr" rid="ref-7">7</xref>] and PC-DARTS [<xref ref-type="bibr" rid="ref-8">8</xref>], which enable an efficient architecture search, have been established as representative methods. In these methods, an entire network is defined as a collection of compact modules (cells), with each cell represented as a directed acyclic graph (DAG), thereby enabling the construction of large-scale architecture within a confined search space. Recently, numerous high-accuracy architectures have been discovered using cell-based NAS algorithms [<xref ref-type="bibr" rid="ref-9">9</xref>], and the reuse of the high-performance architectures obtained via NAS is anticipated to be an important topic for future reuse. In addition to cell-based NAS methods, recent advancements have introduced innovative approaches that enhance both search efficiency and architecture performance. For instance, approaches that integrate NAS with self-adaptive techniques and evolutionary algorithms [<xref ref-type="bibr" rid="ref-10">10</xref>] effectively address challenges such as early convergence, bias, and inefficiency in operation selection. These methods contribute to a more robust and flexible architecture search. In addition, techniques that progressively activate partial channel connections across multiple stages have been proposed to balance computational costs and model accuracy. Specialized NAS approaches have also been developed for generative adversarial networks (GANs). Transformer-based NAS methods [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-12">12</xref>] have also demonstrated significant advancements in the structural representation refinement, which were difficult to achieve using earlier techniques. Moreover, incorporating large language models into the NAS framework has shown potential in improving architecture prediction and search guidance [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>]. Collectively, these developments represent the leading edge of NAS research and highlight promising directions for future studies and practical applications.</p>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>Differentiable Architecture Search (DARTS)</title>
<p>DARTS is a cell-based NAS method that employs weight sharing and a gradient method to change the continuous search space, thereby enabling an efficient search of architectures. Weight sharing is a one-shot technique that selects architectural candidates in part of the supernet (a model that includes all network architecture candidates in the search space) of a directed acyclic graph. <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref> pertains to two nodes in DARTS.</p>
<p><disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>&#x003C;</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:munder><mml:msup><mml:mi>o</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:msup><mml:mi>x</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></disp-formula></p>
<p><xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> is used for the search operation.
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:msup><mml:mrow><mml:mover><mml:mi>o</mml:mi><mml:mo accent="false">&#x00AF;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>o</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mi>o</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>o</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:msup><mml:mi>o</mml:mi><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mi>o</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mi mathvariant="bold-italic">x</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Herein, <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mi mathvariant="bold">O</mml:mi></mml:mrow></mml:math></inline-formula> is a candidate operation entering an edge, and the architecture weight of a pair of nodes <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is a parameter <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:msup><mml:mi mathvariant="bold-italic">&#x03B1;</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> of dimension <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow></mml:math></inline-formula>. The architecture weights <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi mathvariant="bold-italic">&#x03B1;</mml:mi></mml:math></inline-formula> represent the significance of each candidate operation in determining the network architecture. When the search is completed, the architecture is determined using <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>:
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:msup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:munder><mml:mrow><mml:mrow><mml:mtext>argmax</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mi>o</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mo>&#x2061;</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mi>o</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup></mml:math></disp-formula></p>
<p>DARTS searches for two types of cells: normal and reduced cells. When the search is completed, the architecture is generated by connecting two types of multiple cells.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Patrial Channel Connections for Memory-Efficient Architecture Search (PC-DARTS)</title>
<p>PC-DARTS, an enhanced version of DARTS, is a gradient-based method that can reduce the memory and computational time required to search network architectures. DARTS requires considerable memory because its target range is wide and includes a redundant search space. Contrarily, PC-DARTS combines edge normalization techniques, which can select a stable connection of the network, and a partial channel connection (<xref ref-type="disp-formula" rid="eqn-4">Eq. (4)</xref>), which passes only part of the channel in the search operation for an efficient search.</p>
<p><inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">S</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup></mml:math></inline-formula> is a masking function that assigns 1 to a selected channel and 0 to a masked channel.
<disp-formula id="eqn-4"><label>(4)</label><mml:math id="mml-eqn-4" display="block"><mml:msubsup><mml:mrow><mml:mi>o</mml:mi></mml:mrow><mml:mrow><mml:mi>P</mml:mi><mml:mi>C</mml:mi></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>;</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">S</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>o</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mfrac><mml:mrow><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mi>o</mml:mi><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:msup><mml:mi>o</mml:mi><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:msup><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:mi>exp</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:msup><mml:mi>o</mml:mi><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:msup></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:mfrac><mml:mi>o</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">S</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">i</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">S</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mo stretchy="false">)</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">x</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">i</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>PC-DARTS achieves performance gains of less than 0.1 GPU days, surpassing DARTS&#x2019;s requirement of 1.0 GPU days. Moreover, the enhanced stability allows PC-DARTS to directly search neural architecture large datasets.</p>
</sec>
<sec id="s2_4">
<label>2.4</label>
<title>Graph Neural Network (GNN)</title>
<p>A GNN [<xref ref-type="bibr" rid="ref-15">15</xref>] is a deep learning model designed for graph-structured data and has been applied to tasks such as graph classification, edge prediction, and node classification. Because GNNs effectively process any data represented as nodes and edges, they have been applied across various fields including molecular structure analysis in chemistry [<xref ref-type="bibr" rid="ref-16">16</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>] and multibody particle simulations in physics [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>]. In this study, given that the generated NAS is represented in graph form, GNN methods were employed to process the graph-structured data as inputs. Many GNNs employ a convolution operation known as graph convolution network (GCN). In particular, GCN demonstrates high performance by extracting local graph information using the adjacency matrix and degree matrix, as shown in <xref ref-type="disp-formula" rid="eqn-5">Eq. (5)</xref>.
<disp-formula id="eqn-5"><label>(5)</label><mml:math id="mml-eqn-5" display="block"><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:mo>=</mml:mo><mml:mi>h</mml:mi><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.623em" minsize="1.623em">(</mml:mo></mml:mrow></mml:mstyle><mml:msup><mml:mrow><mml:mtext mathvariant="bold">D</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:msup><mml:mrow><mml:mtext mathvariant="bold">D</mml:mtext></mml:mrow><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msup><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msubsup><mml:msup><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>l</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:mrow></mml:msup><mml:mstyle scriptlevel="0"><mml:mrow><mml:mo maxsize="1.623em" minsize="1.623em">)</mml:mo></mml:mrow></mml:mstyle></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> represents the node features at the input and output of a layer. <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow></mml:math></inline-formula> is the matrix obtained by adding the identity matrix to the adjacency matrix (which indicates how edges connect in an undirected graph), <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mrow><mml:mtext mathvariant="bold">W</mml:mtext></mml:mrow></mml:math></inline-formula> is a learnable weight matrix, <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mrow><mml:mtext mathvariant="bold">D</mml:mtext></mml:mrow></mml:math></inline-formula> is the degree matrix representing node relationships, and <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>h</mml:mi></mml:math></inline-formula> is an activation function (e.g., ReLU or Softmax). Additionally, the graph attention network (GAT) [<xref ref-type="bibr" rid="ref-20">20</xref>] has attracted significant attention as a method that enhances representational power by incorporating an attention mechanism into the GCN to consider the importance between nodes. Additional GNN layers have also been introduced, such as GraphSAGE [<xref ref-type="bibr" rid="ref-21">21</xref>], Chebyshev Networks [<xref ref-type="bibr" rid="ref-22">22</xref>], and Graph Isomorphism Network (GIN) [<xref ref-type="bibr" rid="ref-23">23</xref>].</p>
<p>Variational graph auto-encoders (VGAE) [<xref ref-type="bibr" rid="ref-24">24</xref>] are models that learn latent representations of graphs using an encoder-decoder framework and serve as core methods in this study. It extends the VAE [<xref ref-type="bibr" rid="ref-25">25</xref>] originally developed for images to enable the representation of graphs. The details of the encoder and decoder are provided in the following section.</p>
<p>The flexible structural representation capability of GNNs enables them to effectively encode complex problem structures, making them well-suited for a diverse array of optimization problems. Notable applications include methods to combinatorial optimization [<xref ref-type="bibr" rid="ref-26">26</xref>] and reinforcement learning approaches incorporating GNN as the agent&#x2019;s state representation [<xref ref-type="bibr" rid="ref-27">27</xref>]. These studies demonstrate the utility of GNNs in solving various optimization tasks and are closely aligned with the objectives and methodology of the proposed research.</p>
</sec>
<sec id="s2_5">
<label>2.5</label>
<title>Related Approaches</title>
<p>Several approaches similar to the proposed method exist in the literature. The proposed method is characterized by the following five features:
<list list-type="simple">
<list-item><label>(a)</label><p>Alleviating computational cost and search time is reduced</p></list-item>
<list-item><label>(b)</label><p>Building latent feature space using VGAE</p></list-item>
<list-item><label>(c)</label><p>Capable of handling continuous architecture</p></list-item>
<list-item><label>(d)</label><p>Directly generated architectures</p></list-item>
<list-item><label>(e)</label><p>Enabling information transfer in existing datasets</p></list-item>
</list></p>
<p>Owing to the substantial time and computational resources required for search, conventional NAS methods face significant challenges, prompting numerous studies aimed at substantially reducing the computational cost (a) [<xref ref-type="bibr" rid="ref-28">28</xref>&#x2013;<xref ref-type="bibr" rid="ref-30">30</xref>]. Furthermore, numerous methods have been proposed to map features extracted from NAS to a latent space [<xref ref-type="bibr" rid="ref-31">31</xref>&#x2013;<xref ref-type="bibr" rid="ref-34">34</xref>]. Among these, only a limited number&#x2013;such as those in [<xref ref-type="bibr" rid="ref-35">35</xref>&#x2013;<xref ref-type="bibr" rid="ref-37">37</xref>], employ VGAE to extract features, which distinguishes our approach in terms of (b). By employing VGAE, detailed features and properties of the architectures can be effectively captured through graph representations. Other related approaches include methods that construct a predictor to estimate performance [<xref ref-type="bibr" rid="ref-38">38</xref>&#x2013;<xref ref-type="bibr" rid="ref-41">41</xref>] (graph-based approach [<xref ref-type="bibr" rid="ref-42">42</xref>&#x2013;<xref ref-type="bibr" rid="ref-44">44</xref>]); however, our study adopts an approach (d) that directly generates architectures from the latent feature space. Moreover, by simultaneously learning the embedded representations of the operations, the proposed method can accommodate both discrete and continuous architectures (c). In terms of leveraging promising prior knowledge to enhance performance (e), the concept is akin to that of transfer learning [<xref ref-type="bibr" rid="ref-45">45</xref>&#x2013;<xref ref-type="bibr" rid="ref-47">47</xref>]. While transfer learning methods primarily focus on transferring dataset information, the proposed method specializes in transferring the information of architectures themselves. Contrary to many related approaches that primarily aim to improve the accuracy, this study focuses on investigating architectural structures across various datasets. There were approaches with only a distinctive feature and those exhibiting multiple features. However, the proposed method possesses all five features, (a)&#x2013;(e), making it significantly different from related research. All five features are critical for the generation of the architecture. The combination of these features provides multifaceted advantages and is expected to be applicable in a wide range of applications. Furthermore, a notable aspect of this study is that by using an improved VGAE, the detailed characteristics of the architectures explored through the NAS can be effectively captured. Another distinguishing feature of this study is the construction of architectures that can achieve high accuracies across multiple datasets.</p>
<p>In contrast to neural architecture optimization [<xref ref-type="bibr" rid="ref-48">48</xref>] and their subsequent Graph VAE-based extensions [<xref ref-type="bibr" rid="ref-49">49</xref>], NAVIGATOR adopts a substantially different framework from the perspective of design paradigm. These earlier methods primarily focus on optimizing performance within a single dataset, embedding only the architecture into the latent space. As a result, they often fail to account for the tasks or dataset diversity. By contrast, NAVIGATOR is explicitly designed to maximize transferability across various tasks and datasets. Concretely, it achieves this by separately encoding architectural structure and unstructured task information, which are then integrated in the latent space. This design enables NAVIGATOR to directly generate architectures tailored for previously unseen tasks, thereby demonstrating high flexibility and extensibility. Furthermore, NAVIGATOR&#x2019;s configuration, which clearly separates the roles of each module, such as architecture encoder, dataset encoder, task encoder, and decoder, makes it particularly well-suited for future improvements and domain-specific applications.</p>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Proposed Approach NAVIGATOR</title>
<p>This study proposes a Neural Architecture search using the NAVIGATOR. This method leverages the latent features obtained by integrating NAS and VGAE, thereby enabling the visualization of the discovered architectural characteristics and the generation of new architectures.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Overview of NAVIGATOR</title>
<p><xref ref-type="fig" rid="fig-1">Fig. 1</xref> shows a schematic overview of the proposed method, and the details of the encoder and decoder are shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. NAVIGATOR consists of three components: NAS, VGAE, and the generating model. The procedures described in <bold>Algorithm 1</bold> (NAS: Lines 1&#x2013;5, VGAE: Lines 6&#x2013;15, Generating Model: Lines 16&#x2013;19), the process proceeds accordingly. Initially, NAS was applied to a given dataset to obtain an optimized architecture. The number of obtained architectures depends on the type of dataset, and multiple searches were performed with different seeds. Subsequently, the encoder and decoder of VGAE were employed on the optimized architectures to extract latent features. As shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>, clustering can be performed using visualization techniques to generate new architectures after extracting the latent features. For visualization, we used PCA [<xref ref-type="bibr" rid="ref-50">50</xref>], a dimensionality reduction algorithm. Preliminary experiments tested other dimensionality reduction methods, PCA was adopted in this study because its mechanism aligns well with VGAE and it yielded more visually interpretable results.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Details of encoder &#x0026; decoder. GAT: graph attention network; GCN: graph convolution network; edge classifier: two fully connected layers with a ReLU</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-2.tif"/>
</fig>
<fig id="fig-12">
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-12.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Encoder</title>
<p>Let <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> be the mean of the latent variables, and let <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> be their variance. As shown in <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>, the encoder computes <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mi>&#x03C3;</mml:mi></mml:math></inline-formula> from the graph representing the network architecture discovered by NAS (node features <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>, adjacency matrix <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow></mml:math></inline-formula>, and edge features <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula>).
<disp-formula id="eqn-6"><label>(6)</label><mml:math id="mml-eqn-6" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>&#x03BC;</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>GNN</mml:mtext></mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>GNN</mml:mtext></mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> contains the node features, <inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow></mml:math></inline-formula> is the adjacency matrix for the edges, and <inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> contains edge features. The node features <inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> consist of input, intermediate, and output nodes in the NAS, with these three types of nodes serving as the input features. For example, when combining the two cell types in PC-DARTS, there were four input nodes, eight intermediate nodes (hyperparameters), and two output nodes. In addition, the edge features <inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> of the graph were learned as input. The edge features <inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> depend on the NAS search space. In PC-DARTS, for instance, <inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext mathvariant="bold">O</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>8</mml:mn></mml:math></inline-formula> operation candidates were used (skip connection, none, average pooling 3 <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3, max pooling 3 <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3, separable convolution 3 <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3, separable convolution 5 <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 5, dilated convolution 3 <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3, and dilated convolution 5 <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 5). Each operator corresponds to an edge feature in <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> that passes through an embedding layer. The embedding layer was also updated during training. By embedding edge features <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub></mml:math></inline-formula> through this embedding layer, inter-edge relationships can be more effectively reflected in the latent space. Consequently, it becomes possible to capture not only the graph structure but also the edge information more comprehensively.</p>
<p>In the proposed method, the original VGAE is enhanced by adopting two types of layers: one that uses only GCN layers and another that combines GAT and GCN layers, as shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. In the NAS search space considered in this study, each architecture typically includes multiple inputs and intricate skip connections. With its ability to assign varying importance to connections on a per-node basis through an attention mechanism, the GAT dynamically captures structural heterogeneity. This property effectively extracts architectural features that a GCN constrained by its fixed receptive field may not fully capture, thereby contributing to enhanced representational power. However, owing to the computation of weights for each edge, the model experiences significant computational cost and increased complexity. To balance this trade-off, we employ a stepwise configuration that integrates GAT and GCN layers. Concretely, the initial GAT layer is used to learn non-local and non-uniform relationships, while the subsequent GCN layer efficiently aggregates local graph structure information within the graph. This hybrid design balances representational expressiveness and computational efficiency. Moreover, such a design enables gradual control over model complexity and facilitates the learning of latent representations that can flexibly accommodate the diverse architecture within the search space.</p>
<p>The encoder in the proposed method captures topological properties through the VGAE framework, which learns the latent space by directly considering connection relationships among nodes. Specifically, the initial GAT layer extracts structural importance through attention mechanisms, while the subsequent GCN layer aggregates these features across nodes. This process enables the emergence of structural characteristics as spatial proximity in the latent space. Moreover, the proposed method integrates edge features into node-to-node relationships through an embedding layer, thereby enabling the comprehensive retention of both local and global structural information in the latent representation. As a result, the latent space significantly reflects the topological properties of architectures, facilitating representation learning that is semantically and structurally coherent.</p>
<p>We assumed that the latent feature <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> follows a normal distribution characterized by the computed <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>&#x03BC;</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mrow><mml:mi>&#x03C3;</mml:mi></mml:mrow></mml:math></inline-formula>, and output <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> as shown in <xref ref-type="disp-formula" rid="eqn-7">Eq. (7)</xref>.
<disp-formula id="eqn-7"><label>(7)</label><mml:math id="mml-eqn-7" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:mi>q</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mi>q</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:mrow><mml:mtext>with</mml:mtext></mml:mrow><mml:mtext>&#x00A0;&#x00A0;</mml:mtext><mml:mi>q</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">N</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>N</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi><mml:mi>m</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mi>&#x03BC;</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>Here, <italic>N</italic> denotes the number of nodes. The encoder derives the parameters of the normal distribution, and by taking the product of the normal distributions for each node, which forms the overall distribution.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Decoder</title>
<p>In the decoder, the graph structure is reconstructed from the latent feature <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> through the inner product in <xref ref-type="disp-formula" rid="eqn-11">Eq. (11)</xref>. Here, <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mrow><mml:mover><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> is the adjacency matrix of the generated graph.</p>
<p><inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mrow><mml:mover><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mi>E</mml:mi></mml:msub><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow></mml:math></inline-formula> represents the generated edge features, predicted from the latent feature <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> using an edge classifier. The decoder comprises an inner product and edge classifier. By minimizing the loss between the reconstructed and original graphs, it was optimized to accurately extract latent features from the target graph.</p>
</sec>
<sec id="s3_4">
<label>3.4</label>
<title>Loss Function</title>
<p>The loss function of the proposed method consists of the sum of the Reconstruction Loss (Recon Loss) from variational graph auto-encoders (VGAE), edge class loss, and Kullback&#x2013;Leibler divergence loss (KL Loss).</p>
<p>Specifically, it is given by <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref> below. Specifically, it is given by <xref ref-type="disp-formula" rid="eqn-8">Eq. (8)</xref> below. <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a hyperparameter that adjusts the importance of edge class loss, and <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is a hyperparameter that adjusts the importance of KL Loss.
<disp-formula id="eqn-8"><label>(8)</label><mml:math id="mml-eqn-8" display="block"><mml:mi>L</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>s</mml:mi><mml:mo>=</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the reconstruction loss of the VGAE, given by <xref ref-type="disp-formula" rid="eqn-9">Eq. (9)</xref>, which uses positive and negative losses. <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> brings the reconstructed structure closer to the original graph. <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msub><mml:mi>W</mml:mi><mml:mi>P</mml:mi></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:msub><mml:mi>W</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:math></inline-formula> are hyperparameters that adjust the importance of the positive and negative losses, respectively.
<disp-formula id="eqn-9"><label>(9)</label><mml:math id="mml-eqn-9" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>R</mml:mi><mml:mi>e</mml:mi><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow></mml:mrow></mml:munder><mml:msub><mml:mi>W</mml:mi><mml:mi>P</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>W</mml:mi><mml:mi>N</mml:mi></mml:msub><mml:mo>&#x2217;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:msub><mml:mi>A</mml:mi><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the element in the adjacency matrix of the true graph, and <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:msub><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> is the corresponding element in the adjacency matrix of the generated graph. <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the edge class loss, which is calculated using the cross-entropy loss to predict the type of each edge. <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> aligns the generated architectures with the true structure based on the edge types. While other losses are also considered. In the original VGAE, <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> was newly added to our proposed method to incorporate fine-grained features from the NAS. <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> represents the KL divergence between the normal distribution of the latent variables and the prior distribution and is given by <xref ref-type="disp-formula" rid="eqn-10">Eq. (10)</xref>. KL divergence is a metric that measures the difference between two probability distributions. <inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> helps ensure the continuity of the latent space and stabilizes the training. Here,<italic>N</italic> denotes the dimension of the latent variables (calculated using <xref ref-type="disp-formula" rid="eqn-6">Eq. (6)</xref>).
<disp-formula id="eqn-10"><label>(10)</label><mml:math id="mml-eqn-10" display="block"><mml:msub><mml:mi>L</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mfrac><mml:mn>1</mml:mn><mml:mn>2</mml:mn></mml:mfrac><mml:munderover><mml:mo>&#x2211;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:munderover><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:mi>log</mml:mi><mml:mo>&#x2061;</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>&#x03BC;</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mi>&#x03C3;</mml:mi><mml:mi>i</mml:mi><mml:mn>2</mml:mn></mml:msubsup><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>The sum of these three loss terms forms the objective function of our method.
<disp-formula id="eqn-11"><label>(11)</label><mml:math id="mml-eqn-11" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mover><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">E</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:munderover><mml:mo>&#x220F;</mml:mo><mml:mrow><mml:mi>i</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:munderover><mml:mo>&#x220F;</mml:mo><mml:mrow><mml:mi>j</mml:mi><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:mrow><mml:mi>N</mml:mi></mml:munderover><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>F</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>E</mml:mi><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mi>p</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mover><mml:mi>A</mml:mi><mml:mo stretchy="false">&#x005E;</mml:mo></mml:mover></mml:mrow><mml:mrow><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>i</mml:mi></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>=</mml:mo><mml:mi>S</mml:mi><mml:mi>i</mml:mi><mml:mi>g</mml:mi><mml:mi>m</mml:mi><mml:mi>o</mml:mi><mml:mi>i</mml:mi><mml:mi>d</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>i</mml:mi><mml:mi>T</mml:mi></mml:msubsup><mml:mo>,</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mi>j</mml:mi></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
</sec>
<sec id="s3_5">
<label>3.5</label>
<title>Un-Architecture Information Encoder</title>
<p>The proposed method employs an encoder-decoder model to extract architectural features. Instead of directly capturing dataset features, the method indirectly incorporates them by using NAS to search for the optimal architecture for a given dataset or task and then feeding the discovered architecture into the model. Nevertheless, a direct feature extractor is desirable to incorporate dataset features into the latent space. As described in the subsequent preliminary experiments, the optimal architecture varied depending on the dataset. Factors such as the existence of similar labels and the number of channels in the dataset were found to be significant. To address dataset feature extraction, we used existing pre-trained deep learning models as feature extractor encoders. The dataset feature representation was then obtained using either the dataset-wide average or the weighted sum. In this study, we adopted CLIP [<xref ref-type="bibr" rid="ref-51">51</xref>], which has a high performance as a feature extraction method for a pre-trained model. By extracting the dataset features, we anticipate potential variations influenced by factors such as whether the target images are in color or grayscale.</p>
<p>We also considered the extraction of task features in addition to dataset features. For example, even if the same dataset is used to classify human images, the required features will differ depending on whether the task is to classify gender or emotions (that is specific regions of interest in the image change). We hypothesize that this also leads to changes in the optimal architecture. Hence, we examined the optimal architecture for each task in the previous studies [<xref ref-type="bibr" rid="ref-52">52</xref>]. The results show that the classification target and primary focus points influence the optimal architecture, indicating the importance of these factors in task features. Accordingly, this study adopted these approaches as the task features. We used the diagonal elements of the Fisher information matrix as gradients for image recognition tasks. The architecture used during training was ResNet18 [<xref ref-type="bibr" rid="ref-53">53</xref>], which is lightweight and has a low computation time. The Fisher information matrix theoretically indicates the sensitivity of parameters concerning each task, thereby effectively representing inter-task differences. This property guides the latent space toward meaningful directions that are reflective of structural and functional differences of individual tasks, thereby improving the efficiency of neural architecture search. In particular, by explicitly incorporating task-specific information into the latent space, NAVIGATOR can preferentially acquire regions of the latent representation that are more appropriate for individual tasks. Consequently, variations in task features lead to corresponding shifts in the latent space, enabling the rapid identification of architectures optimized for the target task. These choices are based on experimental results reported in the existing literature. Some studies embedded tasks as vector representations Task2Vec [<xref ref-type="bibr" rid="ref-54">54</xref>] and Editing Models [<xref ref-type="bibr" rid="ref-55">55</xref>] by handling weights after fine-tuning. We establish these as task features that NAVIGATOR can process as a reference.</p>
<p>The extraction of such unarchitectured information dataset features and task features contributes to both the transformation of the latent space and the optimization and generation of architectures within it. In other words, we prepared the existing architectures, datasets, and tasks in latent spaces. When a new dataset or task was introduced, its features can be extracted from the encoder, and the distances between these features can be measured to facilitate optimization. In our experiments, we investigated the approach within the domain of previously used similar datasets and examined whether incorporating unarchitectured information into the latent space and the architecture generated from it enhances performance.</p>
</sec>
<sec id="s3_6">
<label>3.6</label>
<title>Generating Model</title>
<p>To generate a new network architecture, the latent features are manipulated according to the following steps:</p>
<p>1. The user identifies a promising latent feature <inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> for the target task, based on dataset similarity or other criteria.</p>
<p>2. Considering other features, the identified latent feature <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> is normalized to the range [0, 1] on a per-dimension basis.</p>
<p>3. The generated direction vector is scaled and added to the corresponding normalized latent feature to produce a new latent feature near the optimal point.</p>
<p>4. The new latent feature generated in Step 3 was denormalized using the inverse of Step 2 and fed into the trained decoder of the VGAE.</p>
<p>5. From the new latent feature <inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, the network architecture <inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is generated.</p>
<p>One possible way to select promising latent features is to infer them from dataset features. Assuming that the best latent features can be estimated, the default approach in this study was to manually specify the architecture that achieves the best performance on existing datasets. In Steps 2&#x2013;4, the new latent feature <inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is generated by multiplying a scaling factor as shown in <xref ref-type="disp-formula" rid="eqn-12">Eq. (12)</xref>. <inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mtext>sgn</mml:mtext></mml:math></inline-formula> is a sign function that returns the sign of the input value; hence, a direction vector with values of either <inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> or <inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mn>1</mml:mn></mml:math></inline-formula> per dimension is generated from the uniform distribution <inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:mrow><mml:mi>&#x1D4B0;</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>.
<disp-formula id="eqn-12"><label>(12)</label><mml:math id="mml-eqn-12" display="block"><mml:mtable columnalign="right left right left right left right left right left right left" rowspacing="3pt" columnspacing="0em 2em 0em 2em 0em 2em 0em 2em 0em 2em 0em" displaystyle="true"><mml:mtr><mml:mtd /><mml:mtd><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>normalize</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:mrow><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub></mml:mrow><mml:mrow><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mspace width="1em" /><mml:mrow><mml:mtext mathvariant="bold">d</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mtext>Sgn</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x1D4B0;</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo>&#x2212;</mml:mo><mml:mn>0.5</mml:mn><mml:mo>,</mml:mo><mml:mn>0.5</mml:mn><mml:mo stretchy="false">)</mml:mo><mml:mo>)</mml:mo></mml:mrow></mml:mtd></mml:mtr><mml:mtr><mml:mtd /><mml:mtd><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>normalize</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mrow><mml:mtext mathvariant="bold">d</mml:mtext></mml:mrow><mml:mo stretchy="false">)</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">max</mml:mo></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mo movablelimits="true" form="prefix">min</mml:mo></mml:mrow></mml:msub></mml:mtd></mml:mtr></mml:mtable></mml:math></disp-formula></p>
<p>In Step 3, we applied a small perturbation using the uniform random variable and <inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> to obtain a new latent feature near the original one. Given that multiple trials were repeated in this study, a small perturbation was applied to achieve favorable outcomes. <inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> is a hyperparameter. By keeping the direction vector&#x2019;s distance fixed in the normalized latent feature space, we set an absolute distance that depends on the value of <inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula>. When <inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> is sufficiently small, the newly generated network architecture is expected to share similar properties with the architecture corresponding to the optimal point.</p>
<p>The above describes the method for generating a new latent feature <inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> when the Un-architecture Information Encoder is not used as part of the generation model. We also propose a method that utilizes the unarchitectured information extracted from each dataset and task to determine the direction and magnitude of generation based on these features.</p>
<p>First, for each dataset, we obtained a high-dimensional dataset feature vector <inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>d</mml:mi></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> using a pre-trained feature extractor. Similarly, for task-specific information, we extracted a feature vector <inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:msub><mml:mi>d</mml:mi><mml:mi>t</mml:mi></mml:msub></mml:mrow></mml:msup></mml:math></inline-formula> using a task encoder. We standardized these feature vectors to the same dimension <inline-formula id="ieqn-97"><mml:math id="mml-ieqn-97"><mml:mi>d</mml:mi></mml:math></inline-formula>, such that <inline-formula id="ieqn-98"><mml:math id="mml-ieqn-98"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>. Next, if both types of unarchitectured information were used, we define a combined feature vector <inline-formula id="ieqn-99"><mml:math id="mml-ieqn-99"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> by a weighted average as follows:
<disp-formula id="eqn-13"><label>(13)</label><mml:math id="mml-eqn-13" display="block"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mn>1</mml:mn><mml:mo>&#x2212;</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>&#x22C5;</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup><mml:mo>,</mml:mo></mml:math></disp-formula></p>
<p>Here, <inline-formula id="ieqn-100"><mml:math id="mml-ieqn-100"><mml:mi>&#x03B1;</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula> is a hyperparameter that adjusts the contribution of the dataset and task features. If only one type of unarchitectured information is available, we set <inline-formula id="ieqn-101"><mml:math id="mml-ieqn-101"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> for the dataset features alone or <inline-formula id="ieqn-102"><mml:math id="mml-ieqn-102"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mi>t</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> for task features alone.</p>
<p>Next, we computed the difference between the latent feature <inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>&#x2217;</mml:mo></mml:msup><mml:mo>&#x2208;</mml:mo><mml:msup><mml:mrow><mml:mi mathvariant="double-struck">R</mml:mi></mml:mrow><mml:mrow><mml:mi>d</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula>, which corresponds to an existing optimal architecture (the &#x201C;good point&#x201D; obtained via NAS and VGAE), and the combined feature <inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:msub></mml:math></inline-formula>. A new candidate latent feature is determined as the intersection of the two hyperspheres whose centers are the latent features <inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2217;</mml:mo></mml:msubsup></mml:math></inline-formula> and <inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2217;</mml:mo></mml:msubsup></mml:math></inline-formula> of the existing optimal architecture and whose radii are the distances <inline-formula id="ieqn-107"><mml:math id="mml-ieqn-107"><mml:msub><mml:mi>d</mml:mi><mml:mn>1</mml:mn></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-108"><mml:math id="mml-ieqn-108"><mml:msub><mml:mi>d</mml:mi><mml:mn>2</mml:mn></mml:msub></mml:math></inline-formula>, respectively. There were two new candidate latent features at the intersection of these two hyperspheres. In this study, the following offset vector <inline-formula id="ieqn-109"><mml:math id="mml-ieqn-109"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> was defined to select a solution consistent with the existing dataset distribution:
<disp-formula id="eqn-14"><label>(14)</label><mml:math id="mml-eqn-14" display="block"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>1</mml:mn></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mn>1</mml:mn><mml:mo>&#x2217;</mml:mo></mml:msubsup><mml:mo stretchy="false">)</mml:mo><mml:mo>+</mml:mo><mml:msub><mml:mi>w</mml:mi><mml:mn>2</mml:mn></mml:msub><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mrow><mml:mtext mathvariant="bold">F</mml:mtext></mml:mrow><mml:mrow><mml:mrow><mml:mtext>combined</mml:mtext></mml:mrow></mml:mrow></mml:msub><mml:mo>&#x2212;</mml:mo><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mn>2</mml:mn><mml:mo>&#x2217;</mml:mo></mml:msubsup><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo></mml:math></disp-formula>where the weights <inline-formula id="ieqn-110"><mml:math id="mml-ieqn-110"><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub></mml:math></inline-formula> are defined based on the distance from each dataset, as follows:
<disp-formula id="eqn-15"><label>(15)</label><mml:math id="mml-eqn-15" display="block"><mml:msub><mml:mi>w</mml:mi><mml:mi>i</mml:mi></mml:msub><mml:mo>=</mml:mo><mml:mfrac><mml:msub><mml:mi>d</mml:mi><mml:mi>j</mml:mi></mml:msub><mml:mrow><mml:munder><mml:mo>&#x2211;</mml:mo><mml:mi>k</mml:mi></mml:munder><mml:msub><mml:mi>d</mml:mi><mml:mi>k</mml:mi></mml:msub></mml:mrow></mml:mfrac><mml:mo>,</mml:mo><mml:mspace width="1em" /><mml:mo stretchy="false">(</mml:mo><mml:mi>i</mml:mi><mml:mo>,</mml:mo><mml:mi>j</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:mn>1</mml:mn><mml:mo>,</mml:mo><mml:mn>2</mml:mn><mml:mo fence="false" stretchy="false">}</mml:mo><mml:mo>,</mml:mo><mml:mi>i</mml:mi><mml:mo>&#x2260;</mml:mo><mml:mi>j</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></disp-formula></p>
<p>The weighting selects natural interpolation points by strongly reflecting the features of the closer datasets.</p>
<p>This difference measures the extent to which a new dataset or task diverges from an existing optimal architecture. This study employed the intuitive Euclidean distance as our metric; however, alternative distance measures, such as the Mahalanobis distance, can also be applied if necessary.</p>
<p>The new latent feature <inline-formula id="ieqn-111"><mml:math id="mml-ieqn-111"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> is generated by adding the offset <inline-formula id="ieqn-112"><mml:math id="mml-ieqn-112"><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow></mml:math></inline-formula> to the existing optimal latent representation <inline-formula id="ieqn-113"><mml:math id="mml-ieqn-113"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>&#x2217;</mml:mo></mml:msup></mml:math></inline-formula>.
<disp-formula id="eqn-16"><label>(16)</label><mml:math id="mml-eqn-16" display="block"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup><mml:mo>=</mml:mo><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>&#x2217;</mml:mo></mml:msup><mml:mo>+</mml:mo><mml:mi>&#x03B3;</mml:mi><mml:mo>&#x22C5;</mml:mo><mml:mi mathvariant="normal">&#x0394;</mml:mi><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mo>,</mml:mo></mml:math></disp-formula></p>
<p>where <inline-formula id="ieqn-114"><mml:math id="mml-ieqn-114"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> is a scaling hyperparameter that controls the magnitude of the perturbation.</p>
<p>If the latent space is pre-normalized (e.g., to the range <inline-formula id="ieqn-115"><mml:math id="mml-ieqn-115"><mml:mo stretchy="false">[</mml:mo><mml:mn>0</mml:mn><mml:mo>,</mml:mo><mml:mn>1</mml:mn><mml:mo stretchy="false">]</mml:mo></mml:math></inline-formula>), an inverse normalization is applied after updating using <xref ref-type="disp-formula" rid="eqn-16">Eq. (16)</xref> to restore the original scale. This approach directly employs the unarchitectured information extracted from existing datasets and tasks to identify new latent features. Consequently, by inputting <inline-formula id="ieqn-116"><mml:math id="mml-ieqn-116"><mml:msup><mml:mrow><mml:mtext mathvariant="bold">z</mml:mtext></mml:mrow><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msup></mml:math></inline-formula> to the trained VGAE decoder, a new network architecture <inline-formula id="ieqn-117"><mml:math id="mml-ieqn-117"><mml:msubsup><mml:mrow><mml:mtext mathvariant="bold">A</mml:mtext></mml:mrow><mml:mi>r</mml:mi><mml:mrow><mml:mi mathvariant="normal">&#x2032;</mml:mi></mml:mrow></mml:msubsup></mml:math></inline-formula> is generated.</p>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Preliminary Experiments</title>
<p>We present preliminary experiments conducted to use the dataset encoder and dimensionality reduction methods in the NAVIGATOR.</p>
<sec id="s4_1">
<label>4.1</label>
<title>Experimental Settings of Datasets</title>
<p>In this experiment, we employed PC-DARTS to analyze the effects of the architectures obtained from multiple datasets. In deep learning, it has been shown that when the input image resolutions differ, lower resolution leads to decreased recognition accuracy [<xref ref-type="bibr" rid="ref-56">56</xref>]. However, in NAS&#x2013;which searches for the architecture itself, there is still insufficient analysis of how image resolution affects the network architecture. Therefore, we prepared multiple datasets with varying image resolutions to evaluate their effects on the architecture.</p>
<p>In this experiment, seven datasets were used for analysis: CIFAR-10 [<xref ref-type="bibr" rid="ref-57">57</xref>], MNIST [<xref ref-type="bibr" rid="ref-58">58</xref>], Fashion-MNIST [<xref ref-type="bibr" rid="ref-59">59</xref>], SVHN [<xref ref-type="bibr" rid="ref-60">60</xref>], Oxford-IIIT Pet [<xref ref-type="bibr" rid="ref-61">61</xref>], Oxford 102 Flower [<xref ref-type="bibr" rid="ref-62">62</xref>], and STL-10 [<xref ref-type="bibr" rid="ref-63">63</xref>]. Because each dataset comprises images with different characteristics, we conducted experiments using multiple datasets. <xref ref-type="table" rid="table-1">Table 1</xref> presents the characteristics of the datasets.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Characteristics and splits for each dataset</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Dataset</th>
<th>Image</th>
<th>Feature</th>
<th>Class</th>
<th>Resolution</th>
<th>Channel</th>
<th>Search <inline-formula id="ieqn-118"><mml:math id="mml-ieqn-118"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula></th>
<th>Search <inline-formula id="ieqn-119"><mml:math id="mml-ieqn-119"><mml:mi mathvariant="bold-italic">&#x03B1;</mml:mi></mml:math></inline-formula></th>
<th>Train</th>
<th>Val</th>
<th>Search Resolution</th>
</tr>
</thead>
<tbody>
<tr>
<td>CIFAR-10</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-1.tif"/></td>
<td>Animals and Vehicles</td>
<td>10</td>
<td>32</td>
<td>3 (RGB)</td>
<td>25,000</td>
<td>25,000</td>
<td>50,000</td>
<td>10,000</td>
<td>32, 64, 128</td>
</tr>
<tr>
<td>MNIST</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-2.tif"/></td>
<td>Handwritten Digits</td>
<td>10</td>
<td>28</td>
<td>1 (Grayscale)</td>
<td>30,000</td>
<td>30,000</td>
<td>60,000</td>
<td>10,000</td>
<td>28, 64, 128</td>
</tr>
<tr>
<td>Fashion</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-3.tif"/></td>
<td>Clothing</td>
<td>10</td>
<td>28</td>
<td>1 (Grayscale)</td>
<td>30,000</td>
<td>30,000</td>
<td>60,000</td>
<td>10,000</td>
<td>28, 64, 128</td>
</tr>
<tr>
<td>SVHN</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-4.tif"/></td>
<td>House Numbers</td>
<td>10</td>
<td>32</td>
<td>3 (RGB)</td>
<td>36,628</td>
<td>36,629</td>
<td>73,257</td>
<td>26,032</td>
<td>32, 64, 128</td>
</tr>
<tr>
<td>Pet</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-5.tif"/></td>
<td>Dogs and Cats</td>
<td>37</td>
<td><inline-formula id="ieqn-120"><mml:math id="mml-ieqn-120"><mml:mo>&#x2265;</mml:mo><mml:mn>224</mml:mn></mml:math></inline-formula></td>
<td>3 (RGB)</td>
<td>3674</td>
<td>3675</td>
<td>7349</td>
<td>7349</td>
<td>32, 64, 128, 224</td>
</tr>
<tr>
<td>Flower</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-6.tif"/></td>
<td>Flowers</td>
<td>102</td>
<td><inline-formula id="ieqn-121"><mml:math id="mml-ieqn-121"><mml:mo>&#x2265;</mml:mo><mml:mn>224</mml:mn></mml:math></inline-formula></td>
<td>3 (RGB)</td>
<td>6149</td>
<td>1024</td>
<td>6149</td>
<td>1024</td>
<td>32, 64, 128, 224</td>
</tr>
<tr>
<td>STL-10</td>
<td><inline-graphic mime-subtype="tif" xlink:href="CMC_64969-inline-7.tif"/></td>
<td>Animals and Vehicles</td>
<td>10</td>
<td>96</td>
<td>3 (RGB)</td>
<td>4000</td>
<td>4000</td>
<td>8000</td>
<td>5000</td>
<td>32, 64, 128, 224</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>CIFAR-10 and STL-10 datasets contained visually similar images, whereas both the MNIST and Fashion-MNIST were grayscale image datasets. In addition, both MNIST and SVHN consisted of digit images. If the two datasets share similar image features, the architectures derived from PC-DARTS are expected to exhibit similar characteristics. Moreover, the Flower, Pet, and STL-10 were datasets with larger original image sizes compared with the others. To ensure consistency, downsampling was applied to reduce the image resolution according to the evaluation requirements. We hypothesized that using upsampling and downsampling yields different architectural characteristics during resolution evaluation. Hence, we included Flower, Pet, and STL-10 in the datasets. We searched for network architectures with PC-DARTS using seven datasets: CIFAR-10, MNIST, Fashion-MNIST, SVHN, Oxford-IIIT Pet, Oxford 102 Flower, and STL-10. Additionally, multiple resolutions were used for each dataset. <xref ref-type="table" rid="table-1">Table 1</xref> lists the number of images used in each subset of the dataset and the resolutions used for the exploration. Search <inline-formula id="ieqn-122"><mml:math id="mml-ieqn-122"><mml:mrow><mml:mtext mathvariant="bold">w</mml:mtext></mml:mrow></mml:math></inline-formula> indicates the number of images used to learn network weights during the search phase, while Search <inline-formula id="ieqn-123"><mml:math id="mml-ieqn-123"><mml:mi mathvariant="bold-italic">&#x03B1;</mml:mi></mml:math></inline-formula> denotes the number of images used to learn the architecture itself. As the memory capacity required for the search process varies depending on the resolution and number of images per dataset, we altered the resolution for each of the datasets used in the experiments. Moreover, we varied the seed and repeated the architecture search four times for each resolution, because the initial network values (seeds) might introduce bias. This approach enabled us to assess the reliability and stability of the experimental results. To accurately evaluate the effects of the datasets and resolutions on the architectures, we maintained constant hyperparameters for each dataset and resolution. We determined the hyperparameters by referring to the image resolutions and the original PC-DARTS paper. <xref ref-type="table" rid="table-2">Table 2</xref> lists the learning rate and architecture learning rates applied during the architecture search.</p>
<table-wrap id="table-2">
<label>Table 2</label>
<caption>
<title>Preliminary experiment: hyperparameters for each dataset during architecture search</title>
</caption>
<table>
<colgroup>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
<col align="center"/>
</colgroup>
<thead>
<tr>
<th align="center">Dataset</th>
<th align="center">Resolution</th>
<th align="center">Search learning rate</th>
<th align="center">Search architecture learning rate</th>
<th align="center">Train learning rate</th>
</tr>
</thead>
<tbody>
<tr>
<td>CIFAR-10, MNIST, Fashion, SVHN</td>
<td>28, 32, 64</td>
<td>0.1</td>
<td><inline-formula id="ieqn-124"><mml:math id="mml-ieqn-124"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>0.1</td>
</tr>
<tr>
<td>CIFAR-10, MNIST, Fashion, SVHN</td>
<td>128</td>
<td>0.1</td>
<td><inline-formula id="ieqn-125"><mml:math id="mml-ieqn-125"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>0.25</td>
</tr>
<tr>
<td>Flower</td>
<td>32, 64, 128, 224</td>
<td>0.5</td>
<td><inline-formula id="ieqn-126"><mml:math id="mml-ieqn-126"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>0.01</td>
</tr>
<tr>
<td>Pet</td>
<td>32, 64, 128, 224</td>
<td>0.1</td>
<td><inline-formula id="ieqn-127"><mml:math id="mml-ieqn-127"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>0.025</td>
</tr>
<tr>
<td>STL-10</td>
<td>32, 64, 128, 224</td>
<td>0.1</td>
<td><inline-formula id="ieqn-128"><mml:math id="mml-ieqn-128"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula></td>
<td>0.01</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Results and Discussion</title>
<p>In PC-DARTS, the term &#x201C;search&#x201D; refers to exploring both network connectivity patterns and types of operations. We hypothesized that different operations would be selected depending on the dataset and resolution. We counted the number of times each operation was selected for the resulting architectures. Comparing these counts across resolutions and datasets enables a more comprehensive characterization of the architectures. We summed the operations in both the Normal and the Reduction cells and then took the average over four trials.</p>
<p><xref ref-type="fig" rid="fig-3">Fig. 3</xref> shows a stacked bar chart of the operations comprising the obtained architectures. from PC-DARTS, categorized by dataset and resolution. In CIFAR-10, MNIST, Fashion-MNIST, and SVHN, which underwent upsampling, we observed that as the resolution increased, separable convolutions decreased, and the dilated convolutions increased. In addition, within each convolution type, the share of <inline-formula id="ieqn-129"><mml:math id="mml-ieqn-129"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>3</mml:mn></mml:math></inline-formula> filters decreased, whereas <inline-formula id="ieqn-130"><mml:math id="mml-ieqn-130"><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn></mml:math></inline-formula> filters became more prevalent. Separable convolutions were more efficient in their operations, whereas dilated convolutions were designed to cover a larger receptive field. During upsampling, the image size increased while retaining the same overall information. Consequently, using narrow convolutions yielded fewer extracted features. Therefore, under upsampling conditions, dilated convolutions and <inline-formula id="ieqn-131"><mml:math id="mml-ieqn-131"><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn></mml:math></inline-formula> filters tended to be selected more often because they cover a wider area. However, in Flower, Pet, and STL-10 (where downsampling was predominant), changes in resolution did not significantly affect the chosen operations. These findings suggest that the total amount of information in the entire image has a limited effect on the number of operations required. Instead, the density of image information seems to influence the spatial range of the extracted features.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>Number of operations for each architecture obtained from PC-DARTS</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-3.tif"/>
</fig>
<p>The ratio of the frequently selected operations varied depending on the dataset. In the MNIST and Fashion-MNIST datasets, a <inline-formula id="ieqn-132"><mml:math id="mml-ieqn-132"><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn></mml:math></inline-formula> dilated convolution was frequently used. A possible reason for this is that both datasets contained images from a single channel. Compared with datasets containing three channels, those with fewer channels provided less information in the channel dimension, leading to the selection of a <inline-formula id="ieqn-133"><mml:math id="mml-ieqn-133"><mml:mn>5</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>5</mml:mn></mml:math></inline-formula> dilated convolution, which processes a broader spatial area.</p>
<p><xref ref-type="fig" rid="fig-4">Fig. 4</xref> illustrates the latent features of each architecture derived from the NAVIGATOR. The axes represent the coordinates in the 2D PCA plot. The latent features of architectures from the same datasets (same color) tended to cluster together. For some dataset types, even different datasets distribute their latent features in adjacent regions. Moreover, latent features of architectures from CIFAR10 and STL-10 lie close to each other. One possible explanation is that CIFAR-10 (airplanes, birds, cars, cats, deer, dogs, horses, frogs, ships, and trucks) and STL-10 (airplanes, birds, cars, cats, bears, dogs, horses, monkeys, ships, truck) share eight out of ten labels. They are thus highly similar datasets, differing only in &#x201C;deer/frog&#x201D; versus &#x201C;bear/monkey.&#x201D; This similarity likely results in similar architectures. Additionally, architectures from MNIST and Fashion-MNIST clustered closely in the latent space. Both datasets have only one channel, which probably causes them to converge toward similar architectures. The flower lies in a distinct region apart from the other datasets, indicating that its architectures also diverged significantly from the others, consistent with the operation count analysis.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>NAVIGATOR results with latent features of the architecture color-coded by dataset (legend: dataset-resolution)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-4.tif"/>
</fig>
<p>Our preliminary results confirmed that the optimal architecture varies depending on the dataset&#x2014; particularly when the datasets share similar labels or have the same number of channels. Accordingly, in the navigator dataset encoder of NAVIGATOR, we define dataset features that can capture these properties.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Experiments on Dimensionality Reduction</title>
<p>To determine the latent space for visualization, we first applied several representative dimensionality reduction methods&#x2013;PCA, t-SNE [<xref ref-type="bibr" rid="ref-64">64</xref>], UMAP [<xref ref-type="bibr" rid="ref-65">65</xref>], Isomap [<xref ref-type="bibr" rid="ref-66">66</xref>], MDS [<xref ref-type="bibr" rid="ref-67">67</xref>], and LLE [<xref ref-type="bibr" rid="ref-68">68</xref>]. We projected training data into two dimensions and represented the results in <xref ref-type="fig" rid="fig-5">Fig. 5</xref>. Note that the latent space used in this experiment was derived from the results in the following section, and the current findings were used purely to compare the different dimensionality reduction methods. In the figure, red and green points represent samples from different classes. We observed similar but slightly varying patterns of dispersion and similarity among the different methods.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Comparison of dimensionality reduction results from PCA, t-SNE, UMAP, Isomap, MDS, and LLE</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-5.tif"/>
</fig>
<p>Based on this comparison, we used PCA in this study. The main reasons for this are as follows: (1) PCA constructs the latent space through linear transformations, making it conceptually compatible with the VGAE latent representation and highly interpretable; (2) compared to other methods, PCA facilitates a clearer visual understanding of inter-class distribution patterns.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Experiments</title>
<p>This study evaluated the effectiveness of the proposed method through experiments that use multiple, distinct datasets. To determine the optimal architecture for each dataset, we adopted PC-DARTS as the NAS component of NAVIGATOR. PC-DARTS delivers high accuracy quickly, making it an ideal choice for our experiments, which require testing of multiple architectures. Moreover, the PC-DARTS search spaces encompassed approximately <inline-formula id="ieqn-134"><mml:math id="mml-ieqn-134"><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mn>25</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula> possible architectures. Its efficiency and large scale make it ideal under our experimental conditions. In each trial, we used a single NVIDIA Tesla V100 SXM2 16 GB GPU (four such GPUs were employed in parallel for ImageNet).</p>
<sec id="s5_1">
<label>5.1</label>
<title>Experiment I: Visualizing the Output from Latent Features</title>
<p>The objective of Experiment I was to verify whether architectures with similar characteristics can be generated from the latent feature space extracted by the NAVIGATOR. Next, we describe the results obtained by adding noise around the specified feature vectors to observe the shapes of model architectures distributed nearby. This experiment used two datasets: CIFAR-10 and MNIST. Based on the insight that latent features encompass dataset-specific properties, we used the latent features extracted by NAVIGATOR from an architecture optimized for CIFAR-10 and then fine-tuned them. This approach demonstrated that various architectures similar to the original output graph can be generated. In addition, learning an architecture that connects two types of cells rather than the entire graph can extract more detailed features and facilitate easier visualization of the graph.</p>
<p><xref ref-type="fig" rid="fig-6">Fig. 6</xref> shows a graph of the input architecture and a graph generated from the latent features. The input nodes are shown in blue, the middle nodes in orange, and the output nodes in green. The number of edges matched that of the input graph.</p>
<fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>Graphs generated from input graphs and latent features</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-6.tif"/>
</fig>
<p>First, we observed that the original graph (a) and reconstruction graph (b) are the same. This finding confirms that the NAVIGATOR encoder and decoder can reconstruct architecture with high fidelity. Next, when examining the changes in the small perturbation parameter <inline-formula id="ieqn-135"><mml:math id="mml-ieqn-135"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula>, we find that at <inline-formula id="ieqn-136"><mml:math id="mml-ieqn-136"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.05</mml:mn></mml:math></inline-formula> in (d), two edges differ from the original. However, the overall architecture remains significantly similar to the original graph (a) up through (h), also at <inline-formula id="ieqn-137"><mml:math id="mml-ieqn-137"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.05</mml:mn></mml:math></inline-formula>.</p>
<p>When generating new latent features, we follow the Generating Model algorithm described in <xref ref-type="sec" rid="s3_6">Section 3.6</xref>, which first normalizes the features into the range [0, 1].</p>
<p>Moreover, the direction vector multiplied by <inline-formula id="ieqn-138"><mml:math id="mml-ieqn-138"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> is fixed at unit length; therefore, a small perturbation of <inline-formula id="ieqn-139"><mml:math id="mml-ieqn-139"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.05</mml:mn></mml:math></inline-formula> results in an approximately 5% shift from the original values in the latent feature space. This small change explains why the generated architecture in <xref ref-type="fig" rid="fig-6">Fig. 6</xref> remains almost identical to the original architecture.</p>
<p>However, for <inline-formula id="ieqn-140"><mml:math id="mml-ieqn-140"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula> and higher, the overall architecture gradually changes while retaining its fundamental features. Notably, the labels for certain edges (represented by colors) differ, indicating that the edge features shifted. Basically, increasing <inline-formula id="ieqn-141"><mml:math id="mml-ieqn-141"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> enables the free generation of new architectures with characteristics that diverge from the original graph.</p>
<p>Furthermore, in the proposed method, certain constraints are imposed during NAS-based architecture generation. For instance, the number of edges that could connect to each middle node (e.g., nodes 2&#x2013;5 and 9&#x2013;12) was fixed at two. Additionally, we merged normal and reduced cells per plot, respectively. This setup includes certain mandatory fixed edges (for example, from Node 1 to Node 7 or output edges) while restricting unconstrained connections within the cells. We introduced these rules to replicate the search space used in DARTS and PC-DARTS. For a fair comparison, we applied the same conditions as those used for the input architecture. Note that these constraints can be relaxed if the objective is to generate diverse architectures.</p>
<p><xref ref-type="fig" rid="fig-7">Fig. 7</xref> shows the color map of the architecture generated by the proposed method, where the vertical axis is the value of <inline-formula id="ieqn-142"><mml:math id="mml-ieqn-142"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> and the horizontal axis is the same <inline-formula id="ieqn-143"><mml:math id="mml-ieqn-143"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> value to generate direction vectors from five times uniform random numbers. The colors represent edge-connection operations sorted from the top in order of proximity to the input node from the left. Architectures of the model are relatively unchanged except for <inline-formula id="ieqn-144"><mml:math id="mml-ieqn-144"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 1.0, and <inline-formula id="ieqn-145"><mml:math id="mml-ieqn-145"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 0.0 remains unchanged (because the feature points did not move), and both the vertical and horizontal directions changed as the value increased. As shown in <xref ref-type="fig" rid="fig-7">Fig. 7</xref>, these parameters are linked to the architectural changes. When <inline-formula id="ieqn-146"><mml:math id="mml-ieqn-146"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 1.0, more purple operations (skip connections) were accepted, indicating that the architectures of the space away from the learned architectures were biased. Biased architectures are built because normalization and moving the latent features by as much as 1.0 will generate them outside the latent space.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>Color representation of operations for each architecture. The vertical axis is the value of <inline-formula id="ieqn-147"><mml:math id="mml-ieqn-147"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> and the horizontal axis is the same <inline-formula id="ieqn-148"><mml:math id="mml-ieqn-148"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> value to generate direction vectors from five times uniform random numbers. Edge connections are not considered. <inline-formula id="ieqn-149"><mml:math id="mml-ieqn-149"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 0 is the original and reconstructed architecture. None: red, max pooling 3 <inline-formula id="ieqn-150"><mml:math id="mml-ieqn-150"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3: green, average pooling 3 <inline-formula id="ieqn-151"><mml:math id="mml-ieqn-151"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3: blue, skip connection: purple, separable convolution 3 <inline-formula id="ieqn-152"><mml:math id="mml-ieqn-152"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3: orange, separable convolution 5 <inline-formula id="ieqn-153"><mml:math id="mml-ieqn-153"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 5: brown, dilated convolution 3 <inline-formula id="ieqn-154"><mml:math id="mml-ieqn-154"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 3: pink, dilated convolution 5 <inline-formula id="ieqn-155"><mml:math id="mml-ieqn-155"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 5: gray (same as edge colors in <xref ref-type="fig" rid="fig-6">Fig. 6</xref>)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-7.tif"/>
</fig>
</sec>
<sec id="s5_2">
<label>5.2</label>
<title>Experiment II: Experimental Settings for the Latent Feature Visualization of Architecture Searched in CIFAR-10 and MNIST</title>
<p>In Experiment II, we aimed to obtain architectures optimized by NAS for two datasets with different properties and then visualize the latent features of the network architectures adapted to each dataset. In NAVIGATOR, we hypothesized that the model inputs to VGAE would be distributed in different regions of the latent space, depending on the dataset. We conducted experiments to verify this hypothesis. In the experiment, we used two datasets: CIFAR-10 and MNIST. Each architecture includes both connection patterns and types of edges, whereas the number of nodes remained fixed at. Using five different seed values, we obtained ten architectures. During the search phase, 50 epochs were used for training. The learning rate was decreased from 0.1 to 0 according to a cosine schedule, the architecture parameter learning rate was set to 0.0003, and random cropping and horizontal flipping were applied for data augmentation. For the VGAE configuration, 500,000 epochs were used. The learning rate decreased from 0.001 to 0.00025 following a cosine schedule, and Adam was used as the optimizer. Dimensions of latent features were set to four based on a seed value of 2024. We also adjusted the loss function weights <inline-formula id="ieqn-156"><mml:math id="mml-ieqn-156"><mml:msub><mml:mi>W</mml:mi><mml:mi>P</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-157"><mml:math id="mml-ieqn-157"><mml:msub><mml:mi>W</mml:mi><mml:mi>N</mml:mi></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-158"><mml:math id="mml-ieqn-158"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>E</mml:mi><mml:mi>d</mml:mi><mml:mi>g</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <inline-formula id="ieqn-159"><mml:math id="mml-ieqn-159"><mml:msub><mml:mi>W</mml:mi><mml:mrow><mml:mi>K</mml:mi><mml:mi>L</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> to 2.0, 1.0, 0.5, and 1.0, respectively. These hyperparameters were chosen by referring to original papers on NAS methods and VGAE.</p>
</sec>
<sec id="s5_3">
<label>5.3</label>
<title>Experiment II: Results and Discussion</title>
<p><xref ref-type="fig" rid="fig-8">Fig. 8</xref> shows the visualization results of the latent features obtained using the proposed method. This figure illustrates dimensionality reduction using PCA. Each axis corresponds to a coordinate derived by compressing high-dimensional data. Owing to the differences in the image size and dataset characteristics of CIFAR-10 (RGB) and MNIST (grayscale), the NAS procedure is likely to be identified as optimized architectures to capture the distinctive features of each dataset. Moreover, CIFAR-10 has an input size of 32 <inline-formula id="ieqn-160"><mml:math id="mml-ieqn-160"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 32, whereas MNIST is 28 <inline-formula id="ieqn-161"><mml:math id="mml-ieqn-161"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 28, leading to different optimal latent features for each dataset. Consequently, distinct trends were observed between the two.</p>
<fig id="fig-8">
<label>Figure 8</label>
<caption>
<title>Visualization of latent features acquired using the proposed method. The axes represent the coordinate axes on a two-dimensional PCA plot</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-8.tif"/>
</fig>
<p>To gain a deeper understanding of how the latent space preserves the graph-structural features of neural architectures, we conducted a correlation analysis between inter-architecture distance in the latent space and the distances of their structural features. Specifically, we used the following representative structural features: (1) the degree distribution of node connections, (2) the distribution of operation types, and (3) motif patterns, defined as the frequency of three-node subgraphs. For each pair of architectures, we computed the distance between these features using the Kullback&#x2013;Leibler divergence. Then we assessed the relationship between the structural and the latent-space distance by calculating the Spearman correlation coefficient. The results are presented in <xref ref-type="table" rid="table-3">Table 3</xref>. Among the examined structural features, the distribution of operation types shows the strongest correlation with latent-space distances, while the degree distribution indicates a moderate correlation. Although the motif pattern correlation is comparatively weaker, it remains a significantly positive correlation, suggesting that certain structural patterns are effectively captured. These findings suggest that the proposed method successfully preserves the topological properties of neural architectures within the latent space. In other words, the method constructs a latent space that maintains consistency in both semantic meaning and structural aspects.</p>
<table-wrap id="table-3">
<label>Table 3</label>
<caption>
<title>Correlation between latent-space distances and structural feature distances</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Structual feature</th>
<th>Spearman&#x2019;s <inline-formula id="ieqn-162"><mml:math id="mml-ieqn-162"><mml:mi mathvariant="bold-italic">&#x03C1;</mml:mi></mml:math></inline-formula></th>
<th><inline-formula id="ieqn-163"><mml:math id="mml-ieqn-163"><mml:mi mathvariant="bold-italic">p</mml:mi></mml:math></inline-formula>-value</th>
</tr>
</thead>
<tbody>
<tr>
<td>Node degree distribution</td>
<td>0.372</td>
<td><inline-formula id="ieqn-164"><mml:math id="mml-ieqn-164"><mml:mo>&#x003C;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mn>0.001</mml:mn></mml:math></inline-formula></td>
</tr>
<tr>
<td>Operation type distribution</td>
<td>0.500</td>
<td><inline-formula id="ieqn-165"><mml:math id="mml-ieqn-165"><mml:mo>&#x003C;</mml:mo><mml:mspace width="negativethinmathspace" /><mml:mn>0.001</mml:mn></mml:math></inline-formula></td>
</tr>
<tr>
<td>Motif patterns</td>
<td>0.255</td>
<td>0.011</td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Overall, training the VGAE in NAVIGATOR (for ten architectures over 500,000 epochs) required 14.52 h. Meanwhile, the NAS phase using standard PC-DARTS required approximately 2.4 h of training time per architecture. Hence, by prebuilding a large latent space optimized for multiple datasets, we demonstrate that the proposed method can generate architectures for new datasets in a short amount of time. This highlights the effectiveness of the proposed approach.</p>
</sec>
<sec id="s5_4">
<label>5.4</label>
<title>Experiment III: Experimental Settings for Performance Test of Generated Architecture in NAVIGATOR</title>
<p>In Experiment III, we compared the architectures generated by NAVIGATOR with those generated randomly on the USPS and ImageNet datasets. We aimed to evaluate whether the proposed method can produce promising architectures and whether it offers performance improvements over random generations. The newly introduced USPS (United States Postal Service) dataset [<xref ref-type="bibr" rid="ref-69">69</xref>] contains 16 <inline-formula id="ieqn-166"><mml:math id="mml-ieqn-166"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 16 grayscale images of handwritten digits from 0 to 9. It is frequently used as a domain adaptation example based on MNIST [<xref ref-type="bibr" rid="ref-70">70</xref>&#x2013;<xref ref-type="bibr" rid="ref-72">72</xref>]. Hence, we investigated whether an architecture optimized for a similar dataset can achieve comparable performance to the USPS. We retrained and evaluated four types of architectures for their accuracy. The first architecture is generated from latent features around MNIST (<inline-formula id="ieqn-167"><mml:math id="mml-ieqn-167"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>). The second architecture is generated from latent features around CIFAR-10 (<inline-formula id="ieqn-168"><mml:math id="mml-ieqn-168"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>). The third and fourth architectures are generated from the MNIST (<inline-formula id="ieqn-169"><mml:math id="mml-ieqn-169"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0</mml:mn></mml:math></inline-formula>) and random latent features, respectively. Based on the results of Experiment I, we set <inline-formula id="ieqn-170"><mml:math id="mml-ieqn-170"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula> as the parameter for building latent features around the original. While it retains the characteristics of the <inline-formula id="ieqn-171"><mml:math id="mml-ieqn-171"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0</mml:mn></mml:math></inline-formula> architecture, it also produces partially different architectures. For both MNIST and CIFAR-10 datasets, five sets of latent features exist. We generated the directional vectors four times (using random values) to construct each model. In total, we evaluated 50 architectures across the four types. To retrain the USPS, we set the number of training epochs to 600 with a batch size of 128. The initial learning rate was 0.025 (reduced to zero following a cosine schedule), momentum was 0.9, weight decay was <inline-formula id="ieqn-172"><mml:math id="mml-ieqn-172"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>4</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>, drop path probability was 0.3, and we employed cutout.</p>
<p>Evaluating NAVIGATOR&#x2019;s effectiveness of NAVIGATOR for datasets with larger image sizes is crucial, given that the USPS images are only 16 <inline-formula id="ieqn-173"><mml:math id="mml-ieqn-173"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 16. Therefore, additional experiments were conducted using the ImageNet dataset in Experiment III as well. ImageNet is a widely used dataset for image classification [<xref ref-type="bibr" rid="ref-73">73</xref>]. Among the underlying PC-DARTS approaches, the models discovered on CIFAR-10 were transferred to ImageNet. In Experiment III, 20 architectures generated from latent features around CIFAR-10 were evaluated on ImageNet. As the ImageNet images were originally large, they were cropped to 224 <inline-formula id="ieqn-174"><mml:math id="mml-ieqn-174"><mml:mo>&#x00D7;</mml:mo></mml:math></inline-formula> 224 pixels. We also used RandomHorizontalFlip (for random left-right flips) and ColorJitter (for random changes in brightness and contrast) for data augmentation. Computations were parallelized across four GPUs. For the ImageNet retraining, we used 250 epochs, a batch size of 768, and an initial learning rate of 0.5 (reduced to zero using a linear schedule). Momentum was set to 0.9, and weight decay was <inline-formula id="ieqn-175"><mml:math id="mml-ieqn-175"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:msup><mml:mn>10</mml:mn><mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>5</mml:mn></mml:mrow></mml:msup></mml:math></inline-formula>. In addition, label smoothing and a five-epoch warm-up phase were employed. For simplicity, the auxiliary loss was not applied. We selected the hyperparameters for retraining on ImageNet and USPS by referring to the original PC-DARTS study. <xref ref-type="fig" rid="fig-9">Fig. 9</xref> presents examples of architectures (MNIST Model and CIFAR-10 Model) generated by the proposed method and used for evaluation in Experiment III.</p>
<fig id="fig-9">
<label>Figure 9</label>
<caption>
<title>Example of an architecture generated by NAVIGATOR. The upper figures are MNIST models and the lower figures are CIFAR-10 models</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-9.tif"/>
</fig>
</sec>
<sec id="s5_5">
<label>5.5</label>
<title>Experiment III: Results and Discussion</title>
<p><xref ref-type="table" rid="table-4">Table 4</xref> presents the results of each NAS method. From <xref ref-type="table" rid="table-4">Table 4</xref>, we can see that the MNIST Model (<inline-formula id="ieqn-176"><mml:math id="mml-ieqn-176"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>), which reached a best test accuracy (Max) of 98.11%, achieved the highest accuracy. Basically, the architecture constructed by referring to the latent features optimized on a similar dataset produced the best results. Furthermore, the MNIST Model achieved a higher performance than both the random and CIFAR-10 models. Even within the same MNIST Model, the variant with <inline-formula id="ieqn-177"><mml:math id="mml-ieqn-177"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula> outperformed the variant with <inline-formula id="ieqn-178"><mml:math id="mml-ieqn-178"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0</mml:mn></mml:math></inline-formula>. This finding demonstrates the usefulness of our perturbation-based method. When <inline-formula id="ieqn-179"><mml:math id="mml-ieqn-179"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.0</mml:mn></mml:math></inline-formula>, the architecture was originally optimized for MNIST; however, despite its similarity to USPS, USPS retains characteristics, making the perturbed approach beneficial. As the same technique can be applied to new datasets, NAS does not require to be conducted from scratch. This leads to a significant reduction in the search time. Moreover, in the ImageNet experiment results shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, we found that the NAVIGATOR approach (CIFAR-10 Model, <inline-formula id="ieqn-180"><mml:math id="mml-ieqn-180"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>) achieved the highest accuracy, recording a Best Test Acc. (Max) of 75.43% in Top-1 and 92.60% in Top-5. Moreover, in the ImageNet experiment results shown in <xref ref-type="fig" rid="fig-10">Fig. 10</xref>, we found that the NAVIGATOR approach (CIFAR-10 Model, <inline-formula id="ieqn-181"><mml:math id="mml-ieqn-181"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>) achieved the highest accuracy, recording a Best Test Acc.(Max) of 75.43% for Top-1 and 92.60% in Top-5. In a Mobile Setting where FLOPs were below 600 M, PC-DARTS achieved the highest accuracy; however, the architecture generated by NAVIGATOR remains competitive at a similar performance level. Hence, our proposed method can still produce a set of architectures with rival approaches even under relatively low-FLOPs conditions. Moreover, achieving high performance while drastically reducing the search time represents a major advantage of our method. These experimental results clearly demonstrate that the proposed method remains effective on datasets with large image sizes and large-scale architectures constructed via Cell transfer.</p>
<table-wrap id="table-4">
<label>Table 4</label>
<caption>
<title>Results of Experiment III using the USPS dataset</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Architecture</th>
<th>Model Size</th>
<th>Best Test Acc.</th>
<th>Search Time</th>
</tr>
<tr>
<th></th>
<th>(Mean <inline-formula id="ieqn-182"><mml:math id="mml-ieqn-182"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> SD) (M)</th>
<th>(Max <inline-formula id="ieqn-183"><mml:math id="mml-ieqn-183"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> SD) (%)</th>
<th>(h)</th>
</tr>
</thead>
<tbody>
<tr>
<td>CENAS [<xref ref-type="bibr" rid="ref-46">46</xref>]</td>
<td>0.57</td>
<td>89.94</td>
<td>7.4</td>
</tr>
<tr>
<td>ASED [<xref ref-type="bibr" rid="ref-74">74</xref>]</td>
<td>&#x2013;</td>
<td>97.75</td>
<td>153.6</td>
</tr>
<tr>
<td>NASDA [<xref ref-type="bibr" rid="ref-75">75</xref>]</td>
<td>2.70</td>
<td>98.00</td>
<td>7.2</td>
</tr>
<tr>
<td>NAVIGATOR (Random model)</td>
<td>2.15 <inline-formula id="ieqn-184"><mml:math id="mml-ieqn-184"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.28</td>
<td>97.96 <inline-formula id="ieqn-185"><mml:math id="mml-ieqn-185"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.10</td>
<td>0</td>
</tr>
<tr>
<td>NAVIGATOR (CIFAR-10 model, <inline-formula id="ieqn-186"><mml:math id="mml-ieqn-186"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 0.1)</td>
<td>3.32 <inline-formula id="ieqn-187"><mml:math id="mml-ieqn-187"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.63</td>
<td>98.01 <inline-formula id="ieqn-188"><mml:math id="mml-ieqn-188"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.11</td>
<td>0</td>
</tr>
<tr>
<td>NAVIGATOR (MNIST model, <inline-formula id="ieqn-189"><mml:math id="mml-ieqn-189"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 0.0)</td>
<td>3.37 <inline-formula id="ieqn-190"><mml:math id="mml-ieqn-190"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.60</td>
<td>98.01 <inline-formula id="ieqn-191"><mml:math id="mml-ieqn-191"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.09</td>
<td>0</td>
</tr>
<tr>
<td>NAVIGATOR (MNIST model, <inline-formula id="ieqn-192"><mml:math id="mml-ieqn-192"><mml:mi>&#x03B3;</mml:mi></mml:math></inline-formula> &#x003D; 0.1)</td>
<td><bold>3.16 <inline-formula id="ieqn-193"><mml:math id="mml-ieqn-193"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.66</bold></td>
<td><bold>98.11 <inline-formula id="ieqn-194"><mml:math id="mml-ieqn-194"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.13</bold></td>
<td><bold>0</bold></td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-10">
<label>Figure 10</label>
<caption>
<title>Results of Experiment III using the ImageNet dataset (left: top 1, right: top 5), dashed line: 600M (mobile setting)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-10.tif"/>
</fig>
</sec>
<sec id="s5_6">
<label>5.6</label>
<title>Experiment IV: Performance Test of Un-Architecture Information Encoder</title>
<p>In Experiment IV, we experimented using an extended version of NAVIGATOR that can incorporate Un-architectural information into latent features. Subsequently, we evaluated their effectiveness. We used CLIP for dataset features and ResNet18 for task features. Details of the methodology and the model generation procedure are described in <xref ref-type="sec" rid="s3_5">Sections 3.5</xref> and <xref ref-type="sec" rid="s3_6">3.6</xref>. For our experimental setup, we built the latent feature space using CIFAR-10, and MNIST (as in Experiments I&#x2013;III) and tested it on the USPS dataset, which serves as a previously unseen problem.</p>
<p><xref ref-type="fig" rid="fig-11">Fig. 11</xref> visualizes the latent feature spaces obtained in four scenarios: (1) architecture features alone, (2) additional dataset features, (3) additional task features, and (4) both datasets and task features. This figure shows the results of dimensionality reduction using PCA, where each axis corresponds to a coordinate representing the compressed multidimensional data. From these results, we observed a particularly pronounced separation when the dataset features were included. However, when task features were included, significant clustering did not emerge, suggesting that the influence or differences introduced by task features are relatively small.</p>
<fig id="fig-11">
<label>Figure 11</label>
<caption>
<title>Visualization of latent features acquired using the proposed method. The axes represent the coordinate axes on a two-dimensional PCA plot</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="CMC_64969-fig-11.tif"/>
</fig>
<p><xref ref-type="table" rid="table-5">Table 5</xref> presents the model sizes and test accuracies achieved by each architecture on the USPS dataset. Normal refers to the MNIST Model (<inline-formula id="ieqn-195"><mml:math id="mml-ieqn-195"><mml:mi>&#x03B3;</mml:mi><mml:mo>=</mml:mo><mml:mn>0.1</mml:mn></mml:math></inline-formula>) which produced the highest evaluation results in Experiment III. This table confirms that NAVIGATOR, incorporating dataset features achieved the highest performance. We believe that the experimental results are highly dependent on the chosen datasets and tasks. In this case, the USPS dataset is grayscale and more similar to MNIST than to CIFAR-10. Consequently, the newly generated latent features likely placed a stronger emphasis on the MNIST, producing architectures similar to the MNIST Model, thus yielding higher performance. In contrast, architectures incorporating task features exhibited lower performance. We suspect this was because significantly distinguishing gradients were not discovered during the design of the task features. As demonstrated by our experiment, the construction of an appropriate encoder is crucial for performance improvement. Embedding unarchitectured information&#x2013;previously unaddressed by architecture feature extraction&#x2013;into the same space for exploration is a major advantage of NAVIGATOR.</p>
<table-wrap id="table-5">
<label>Table 5</label>
<caption>
<title>Results of experiment IV using the USPS, DF: dataset features, TF: task features</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>Architecture</th>
<th>Model size</th>
<th>Best test Acc.</th>
</tr>
<tr>
<th></th>
<th>(Mean <inline-formula id="ieqn-196"><mml:math id="mml-ieqn-196"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> SD) (M)</th>
<th>(Max <inline-formula id="ieqn-197"><mml:math id="mml-ieqn-197"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> SD) (%)</th>
</tr>
</thead>
<tbody>
<tr>
<td>NAVIGATOR (Normal)</td>
<td>3.16 <inline-formula id="ieqn-198"><mml:math id="mml-ieqn-198"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.66</td>
<td>98.11 <inline-formula id="ieqn-199"><mml:math id="mml-ieqn-199"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.13</td>
</tr>
<tr>
<td>NAVIGATOR (&#x002B; DF)</td>
<td>3.82 <inline-formula id="ieqn-200"><mml:math id="mml-ieqn-200"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.61</td>
<td>98.00 <inline-formula id="ieqn-201"><mml:math id="mml-ieqn-201"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.16</td>
</tr>
<tr>
<td>NAVIGATOR (&#x002B; TF)</td>
<td>3.55 <inline-formula id="ieqn-202"><mml:math id="mml-ieqn-202"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.63</td>
<td>97.90 <inline-formula id="ieqn-203"><mml:math id="mml-ieqn-203"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.11</td>
</tr>
<tr>
<td>NAVIGATOR (&#x002B; DF &#x002B; TF)</td>
<td>3.21 <inline-formula id="ieqn-204"><mml:math id="mml-ieqn-204"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.31</td>
<td>98.11 <inline-formula id="ieqn-205"><mml:math id="mml-ieqn-205"><mml:mo>&#x00B1;</mml:mo></mml:math></inline-formula> 0.15</td>
</tr>
</tbody>
</table>
</table-wrap>
</sec>
</sec>
<sec id="s6">
<label>6</label>
<title>Conclusion</title>
<p>This study proposed NAVIGATOR, a framework that integrates NAS and VGAE to significantly reduce search time. Additionally, by extracting features from datasets and tasks, we introduced a method to measure distances among different datasets or tasks. This advancement further refined the optimization process within the latent feature space. A future direction involves extending the proposed method to generate similar architectures in search spaces beyond those covered by currently trained models. This limitation arises from the fact that NAVIGATOR&#x2019;s performance may be constrained by the diversity of the initial set of architectures used during the training phase. In addition, the dataset and task feature embeddings employed in this study may be insufficient for capturing subtle distinctions between tasks, indicating the need for more accurate and efficient embedding methods in future work. Furthermore, extending NAVIGATOR to accommodate dynamically changing search spaces and user-specified constraints can evolve it practically and flexibly, enabling an approach for multi-objective searches. This approach promises to navigate toward advances in neural architecture search and foster deeper insights into the realm of network design and optimization.</p>
</sec>
</body>
<back>
<ack>
<p>Not applicable.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>This research was funded by the New Energy and Industrial Technology Development Organization (NEDO), grant number JPNP18002.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: Conceptualization, Kazuki Hemmi; methodology, Kazuki Hemmi, Yuki Tanigaki and Kaisei Hara; data curation, Kazuki Hemmi; writing&#x2014;original draft preparation, Kazuki Hemmi; writing&#x2014;review and editing, Kazuki Hemmi, Yuki Tanigaki, Kaisei Hara and Masaki Onishi. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>The datasets used in this study are available from the authors upon reasonable request. The experimental dataset was obtained from torchvision.datasets, and several models were sourced from Hugging Face.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Kipf</surname> <given-names>TN</given-names></string-name>, <string-name><surname>Welling</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Semi-supervised classification with graph convolutional networks</article-title>. <comment>arXiv:1609.02907</comment>. <year>2016</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Hemmi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Tanigaki</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Onishi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>NAVIGATOR-D3: neural architecture search using variational graph auto-encoder toward optimal architecture design for diverse datasets</article-title>. In: <conf-name>International Conference on Artificial Neural Networks</conf-name>; <comment>2024 Sep 17&#x2013;20</comment>; <publisher-loc>Lugano, Switzerland</publisher-loc>; <year>2024</year>. p. <fpage>292</fpage>&#x2013;<lpage>307</lpage>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zoph</surname> <given-names>B</given-names></string-name>, <string-name><surname>Le</surname> <given-names>QV</given-names></string-name></person-group>. <article-title>Neural architecture search with reinforcement learning</article-title>. In: <conf-name>ICLR 2017</conf-name>; <year>2017 Apr 24&#x2013;26</year>; <publisher-loc>Toulon, France</publisher-loc>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Jia</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>LB</given-names></string-name>, <string-name><surname>Ai</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Lai</surname> <given-names>M</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Large scale tissue histopathology image classification, segmentation, and visualization via deep convolutional activation features</article-title>. <source>BMC Bioinform</source>. <year>2017</year>;<volume>18</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>17</lpage>. doi:<pub-id pub-id-type="doi">10.1186/s12859-017-1685-x</pub-id>; <pub-id pub-id-type="pmid">28549410</pub-id></mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Simonyan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Vinyals</surname> <given-names>O</given-names></string-name>, <string-name><surname>Fernando</surname> <given-names>C</given-names></string-name>, <string-name><surname>Kavukcuoglu</surname> <given-names>K</given-names></string-name></person-group>. <article-title>Hierarchical representations for efficient architecture search</article-title>. <comment>arXiv:1711.00436</comment>. <year>2017</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Real</surname> <given-names>E</given-names></string-name>, <string-name><surname>Moore</surname> <given-names>S</given-names></string-name>, <string-name><surname>Selle</surname> <given-names>A</given-names></string-name>, <string-name><surname>Saxena</surname> <given-names>S</given-names></string-name>, <string-name><surname>Suematsu</surname> <given-names>YL</given-names></string-name>, <string-name><surname>Tan</surname> <given-names>J</given-names></string-name> <etal>et al</etal></person-group>. <article-title>Large-scale eolution of image classifiers</article-title>. In: <conf-name>Proceedings of the 34th International Conference on Machine Learning</conf-name>; <comment>2017 Aug 6&#x2013;11</comment>; <publisher-loc>Sydney, NSW, Australia</publisher-loc>; <year>2017</year>. p. <fpage>2902</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Simonyan</surname> <given-names>K</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Darts: differentiable architecture search</article-title>. <comment>arXiv:1806.09055</comment>. <year>2018</year>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Xie</surname> <given-names>L</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>X</given-names></string-name>, <string-name><surname>Qi</surname> <given-names>G</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Q</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>PC-DARTS: partial channel connections for memory-efficient architecture search</article-title>. In: <conf-name>2020 International Conference on Learning Representations</conf-name>; <year>2020 Apr 30</year>; <publisher-loc>Ethiopia</publisher-loc>: <publisher-name>Addis Ababa</publisher-name>. p. <fpage>1</fpage>&#x2013;<lpage>13</lpage>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wong</surname> <given-names>C</given-names></string-name>, <string-name><surname>Houlsby</surname> <given-names>N</given-names></string-name>, <string-name><surname>Lu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Gesmundo</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Transfer learning with neural AutoML</article-title>. In: <conf-name>NIPS&#x2019;18: Proceedings of the 32nd International Conference on Neural Information Processing Systems</conf-name>; <year>2018 Dec 3&#x2013;8</year>; <publisher-loc>Montreal, QC, Canada</publisher-loc>. p. <fpage>8366</fpage>&#x2013;<lpage>75</lpage>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Xue</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Han</surname> <given-names>X</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>Self-adaptive weight based on dual-attention for differentiable neural architecture search</article-title>. <source>IEEE Trans Indus Inform</source>. <year>2024</year>;<volume>20</volume>(<issue>4</issue>):<fpage>6394</fpage>&#x2013;<lpage>403</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tii.2023.3348843</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>M</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ling</surname> <given-names>H</given-names></string-name></person-group>. <article-title>Autoformer: searching transformers for visual recognition</article-title>. In: <conf-name>Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision</conf-name>; <year>2021 Oct 11&#x2013;17</year>; <publisher-loc>Montreal, QC, Canada</publisher-loc>. p. <fpage>12270</fpage>&#x2013;<lpage>80</lpage>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>M</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>K</given-names></string-name>, <string-name><surname>Ni</surname> <given-names>B</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>H</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Fu</surname> <given-names>J</given-names></string-name> <etal>et al</etal></person-group>. <article-title>Searching the search space of vision transformer</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2021</year>;<volume>34</volume>:<fpage>8714</fpage>&#x2013;<lpage>26</lpage>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nasir</surname> <given-names>MU</given-names></string-name>, <string-name><surname>Earle</surname> <given-names>S</given-names></string-name>, <string-name><surname>Togelius</surname> <given-names>J</given-names></string-name>, <string-name><surname>James</surname> <given-names>S</given-names></string-name>, <string-name><surname>Cleghorn</surname> <given-names>C</given-names></string-name></person-group>. <article-title>LLMatic: neural architecture search via large language models and quality diversity optimization</article-title>. In: <conf-name>Proceedings of the 2024 Genetic and Evolutionary Computation Conference</conf-name>; <year>2024 Jul 14&#x2013;18</year>; <publisher-loc>Melbourne, VIC, Australia</publisher-loc>. p. <fpage>1110</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Zheng</surname> <given-names>M</given-names></string-name>, <string-name><surname>Su</surname> <given-names>X</given-names></string-name>, <string-name><surname>You</surname> <given-names>S</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>F</given-names></string-name>, <string-name><surname>Qian</surname> <given-names>C</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>C</given-names></string-name> <etal>et al</etal></person-group>. <article-title>Can GPT-4 perform neural architecture search?</article-title> <comment> arXiv: 2304.10970</comment>. <year>2023</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Scarselli</surname> <given-names>F</given-names></string-name>, <string-name><surname>Gori</surname> <given-names>M</given-names></string-name>, <string-name><surname>Tsoi</surname> <given-names>AC</given-names></string-name>, <string-name><surname>Hagenbuchner</surname> <given-names>M</given-names></string-name>, <string-name><surname>Monfardini</surname> <given-names>G</given-names></string-name></person-group>. <article-title>The graph neural network model</article-title>. <source>IEEE Trans Neural Netw</source>. <year>2008</year>;<volume>20</volume>(<issue>1</issue>):<fpage>61</fpage>&#x2013;<lpage>80</lpage>. doi:<pub-id pub-id-type="doi">10.1109/tnn.2008.2005605</pub-id>; <pub-id pub-id-type="pmid">19068426</pub-id></mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Jiang</surname> <given-names>D</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Hsieh</surname> <given-names>CY</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>G</given-names></string-name>, <string-name><surname>Liao</surname> <given-names>B</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Z</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Could graph neural networks learn better molecular representation for drug discovery? A comparison study of descriptor-based and graph-based models</article-title>. <source>J Cheminform</source>. <year>2021</year>;<volume>13</volume>(<issue>1</issue>):<fpage>12</fpage>. doi:<pub-id pub-id-type="doi">10.21203/rs.3.rs-79416/v1</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Flannery</surname> <given-names>ST</given-names></string-name>, <string-name><surname>Kihara</surname> <given-names>D</given-names></string-name></person-group>. <article-title>Protein docking model evaluation by graph neural networks</article-title>. <source>Front Mol Biosci</source>. <year>2021</year>;<volume>8</volume>(<issue>Suppl 1</issue>):<fpage>647915</fpage>. doi:<pub-id pub-id-type="doi">10.1101/2020.12.30.424859</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Sanchez-Gonzalez</surname> <given-names>A</given-names></string-name>, <string-name><surname>Godwin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Pfaff</surname> <given-names>T</given-names></string-name>, <string-name><surname>Ying</surname> <given-names>R</given-names></string-name>, <string-name><surname>Leskovec</surname> <given-names>J</given-names></string-name>, <string-name><surname>Battaglia</surname> <given-names>PW</given-names></string-name></person-group>. <article-title>Learning to simulate complex physics with graph networks</article-title>. In: <conf-name>Proceedings of the 2020 International Conference on Machine Learnin</conf-name>; <year>2020 Jul 13&#x2013;18</year>; <publisher-loc>Online</publisher-loc>. p. <fpage>8459</fpage>&#x2013;<lpage>68</lpage>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shlomi</surname> <given-names>J</given-names></string-name>, <string-name><surname>Battaglia</surname> <given-names>P</given-names></string-name>, <string-name><surname>Vlimant</surname> <given-names>JR</given-names></string-name></person-group>. <article-title>Graph neural networks in particle physics</article-title>. <source>Mach Learn Sci Technol</source>. <year>2020</year>;<volume>2</volume>(<issue>2</issue>):<fpage>021001</fpage>. doi:<pub-id pub-id-type="doi">10.1088/2632-2153/abbf9a</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Velickovic</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cucurull</surname> <given-names>G</given-names></string-name>, <string-name><surname>Casanova</surname> <given-names>A</given-names></string-name>, <string-name><surname>Romero</surname> <given-names>A</given-names></string-name>, <string-name><surname>Li&#x00F2;</surname> <given-names>P</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Graph attention networks</article-title>. In: <conf-name>The 6th International Conference on Learning Representations</conf-name>; <year>2018 Apr 30&#x2013;May 3</year>; <publisher-loc>Vancouver, BC, Canada</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>12</lpage>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hamilton</surname> <given-names>W</given-names></string-name>, <string-name><surname>Ying</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Leskovec</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Inductive representation learning on large graphs</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2017</year>;<volume>30</volume>:<fpage>1025</fpage>&#x2013;<lpage>35</lpage>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Defferrard</surname> <given-names>M</given-names></string-name>, <string-name><surname>Bresson</surname> <given-names>X</given-names></string-name>, <string-name><surname>Vandergheynst</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Convolutional neural networks on graphs with fast localized spectral filtering</article-title>. <source>Adv Neural Inf Process Syst</source>. <year>2016</year>;<volume>29</volume>:<fpage>3844</fpage>&#x2013;<lpage>52</lpage>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Xu</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hu</surname> <given-names>W</given-names></string-name>, <string-name><surname>Leskovec</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jegelka</surname> <given-names>S</given-names></string-name></person-group>. <article-title>How powerful are graph neural networks?</article-title> <comment>arXiv:1810.00826</comment>. <year>2018</year>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Kipf</surname> <given-names>TN</given-names></string-name>, <string-name><surname>Welling</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Variational graph auto-encoders</article-title>. <comment>arXiv:1611.07308</comment>. <year>2016</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Kingma</surname> <given-names>DP</given-names></string-name>, <string-name><surname>Welling</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Auto-encoding variational bayes</article-title>. <comment>arXiv:1312.6114</comment>. <year>2013</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Cappart</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Ch&#x00E9;telat</surname> <given-names>D</given-names></string-name>, <string-name><surname>Khalil</surname> <given-names>EB</given-names></string-name>, <string-name><surname>Lodi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Morris</surname> <given-names>C</given-names></string-name>, <string-name><surname>Veli&#x010D;kovi&#x0107;</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Combinatorial optimization and reasoning with graph neural networks</article-title>. <source>J Mach Learn Res</source>. <year>2023</year>;<volume>24</volume>(<issue>130</issue>):<fpage>1</fpage>&#x2013;<lpage>61</lpage>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Almasan</surname> <given-names>P</given-names></string-name>, <string-name><surname>Su&#x00E1;rez-Varela</surname> <given-names>J</given-names></string-name>, <string-name><surname>Rusek</surname> <given-names>K</given-names></string-name>, <string-name><surname>Barlet-Ros</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cabellos-Aparicio</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Deep reinforcement learning meets graph neural networks: exploring a routing optimization use case</article-title>. <source>Comput Commun</source>. <year>2022</year>;<volume>196</volume>(<issue>4</issue>):<fpage>184</fpage>&#x2013;<lpage>94</lpage>. doi:<pub-id pub-id-type="doi">10.1016/j.comcom.2022.09.029</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Zoph</surname> <given-names>B</given-names></string-name>, <string-name><surname>Vasudevan</surname> <given-names>V</given-names></string-name>, <string-name><surname>Shlens</surname> <given-names>J</given-names></string-name>, <string-name><surname>Le</surname> <given-names>QV</given-names></string-name></person-group>. <article-title>Learning transferable architectures for scalable image recognition</article-title>. In: <conf-name>Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>; <year>2018 Jun 18&#x2013;23</year>; <publisher-loc>Salt Lake City, UT, USA</publisher-loc>. p. <fpage>8697</fpage>&#x2013;<lpage>710</lpage>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Mellor</surname> <given-names>J</given-names></string-name>, <string-name><surname>Turner</surname> <given-names>J</given-names></string-name>, <string-name><surname>Storkey</surname> <given-names>A</given-names></string-name>, <string-name><surname>Crowley</surname> <given-names>EJ</given-names></string-name></person-group>. <article-title>Neural architecture search without training</article-title>. In: <conf-name>2021 International Conference on Machine Learning</conf-name>; <year>2021 Jul 18&#x2013;24</year>; <publisher-loc>Online</publisher-loc>. p. <fpage>7588</fpage>&#x2013;<lpage>98</lpage><comment>.</comment></mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Krishnakumar</surname> <given-names>A</given-names></string-name>, <string-name><surname>White</surname> <given-names>C</given-names></string-name>, <string-name><surname>Zela</surname> <given-names>A</given-names></string-name>, <string-name><surname>Tu</surname> <given-names>R</given-names></string-name>, <string-name><surname>Safari</surname> <given-names>M</given-names></string-name>, <string-name><surname>Hutter</surname> <given-names>F</given-names></string-name></person-group>. <article-title>NAS-Bench-Suite-Zero: accelerating research on zero cost proxies</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2022</year>;<volume>35</volume>:<fpage>28037</fpage>&#x2013;<lpage>51</lpage>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Friede</surname> <given-names>D</given-names></string-name>, <string-name><surname>Lukasik</surname> <given-names>J</given-names></string-name>, <string-name><surname>Stuckenschmidt</surname> <given-names>H</given-names></string-name>, <string-name><surname>Keuper</surname> <given-names>M</given-names></string-name></person-group>. <article-title>A variational-sequential graph autoencoder for neural architecture performance prediction</article-title>. <comment>arXiv:1912.05317</comment>. <year>2019</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Yan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Ao</surname> <given-names>W</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Does unsupervised architecture representation learning help neural architecture search?</article-title>. In: <conf-name>NIPS&#x2019;20: 34th International Conference on Neural Information Processing Systems</conf-name>; <year>2020 Dec 6&#x2013;12</year>; <publisher-loc>Vancouver, BC, Canada</publisher-loc>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ning</surname> <given-names>X</given-names></string-name>, <string-name><surname>Zheng</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Zhao</surname> <given-names>T</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>H</given-names></string-name></person-group>. <article-title>A generic graph-based neural architecture encoding scheme for predictor-based NAS</article-title>. In: <conf-name>Proceedings of European Conference on Computer Vision</conf-name>; <year>2020 Aug 23&#x2013;28</year>; <publisher-loc>Glasgow, UK</publisher-loc>. p. <fpage>189</fpage>&#x2013;<lpage>204</lpage>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Lee</surname> <given-names>H</given-names></string-name>, <string-name><surname>Hyung</surname> <given-names>E</given-names></string-name>, <string-name><surname>Hwang</surname> <given-names>SJ</given-names></string-name></person-group>. <article-title>Rapid neural architecture search by learning to generate graphs from datasets</article-title>. <comment>arXiv:2107.00860</comment>. <year>2021</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chatzianastasis</surname> <given-names>M</given-names></string-name>, <string-name><surname>Dasoulas</surname> <given-names>G</given-names></string-name>, <string-name><surname>Siolas</surname> <given-names>G</given-names></string-name>, <string-name><surname>Vazirgiannis</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Graph-based neural architecture search with operation embeddings</article-title>. In: <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops</conf-name>; <year>2021 Oct 11&#x2013;17</year>; <publisher-loc>Montreal, BC, Canada</publisher-loc>. p. <fpage>393</fpage>&#x2013;<lpage>402</lpage>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Lukasik</surname> <given-names>J</given-names></string-name>, <string-name><surname>Friede</surname> <given-names>D</given-names></string-name>, <string-name><surname>Zela</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hutter</surname> <given-names>F</given-names></string-name>, <string-name><surname>Keuper</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Smooth variational graph embeddings for efficient neural architecture search</article-title>. In: <conf-name>2021 International Joint Conference on Neural Networks (IJCNN)</conf-name>; <year>2021 Jul 18&#x2013;22</year>; <publisher-loc>Shenzhen, China</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Suchop&#x00E1;rov&#x00E1;</surname> <given-names>G</given-names></string-name>, <string-name><surname>Neruda</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Graph embedding for neural architecture search with input-output information</article-title>. In: <conf-name>AutoML Conference Workshop Track</conf-name>; <year>2022</year>; <publisher-loc>Baltimore, MD, USA</publisher-loc>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wen</surname> <given-names>W</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Li</surname> <given-names>H</given-names></string-name>, <string-name><surname>Bender</surname> <given-names>G</given-names></string-name>, <string-name><surname>Kindermans</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Neural Predictor for Neural Architecture Search</article-title>. In: <conf-name>Proceedings of European Conference on Computer Vision</conf-name>; <year>2020 Aug 23&#x2013;28</year>; <publisher-loc>Glasgow, UK</publisher-loc>. p. <fpage>660</fpage>&#x2013;<lpage>76</lpage>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Chen</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Guo</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>Q</given-names></string-name>, <string-name><surname>Li</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zeng</surname> <given-names>W</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>Y</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Contrastive neural architecture search with neural architecture comparators</article-title>. In: <conf-name>Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition</conf-name>; <year>2021 Jun 20&#x2013;25</year>; <publisher-loc>Nashville, TN, USA</publisher-loc>. p. <fpage>9502</fpage>&#x2013;<lpage>11</lpage>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>White</surname> <given-names>C</given-names></string-name>, <string-name><surname>Neiswanger</surname> <given-names>W</given-names></string-name>, <string-name><surname>Savani</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>BANANAS: bayesian optimization with neural architectures for neural architecture search</article-title>. <source>Proc AAAI Conf Artif Intell</source>. <year>2021</year>;<volume>35</volume>(<issue>12</issue>):<fpage>10293</fpage>&#x2013;<lpage>301</lpage>. doi:<pub-id pub-id-type="doi">10.1609/aaai.v35i12.17233</pub-id>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Lukasik</surname> <given-names>J</given-names></string-name>, <string-name><surname>Jung</surname> <given-names>S</given-names></string-name>, <string-name><surname>Keuper</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Learning where to look-generative NAS is surprisingly efficient</article-title>. <comment>arXiv:2203.08734</comment>. <year>2022</year>.</mixed-citation></ref>
<ref id="ref-42"><label>[42]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Agiollo</surname> <given-names>A</given-names></string-name>, <string-name><surname>Omicini</surname> <given-names>A</given-names></string-name></person-group>. <article-title>GNN2GNN: graph neural networks to generate neural networks</article-title>. In: <conf-name>Proceedings of the Thirty-Eighth Conference on Uncertainty in Artificial Intelligence</conf-name>; <year>2022 Aug 1&#x2013;5</year>; <publisher-loc>Eindhoven, The Netherlands</publisher-loc>. p. <fpage>32</fpage>&#x2013;<lpage>42</lpage>.</mixed-citation></ref>
<ref id="ref-43"><label>[43]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Dudziak</surname> <given-names>L</given-names></string-name>, <string-name><surname>Chau</surname> <given-names>T</given-names></string-name>, <string-name><surname>Abdelfattah</surname> <given-names>MS</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>R</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>H</given-names></string-name>, <string-name><surname>Lane</surname> <given-names>ND</given-names></string-name></person-group>. <article-title>BRP&#x2013;NAS: prediction-based NAS using GCNs</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>10480</fpage>&#x2013;<lpage>90</lpage>.</mixed-citation></ref>
<ref id="ref-44"><label>[44]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Pi</surname> <given-names>R</given-names></string-name>, <string-name><surname>Xu</surname> <given-names>H</given-names></string-name>, <string-name><surname>Li</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Kwok</surname> <given-names>JT</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>T</given-names></string-name></person-group>. <article-title>Bridging the gap between sample-based and one-shot neural architecture search with bonas</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2020</year>;<volume>33</volume>:<fpage>1808</fpage>&#x2013;<lpage>19</lpage>.</mixed-citation></ref>
<ref id="ref-45"><label>[45]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Wistuba</surname> <given-names>M</given-names></string-name></person-group>. <article-title>XferNAS: transfer neural architecture search</article-title>. In: <conf-name>Machine Learning and Knowledge Discovery in Databases: European Conference</conf-name>; <year>2021 Sep 13&#x2013;17</year>; <publisher-loc>Bilbao, Spain</publisher-loc>. p. <fpage>247</fpage>&#x2013;<lpage>62</lpage>.</mixed-citation></ref>
<ref id="ref-46"><label>[46]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Singamsetti</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mahajan</surname> <given-names>A</given-names></string-name>, <string-name><surname>Guzdial</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Conceptual expansion neural architecture search (CENAS)</article-title>. <comment>arXiv:2110.03144</comment>. <year>2021</year>.</mixed-citation></ref>
<ref id="ref-47"><label>[47]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Lu</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Sreekumar</surname> <given-names>G</given-names></string-name>, <string-name><surname>Goodman</surname> <given-names>E</given-names></string-name>, <string-name><surname>Banzhaf</surname> <given-names>W</given-names></string-name>, <string-name><surname>Deb</surname> <given-names>K</given-names></string-name>, <string-name><surname>Boddeti</surname> <given-names>VN</given-names></string-name></person-group>. <article-title>Neural architecture transfer</article-title>. <comment>arXiv:2005.05859</comment>. <year>2020</year>.</mixed-citation></ref>
<ref id="ref-48"><label>[48]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Luo</surname> <given-names>R</given-names></string-name>, <string-name><surname>Tian</surname> <given-names>Fei</given-names></string-name>, <string-name><surname>Qin</surname> <given-names>Tao</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>E</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>T-Y</given-names></string-name></person-group>. <article-title>Neural architecture optimization</article-title>. <source>Adv Neural Inform Process Syst</source>. <year>2018</year>;<volume>31</volume>:<fpage>7816</fpage>&#x2013;<lpage>27</lpage>.</mixed-citation></ref>
<ref id="ref-49"><label>[49]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>J</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>J</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>W</given-names></string-name></person-group>. <article-title>Neural architecture optimization with graph VAE</article-title>. <comment>arXiv:2006.10310</comment>. <year>2020</year>.</mixed-citation></ref>
<ref id="ref-50"><label>[50]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Pearson</surname> <given-names>KLIII</given-names></string-name></person-group>. <article-title>On lines and planes of closest fit to systems of points in space</article-title>. <source>London Edinburgh Dublin Philosoph Magaz J Sci</source>. <year>1901</year>;<volume>2</volume>(<issue>11</issue>):<fpage>559</fpage>&#x2013;<lpage>72</lpage>. doi:<pub-id pub-id-type="doi">10.1080/14786440109462720</pub-id>.</mixed-citation></ref>
<ref id="ref-51"><label>[51]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Radford</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kim</surname> <given-names>JW</given-names></string-name>, <string-name><surname>Hallacy</surname> <given-names>C</given-names></string-name>, <string-name><surname>Ramesh</surname> <given-names>A</given-names></string-name>, <string-name><surname>Goh</surname> <given-names>G</given-names></string-name>, <string-name><surname>Agarwal</surname> <given-names>S</given-names></string-name> <etal>et al</etal></person-group>. <article-title>Learning transferable visual models from natural language supervision</article-title>. In: <conf-name>International Conference on Machine Learning</conf-name>; <year>2021 Jul 18&#x2013;24</year>; <publisher-loc>Online</publisher-loc>. p. <fpage>8748</fpage>&#x2013;<lpage>63</lpage>.</mixed-citation></ref>
<ref id="ref-52"><label>[52]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Watanabe</surname> <given-names>K</given-names></string-name>, <string-name><surname>Hemmi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Onishi</surname> <given-names>M</given-names></string-name></person-group>. <article-title>How neural architecture search optimize architecture with different task?</article-title> <source>Forum Data Eng Inform Manage</source>. <year>2025</year>.</mixed-citation></ref>
<ref id="ref-53"><label>[53]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>He</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zhang</surname> <given-names>X</given-names></string-name>, <string-name><surname>Ren</surname> <given-names>S</given-names></string-name>, <string-name><surname>Sun</surname> <given-names>J</given-names></string-name></person-group>. <article-title>Deep residual learning for image recognition</article-title>. In: <conf-name>Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</conf-name>; <year>2016 Jun 27&#x2013;30</year>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>. p. <fpage>770</fpage>&#x2013;<lpage>8</lpage>.</mixed-citation></ref>
<ref id="ref-54"><label>[54]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Achille</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lam</surname> <given-names>M</given-names></string-name>, <string-name><surname>Tewari</surname> <given-names>R</given-names></string-name>, <string-name><surname>Ravichandran</surname> <given-names>A</given-names></string-name>, <string-name><surname>Maji</surname> <given-names>S</given-names></string-name>, <string-name><surname>Fowlkes</surname> <given-names>CC</given-names></string-name> <etal>et al</etal></person-group>. <article-title>Task2Vec: task embedding for meta-learning</article-title>. In: <conf-name>Proceedings of the IEEE/CVF International Conference on Computer Vision</conf-name>; <year>2019 Oct 27&#x2013;Nov 2</year>; <publisher-loc>Seoul, Republic of Korea</publisher-loc>. p. <fpage>6430</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-55"><label>[55]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ilharco</surname> <given-names>G</given-names></string-name>, <string-name><surname>Ribeiro</surname> <given-names>MT</given-names></string-name>, <string-name><surname>Wortsman</surname> <given-names>M</given-names></string-name>, <string-name><surname>Schmidt</surname> <given-names>L</given-names></string-name>, <string-name><surname>Hajishirzi</surname> <given-names>H</given-names></string-name>, <string-name><surname>Farhadi</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Editing models with task arithmetic</article-title>. In: <conf-name>The Eleventh International Conference on Learning Representations</conf-name>; <year>2023 May 1&#x2013;5</year>. <publisher-loc>Kigali, Rwanda</publisher-loc>.</mixed-citation></ref>
<ref id="ref-56"><label>[56]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Sabottke</surname> <given-names>CF</given-names></string-name>, <string-name><surname>Spieler</surname> <given-names>BM</given-names></string-name></person-group>. <article-title>The effect of image resolution on deep learning in radiography</article-title>. <source>Radiol Artif Intell</source>. <year>2020</year>;<volume>2</volume>(<issue>1</issue>):<fpage>e190015</fpage>. doi:<pub-id pub-id-type="doi">10.1148/ryai.2019190015</pub-id>; <pub-id pub-id-type="pmid">33937810</pub-id></mixed-citation></ref>
<ref id="ref-57"><label>[57]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Krizhevsky</surname> <given-names>A</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Learning multiple layers of features from tiny images [master&#x2019;s thesis], Toronto, ON, Canada: Department of Computer Science, University of Toronto</article-title>; <year>2009</year>.</mixed-citation></ref>
<ref id="ref-58"><label>[58]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>LeCun</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Bottou</surname> <given-names>L</given-names></string-name>, <string-name><surname>Bengio</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Haffner</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Gradient-based learning applied to document recognition</article-title>. <source>Proc IEEE</source>. <year>1998</year>;<volume>86</volume>(<issue>11</issue>):<fpage>2278</fpage>&#x2013;<lpage>324</lpage>. doi:<pub-id pub-id-type="doi">10.1109/5.726791</pub-id>.</mixed-citation></ref>
<ref id="ref-59"><label>[59]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Xiao</surname> <given-names>H</given-names></string-name>, <string-name><surname>Rasul</surname> <given-names>K</given-names></string-name>, <string-name><surname>Vollgraf</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms</article-title>. <comment>arXiv:1708.07747. 2017</comment>.</mixed-citation></ref>
<ref id="ref-60"><label>[60]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Netzer</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Wang</surname> <given-names>T</given-names></string-name>, <string-name><surname>Coates</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bissacco</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wu</surname> <given-names>B</given-names></string-name>, <string-name><surname>Ng</surname> <given-names>AY</given-names></string-name></person-group>. <article-title>Reading digits in natural images with unsupervised feature learning</article-title>. In: <conf-name> NIPS Workshop on Deep Learning and Unsupervised Feature Learning 2011</conf-name>; <year>2011 Dec 16</year>; <publisher-loc>Granada, Spain</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1109/icdar.2011.95</pub-id>.</mixed-citation></ref>
<ref id="ref-61"><label>[61]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Parkhi</surname> <given-names>OM</given-names></string-name>, <string-name><surname>Vedaldi</surname> <given-names>A</given-names></string-name>, <string-name><surname>Zisserman</surname> <given-names>A</given-names></string-name>, <string-name><surname>Jawahar</surname> <given-names>C</given-names></string-name></person-group>. <article-title>Cats and dogs</article-title>. In: <conf-name>2012 IEEE Conference on Computer Vision And Pattern Recognition</conf-name>; <year>2012 Jun 16&#x2013;21</year>; <publisher-loc>Providence, RI, USA</publisher-loc>. p. <fpage>3498</fpage>&#x2013;<lpage>505</lpage>.</mixed-citation></ref>
<ref id="ref-62"><label>[62]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Nilsback</surname> <given-names>ME</given-names></string-name>, <string-name><surname>Zisserman</surname> <given-names>A</given-names></string-name></person-group>. <article-title>Automated flower classification over a large number of classes</article-title>. In: <conf-name>2008 Sixth Indian Conference on Computer Vision, Graphics &#x0026; Image Processing</conf-name>; <year>2008 Dec 16&#x2013;19</year>; <publisher-loc>Bhubaneswar, India</publisher-loc>. p. <fpage>722</fpage>&#x2013;<lpage>9</lpage>.</mixed-citation></ref>
<ref id="ref-63"><label>[63]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Coates</surname> <given-names>A</given-names></string-name>, <string-name><surname>Ng</surname> <given-names>A</given-names></string-name>, <string-name><surname>Lee</surname> <given-names>H</given-names></string-name></person-group>. <article-title>An analysis of single-layer networks in unsupervised feature learning</article-title>. In: <conf-name>Proceedings of the Fourteenth International Conference on Artificial Intelligence and Statistics</conf-name>; <year>2011 Apr 11&#x2013;13</year>; <publisher-loc>Fort Lauderdale, FL, USA</publisher-loc>. p. <fpage>215</fpage>&#x2013;<lpage>23</lpage>.</mixed-citation></ref>
<ref id="ref-64"><label>[64]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Van der Maaten</surname> <given-names>L</given-names></string-name>, <string-name><surname>Hinton</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Visualizing data using t-SNE</article-title>. <source>J Mach Learn Res</source>. <year>2008</year>;<volume>9</volume>(<issue>86</issue>):<fpage>2579</fpage>&#x2013;<lpage>605</lpage>.</mixed-citation></ref>
<ref id="ref-65"><label>[65]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>McInnes</surname> <given-names>L</given-names></string-name>, <string-name><surname>Healy</surname> <given-names>J</given-names></string-name>, <string-name><surname>Melville</surname> <given-names>J</given-names></string-name></person-group>. <article-title>UMAP: Uniform manifold approximation and projection for dimension reduction</article-title>. <comment>arXiv:1802.03426</comment>. <year>2018</year>.</mixed-citation></ref>
<ref id="ref-66"><label>[66]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Tenenbaum</surname> <given-names>JB</given-names></string-name>, <string-name><surname>Vd</surname> <given-names>Silva</given-names></string-name>, <string-name><surname>Langford</surname> <given-names>JC</given-names></string-name></person-group>. <article-title>A global geometric framework for nonlinear dimensionality reduction</article-title>. <source>science</source>. <year>2000</year>;<volume>290</volume>(<issue>5500</issue>):<fpage>2319</fpage>&#x2013;<lpage>23</lpage>. doi:<pub-id pub-id-type="doi">10.1126/science.290.5500.2319</pub-id>; <pub-id pub-id-type="pmid">11125149</pub-id></mixed-citation></ref>
<ref id="ref-67"><label>[67]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kruskal</surname> <given-names>JB</given-names></string-name></person-group>. <article-title>Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis</article-title>. <source>Psychometrika</source>. <year>1964</year>;<volume>29</volume>(<issue>1</issue>):<fpage>1</fpage>&#x2013;<lpage>27</lpage>. doi:<pub-id pub-id-type="doi">10.1007/bf02289565</pub-id>.</mixed-citation></ref>
<ref id="ref-68"><label>[68]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Roweis</surname> <given-names>ST</given-names></string-name>, <string-name><surname>Saul</surname> <given-names>LK</given-names></string-name></person-group>. <article-title>Nonlinear dimensionality reduction by locally linear embedding</article-title>. <source>Science</source>. <year>2000</year>;<volume>290</volume>(<issue>5500</issue>):<fpage>2323</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1126/science.290.5500.2323</pub-id>; <pub-id pub-id-type="pmid">11125150</pub-id></mixed-citation></ref>
<ref id="ref-69"><label>[69]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Hull</surname> <given-names>JJ</given-names></string-name></person-group>. <article-title>A database for handwritten text recognition research</article-title>. <source>IEEE Trans Pattern Anal Mach Intell</source>. <year>1994</year>;<volume>16</volume>(<issue>5</issue>):<fpage>550</fpage>&#x2013;<lpage>4</lpage>. doi:<pub-id pub-id-type="doi">10.1109/34.291440</pub-id>.</mixed-citation></ref>
<ref id="ref-70"><label>[70]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Schrod</surname> <given-names>S</given-names></string-name>, <string-name><surname>Lippl</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sch&#x00E4;fer</surname> <given-names>A</given-names></string-name>, <string-name><surname>Altenbuchinger</surname> <given-names>M</given-names></string-name></person-group>. <article-title>FACT: federated adversarial cross training</article-title>. <comment>arXiv:2306.00607</comment>. <year>2023</year>.</mixed-citation></ref>
<ref id="ref-71"><label>[71]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Wang</surname> <given-names>J</given-names></string-name>, <string-name><surname>Chen</surname> <given-names>J</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>J</given-names></string-name>, <string-name><surname>Sigal</surname> <given-names>L</given-names></string-name>, <string-name><surname>de Silva</surname> <given-names>CW</given-names></string-name></person-group>. <article-title>Discriminative feature alignment: improving transferability of unsupervised domain adaptation by Gaussian-guided latent alignment</article-title>. <source>Pattern Recognit</source>. <year>2021</year>;<volume>116</volume>(<issue>1</issue>):<fpage>107943</fpage>. doi:<pub-id pub-id-type="doi">10.1016/j.patcog.2021.107943</pub-id>.</mixed-citation></ref>
<ref id="ref-72"><label>[72]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>French</surname> <given-names>G</given-names></string-name>, <string-name><surname>Mackiewicz</surname> <given-names>M</given-names></string-name>, <string-name><surname>Fisher</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Self-ensembling for visual domain adaptation</article-title>. <comment>arXiv:1706.05208</comment>. <year>2017</year>.</mixed-citation></ref>
<ref id="ref-73"><label>[73]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Russakovsky</surname> <given-names>O</given-names></string-name>, <string-name><surname>Deng</surname> <given-names>J</given-names></string-name>, <string-name><surname>Su</surname> <given-names>H</given-names></string-name>, <string-name><surname>Krause</surname> <given-names>J</given-names></string-name>, <string-name><surname>Satheesh</surname> <given-names>S</given-names></string-name>, <string-name><surname>Ma</surname> <given-names>S</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Imagenet large scale visual recognition challenge</article-title>. <source>Int J Comput Vis</source>. <year>2015</year>;<volume>115</volume>(<issue>3</issue>):<fpage>211</fpage>&#x2013;<lpage>52</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11263-015-0816-y</pub-id>.</mixed-citation></ref>
<ref id="ref-74"><label>[74]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Muravev</surname> <given-names>A</given-names></string-name>, <string-name><surname>Raitoharju</surname> <given-names>J</given-names></string-name>, <string-name><surname>Gabbouj</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Neural architecture search by estimation of network structure distributions</article-title>. <source>IEEE Access</source>. <year>2021</year>;<volume>9</volume>:<fpage>15304</fpage>&#x2013;<lpage>19</lpage>. doi:<pub-id pub-id-type="doi">10.1109/access.2021.3052996</pub-id>.</mixed-citation></ref>
<ref id="ref-75"><label>[75]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Li</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Peng</surname> <given-names>X</given-names></string-name></person-group>. <article-title>Network architecture search for domain adaptation</article-title>. <comment>arXiv:2008.05706</comment>. <year>2020</year>.</mixed-citation></ref>
</ref-list>
</back></article>