<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xml:lang="en" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">JQC</journal-id>
<journal-id journal-id-type="nlm-ta">JQC</journal-id>
<journal-id journal-id-type="publisher-id">JQC</journal-id>
<journal-title-group>
<journal-title>Journal of Quantum Computing</journal-title>
</journal-title-group>
<issn pub-type="epub">2579-0145</issn>
<issn pub-type="ppub">2579-0137</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">61275</article-id>
<article-id pub-id-type="doi">10.32604/jqc.2025.061275</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Genetic Approach to Minimising Gate and Qubit Teleportations for Multi-Processor Quantum Circuit Distribution</article-title>
<alt-title alt-title-type="left-running-head">A Genetic Approach to Minimising Gate and Qubit Teleportations for Multi-Processor Quantum Circuit Distribution</alt-title>
<alt-title alt-title-type="right-running-head">A Genetic Approach to Minimising Gate and Qubit Teleportations for Multi-Processor Quantum Circuit Distribution</alt-title>
</title-group>
<contrib-group>
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Crampton</surname><given-names>Oliver</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><email>omc2000@hw.ac.uk</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Promponas</surname><given-names>Panagiotis</given-names></name><xref ref-type="aff" rid="aff-1">1</xref><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-3" contrib-type="author">
<name name-style="western"><surname>Chen</surname><given-names>Richard</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-4" contrib-type="author">
<name name-style="western"><surname>Polakos</surname><given-names>Paul</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<contrib id="author-5" contrib-type="author">
<name name-style="western"><surname>Tassiulas</surname><given-names>Leandros</given-names></name><xref ref-type="aff" rid="aff-2">2</xref></contrib>
<contrib id="author-6" contrib-type="author">
<name name-style="western"><surname>Samuel</surname><given-names>Louis</given-names></name><xref ref-type="aff" rid="aff-1">1</xref></contrib>
<aff id="aff-1"><label>1</label><institution>Cisco Systems UK, Bedfont Lakes</institution>, <addr-line>London, TW14 8HA</addr-line>, <country>UK</country></aff>
<aff id="aff-2"><label>2</label><institution>Department of Electrical Engineering, Yale University</institution>, <addr-line>New Haven, CT 06511</addr-line>, <country>USA</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Oliver Crampton. Email: <email>omc2000@hw.ac.uk</email></corresp>
</author-notes>
<pub-date date-type="collection" publication-format="electronic">
<year>2025</year></pub-date>
<pub-date date-type="pub" publication-format="electronic">
<day>21</day>
<month>03</month>
<year>2025</year>
</pub-date>
<volume>7</volume>
<issue>1</issue>
<fpage>1</fpage>
<lpage>15</lpage>
<history>
<date date-type="received">
<day>21</day>
<month>11</month>
<year>2024</year>
</date>
<date date-type="accepted">
<day>20</day>
<month>2</month>
<year>2025</year>
</date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2025 The Authors.</copyright-statement>
<copyright-year>2025</copyright-year>
<copyright-holder>Published by Tech Science Press.</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_JQC_61275.pdf"></self-uri>
<abstract>
<p>Distributed Quantum Computing (DQC) provides a means for scaling available quantum computation by interconnecting multiple quantum processor units (QPUs). A key challenge in this domain is efficiently allocating logical qubits from quantum circuits to the physical qubits within QPUs, a task known to be NP-hard. Traditional approaches, primarily focused on graph partitioning strategies, have sought to reduce the number of required Bell pairs for executing non-local CNOT operations, a form of gate teleportation. However, these methods have limitations in terms of efficiency and scalability. Addressing this, our work jointly considers gate and qubit teleportations introducing a novel meta-heuristic algorithm to minimise the network cost of executing a quantum circuit. By allowing dynamic reallocation of qubits along with gate teleportations during circuit execution, our method significantly enhances the overall efficacy and potential scalability of DQC frameworks. In our numerical analysis, we demonstrate that integrating qubit teleportations into our genetic algorithm for optimizing circuit blocking reduces the required resources, specifically the number of EPR pairs, compared to traditional graph partitioning methods. Our results, derived from both benchmark and randomly generated circuits, show that as circuit complexity increases&#x2014;demanding more qubit teleportations&#x2014;our approach effectively optimises these teleportations throughout the execution, thereby enhancing performance through strategic circuit partitioning. This is a step forward in the pursuit of a global quantum compiler which will ultimately enable the efficient use of a &#x2018;quantum data center&#x2019; in the future.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Distributed quantum computing</kwd>
<kwd>optimisation</kwd>
<kwd>teleportation</kwd>
<kwd>heuristic</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1">
<label>1</label>
<title>Introduction</title>
<p>Quantum computers can, in principle, perform tasks that have previously been impossible or highly inefficient on classical computers [<xref ref-type="bibr" rid="ref-1">1</xref>], such as factoring large numbers using Shor&#x2019;s algorithm [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-3">3</xref>] or simulating quantum systems [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. However, the scaling of monolithic quantum processors towards doing useful, error-free computations is difficult to achieve [<xref ref-type="bibr" rid="ref-6">6</xref>]. To address this, companies such as IBM are exploring inter-connected distributed quantum processor units (QPUs), which require quantum networking to execute complex circuits at scale. Similar to classical distributed computing, the execution of a quantum circuit may be performed across many smaller quantum computers. In this scenario, a quantum network is needed to perform the necessary operations between each QPU.</p>
<p>Quantum circuits are a visual way to represent the temporal order of single or multi-qubit gates, to perform a designed algorithm. Quantum gates are unitary operations that operate on a logical qubit state <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>&#x03C8;</mml:mi><mml:mo fence="false" stretchy="false">&#x27E9;</mml:mo><mml:mo>=</mml:mo><mml:mi>&#x03B1;</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mn>0</mml:mn><mml:mo fence="false" stretchy="false">&#x27E9;</mml:mo><mml:mo>+</mml:mo><mml:mi>&#x03B2;</mml:mi><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mn>1</mml:mn><mml:mo fence="false" stretchy="false">&#x27E9;</mml:mo></mml:math></inline-formula>. A logical qubit is the unit of information used in a quantum computation and may be constructed from many physical qubits inside a QPU to perform error correction. A more detailed review of quantum information and quantum computing can be found in the seminal text by Nielsen et al. [<xref ref-type="bibr" rid="ref-7">7</xref>]. In Distributed Quantum Computing (DQC) there are three important types of operations, single-qubit gates (e.g., Pauli rotation, Hadamard), local CNOT (cx) (control and target qubits within the same processor), and non-local CNOT (also called telegate). In the latter, a CNOT operation should be executed between qubits that are stored in different QPUs. The Hadamard gate, Pauli gates, and CNOT operations form a universal set for quantum computation [<xref ref-type="bibr" rid="ref-8">8</xref>], meaning that any arbitrary unitary transformation of a quantum state can be expressed by only these three gates [<xref ref-type="bibr" rid="ref-9">9</xref>], henceforth we will assume that all circuits have been decomposed into this universal set of operations. The CNOT gate is a two-qubit operator which requires the control of one qubit by another and is given by the following matrix:
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mi>C</mml:mi><mml:mi>N</mml:mi><mml:mi>O</mml:mi><mml:mi>T</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mtable columnalign="left left left left" rowspacing="4pt" columnspacing="1em"><mml:mtr><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>1</mml:mn></mml:mtd></mml:mtr><mml:mtr><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd><mml:mtd><mml:mn>1</mml:mn></mml:mtd><mml:mtd><mml:mn>0</mml:mn></mml:mtd></mml:mtr></mml:mtable><mml:mo>]</mml:mo></mml:mrow></mml:math></disp-formula></p>
<p>To execute a distributed algorithm, the logical qubits must be mapped to physical qubits in the QPUs and a means for performing operations between non-local qubits must be established. Two types of non-local operations can occur, qubit or gate teleportation. Qubit teleportation is the process of transferring a logical qubit state from one QPU to another via the use of a Bell state (EPR pair) [<xref ref-type="bibr" rid="ref-10">10</xref>] and some classical communication. Gate teleportation, on the other hand, does not move the qubit state but allows one qubit to control the operation on another, distant qubit. The circuits required to perform both qubit teleportation and a teleported controlled operation between two distant logical qubits [<xref ref-type="bibr" rid="ref-11">11</xref>] are shown in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. Both operations make use of Bell states which need to be distributed by the network (one-half to each QPU). This is a costly procedure and hence there is a need for minimising 2 the teleportations to reduce the load on the network and to maximise the likelihood of successfully executing a quantum circuit before qubits decohere.</p>
<fig id="fig-1">
<label>Figure 1</label>
<caption>
<title>Circuits that implement (a) a teleportation operation of a state <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi>&#x03C8;</mml:mi><mml:mo fence="false" stretchy="false">&#x27E9;</mml:mo></mml:math></inline-formula> from QPU 1 to QPU 2, and (b) a teleportation of a CNOT operation between a control qubit and target qubit that are stored in different QPUs. Both operations require a Bell state <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mrow><mml:mo stretchy="false">|</mml:mo></mml:mrow><mml:mi mathvariant="normal">&#x03A6;</mml:mi><mml:mo>+</mml:mo><mml:mo fence="false" stretchy="false">&#x27E9;</mml:mo></mml:math></inline-formula> as well as the transmission of classical bits that correspond to the outcome of qubit measurements</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-1.tif"/>
</fig>
<p>When executing a quantum circuit on a single QPU, it&#x2019;s crucial for the compiler to dynamically map logical qubits to neighboring physical positions. This mapping allows gate operations by enabling direct interactions. Numerous studies have addressed the challenges and solutions related to quantum circuit compilation (e.g., [<xref ref-type="bibr" rid="ref-12">12</xref>&#x2013;<xref ref-type="bibr" rid="ref-14">14</xref>]) in the case of a single QPU. Our work, however, assumes a fully connected architecture for QPUs, where each qubit can directly interact with any other. A similar assumption would be the existence of an efficient compiler that handles the qubit mapping inside each QPU separately. Such assumptions allow us to abstract away the constraints of the compilation problem, focusing instead on optimising the network operations necessary for distributed quantum computation. Assuming only gate teleportations as a means towards DQC, previous works have used various heuristic methods to minimise solely the number of non-local (controlled) operations within a circuit [<xref ref-type="bibr" rid="ref-15">15</xref>&#x2013;<xref ref-type="bibr" rid="ref-19">19</xref>].</p>
<p>Although previous works have primarily considered the minimisation of gate teleportation, this work sets out to jointly consider gate and qubit teleportations as enablement of DQC. Thus, the network cost is associated with the Bell pairs requested from both teleportation operations. Recently the authors in [<xref ref-type="bibr" rid="ref-20">20</xref>] introduced a graph partitioning framework that incorporates qubit teleportations into network cost calculations. While their work also recognizes the importance of qubit teleportations in DQC, it differs from ours in its constraint of equalizing operation counts across QPUs. Our study, in contrast, views the generation of Bell pairs for network operations as the primary limiting factor, thereby allowing for more flexible QPU operation allocations. Recently, reference [<xref ref-type="bibr" rid="ref-21">21</xref>] employs Quadratic Unconstrained Binary Optimisation to minimise the network cost assuming only qubit teleportations. In the latter work, the authors divide a quantum circuit into predetermined slices such that each such slice can be run without the need for gate teleportations. Finally, in [<xref ref-type="bibr" rid="ref-22">22</xref>], the authors employ a window-based partitioning of a quantum circuit considering also qubit teleportations. Nevertheless, the optimisation over the latter is realized through a tuning parameter that determines how &#x201C;hard&#x201D; should be for a qubit to migrate to a different QPU. In contrast, in our work, we optimise the slicing of the circuit by allowing gate teleportations within each slice and qubit teleportations across slices to facilitate distributed quantum computation.</p>
<p>Specifically, this paper addresses the problem of modeling and minimising the network cost of executing a quantum circuit into a DQC framework. Allowing both gate and qubit teleportations, we dynamically allocate the logical qubits to physical qubits into the quantum processors to execute a quantum algorithm distributedly. Since both teleportation operations require a Bell pair our goal is to minimise the number of teleportations needed to complete the execution. For this purpose, we propose a novel meta-heuristic called Optimised Distributed Quantum Circuit Execution via Meta-Heuristic Approach (ODQC-MHA) that uses a genetic algorithm. Our method significantly enhances the overall efficacy and potential scalability of the DQC framework by dynamically allocating the qubits across distributed QPUs.</p>
<p>The rest of the paper is organised as follows, in <xref ref-type="sec" rid="s2">Section 2</xref> we introduce the logical qubit allocation problem for static assignment within a monolithic QPU as well as a straightforward partitioning heuristic. <xref ref-type="sec" rid="s3">Section 3</xref> introduces qubit teleportations to the qubit mapping problem and we describe our meta-heuristic for solving this problem approximately. <xref ref-type="sec" rid="s4">Section 4</xref> shows the results of the performance of our metaheuristic against benchmark circuits and randomly generated circuits. Finally, <xref ref-type="sec" rid="s5">Section 5</xref> concludes the paper.</p>
</sec>
<sec id="s2">
<label>2</label>
<title>Qubit Allocation to Minimise Gate Teleportations</title>
<p>Focusing only on gate teleportations as the means towards DQC, one way to decrease the network cost is to leverage graph partitioning algorithms [<xref ref-type="bibr" rid="ref-23">23</xref>&#x2013;<xref ref-type="bibr" rid="ref-25">25</xref>] in an appropriately generated graph. In this section, we describe such process for the case of two and multiple QPUs (<xref ref-type="sec" rid="s2_2">Sections 2.2</xref> and <xref ref-type="sec" rid="s2_3">2.3</xref>, respectively).</p>
<p>In this study, we operate under the assumption that there is complete connectivity both between and within the QPUs. This means we overlook any compilation within a QPU and presume that the compiler handles the required swaps in a non-fully connected QPU to ensure adjacent qubits. Research has been done on simulating circuits on realistic, constrained processor architectures by minimising the number of swaps required to execute controlled operations [<xref ref-type="bibr" rid="ref-26">26</xref>]. In practice constraints on the connectivity within a processor are likely to exist, however, one would need a global compiler to be able to maximise the ability to execute a circuit on NISQ devices.</p>
<sec id="s2_1">
<label>2.1</label>
<title>Minimising Gate Teleportation: Model &#x0026; Problem Description</title>
<p>In the graph representation of a circuit denoted as <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy="false">)</mml:mo><mml:mo>,</mml:mo><mml:mi>V</mml:mi></mml:math></inline-formula> represents a set of <italic>n</italic> qubits and <italic>E</italic> defines connections between qubits. The edge-weight function
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mi>c</mml:mi><mml:mo>&#x003A;</mml:mo><mml:mi>V</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mi>V</mml:mi><mml:mo stretchy="false">&#x2192;</mml:mo><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></disp-formula>where <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:msub><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> denotes the set of natural numbers including zero represents the frequency of controlled operations between qubits <italic>u</italic> and <italic>v</italic>. Therefore, <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>c</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>u</mml:mi><mml:mo>,</mml:mo><mml:mi>v</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mn>0</mml:mn></mml:math></inline-formula> indicates the absence of an edge and thus of a CNOT gate between <italic>u</italic> and <italic>v</italic>. The cost of a partition <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>c</mml:mi><mml:mi>o</mml:mi><mml:mi>s</mml:mi><mml:mi>t</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> is defined as the sum of all weights <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mrow><mml:mi>c</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mrow><mml:mi>u</mml:mi></mml:mrow><mml:mo>,</mml:mo><mml:mrow><mml:mi>v</mml:mi></mml:mrow><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> where <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>u</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>V</mml:mi><mml:mn>1</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>v</mml:mi><mml:mo>&#x2208;</mml:mo><mml:mi>V</mml:mi><mml:mn>2</mml:mn></mml:math></inline-formula> belong to different partitions. The aim is to find <italic>k</italic> partitions of the graph each of, at most, size <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>v</mml:mi><mml:mo>=</mml:mo><mml:mi>n</mml:mi><mml:mrow><mml:mo>/</mml:mo></mml:mrow><mml:mi>k</mml:mi></mml:math></inline-formula> such that the capacity of the edges between partitions is minimised, thus reducing the number non-local controlled operations, and the number of Bell pairs required. The constraint of almost equal partitions is to minimise the maximum number of physical qubits needed from a QPU. Minimising the number of non-local operations is crucial because entanglement is a costly resource and distributing it into a network of QPUs requires extra time steps and is error-prone [<xref ref-type="bibr" rid="ref-27">27</xref>]. A schematic showing the graph partitioning method is shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>
<fig id="fig-2">
<label>Figure 2</label>
<caption>
<title>Example of initial qubit allocation process via graph partitioning to minimise non-local operations between 3 processors of sizes: (6, 8, 12). The edge weight is signified by the line colour, darker lines represent higher number of non-local operations</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-2.tif"/>
</fig>
</sec>
<sec id="s2_2">
<label>2.2</label>
<title>K-L Algorithm for 2 QPUs</title>
<p>One commonly used heuristic method for bi-partitioning a weighted graph is the Kernighan-Lin algorithm [<xref ref-type="bibr" rid="ref-23">23</xref>]. The K-L algorithm works by taking the weighted graph <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>V</mml:mi><mml:mo>,</mml:mo><mml:mi>E</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and c as the edge-weight function. By swapping pairs of vertices <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:msub><mml:mi>u</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:msub><mml:mi>v</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:msub><mml:mi>V</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> with maximum cost improvement, the swapped pair are locked in place and the same process is done with another pair until all vertices are locked. The best configuration is chosen and the algorithm is run again until a close to optimal configuration is found. The downside of the K-L algorithm is that it can only be used for bi-partitioning of a graph. In the next section we propose a similarly simple heuristic for partitioning a graph, into any size partitions <italic>k</italic>. Such extension enables the division of a quantum circuit&#x2019;s logical qubits across multiple QPUs, beyond just two, when available. For all cases where the K-L algorithm is used in this study, sufficiently high numbers of iterations are used to ensure that the algorithm is allowed to &#x201C;converge&#x201D;.</p>
</sec>
<sec id="s2_3">
<label>2.3</label>
<title>Greedy Partitioning Algorithm in the Case of Multiple QPUs</title>
<p>In this section we describe Greedy Partitioning Algorithm in the case of multiple QPUs (GPA), a straightforward and practical heuristic approach for distributing qubits in circuits with varying numbers of qubits and depth, across any quantity and scale of QPUs. The heuristic works by always contracting the largest weighted edge to build supernodes&#x2014;respresenting QPUs&#x2014;one by one. Once a supernode (QPU) has been filled to capacity, none of the nodes within can be swapped later in the algorithm. The largest weighted edge that is adjacent to the supernode is always contracted at each step of the heuristic, with no look-ahead. This method is computationally inexpensive and so can be used as a quick heuristic for qubit allocation within our proposed meta-heuristic (proposed in <xref ref-type="sec" rid="s3">Section 3</xref>). The pseudo-code for GPA is shown in Algorithm 1. This heuristic is performed once to produce a good allocation of qubits to processors where the number of interactions between each processor is reduced. The algorithm is implemented in python and utilises the NetworkX framework to do the edge contractions. Here, an edge contraction is the process of producing a graph in which two node <italic>v</italic><sub>1</sub> and <italic>v</italic><sub>2</sub> are replaced with a single node, <italic>v</italic>, such that <italic>v</italic> is adjacent to the union of the nodes to which <italic>v</italic><sub>1</sub> and <italic>v</italic><sub>2</sub> were originally adjacent, also called &#x2018;vertex identification&#x2019;.</p>
<fig id="fig-8">
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-8.tif"/>
</fig>
</sec>
</sec>
<sec id="s3">
<label>3</label>
<title>Minimising Remote Operations: Model &#x0026; Problem Description</title>
<p>Focusing only on gate teleportations as the means towards DQC, one way to decrease the network cost is to leverage graph partitioning algorithms [<xref ref-type="bibr" rid="ref-23">23</xref>&#x2013;<xref ref-type="bibr" rid="ref-25">25</xref>] in an appropriately generated graph. In this section, we describe such a process for the case of two and multiple QPUs (<xref ref-type="sec" rid="s2_2">Sections 2.2</xref> and <xref ref-type="sec" rid="s2_3">2.3</xref>, respectively). In this study, we operate under the assumption that there is complete connectivity both between and within the QPUs. This means we overlook any compilation within a QPU and presume that the compiler handles the required swaps in a non-fully connected QPU to ensure adjacent qubits. Research has been done on simulating circuits on realistic, constrained processor architectures by minimising the number of swaps required to execute controlled operations [<xref ref-type="bibr" rid="ref-26">26</xref>]. In practice constraints on the connectivity within a processor are likely to exist, however, one would need a global compiler to be able to maximise the ability to execute a circuit on NISQ devices.</p>
<sec id="s3_1">
<label>3.1</label>
<title>Optimised Distributed Quantum Circuit Execution via Meta-Heuristic Approach (ODQC-MHA)&#x2014;High Level Design</title>
<p>In this section, we introduce the ODQC-MHA framework by describing its high-level design. ODQC-MHA allows the circuit to be analysed in blocks of varying size, using any algorithm for k partitioning a graph within each block. For each block, we attempt to minimise the number of gate teleportations while allowing qubit teleportations between blocks to reallocate the logical qubits when needed. Finding the optimal blocking is challenging because of the many possible combinations, however, this problem is to be solved by our heuristic. Note that each allocation has no information about the previous block&#x2019;s allocation and so a meta-heuristic is required to minimise the gate and qubit teleportations together. The high-level description of the proposed framework is illustrated in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. Note that to enhance circuit execution efficiency, qubit teleportations are allowed between blocks. Given the vast search space comprising various circuit partitions and qubit placements, the approach combines any partitioning algorithm for intra-block qubit placements (this can be K-L, GPA, or any other procedure) with a genetic algorithm to jointly consider the qubit teleportations. The proposed genetic algorithm evaluates its utility given a circuit partition based on the placements suggested by the particular partitioning algorithm used, focusing on achieving the optimisation objective of minimising the network cost. Note that this is not a joint optimisation but a meta-heuristic that uses the output of some heuristic (K-L, Greedy algorithm, etc.) to explore the solution space more thoroughly. The problem of graph partitioning is NP-HARD per block hence the blocking model proposed in this paper is hard to solve without a novel heuristic. In the next sections, a genetic algorithm is proposed to approximate an optimal &#x2018;blocking&#x2019; of the circuit to minimise the total number of Bell pairs required for qubit and gate teleportations.</p>
<fig id="fig-3">
<label>Figure 3</label>
<caption>
<title>A high level overview diagram of the proposed framework (ODQC-MHA) for blocking/partitioning quantum circuits. The circuit is broken into blocks of arbitrary size (number of layers). Within each block, logical qubit allocation is performed using graph partitioning methods in order to reduce the number of Bell pairs required for nonlocal operations. After this, between each block, qubit teleportations are performed to reallocate the logical qubits according to each blocks allocation</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-3.tif"/>
</fig>
</sec>
<sec id="s3_2">
<label>3.2</label>
<title>Optimised Distributed Quantum Circuit Execution via Meta-Heuristic Approach (ODQC-MHA)&#x2014;Genetic Algorithm</title>
<p>Genetic algorithms mimic natural evolution by evolving solutions to problems through a process of selection, mutation, and crossover [<xref ref-type="bibr" rid="ref-28">28</xref>]. They start with a diverse population of individuals, where each individual&#x2019;s &#x201C;genotype&#x201D; encodes a potential solution, and its &#x201C;phenotype&#x201D;&#x2014;its performance or fitness&#x2014;reflects the solution&#x2019;s effectiveness. Over successive generations, individuals with higher fitness are more likely to pass their genes to the next generation, allowing the algorithm to &#x201C;naturally select&#x201D; increasingly effective solutions. In our approach, we utilize a genetic algorithm to optimise the distribution of computational tasks in a quantum computing network, specifically aiming to minimise the requisite number of Bell pairs for efficient quantum communication. The core of our algorithm is defined by a population of candidate solutions, denoted as <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>p</mml:mi><mml:mrow><mml:mi>N</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, where each candidate solution pi represents a potential configuration of dividing the target quantum circuit into distinct blocks.</p>
<p>Each candidate solution <italic>p</italic> &#x2208; <italic>P</italic> is characterized by its genotype, <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, which in our model is a sequence of integers <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mo fence="false" stretchy="false">{</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>K</mml:mi></mml:mrow></mml:msub><mml:mo fence="false" stretchy="false">}</mml:mo></mml:math></inline-formula>, where <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>k</mml:mi></mml:mrow></mml:msub><mml:mo>&#x2208;</mml:mo><mml:mrow><mml:mi mathvariant="double-struck">N</mml:mi></mml:mrow></mml:math></inline-formula>, and <italic>K</italic> represents a predefined maximum number of blocks for the quantum circuit. Here, gi signifies the depth (i.e., the number of layers) of the circuit within block i of the network. Note that a large value for <italic>K</italic> increases the search space exponentially allowing for more combinations of circuit blockings to be checked by the algorithm. Intuitively, the configuration of these blocks is subject to a constraint where the sum of all gi values must equal the total number of layers in the quantum circuit. Notably, it is permissible for any gi to be zero, indicating blocks that are empty and thus not contributing to the overall division of the circuit. For instance, a genotype <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> &#x003D; [12, 42, 64, 38, 203, 0, 34] represents a circuit partitioned into blocks with respective depths of 12, 42, 64, 38, 203, 34. Although in this specific example, we allow up to 7 blocks for the circuit, this specific genotype, <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, utilizes only 6 of them.</p>
<p>The phenotype, <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, associated with an individual <italic>p</italic> &#x2208; <italic>P</italic> with genotype <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, quantifies the total number of Bell pairs required for the candidate solution&#x2019;s block configuration. This total encapsulates both gate teleportations within individual blocks and qubit teleportations across the network. The phenotype thus serves as a measure of the solution&#x2019;s effectiveness in optimising quantum communication. To evaluate the phenotype and thus the viability of each candidate solution, we introduce a fitness function, evaluateFitness: P &#x2192; N, that maps the individual to a natural number. In our case, this function is counting the total number of Bell pairs and hence teleportations are needed under the configuration under 5 consideration.</p>
<p>Through the iterative processes of selection, crossover, and mutation, our genetic algorithm seeks to evolve the population towards configurations that minimise Bell pair usage, thereby enhancing the efficiency and feasibility of DQC tasks. By continuously refining the genotypes within the population based on their fitness scores, the algorithm drives towards an optimal or near-optimal distribution of computational loads and quantum communication requirements across the network. This framework not only provides a method for optimising quantum network configurations but also offers insights into the trade-offs between computational depth and quantum communication resources, laying the groundwork for further innovations in DQC architectures.</p>
</sec>
<sec id="s3_3">
<label>3.3</label>
<title>Optimised Distributed Quantum Circuit Execution via Meta-Heuristic Approach (ODQC-MHA)&#x2014;A Detailed Description</title>
<p>The Optimised Distributed Quantum Circuit Execution via Meta-Heuristic Approach (ODQC-MHA) makes use of a genetic algorithm formalism to find a minimum number of total Bell pairs required for a distributed execution of a given quantum circuit by finding an arrangement of block lengths that minimises the total cost. This section provides a detailed description of the ODQC-MHA components that were abstracted away in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>, complementing also the overview provided in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>.</p>
<fig id="fig-4">
<label>Figure 4</label>
<caption>
<title>Structure of genetic algorithm. GenerateIndividuals(): create N lists, of a given size, representing the number of layers per block of a circuit. selection():&#x2018;Hall of Fame&#x2019; selection process ensures that the best individual to ever exist is chosen as the optimal. Crossover(): randomly chooses two individuals from the mating pool to create each new generation of superior individuals. Mutate: individuals are randomly chosen to mutate, mutate 1 occurs every mutation and mutate 2 occurs with probability 1/10 for each mutation. HoF Winner: the best individual that has existed throughout the generations is selected as the optimal &#x2018;blocking&#x2019; solution</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-4.tif"/>
</fig>
<sec id="s3_3_1">
<label>3.3.1</label>
<title>Efficient Qubit Teleportation Decisions</title>
<p>For each individual in the genetic algorithm, the quantum circuit is segmented into blocks based on the number of layers specified by the individual&#x2019;s genotype. To optimise the allocation of qubits across multiple QPUs, a method such as graph partitioning (e.g., GPA) is employed. This step aims to find an approximately optimal distribution of qubits over the available processors for each block. Given the allocation for each block, the challenge arises in transitioning qubits from their configuration in one block to the next. This transition is not straightforward due to the flexibility in QPU assignments: any given allocation might correspond to any QPU. This leads to a complex problem, especially as the number of QPUs increases, where finding the most efficient mapping between allocations in consecutive blocks becomes computationally intensive. To address this, we construct a bipartite graph <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>G</mml:mi><mml:mo>=</mml:mo><mml:mo stretchy="false">(</mml:mo><mml:mi>I</mml:mi><mml:mo>,</mml:mo><mml:mi>J</mml:mi><mml:mo>,</mml:mo><mml:mi>L</mml:mi><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula>, where nodes in disjoint set I represent the processor allocations in block <italic>i</italic>, and nodes in set <italic>J</italic> represent the allocations in block <italic>i</italic> &#x002B; 1 (<xref ref-type="fig" rid="fig-5">Fig. 5</xref>). The edges in this graph, <italic>L</italic>, denote the potential mappings between allocations, with weights reflecting the minimum number of qubit differences (and thus qubit teleportations) required for each mapping. By negating these weights and applying a maximum weighted matching algorithm [<xref ref-type="bibr" rid="ref-29">29</xref>], we identify the mapping that minimises the number of qubit teleportations needed for reallocation between blocks. This approach significantly reduces the complexity of finding optimal qubit transitions between blocks.</p>
<fig id="fig-5">
<label>Figure 5</label>
<caption>
<title>Bipartite graph representing the re allocation between multiple QPUs, it is clear to see that there are multiple ways to map a given allocation to QPU</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-5.tif"/>
</fig>
</sec>
<sec id="s3_3_2">
<label>3.3.2</label>
<title>Components of the Genetic Algorithm</title>
<p>Initialisation (generateIndividuals)&#x2014;firstly the initial population is generated. Toward that goal, we generate homogeneous lists of a given length, which corresponds to the maximum number of blocks available in the solution. Then applying the mutate function (described later) to each individual 100,000 times we generate diverse genotypes for the initial population. It is important that these individuals are highly varied due to the size and complexity of the solution space.</p>
<p>Evaluation of Fitness (evaluateFitness)&#x2014;the fitness function evaluates the efficiency of a given qubit allocation and teleportation scheme. It does so by summing the total number of gate teleportations within each block (determined by the initial qubit allocations) and the qubit teleportations between blocks (as optimised by the bipartite graph matching). The objective of the genetic algorithm is to minimise this sum, thereby reducing the overall quantum communication and computation overhead in the DQC framework. The evaluation process effectively quantifies the &#x201C;cost&#x201D; of a particular configuration of qubit allocations and transitions, guiding the genetic algorithm toward solutions that optimise the use of quantum resources. By focusing on minimising the combined total of gate and qubit teleportations, the algorithm seeks configurations that offer the best balance between computational efficiency and the practical constraints of DQC. This structured approach allows for a clear understanding of how qubit teleportation and allocation decisions impact the overall efficiency of quantum computing operations, providing a solid basis for optimising DQC architectures.</p>
<p>Crossover Function&#x2014;the crossover function is the process of generating offspring from the selected parent genes. These offspring are generated such that they share some elements from either parent. The crossover function used is a simple two-point crossover which chooses a subset of random size from each pair of parents and swaps them. Since the genotype&#x2019;s values must sum up to the total number of layers in the quantum circuit, we employ a rebalancing routine to enforce that constraint after the crossover.</p>
<p>Mutation Function&#x2014;The mutation function involves randomly choosing 2 indices <italic>i</italic> and <italic>j</italic>, <italic>i</italic> &#x2260; <italic>j</italic> in the individual, with a predefined probability of occurence. Defining a mutation constant, c, a fair coin is flipped and the mutation constant is either added to <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and subtracted from <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:msub><mml:mi>g</mml:mi><mml:mrow><mml:mi>j</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> or <italic>vice versa</italic>. Also, each mutation has a probability <italic>p</italic> of introducing a zero element to the individual, by subtracting a randomly chosen indices value from itself and spreading the value among the remaining indices.</p>
<p>Selection Process&#x2014;The selection process entirely replaces the parental population, requiring that the selection procedure is stochastic and allows the same individual to be selected more than once. We used the selTournament() function as part of the DEAP framework, which was found to be a suitable mutation function for searching the solution space thoroughly.</p>
</sec>
<sec id="s3_3_3">
<label>3.3.3</label>
<title>Overview</title>
<p>The genetic algorithm adjusts the size of each block&#x2014;between which qubit teleportations are done to get from one allocation to the next&#x2014;to try and minimise the total number of Bell pairs required. The size of a block is allowed to go to zero and if so, it is ignored by the calculation of teleportations. In other words, the optimal &#x2018;blocking&#x2019; of the circuit may contain fewer blocks than the initial candidate solution. Note that this meta-heuristic can use any algorithm for the allocation of qubits inside a given block to calculate the Bell pairs needed for the gate teleportations. For example, if there are just two QPUs one could implement K-L algorithm and in a multi-QPU framework the proposed GPA is more suitable. In the next section, we implement ODQC-MHA using the DEAP python framework [<xref ref-type="bibr" rid="ref-30">30</xref>] to execute the genetic algorithm with these definitions, to converge on a close to optimal blocking of the circuit that required the minimum number of gate and qubit teleportations combined for a given quantum circuit. Hereafter, the term ODQC-MHA(K-L) refers to the meta-heuristic that applies the K-L algorithm to each block. Conversely, ODQC-MHA(GPA) denotes the variant where the GPA is employed for graph partitioning in each block.</p>
</sec>
</sec>
</sec>
<sec id="s4">
<label>4</label>
<title>Performance Evaluation</title>
<sec id="s4_1">
<label>4.1</label>
<title>Pre-Processing</title>
<p>In this section, we describe the steps taken to analyse benchmark circuits to evaluate the performance of ODQC-MHA. Each circuit analysed is represented in a QASM (QUantum ASeMbley) [<xref ref-type="bibr" rid="ref-31">31</xref>] file that contains all of the logical instruction information in order. The QASM file is parsed, and the circuit is then represented as a directed acyclic graph (DAG). The DAG shows the dependencies of each gate and allows us to determine which operations can occur simultaneously (in the same layer). Given that we have the DAG, we can divide the circuit, by layer, into the blocks given by an individual&#x2019;s genotype Gp, as explained previously. For each block, an interaction matrix is constructed using the QASM instructions, this matrix is then used to build a network [<xref ref-type="bibr" rid="ref-32">32</xref>] graph object for which the standard graph partitioning algorithms can be used. ODQC-MHA can be applied to any circuit of any size, although the execution time scales with the number of qubits and the maximum number of blocks. Note that the bottleneck here is the number of qubits in the circuits as this determines the size of the graph to be partitioned.</p>
<p>For each comparison, the mean value for the number of combined gate and qubit teleportations when using <italic>ODQC-MHA</italic> for multiple trials on each circuit is taken to be the average performance of the algorithm. The ratio of this mean value to the number of teleportations when using K-L (for two QPUs) and GPA (for &#x003E; 2 QPUs), with no circuit blocking, is presented as the percentage improvement of ODQC-MHA. The maximum number of QPUs considered in this study is three, this is because the main comparison presented is with the commonly used K-L algorithm, which applies only to the two QPU cases. ODQC-MHA is agnostic to any efficient graph partitioning algorithm for &#x003E; 2 QPUs, within each block.</p>
</sec>
<sec id="s4_2">
<label>4.2</label>
<title>Non-Random Quantum Circuits (2 QPUs)</title>
<p>Initially, we ran ODQC-MHA(K-L) on quantum circuits from QASMBench [<xref ref-type="bibr" rid="ref-33">33</xref>]. Information about the benchmark circuits used is shown in <xref ref-type="table" rid="table-1">Table 1</xref>. The results of this analysis are shown in <xref ref-type="fig" rid="fig-6">Fig. 6a</xref>, showing the percentage improvement over using K-L for the entire circuit, i.e., no qubit teleportations. We compare three configurations of ODQC-MHA(K-L), with the maximum allowed numbers of blocks (MAB) of; 10, 50, and 100.</p>
<table-wrap id="table-1">
<label>Table 1</label>
<caption>
<title>Benchmark quantum circuits for Section IV-B</title>
</caption>
<table>
<colgroup>
<col/>
<col/>
<col/>
<col/>
<col/>
<col/>
</colgroup>
<thead>
<tr>
<th>ID</th>
<th>Circuit name</th>
<th>Qubits</th>
<th>Depth (CX only)</th>
<th>Unary gates</th>
<th>CX gates</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>Adder_n118</td>
<td>118</td>
<td>4</td>
<td>1107</td>
<td>845</td>
</tr>
<tr>
<td>2</td>
<td>sym9_146</td>
<td>12</td>
<td>91</td>
<td>180</td>
<td>148</td>
</tr>
<tr>
<td>3</td>
<td>cycle10_2 110</td>
<td>12</td>
<td>3386</td>
<td>3402</td>
<td>2648</td>
</tr>
<tr>
<td>4</td>
<td>Inc_237</td>
<td>16</td>
<td>3463</td>
<td>5983</td>
<td>4636</td>
</tr>
<tr>
<td>5</td>
<td>cm85a_209</td>
<td>14</td>
<td>3818</td>
<td>6428</td>
<td>4986</td>
</tr>
<tr>
<td>6</td>
<td>rd84_253</td>
<td>12</td>
<td>4466</td>
<td>7698</td>
<td>5960</td>
</tr>
<tr>
<td>7</td>
<td>root_255</td>
<td>13</td>
<td>5354</td>
<td>9666</td>
<td>7493</td>
</tr>
<tr>
<td>8</td>
<td>mlp4_245</td>
<td>16</td>
<td>6190</td>
<td>10,620</td>
<td>8232</td>
</tr>
<tr>
<td>9</td>
<td>clip_206</td>
<td>14</td>
<td>10,734</td>
<td>19,055</td>
<td>14,772</td>
</tr>
<tr>
<td>10</td>
<td>Dist_223</td>
<td>13</td>
<td>11,911</td>
<td>21,422</td>
<td>16,624</td>
</tr>
</tbody>
</table>
</table-wrap><fig id="fig-6">
<label>Figure 6</label>
<caption>
<title>(a) Comparison percentage improvement for ODQC-MHA(KL) on benchmark circuits for varying maximum allowed number of blocks (MAB) (10, 50, 100), across 2 QPUs. The benchmark circuits are ordered by increasing circuit depth. Each data point is the mean of 100 executions of ODQC-MHA(K-L). Circuit ID I; (b) Comparison of ODQC-MHA(K-L) for varying maximum allowed number of blocks (10, 50, 100), to ODQC-MHA(K-L) for 1 block on randomly generated circuits, distributed over 2 processors. Each data point is the mean value of multiple runs of ODQCMHA(K-L) on different randomly generated, 16 qubit, circuits. The percentage difference is plotted</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-6.tif"/>
</fig>
<p>While the improvement varies across the circuits, we see a clear trend. For increasing depth there is a region for which 100 MAB shows the smallest improvement while 10 MAB shows the largest, in the central region we see that 50 MAB shows the most improvement, and for the higher end of circuit depth, 100 MAB. We identify a likely reason for this trend, for smaller circuits it is possible to &#x2018;overblock&#x2019;, that is, it becomes hard to converge on a solution&#x2014;on average&#x2014;due to the starting size of an individual genotype. It is worth noting that we believe this analysis on small benchmark circuits is arbitrary because the performance of the heuristic depends strongly on the distribution of CNOT gates, in the next section, we discuss the performance of the algorithm on average, using large randomly generated circuits.</p>
<p>The inconsistent performance between different configurations across different circuits suggests a limitation with the design of ODQC-MHA. Future work might include optimising the allowed size of an individual, the population size, and the number of generations, depending on the circuit at hand. This should allow the algorithm to search the given solution space more efficiently and prevent converging on local minima.</p>
</sec>
<sec id="s4_3">
<label>4.3</label>
<title>Random Quantum Circuits (2 QPUs)</title>
<p>To demonstrate the effectiveness of ODQC-MHA on average, we ran qubit allocation on randomly generated circuits containing only CNOT gates, which are the only universal gates that are considered in the algorithm. This demonstrates the average behaviour on large circuits where the distribution of CNOT gates tends towards homogeneity. In principle, any bias in the distribution of CNOTs should be exploited by a good heuristic method.</p>
<p>Firstly, we generated random, 16 qubit circuits of varying numbers of CNOT gates, in the same range as the benchmark circuits used. We ran ODQC-MHA(K-L) to allocate across 2 QPUs for each circuit. This analysis was done 100 times, and the average performance was plotted in <xref ref-type="fig" rid="fig-6">Fig. 6b</xref>. This was done for different initial configurations of ODQC-MHA(K-L), allowing for a maximum number of blocks (length of genotype) of 10, 50 and 100. The genetic algorithm could&#x2014;on average&#x2014;converge on a solution with fewer bell pairs required across all the randomly generated circuits, with a general trend towards smaller improvement for an increasing number of gates. However, we again observed the same trend as explained before, a region where each configuration performed best. This effect is shown dramatically in the first point (left). This indicates that ODQC-MHA(K-L) requires further optimisation to select an optimal number of blocks. Due to resource limitations, we are not able to analyse the limit that an improvement is shown by increasing the allowed maximum number of blocks.</p>

<p>Next, we generated random circuits of 8, 16, and 32 qubits with up to 100,000 CX gates, in order to test the performance of ODQC-MHA(K-L) on larger circuits. The results are plotted in <xref ref-type="fig" rid="fig-7">Fig. 7a</xref>. We saw a similar improvement trend across different numbers of gates. Interestingly, the reduction in the number of bell pairs increases for a greater number of qubits. This could be because, for the same number of CNOT gates across more qubits, the interactions are more sparsely distributed and will likely have shorter depth.</p>
<fig id="fig-7">
<label>Figure 7</label>
<caption>
<title>(a) Comparison of ODQC-MHA(K-L) to K-L on randomly generated circuits of varying number of qubits (8, 16, 32) on randomly generated circuits, across 2 QPUs; (b) Comparison of ODQC-MHA(GPA) to GPA on randomly generated 16 qubit circuits, for different maximum allowed number of blocks (10, 50, 100)</title>
</caption>
<graphic mimetype="image" mime-subtype="tif" xlink:href="JQC_61275-fig-7.tif"/>
</fig>
</sec>
<sec id="s4_4">
<label>4.4</label>
<title>Random Quantum Circuits (3 QPUs)</title>
<p>Finally, to demonstrate the performance of ODQC-MHA using a different heuristic for graph partitioning within each block, we performed the same analysis but distributed each circuit across 3 QPUs using ODQC-MHA(GPA) within each block, presented in <xref ref-type="fig" rid="fig-7">Fig. 7b</xref>. Here we observe a similar trend as for 2 QPUs, however, the first value for maximum allowed blocks of 100 shows anomalously high improvement. We can attribute this to the fact that the smallest circuits have larger variance in the distribution of CX gates and so the genetic algorithm may get &#x2018;lucky&#x2019; on certain circuit configurations. In the future, this analysis could be extended to greater than 3 QPUs.</p>

</sec>
<sec id="s4_5">
<label>4.5</label>
<title>Discussion</title>
<p>In the limit that the number of gates is large, we would expect that the performance of the K-L algorithm at allocating the qubits across two processors would tend to the performance of randomly allocating the qubits across two processors. This would mean that half of the CNOT gates are executed remotely. However, for any finite circuit depth, the optimal solution will always be better than half of the CNOT gates. This also applies to our metaheuristic, because the genetic algorithm can always find a block length where the performance of K-L is better than half and so&#x2014;if allowed to execute properly&#x2014;will be able to exploit sections of the circuit where K-L performs well at allocating the qubits.</p>
</sec>
</sec>
<sec id="s5">
<label>5</label>
<title>Conclusion</title>
<p>Compiling circuits for DQC will be paramount to the future and scaling of quantum computers, within a &#x2018;quantum data center&#x2019;. In this letter we addressed the problem of qubit allocation within a compilation, using a meta-heuristic which optimises for the minimum number of qubit and gate teleportations. The results when comparing our method to the standard method of graph partitioning using the K-L algorithm show a significant improvement.</p>
</sec>
</body>
<back>
<ack>
<p>The authors would like to thank Luca Della Chiesa at Cisco for his valuable insights and feedback during this work.</p>
</ack>
<sec>
<title>Funding Statement</title>
<p>The authors received no specific funding for this study.</p>
</sec>
<sec>
<title>Author Contributions</title>
<p>The authors confirm contribution to the paper as follows: study conception and design: Oliver Crampton, Panagiotis Promponas, Paul Polakos, Louis Samuel; data collection: Oliver Crampton, Panagiotis Promponas, Richard Chen, Paul Polakos, Louis Samuel; analysis and interpretation of results: Oliver Crampton, Panagiotis Promponas, Richard Chen, Paul Polakos, Louis Samuel; draft manuscript preparation: Oliver Crampton, Panagiotis Promponas, Richard Chen, Leandros Tassiulas. All authors reviewed the results and approved the final version of the manuscript.</p>
</sec>
<sec sec-type="data-availability">
<title>Availability of Data and Materials</title>
<p>Data not available due to commercial restrictions.</p>
</sec>
<sec>
<title>Ethics Approval</title>
<p>Not applicable.</p>
</sec>
<sec sec-type="COI-statement">
<title>Conflicts of Interest</title>
<p>The authors declare no conflicts of interest to report regarding the present study.</p>
</sec>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Grover</surname> <given-names>LK</given-names></string-name></person-group>. <article-title>A fast quantum mechanical algorithm for database search</article-title>. In: <conf-name>Proceedings of the Twenty-Eighth Annual ACM Symposium on Theory of Computing</conf-name>; <year>1996 May 22&#x2013;24</year>; <publisher-loc>Philadelphia, PA, USA</publisher-loc>. p. <fpage>212</fpage>&#x2013;<lpage>9</lpage>. doi:<pub-id pub-id-type="doi">10.1145/237814.237866</pub-id>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Shor</surname> <given-names>PW</given-names></string-name></person-group>. <article-title>Polynomial-time algorithms for prime factorization and discrete logarithms on a quantum computer</article-title>. <source>SIAM Rev</source>. <year>1999</year>;<volume>41</volume>(<issue>2</issue>):<fpage>303</fpage>&#x2013;<lpage>32</lpage>. doi:<pub-id pub-id-type="doi">10.1137/S0036144598347011</pub-id>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Skosana</surname> <given-names>U</given-names></string-name>, <string-name><surname>Tame</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Demonstration of shor&#x2019;s factoring algorithm for n &#x003D; 21 on ibm quantum processors</article-title>. <source>Sci Rep</source>. <year>2021</year>;<volume>11</volume>(<issue>1</issue>):<fpage>16599</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s41598-021-95973-w</pub-id>; <pub-id pub-id-type="pmid">34400695</pub-id></mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Brown</surname> <given-names>KL</given-names></string-name>, <string-name><surname>Munro</surname> <given-names>WJ</given-names></string-name>, <string-name><surname>Kendon</surname> <given-names>VM</given-names></string-name></person-group>. <article-title>Using quantum computers for quantum simulation</article-title>. <source>Entropy</source>. <year>2010</year>;<volume>12</volume>(<issue>11</issue>):<fpage>2268</fpage>&#x2013;<lpage>307</lpage>. doi:<pub-id pub-id-type="doi">10.3390/e12112268</pub-id>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Daley</surname> <given-names>AJ</given-names></string-name>, <string-name><surname>Bloch</surname> <given-names>I</given-names></string-name>, <string-name><surname>Kokail</surname> <given-names>C</given-names></string-name>, <string-name><surname>Flannigan</surname> <given-names>S</given-names></string-name>, <string-name><surname>Pearson</surname> <given-names>N</given-names></string-name>, <string-name><surname>Troyer</surname> <given-names>M</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Practical quantum advantage in quantum simulation</article-title>. <source>Nature</source>. <year>2022</year>;<volume>607</volume>(<issue>7920</issue>):<fpage>667</fpage>&#x2013;<lpage>76</lpage>. doi:<pub-id pub-id-type="doi">10.1038/s41586-022-04940-6</pub-id>; <pub-id pub-id-type="pmid">35896643</pub-id></mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Van Meter</surname> <given-names>R</given-names></string-name>, <string-name><surname>Devitt</surname> <given-names>SJ</given-names></string-name></person-group>. <article-title>The path to scalable distributed quantum computing</article-title>. <source>Computer</source>. <year>2016</year>;<volume>49</volume>(<issue>9</issue>):<fpage>31</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1109/MC.2016.291</pub-id>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Nielsen</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Chuang</surname> <given-names>IL</given-names></string-name></person-group>. <source>Quantum computation and quantum information</source>. <publisher-loc>Cambridge, UK</publisher-loc>: <publisher-name>Cambridge University Press</publisher-name>; <year>2001</year>. doi:<pub-id pub-id-type="doi">10.1017/CBO9780511976667</pub-id>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Deutsch</surname> <given-names>D</given-names></string-name></person-group>. <source>Quantum theory, the Church&#x2013;Turing principle and the universal quantum computer</source>. <source>Proc R Soc Lond. A Math Phys Sci</source>. <year>1985</year>;<volume>400</volume>(<issue>1818</issue>):<fpage>97</fpage>&#x2013;<lpage>117</lpage>. doi:<pub-id pub-id-type="doi">10.1098/rspa.1985.0070</pub-id>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Barenco</surname> <given-names>A</given-names></string-name>, <string-name><surname>Bennett</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Cleve</surname> <given-names>R</given-names></string-name>, <string-name><surname>DiVincenzo</surname> <given-names>DP</given-names></string-name>, <string-name><surname>Margolus</surname> <given-names>N</given-names></string-name>, <string-name><surname>Shor</surname> <given-names>P</given-names></string-name>, <etal>et al</etal></person-group>. <article-title>Elementary gates for quantum computation</article-title>. <source>Phys Rev A</source>. <year>1995</year>;<volume>52</volume>(<issue>5</issue>):<fpage>3457</fpage>. doi:<pub-id pub-id-type="doi">10.1103/PhysRevA.52.3457</pub-id>; <pub-id pub-id-type="pmid">9912645</pub-id></mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Einstein</surname> <given-names>A</given-names></string-name>, <string-name><surname>Podolsky</surname> <given-names>B</given-names></string-name>, <string-name><surname>Rosen</surname> <given-names>N</given-names></string-name></person-group>. <article-title>Can quantum-mechanical description of physical reality be considered complete?</article-title> <source>Phys Rev</source>. <year>1935</year>;<volume>47</volume>(<issue>10</issue>):<fpage>777</fpage>. doi:<pub-id pub-id-type="doi">10.1103/PhysRev.47.777</pub-id>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Bennett</surname> <given-names>CH</given-names></string-name>, <string-name><surname>Brassard</surname> <given-names>G</given-names></string-name>, <string-name><surname>Crepeau</surname> <given-names>C</given-names></string-name>, <string-name><surname>Jozsa</surname> <given-names>R</given-names></string-name>, <string-name><surname>Peres</surname> <given-names>A</given-names></string-name>, <string-name><surname>Wootters</surname> <given-names>WK</given-names></string-name></person-group>. <article-title>Teleporting an unknown quantum state via dual classical and einstein-podolsky-rosen channels</article-title>. <source>Phys Rev Lett</source>. <year>1993</year>;<volume>70</volume>(<issue>13</issue>):<fpage>1895</fpage>. doi:<pub-id pub-id-type="doi">10.1103/PhysRevLett.70.1895</pub-id>; <pub-id pub-id-type="pmid">10053414</pub-id></mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Botea</surname> <given-names>A</given-names></string-name>, <string-name><surname>Kishimoto</surname> <given-names>A</given-names></string-name>, <string-name><surname>Marinescu</surname> <given-names>R</given-names></string-name></person-group>. <article-title>On the complexity of quantum circuit compilation</article-title>. In: <conf-name>Proceedings of the International Symposium on Combinatorial Search</conf-name>; <year>2018 Jul 14&#x2013;15</year>; <publisher-loc>Stockholm, Sweden</publisher-loc>. Vol. <volume>9</volume>, p. <fpage>138</fpage>&#x2013;<lpage>42</lpage>. doi:<pub-id pub-id-type="doi">10.1609/socs.v9i1.18463</pub-id>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Moro</surname> <given-names>L</given-names></string-name>, <string-name><surname>Paris</surname> <given-names>MG</given-names></string-name>, <string-name><surname>Restelli</surname> <given-names>M</given-names></string-name>, <string-name><surname>Prati</surname> <given-names>E</given-names></string-name></person-group>. <article-title>Quantum compiling by deep reinforcement learning</article-title>. <source>Commun Phys</source>. <year>2021</year>;<volume>4</volume>(<issue>1</issue>):<fpage>178</fpage>. doi:<pub-id pub-id-type="doi">10.1038/s42005-021-00684-3</pub-id>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Zhu</surname> <given-names>P</given-names></string-name>, <string-name><surname>Cheng</surname> <given-names>X</given-names></string-name>, <string-name><surname>Guan</surname> <given-names>Z</given-names></string-name></person-group>. <article-title>An exact qubit allocation approach for NISQ architectures</article-title>. <source>Quantum Inf Process</source>. <year>2020</year>;<volume>19</volume>(<issue>11</issue>):<fpage>391</fpage>. doi:<pub-id pub-id-type="doi">10.1007/s11128-020-02901-4</pub-id>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Houshmand</surname> <given-names>M</given-names></string-name>, <string-name><surname>Mohammadi</surname> <given-names>Z</given-names></string-name>, <string-name><surname>Zomorodi-Moghadam</surname> <given-names>M</given-names></string-name>, <string-name><surname>Houshmand</surname> <given-names>M</given-names></string-name></person-group>. <article-title>An evolutionary approach to optimizing teleportation cost in distributed quantum computation</article-title>. <source>Int J Theor Phys</source>. <year>2020</year>;<volume>59</volume>(<issue>4</issue>):<fpage>1315</fpage>&#x2013;<lpage>29</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10773-020-04409-0</pub-id>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Daei</surname> <given-names>O</given-names></string-name>, <string-name><surname>Navi</surname> <given-names>K</given-names></string-name>, <string-name><surname>Zomorodi-Moghadam</surname> <given-names>M</given-names></string-name></person-group>. <article-title>Optimized quantum circuit partitioning</article-title>. <source>Int J Theor Phys</source>. <year>2020</year>;<volume>59</volume>(<issue>12</issue>):<fpage>3804</fpage>&#x2013;<lpage>20</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s10773-020-04633-8</pub-id>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Mao</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Liu</surname> <given-names>Y</given-names></string-name>, <string-name><surname>Yang</surname> <given-names>Y</given-names></string-name></person-group>. <article-title>Qubit allocation for distributed quantum computing</article-title>. In: <conf-name>IEEE INFOCOM 2023-IEEE Conference on Computer Communications</conf-name>; <year>2023 May 17&#x2013;20</year>; <publisher-loc>New York City, NY, USA</publisher-loc>. doi:<pub-id pub-id-type="doi">10.1109/INFOCOM53939.2023.10228915</pub-id>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Finigan</surname> <given-names>W</given-names></string-name>, <string-name><surname>Cubeddu</surname> <given-names>M</given-names></string-name>, <string-name><surname>Lively</surname> <given-names>T</given-names></string-name>, <string-name><surname>Flick</surname> <given-names>J</given-names></string-name>, <string-name><surname>Narang</surname> <given-names>P</given-names></string-name></person-group>. <article-title>Qubit allocation for noisy intermediate-scale quantum computers</article-title>. <comment>arXiv:1810.08291. 2018</comment>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.1810.08291</pub-id>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Ash-Saki</surname> <given-names>A</given-names></string-name>, <string-name><surname>Alam</surname> <given-names>M</given-names></string-name>, <string-name><surname>Ghosh</surname> <given-names>S</given-names></string-name></person-group>. <article-title>Qure: Qubit re-allocation in noisy intermediate-scale quantum computers</article-title>. In: <conf-name>Proceedings of the 56th Annual Design Automation Conference 2019</conf-name>; <year>2019 Jun 2&#x2013;6</year>; <publisher-loc>Las Vegas, NV, USA</publisher-loc>. p. <fpage>1</fpage>&#x2013;<lpage>6</lpage>. doi:<pub-id pub-id-type="doi">10.1145/3316781.3317888</pub-id>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Davis</surname> <given-names>MG</given-names></string-name>, <string-name><surname>Chung</surname> <given-names>J</given-names></string-name>, <string-name><surname>Englund</surname> <given-names>D</given-names></string-name>, <string-name><surname>Kettimuthu</surname> <given-names>R</given-names></string-name></person-group>. <article-title>Towards distributed quantum computing by qubit and gate graph partitioning techniques</article-title>. <comment>arXiv:2310.03942. 2023</comment>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Bandic</surname> <given-names>M</given-names></string-name>, <string-name><surname>Prielinger</surname> <given-names>L</given-names></string-name>, <string-name><surname>Nu&#x00DF;lein</surname> <given-names>J</given-names></string-name>, <string-name><surname>Ovide</surname> <given-names>A</given-names></string-name>, <string-name><surname>Rodrigo</surname> <given-names>S</given-names></string-name>, <string-name><surname>Abadal</surname> <given-names>S</given-names></string-name>, <etal>et al.</etal></person-group> <article-title>Mapping quantum circuits to modular architectures with qubo</article-title>. <comment>arXiv:2305.06687. 2023</comment>. doi:<pub-id pub-id-type="doi">10.1109/QCE57702.2023.00094</pub-id>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Nikahd</surname> <given-names>E</given-names></string-name>, <string-name><surname>Mohammadzadeh</surname> <given-names>N</given-names></string-name>, <string-name><surname>Sedighi</surname> <given-names>M</given-names></string-name>, <string-name><surname>Zamani</surname> <given-names>MS</given-names></string-name></person-group>. <article-title>Automated window-based partitioning of quantum circuits</article-title>. <source>Phys Scr</source>. <year>2021</year>;<volume>96</volume>(<issue>3</issue>):<fpage>035102</fpage>. doi:<pub-id pub-id-type="doi">10.1088/1402-4896/abd57c</pub-id>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Kernighan</surname> <given-names>BW</given-names></string-name>, <string-name><surname>Lin</surname> <given-names>S</given-names></string-name></person-group>. <article-title>An efficient heuristic procedure for partitioning graphs</article-title>. <source>Bell Syst Tech J</source>. <year>1970</year>;<volume>49</volume>(<issue>2</issue>):<fpage>291</fpage>&#x2013;<lpage>307</lpage>. doi:<pub-id pub-id-type="doi">10.1002/j.1538-7305.1970.tb01770.x</pub-id>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Fiduccia</surname> <given-names>CM</given-names></string-name>, <string-name><surname>Mattheyses</surname> <given-names>RM</given-names></string-name></person-group>. <chapter-title>A linear-time heuristic for improving network partitions</chapter-title>. In: <source>Papers on twenty-five years of electronic design automation</source>. <publisher-loc>New York, NY, USA</publisher-loc>: <publisher-name>Association for Computing Machinery</publisher-name>; <year>1988</year>. p. <fpage>241</fpage>&#x2013;<lpage>7</lpage>. doi:<pub-id pub-id-type="doi">10.1145/62882.62910</pub-id>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>Karypis</surname> <given-names>G</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>V</given-names></string-name></person-group>. <article-title>Multilevel k-way hypergraph partitioning</article-title>. In: <conf-name>Proceedings of the 36th Annual ACM/IEEE Design Automation Conference</conf-name>; <year>1999 Jun 21&#x2013;25</year>; <publisher-loc>New Orleans, LA, USA</publisher-loc>. p. <fpage>343</fpage>&#x2013;<lpage>8</lpage>. doi:<pub-id pub-id-type="doi">10.1145/309847.309954</pub-id>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Childs</surname> <given-names>AM</given-names></string-name>, <string-name><surname>Schoute</surname> <given-names>E</given-names></string-name>, <string-name><surname>Unsal</surname> <given-names>CM</given-names></string-name></person-group>. <article-title>Circuit transformations for quantum architectures</article-title>. <comment>arXiv:1902.09102. 2019</comment>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.1902.09102</pub-id>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Abelem</surname> <given-names>A</given-names></string-name>, <string-name><surname>Towsley</surname> <given-names>D</given-names></string-name>, <string-name><surname>Vardoyan</surname> <given-names>G</given-names></string-name></person-group>. <article-title>Quantum internet: the future of internetworking</article-title>. <comment>arXiv:2305.00598. 2023</comment>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.2305.00598</pub-id>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><surname>Katoch</surname> <given-names>S</given-names></string-name>, <string-name><surname>Chauhan</surname> <given-names>SS</given-names></string-name>, <string-name><surname>Kumar</surname> <given-names>V</given-names></string-name></person-group>. <article-title>A review on genetic algorithm: past, present, and future</article-title>. <source>Multimed Tools Appl</source>. <year>2021</year>;<volume>80</volume>(<issue>5</issue>):<fpage>8091</fpage>&#x2013;<lpage>126</lpage>. doi:<pub-id pub-id-type="doi">10.1007/s11042-020-10139-6</pub-id>; <pub-id pub-id-type="pmid">33162782</pub-id></mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Schrijver</surname> <given-names>A</given-names></string-name></person-group>. <source>Combinatorial optimization: polyhedra and efficiency</source>. <publisher-loc>Berlin/Heidelberg, Germany</publisher-loc>: <publisher-name>Springer</publisher-name>; <year>2003</year>. Vol. 24.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><surname>De Rainville</surname> <given-names>FM</given-names></string-name>, <string-name><surname>Fortin</surname> <given-names>FA</given-names></string-name>, <string-name><surname>Gardner</surname> <given-names>MA</given-names></string-name>, <string-name><surname>Parizeau</surname> <given-names>M</given-names></string-name>, <string-name><surname>Gagne</surname> <given-names>C</given-names></string-name></person-group>. <article-title>DEAP: a python framework for evolutionary algorithms</article-title>. In: <conf-name>Proceedings of the 14th Annual Conference Companion on Genetic and Evolutionary Computation</conf-name>; <year>2012 Jul 7&#x2013;11</year>; <publisher-loc>Philadelphia, PA, USA</publisher-loc>. p. <fpage>85</fpage>&#x2013;<lpage>92</lpage>. doi:<pub-id pub-id-type="doi">10.1145/2330784.2330799</pub-id>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><string-name><surname>Cross</surname> <given-names>AW</given-names></string-name>, <string-name><surname>Bishop</surname> <given-names>LS</given-names></string-name>, <string-name><surname>Smolin</surname> <given-names>JA</given-names></string-name>, <string-name><surname>Gambetta</surname> <given-names>JM</given-names></string-name></person-group>. <article-title>Open quantum assembly language</article-title>. <comment>arXiv:1707.03429. 2017</comment>. doi:<pub-id pub-id-type="doi">10.48550/arXiv.1707.03429</pub-id>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><surname>Hagberg</surname> <given-names>A</given-names></string-name>, <string-name><surname>Swart</surname> <given-names>P</given-names></string-name>, <string-name><surname>Chult</surname> <given-names>DS</given-names></string-name></person-group>. <source>Exploring network structure, dynamics, and function using network</source>. <publisher-loc>Los Alamos, NM, USA</publisher-loc>: <publisher-name>Los Alamos National Lab. (LANL)</publisher-name>; <year>2008</year>. doi:<pub-id pub-id-type="doi">10.25080/TCWV9851</pub-id>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="other"><person-group person-group-type="author"><collab>PNNL</collab></person-group>. <article-title>QASMBench: a low-level OpenQASM benchmark suite for NISQ evaluation and simulation. Please see our paper for details [Internet]. [cited 2025 Jan 1]</article-title>. Available from: <ext-link ext-link-type="uri" xlink:href="https://github.com/pnnl/QASMBench">https://github.com/pnnl/QASMBench</ext-link>.</mixed-citation></ref>
</ref-list>
</back></article>