<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Publishing DTD v1.1 20151215//EN" "http://jats.nlm.nih.gov/publishing/1.1/JATS-journalpublishing1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" article-type="research-article" dtd-version="1.1">
<front>
<journal-meta>
<journal-id journal-id-type="pmc">CMC</journal-id>
<journal-id journal-id-type="nlm-ta">CMC</journal-id>
<journal-id journal-id-type="publisher-id">CMC</journal-id>
<journal-title-group>
<journal-title>Computers, Materials &#x0026; Continua</journal-title>
</journal-title-group>
<issn pub-type="epub">1546-2226</issn>
<issn pub-type="ppub">1546-2218</issn>
<publisher>
<publisher-name>Tech Science Press</publisher-name>
<publisher-loc>USA</publisher-loc>
</publisher>
</journal-meta>
<article-meta>
<article-id pub-id-type="publisher-id">32849</article-id>
<article-id pub-id-type="doi">10.32604/cmc.2023.032849</article-id>
<article-categories>
<subj-group subj-group-type="heading">
<subject>Article</subject>
</subj-group>
</article-categories>
<title-group>
<article-title>A Coprocessor Architecture for 80/112-bit Security Related Applications</article-title>
<alt-title alt-title-type="left-running-head">A Coprocessor Architecture for 80/112-bit Security Related Applications</alt-title>
<alt-title alt-title-type="right-running-head">A Coprocessor Architecture for 80/112-bit Security Related Applications</alt-title>
</title-group>
<contrib-group content-type="authors">
<contrib id="author-1" contrib-type="author" corresp="yes">
<name name-style="western"><surname>Rashid</surname><given-names>Muhammad</given-names></name><email>mfelahi@uqu.edu.sa</email></contrib>
<contrib id="author-2" contrib-type="author">
<name name-style="western"><surname>Alotaibi</surname><given-names>Majid</given-names></name></contrib>
<aff id="aff-1"><label>1</label><institution>Department of Computer Engineering, Umm Al-Qura University</institution>, <addr-line>Makkah, 21955</addr-line>, <country>Saudi Arabia</country></aff>
</contrib-group>
<author-notes>
<corresp id="cor1"><label>&#x002A;</label>Corresponding Author: Muhammad Rashid. Email: <email>mfelahi@uqu.edu.sa</email></corresp>
</author-notes>
<pub-date publication-format="print" date-type="pub" iso-8601-date="2022-12-15"><day>15</day>
<month>12</month>
<year>2022</year></pub-date>
<volume>74</volume>
<issue>3</issue>
<fpage>6849</fpage>
<lpage>6865</lpage>
<history>
<date date-type="received"><day>31</day><month>5</month><year>2022</year></date>
<date date-type="accepted"><day>09</day><month>9</month><year>2022</year></date>
</history>
<permissions>
<copyright-statement>&#x00A9; 2023 Rashid and Alotaibi</copyright-statement>
<copyright-year>2023</copyright-year>
<copyright-holder>Rashid and Alotaibi</copyright-holder>
<license xlink:href="https://creativecommons.org/licenses/by/4.0/">
<license-p>This work is licensed under a <ext-link ext-link-type="uri" xlink:type="simple" xlink:href="https://creativecommons.org/licenses/by/4.0/">Creative Commons Attribution 4.0 International License</ext-link>, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</license-p>
</license>
</permissions>
<self-uri content-type="pdf" xlink:href="TSP_CMC_32849.pdf"></self-uri>
<abstract>
<p>We have proposed a flexible coprocessor key-authentication architecture for 80/112-bit security-related applications over <inline-formula id="ieqn-1"><mml:math id="mml-ieqn-1"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field by employing Elliptic-curve Diffie Hellman (ECDH) protocol. Towards flexibility, a serial input/output interface is used to load/produce secret, public, and shared keys sequentially. Moreover, to reduce the hardware resources and to achieve a reasonable time for cryptographic computations, we have proposed a finite field digit-serial multiplier architecture using combined shift and accumulate techniques. Furthermore, two finite-state-machine controllers are used to perform efficient control functionalities. The proposed coprocessor architecture over <inline-formula id="ieqn-2"><mml:math id="mml-ieqn-2"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-3"><mml:math id="mml-ieqn-3"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is programmed using Verilog and then implemented on Xilinx Virtex-7 FPGA (field-programmable-gate-array) device. For <inline-formula id="ieqn-4"><mml:math id="mml-ieqn-4"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-5"><mml:math id="mml-ieqn-5"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the proposed flexible coprocessor use 1351 and 1789 slices, the achieved clock frequency is 250 and 235&#x2005;<italic>MHz</italic>, time for one public key computation is 40.50 and 79.20&#x2005;&#x03BC;s and time for one shared key generation is 81.00 and 158.40&#x2005;&#x03BC;s. Similarly, the consumed power over <inline-formula id="ieqn-6"><mml:math id="mml-ieqn-6"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-7"><mml:math id="mml-ieqn-7"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 0.91 and 1.37&#x2005;<italic>mW</italic>, respectively. The proposed coprocessor architecture outperforms state-of-the-art ECDH designs in terms of hardware resources.</p>
</abstract>
<kwd-group kwd-group-type="author">
<kwd>Coprocessor</kwd>
<kwd>design</kwd>
<kwd>key-authentication</kwd>
<kwd>wireless sensor nodes</kwd>
<kwd>RFID</kwd>
<kwd>ECDH</kwd>
<kwd>FPGA</kwd>
</kwd-group>
</article-meta>
</front>
<body>
<sec id="s1"><label>1</label><title>Introduction</title>
<p>Due to the exponential growth in technology, millions of users want to interact with the internet through IoT devices, and the requirement for this enormous connectivity raises security threats [<xref ref-type="bibr" rid="ref-1">1</xref>&#x2013;<xref ref-type="bibr" rid="ref-3">3</xref>]. Therefore, several security services can be achieved either by employing symmetric or asymmetric (or public-key) cryptographic algorithms. Comparatively, the latter offers more increased security as two distinct keys are involved in cryptographic computation(s) [<xref ref-type="bibr" rid="ref-2">2</xref>]. On the other hand, a single key is needed in the case of symmetric algorithms/protocols. Moreover, each cryptographic algorithm (either related to symmetric or public-key) contains different messages and key lengths for a certain level of security achievement [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. For 80-bit symmetric-key security achievement, Rivest-Shamir-Adleman (RSA) and Elliptic Curve Cryptography (ECC) require 1024-bit and 160-bit key lengths [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>]. Similarly, for identical security to 112-bits, the RSA and ECC require 2048-bit and 224-bit key lengths. For security equivalent to AES-128, the RSA and ECC need 3072-bit and 256-bit lengths. Consequently, for a similar security level, ECC is an attractive option as it offers several additional benefits in terms of lower bandwidth, lower computational/processing efforts, lower power consumption, and lower area cost [<xref ref-type="bibr" rid="ref-6">6</xref>].</p>
<p>The ECC contains a four-layer model. The uppermost layer, known as the protocol layer, determines the execution of (i) encryption/decryption, (ii) signature-generation/verification, (iii) key-authentication, etc. For the computation of these operations, the most frequently used protocols are Elliptic-curve Diffie Hellman (ECDH) [<xref ref-type="bibr" rid="ref-7">7</xref>], Elliptic-curve Digital Signature Algorithm (ECDSA) [<xref ref-type="bibr" rid="ref-8">8</xref>] and Elliptic-curve Menezes Qu&#x2013;Vanstone (ECMQV) [<xref ref-type="bibr" rid="ref-9">9</xref>]. The ECMQV, ECDSA and ECDH protocols are responsible to compute encryption/decryption, signature-generation/verification and key-authentication, respectively. To implement these protocols (ECDSA, ECDH and ECMQV), point multiplication (PM) is essential to execute (third layer of ECC model). Moreover, in Elliptic curves, the PM is the considerable computationally intensive operation [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-10">10</xref>&#x2013;<xref ref-type="bibr" rid="ref-13">13</xref>]. The implementation of PM depends on the computation of layer two operations, i.e., point addition (PA), and doubling (PD). These operations (PA and PD) depend on layer one. The corresponding layer one operations are finite field (FF) addition, multiplication, squaring, inversion and reduction.</p>
<p>In addition to the ECC layer model, the prime, i.e., <inline-formula id="ieqn-8"><mml:math id="mml-ieqn-8"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, and binary, i.e., <inline-formula id="ieqn-9"><mml:math id="mml-ieqn-9"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, fields are available choices for implementations, where <italic>m</italic> shows the field size or supported-key length. Comparatively, the prime field is more appealing for software implementations (e.g., on microcontrollers) while binary fields have a preference due to its accelerations on hardware platforms such as field-programmable-gate-array (FPGA) and application-specific-integrated-circuits (ASICs) [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>&#x2013;<xref ref-type="bibr" rid="ref-15">15</xref>]. Due to reconfigurability, ease of availability in the market, low development cost, etc, we have selected the <inline-formula id="ieqn-10"><mml:math id="mml-ieqn-10"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field for implementations on FPGA in this work.</p>
<p>Over <inline-formula id="ieqn-11"><mml:math id="mml-ieqn-11"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field, the National Institute of Standards and Technology (NIST) [<xref ref-type="bibr" rid="ref-16">16</xref>] has defined various key lengths, i.e., 163, 233, 283, 409, and 571, for implementations. The NIST is an American organization that is responsible for standardized new cryptographic primitives to ensure secure communications. The 163 and 233-bit key lengths are sufficient to secure applications that require an 80 or 112-bit security [<xref ref-type="bibr" rid="ref-4">4</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>]. Therefore, the objective of this work is to protect the cryptographic applications that require 80/112-bit security by designing and implementing an Elliptic-curve processor for key-authentication using ECDH over <inline-formula id="ieqn-12"><mml:math id="mml-ieqn-12"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with <inline-formula id="ieqn-13"><mml:math id="mml-ieqn-13"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>163</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-14"><mml:math id="mml-ieqn-14"><mml:mn>233</mml:mn></mml:math></inline-formula> on FPGA.</p>
<p>Several applications demand higher security. One of the examples includes the fourth industrial revolution (also named industry 4.0) which brings rapid growth in technology, industries and societal patterns due to the demand for increasing interconnectivity of several devices over the unsecured internet. Moreover, industry 4.0 emphasizes the notion of automation of numerous applications to facilitate human daily life [<xref ref-type="bibr" rid="ref-14">14</xref>]. More specifically, in the case of digitalization, automation requires higher security, e.g., key authentication or key agreement. For example, for radio-frequency-identification-network (RFID) applications, key authentication is essential when scanning the bar codes on different products in shopping malls [<xref ref-type="bibr" rid="ref-18">18</xref>&#x2013;<xref ref-type="bibr" rid="ref-22">22</xref>]. Automotive mobile vehicles are another application where authentication is critical to start secure communication [<xref ref-type="bibr" rid="ref-23">23</xref>]. Generally, these include intra or inter-mobile communications with several devices, e.g., vehicle-to-phone, vehicle-to-vehicle, phone-to-phone, etc. The term intra determines the wired/wireless communication inside the sensing network while inter means the communication with embedded devices outside the sensing network. We have provided intra-mobile connectivity of the several devices in <xref ref-type="fig" rid="fig-1">Fig. 1</xref> where the Node1, Node2 and Node3 are the wireless sensor nodes (WSN) that determine the connectivity of several embedded devices with the gateway.</p>
<fig id="fig-1"><label>Figure 1</label><caption><title>Intra-mobile connectivity of several devices [<xref ref-type="bibr" rid="ref-23">23</xref>]</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_32849-fig-1.png"/></fig>
<p>To achieve higher security hardware-based implementations are more suitable when compared to software-based implementations. Therefore, an ECC design is described in [<xref ref-type="bibr" rid="ref-17">17</xref>] where an FPGA-based sensor node has been presented. They have targeted prime and binary fields with supported key lengths of 192 and 163. Moreover, their design is compliant with the IEEE802.15.4 standard. To reduce the hardware resources, they have reused the embedded resources of the utilized FPGA, i.e., Xilinx Artix-7.</p>
<p>Some ASIC and FPGA designs of ECC for RFID applications are described in [<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>]. In [<xref ref-type="bibr" rid="ref-18">18</xref>], an efficient architecture of ECC over <inline-formula id="ieqn-15"><mml:math id="mml-ieqn-15"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for RFID applications is discussed. The synthesis results are reported for UMC 0.13&#x2005;&#x03BC;m Complementary Metal Oxide Semiconductor (CMOS) technology. Different optimization techniques have been used for different purposes: (i) a new finite field inversion method is adopted with an intent to minimize the hardware resources, (ii) a technique for coordinate changing is discussed to minimize the complexity and decrease the computational time, (iii) a shift register design is used to minimize the area of employed register files, and (iv) the clock gating is used to reduce the power consumption. Recently in [<xref ref-type="bibr" rid="ref-21">21</xref>], an ECC-based processor over <inline-formula id="ieqn-16"><mml:math id="mml-ieqn-16"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for RFID applications with aid to acquire low latency and the low area is presented. Additionally, flexibility is the beauty of their design. They have used three shift buffers to serially load the input parameters for two purposes: initially for acquiring low latency and then for flexibility. Moreover, the area is further optimized by reusing the hardware in inversion computation. The synthesis results are reported on various 7-series FPGA devices.</p>
<p>The ECC-based hardware accelerators specific to wireless sensor nodes on ASIC and FPGA platforms are described in [<xref ref-type="bibr" rid="ref-24">24</xref>&#x2013;<xref ref-type="bibr" rid="ref-28">28</xref>]. In [<xref ref-type="bibr" rid="ref-24">24</xref>], a new ECC-based protocol followed with a coprocessor hardware design for key distribution in wireless sensor nodes is presented over <inline-formula id="ieqn-17"><mml:math id="mml-ieqn-17"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Moreover, an 8-bit serial interface is discussed to load/collect the inputs/outputs to/from the coprocessor design. On Spartan-6 FPGA device, their coprocessor architecture takes 33.6&#x2005;&#x03BC;s for one PM computation running on 33.3&#x2005;<italic>MHz</italic> frequency. Similarly, a flexible design for several NIST recommended curves (substituting the reduction unit) is proposed in [<xref ref-type="bibr" rid="ref-25">25</xref>]. This partial reconfiguration determines the flexibility of their design and is accomplished on a Spartan-3 FPGA device over <inline-formula id="ieqn-18"><mml:math id="mml-ieqn-18"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-19"><mml:math id="mml-ieqn-19"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>571</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> fields. They have connected standard motes with the FPGA for visualization purposes while performing the actual cryptographic computations on the standard motes. An ECC-based integrated hardware architecture for wireless sensor nodes is presented in [<xref ref-type="bibr" rid="ref-26">26</xref>] over <inline-formula id="ieqn-20"><mml:math id="mml-ieqn-20"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on Kintex-7 FPGA. Apart from the use of a Secure Hash Algorithm (SHA) or Advance Encryption Standard (AES) for authentication purposes, their design implements an Elliptic-curve based message authentication code (MAC) for efficient reuse of FPGA resources. Similarly, a PM implementation of ECC over <inline-formula id="ieqn-21"><mml:math id="mml-ieqn-21"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>112</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-22"><mml:math id="mml-ieqn-22"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on different FPGA devices is provided in [<xref ref-type="bibr" rid="ref-27">27</xref>] where a Montgomery PM algorithm is employed for securing WSN. Recently in [<xref ref-type="bibr" rid="ref-28">28</xref>], an Ed25519 (Edwards curve a specialized form of Elliptic curves) curve is utilized to implement the ECDH operation for secure key-agreement over <inline-formula id="ieqn-23"><mml:math id="mml-ieqn-23"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mi>P</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with <inline-formula id="ieqn-24"><mml:math id="mml-ieqn-24"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mn>160</mml:mn></mml:math></inline-formula> on two distinct nodes of MoTE-ECC.</p>
<p>The most recent ECC published designs for securing several other cryptographic applications are described in [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-29">29</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>]. In [<xref ref-type="bibr" rid="ref-11">11</xref>], a two-stage pipelined design is reported over <inline-formula id="ieqn-25"><mml:math id="mml-ieqn-25"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-26"><mml:math id="mml-ieqn-26"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for PM execution to secure cryptographic applications that require an optimal throughput and low-area for implementations such as smart cards, etc. Here, the optimal throughput determines the execution of the cryptographic operation in a reasonable time. The pipeline registers are employed to shorten the critical path of their design and ultimately improve the operational frequency which results in lower computational time. A reduced-area ECC design using the Lopez Dahab algorithm over <inline-formula id="ieqn-27"><mml:math id="mml-ieqn-27"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on various FPGA devices is described in [<xref ref-type="bibr" rid="ref-13">13</xref>]. Recently, in [<xref ref-type="bibr" rid="ref-29">29</xref>], a Number Theoretic Transform (NTT) is utilized to enhance the performance of the PM process. A highly efficient design for 8-bit AVR-based sensor nodes is presented in [<xref ref-type="bibr" rid="ref-30">30</xref>].</p>
<p>The hardware accelerators of ECC are specifically concentrating on the hardware resource optimizations and decreasing the power consumption for wireless sensor nodes and RFID applications [<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-18">18</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-24">24</xref>&#x2013;<xref ref-type="bibr" rid="ref-28">28</xref>]. A schoolbook multiplication method is frequently employed in the literature as it reduces the hardware resources and achieves lower power consumption. With minimum hardware resources and low power consumptions, the computational time (latency or throughput) is also important to exchange the cryptographic keys in a reasonable time. It is essential to provide that the performance of polynomial multiplier determines the performance of the entire ECDH protocol as it requires frequent polynomial multiplications for computation. In literature, the most commonly used Karatsuba multiplier, as employed in [<xref ref-type="bibr" rid="ref-13">13</xref>], results in higher resources and is not feasible for wireless sensor nodes and RFID applications. The schoolbook multiplication method of [<xref ref-type="bibr" rid="ref-21">21</xref>] is expensive in terms of computational time. Therefore, an optimal multiplier is needed to achieve the low-area and high-performance footprints for meeting standards for wireless sensor nodes and RFID-related applications. Consequently, to address these issues our contributions are as follows:
<list list-type="simple">
<list-item><label>&#x02022;</label><p><bold>Coprocessor architecture:</bold> We have proposed a key-authentication coprocessor architecture for 80/112-bit security-related applications over <inline-formula id="ieqn-28"><mml:math id="mml-ieqn-28"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-29"><mml:math id="mml-ieqn-29"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> using the ECDH protocol.</p></list-item>
<list-item><label>&#x02022;</label><p><bold>Flexibility:</bold> In our proposed coprocessor architecture, the flexibility is offered using a serial interface by placing input/output buffers to load/produce <italic>x</italic> and <italic>y</italic> coordinates of secret, public, and shared keys sequentially (bit-by-bit).</p></list-item>
<list-item><label>&#x02022;</label><p><bold>Polynomial multiplication architecture:</bold> To reduce the hardware resources and to achieve a reasonable time for cryptographic computations, we have proposed a finite field digit-serial multiplier architecture over <inline-formula id="ieqn-30"><mml:math id="mml-ieqn-30"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field using a shift and accumulate technique. Our digit-serial multiplication operates on a digit length of 24-bits.</p></list-item>
<list-item><label>&#x02022;</label><p><bold>Control blocks:</bold> Finally, two finite-state-machines (FSM) are implemented to efficiently compute the public and shared keys.</p></list-item>
</list></p>
<p>The proposed coprocessor architecture is programmed using Verilog and then implemented on Xilinx Virtex-7 FPGA. Over <inline-formula id="ieqn-31"><mml:math id="mml-ieqn-31"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-32"><mml:math id="mml-ieqn-32"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the proposed coprocessor architecture use 1351 and 1789 slices and the maximum operational frequency is 250 and 235&#x2005;MHz. Similarly, the time required to compute one public key is 40.50 and 79.20&#x2005;&#x03BC;s and time for one shared key generation is 81.00 and 158.40&#x2005;&#x03BC;s. The power consumption of our architecture over <inline-formula id="ieqn-33"><mml:math id="mml-ieqn-33"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-34"><mml:math id="mml-ieqn-34"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 0.91 and 1.37&#x2005;<italic>mW</italic>, respectively. The achieved results show that the proposed architecture is suitable to secure the applications that require 80/112-bits protection.</p>
<p>The rest of this article is organized as: Section 2 presents the relevant background. The proposed coprocessor architecture is presented in Section 3. The achieved results and performance comparison are discussed in Section 4. The article is concluded in Section 5.</p>
</sec>
<sec id="s2"><label>2</label><title>Related Background</title>
<p>This section describes the essential mathematical background required for the computation of operations of ECC.</p>
<p><underline>Key-authentication protocol (ECDH):</underline> As discussed earlier in Section 1, the ECDH protocol (associated with the uppermost layer model of ECC) is required to perform key agreement or key-authentication between two sensor nodes. Let us make an example to describe the key agreement mechanism of the ECDH protocol. We have shown three nodes (Node1, Node2 and Node3) in <xref ref-type="fig" rid="fig-1">Fig. 1</xref>. If Node1 wants to communicate with Node2 then the ECDH steps include: (i) Node1 and Node2 use the same ECC configurations to prompt the required setup, (ii) computation of PM at each node (Node1 and Node2) for public key generation, (iii) exchange of generated public keys between two nodes (Node1 and Node2 in this demonstration example), and (iv) computation of PM on Node1 and Node2 for shared key generation. For mathematical structures and additional descriptions of ECDH protocol, we refer readers/designers to follow [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-14">14</xref>].</p>
<p><underline>Point multiplication over <inline-formula id="ieqn-35"><mml:math id="mml-ieqn-35"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>:</underline> It is important to highlight that each layer in ECC requires different algorithms for implementation. Therefore, the addition of <italic>k</italic> copies of PA and PD determines the PM calculation where <italic>k</italic> shows the key length. Several PM algorithms are available in the literature. According to [<xref ref-type="bibr" rid="ref-14">14</xref>,<xref ref-type="bibr" rid="ref-15">15</xref>,<xref ref-type="bibr" rid="ref-28">28</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>,<xref ref-type="bibr" rid="ref-31">31</xref>], the Double and Add algorithm is more convenient for unified models of ECC, e.g., Edwards, Huff, Twisted Edwards, etc. The Lopez Dahab PM algorithm is an attractive choice for achieving instruction-level parallelism for performance improvement. The similar finite field operations for computation of PA and PD make the Montgomery PM algorithm suitable for side-channel resistant implementation of ECC. A comparison over various PM algorithms is presented in [<xref ref-type="bibr" rid="ref-2">2</xref>]. In short, we have preferred the Montgomery (Algorithm 1) PM algorithm to target the side-channel attack-protected hardware implementation of ECC for wireless sensor nodes and RFID applications.</p>
<statement id="st1" content-type="algorithm">
<label>Algorithm 1:</label>
<title>Montgomery ECPM Algorithm [<xref ref-type="bibr" rid="ref-11">11</xref>]</title>
<p><bold>Input:</bold> <inline-formula id="ieqn-36"><mml:math id="mml-ieqn-36"><mml:mi>k</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with <inline-formula id="ieqn-37"><mml:math id="mml-ieqn-37"><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-38"><mml:math id="mml-ieqn-38"><mml:mi>P</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> &#x220A; <inline-formula id="ieqn-39"><mml:math id="mml-ieqn-39"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <bold>Output:</bold> <inline-formula id="ieqn-40"><mml:math id="mml-ieqn-40"><mml:mi>Q</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mi>k</mml:mi><mml:mo>.</mml:mo><mml:mi>P</mml:mi></mml:math></inline-formula></p>
<p><inline-formula id="ieqn-41"><mml:math id="mml-ieqn-41"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-42"><mml:math id="mml-ieqn-42"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula>, <inline-formula id="ieqn-43"><mml:math id="mml-ieqn-43"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>x</mml:mi><mml:msup><mml:mi>p</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msup><mml:mo>+</mml:mo><mml:mi>b</mml:mi></mml:math></inline-formula>, <inline-formula id="ieqn-44"><mml:math id="mml-ieqn-44"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula> &#x2192; (affine to projective coordinate conversion)</p>
<p><italic>for (</italic><inline-formula id="ieqn-45"><mml:math id="mml-ieqn-45"><mml:mi>i</mml:mi><mml:mi>f</mml:mi><mml:mi>r</mml:mi><mml:mi>o</mml:mi><mml:mi>m</mml:mi><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>2</mml:mn><mml:mi>d</mml:mi><mml:mi>o</mml:mi><mml:mi>w</mml:mi><mml:mi>n</mml:mi><mml:mi>t</mml:mi><mml:mi>o</mml:mi><mml:mn>0</mml:mn></mml:math></inline-formula><italic>) do</italic> &#x2192; (point multiplication in projective coordinates)</p>
<p><italic>if (</italic><inline-formula id="ieqn-46"><mml:math id="mml-ieqn-46"><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula><italic>)</italic></p>
<p>&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-47"><mml:math id="mml-ieqn-47"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p>
<p>&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-48"><mml:math id="mml-ieqn-48"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p>
<p><italic>else</italic></p>
<p>&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-49"><mml:math id="mml-ieqn-49"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p>
<p>&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;&#x2002;<inline-formula id="ieqn-50"><mml:math id="mml-ieqn-50"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></p>
<p><italic>end if</italic></p>
<p><italic>end for</italic></p>
<p><inline-formula id="ieqn-51"><mml:math id="mml-ieqn-51"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mfrac></mml:mstyle></mml:math></inline-formula>, <inline-formula id="ieqn-52"><mml:math id="mml-ieqn-52"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>q</mml:mi></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>[</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:msubsup><mml:mrow><mml:mi>x</mml:mi></mml:mrow><mml:mrow><mml:mi>p</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup><mml:mo>+</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>]</mml:mo></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>)</mml:mo></mml:mrow><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn><mml:mo>+</mml:mo><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> &#x2192; (reconversion)</p>
</statement>
<p>The inputs to Algorithm 1 are (i) an initial point <italic>P</italic> with <italic>x</italic> and <italic>y</italic> coordinates, i.e., <inline-formula id="ieqn-53"><mml:math id="mml-ieqn-53"><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-54"><mml:math id="mml-ieqn-54"><mml:msub><mml:mi>y</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and (ii) a scalar multiplier <inline-formula id="ieqn-55"><mml:math id="mml-ieqn-55"><mml:mi>k</mml:mi></mml:math></inline-formula>. A sequence <inline-formula id="ieqn-56"><mml:math id="mml-ieqn-56"><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>n</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:mo>&#x2026;</mml:mo><mml:mo>,</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>,</mml:mo><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mn>0</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> shows the bits stream. The outputs are <italic>x</italic> and <italic>y</italic> coordinates. The <inline-formula id="ieqn-57"><mml:math id="mml-ieqn-57"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-58"><mml:math id="mml-ieqn-58"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> methods represent the PA and PD instructions. For <inline-formula id="ieqn-59"><mml:math id="mml-ieqn-59"><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:math></inline-formula> and <inline-formula id="ieqn-60"><mml:math id="mml-ieqn-60"><mml:mi>e</mml:mi><mml:mi>l</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:math></inline-formula> statements, the sequence of instructions is shown in <xref ref-type="table" rid="table-1">Table 1</xref>.</p>
<table-wrap id="table-1"><label>Table 1</label><caption><title>Number of instructions for <inline-formula id="ieqn-189"><mml:math id="mml-ieqn-189"><mml:mrow><mml:mi mathvariant="bold-italic">P</mml:mi><mml:mi mathvariant="bold-italic">A</mml:mi><mml:mi mathvariant="bold-italic">D</mml:mi><mml:mi mathvariant="bold-italic">D</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-190"><mml:math id="mml-ieqn-190"><mml:mrow><mml:mi mathvariant="bold-italic">P</mml:mi><mml:mi mathvariant="bold-italic">D</mml:mi><mml:mi mathvariant="bold-italic">B</mml:mi><mml:mi mathvariant="bold-italic">L</mml:mi></mml:mrow><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> functions of Algorithm 1</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left"><inline-formula id="ieqn-191"><mml:math id="mml-ieqn-191"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th align="left"><inline-formula id="ieqn-192"><mml:math id="mml-ieqn-192"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></th>
<th align="left"><inline-formula id="ieqn-193"><mml:math id="mml-ieqn-193"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula></th>
<th align="left"><inline-formula id="ieqn-194"><mml:math id="mml-ieqn-194"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula></th>
<th align="left">Cost of finite field operations</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><inline-formula id="ieqn-195"><mml:math id="mml-ieqn-195"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-196"><mml:math id="mml-ieqn-196"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-197"><mml:math id="mml-ieqn-197"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-198"><mml:math id="mml-ieqn-198"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td align="left" rowspan="7">Total instructions&#x2009;&#x003D;&#x2009;14 (7 for PA and 7 for PD) 3, 5 and 6 instructions are for finite field addition, squaring and multiplication operations</td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-199"><mml:math id="mml-ieqn-199"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-200"><mml:math id="mml-ieqn-200"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-201"><mml:math id="mml-ieqn-201"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-202"><mml:math id="mml-ieqn-202"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>Z</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-203"><mml:math id="mml-ieqn-203"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-204"><mml:math id="mml-ieqn-204"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-205"><mml:math id="mml-ieqn-205"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>3</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-206"><mml:math id="mml-ieqn-206"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:mi>b</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-207"><mml:math id="mml-ieqn-207"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-208"><mml:math id="mml-ieqn-208"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-209"><mml:math id="mml-ieqn-209"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>4</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-210"><mml:math id="mml-ieqn-210"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-211"><mml:math id="mml-ieqn-211"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-212"><mml:math id="mml-ieqn-212"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>T</mml:mi></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-213"><mml:math id="mml-ieqn-213"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>5</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-214"><mml:math id="mml-ieqn-214"><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-215"><mml:math id="mml-ieqn-215"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-216"><mml:math id="mml-ieqn-216"><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>x</mml:mi><mml:mrow><mml:mi>p</mml:mi></mml:mrow></mml:msub><mml:mo>&#x00D7;</mml:mo><mml:msub><mml:mi>Z</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-217"><mml:math id="mml-ieqn-217"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>6</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-218"><mml:math id="mml-ieqn-218"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msubsup><mml:mrow><mml:mi>X</mml:mi></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msubsup></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-219"><mml:math id="mml-ieqn-219"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-220"><mml:math id="mml-ieqn-220"><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mi>X</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mi>T</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-221"><mml:math id="mml-ieqn-221"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
<td align="left"><inline-formula id="ieqn-222"><mml:math id="mml-ieqn-222"><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>=</mml:mo><mml:msub><mml:mrow><mml:mtext>X</mml:mtext></mml:mrow><mml:mrow><mml:mn>2</mml:mn></mml:mrow></mml:msub><mml:mo>+</mml:mo><mml:msub><mml:mrow><mml:mtext>T</mml:mtext></mml:mrow><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula></td>
</tr>
</tbody>
</table>
</table-wrap>
<p>Columns one and two give the PA information in terms of sequence of instructions (i.e., <inline-formula id="ieqn-61"><mml:math id="mml-ieqn-61"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) and the corresponding operations, respectively. Similarly, columns three and four show the number of instructions and the respective finite field operations for PD computations. The last column presents the cost of finite field operations in PA and PD instructions.</p>
</sec>
<sec id="s3"><label>3</label><title>Proposed Architecture</title>
<p>Our proposed design is presented in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. It contains (i) a control unit, (ii) input and output buffers and (iii) an ECC unit. The related details of these blocks are as follows.</p>
<fig id="fig-2"><label>Figure 2</label><caption><title>Proposed elliptic curve processor architecture</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_32849-fig-2.png"/></fig>
<sec id="s3_1"><label>3.1</label><title>Control Unit</title>
<p>It generates the corresponding control signals for input/output buffers and the ECC unit. It contains three states: (i) LIP, (ii) SKG and (iii) LOP. The corresponding details of these states (LIP, SKG and LOP) are as follows.</p>
<p><underline>LIP:</underline> It is responsible to load the input parameters, i.e., <italic>x</italic> and <italic>y</italic> coordinates of an input point <italic>P</italic>, and <italic>x</italic> and <italic>y</italic> coordinates of a public key of another node. The objective is the generation of a shared key for ECC unit. After loading the input parameters, it puts a <inline-formula id="ieqn-62"><mml:math id="mml-ieqn-62"><mml:mi>l</mml:mi><mml:mi>o</mml:mi><mml:mi>a</mml:mi><mml:msub><mml:mi>d</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> signal (not shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>) as 1 for the ECC unit to start generating either public or shared keys depending on the ECDH protocol.</p>

<p><underline>SKG:</underline> The ECDH protocol requires PM operation twice. The initial PM is for the generation of <italic>x</italic> and <italic>y</italic> coordinates of the public key. The second PM computation is required for the generation of <italic>x</italic> and <italic>y</italic> coordinates of a shared key. Therefore, the objective of an SKG state is to wait until the generation of <italic>x</italic> and <italic>y</italic> coordinates of either the public or shared keys. After generating the required public or shared keys, the control unit sets a <inline-formula id="ieqn-63"><mml:math id="mml-ieqn-63"><mml:mi>K</mml:mi><mml:msub><mml:mi>G</mml:mi><mml:mrow><mml:mi>d</mml:mi><mml:mi>o</mml:mi><mml:mi>n</mml:mi><mml:mi>e</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> signal (not presented in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>) as 1.</p>

<p><underline>LOP:</underline> The purpose of the LOP states is to load the <italic>x</italic> and <italic>y</italic> coordinates of the public or shared keys on the output pins (i.e., <inline-formula id="ieqn-64"><mml:math id="mml-ieqn-64"><mml:mi>k</mml:mi><mml:mo>.</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> and <inline-formula id="ieqn-65"><mml:math id="mml-ieqn-65"><mml:mi>k</mml:mi><mml:mo>.</mml:mo><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>) of the proposed processor architecture.</p>
</sec>
<sec id="s3_2"><label>3.2</label><title>Input/Output Buffers</title>
<p>The input buffer block comprises three <inline-formula id="ieqn-66"><mml:math id="mml-ieqn-66"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit buffers (not given in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>) to load <italic>x</italic> and <italic>y</italic> coordinates of a secret, public and shared keys sequentially (one-by-one-bit). It takes serial inputs and concatenates them to generate <inline-formula id="ieqn-67"><mml:math id="mml-ieqn-67"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit outputs. Similarly, two <inline-formula id="ieqn-68"><mml:math id="mml-ieqn-68"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit buffers (not shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>) are used in the output buffer block to serially produce the <italic>x</italic> and <italic>y</italic> coordinates of generated public or shared keys as output. For input and output buffers, <italic>m</italic> clock cycles are required for <inline-formula id="ieqn-69"><mml:math id="mml-ieqn-69"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit data and key lengths to load and produce output. It is essential to mention that the proposed architecture is flexible as it offers data loading (including a private and the coordinates of public &#x0026; shared keys) from the outside of ECC unit, as shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>.</p>

</sec>
<sec id="s3_3"><label>3.3</label><title>ECC Unit</title>
<p>The ECC unit contains (i) a storage system, (ii) an arithmetic and logic unit (ALU) and (iii) a controller (ECC CNTRL), as shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. Moreover, for routing purposes, a <inline-formula id="ieqn-70"><mml:math id="mml-ieqn-70"><mml:mn>4</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> multiplexer is used between the storage system and ALU. As we have presented in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>, it selects an operand from the storage system and ECC parameters, i.e., <inline-formula id="ieqn-71"><mml:math id="mml-ieqn-71"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>x</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, <inline-formula id="ieqn-72"><mml:math id="mml-ieqn-72"><mml:msub><mml:mi>P</mml:mi><mml:mrow><mml:mi>y</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula>, and <italic>b</italic>, to provide input to the ALU. The architectural details of the used storage system, ALU and ECC controller blocks are given as follows.</p>

<sec id="s3_3_1"><label>3.3.1</label><title>Storage System (RegFile)</title>
<p>A <inline-formula id="ieqn-73"><mml:math id="mml-ieqn-73"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi></mml:math></inline-formula> size register file is used as memory to store the initial, intermediate and final outputs of the ECC unit. It contains two <inline-formula id="ieqn-74"><mml:math id="mml-ieqn-74"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> sizes of multiplexers and one <inline-formula id="ieqn-75"><mml:math id="mml-ieqn-75"><mml:mn>1</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>6</mml:mn></mml:math></inline-formula> size of a demultiplexer. The intent of routing multiplexers is to read two <inline-formula id="ieqn-76"><mml:math id="mml-ieqn-76"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit operands. Similarly, a demultiplexer is incorporated to update the memory contents. The related control signals (not given in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>) are generated by the ECC controller.</p>

</sec>
<sec id="s3_3_2"><label>3.3.2</label><title>Arithmetic and Logic Unit (ALU)</title>
<p>The pink color in <xref ref-type="fig" rid="fig-2">Fig. 2</xref> shows the ALU that contains an adder (ADD), squarer (SQR), multiplier (MULT) and two reduction (RED) blocks (connected one after each SQR and MULT). Moreover, for routing purposes, a <inline-formula id="ieqn-77"><mml:math id="mml-ieqn-77"><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> multiplexer is used to select the corresponding data for writing on a storage system. Therefore, in <inline-formula id="ieqn-78"><mml:math id="mml-ieqn-78"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field, the addition is performed by employing bitwise Exclusive-OR operations. The SQR unit in <xref ref-type="fig" rid="fig-2">Fig. 2</xref> is implemented by putting a &#x2018;0&#x2019; bit after every successive data bit, as implemented in hardware accelerators of [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-19">19</xref>].</p>

<p>The polynomial multiplication computation specifies the performance of the PM architecture [<xref ref-type="bibr" rid="ref-2">2</xref>,<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>&#x2013;<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>,<xref ref-type="bibr" rid="ref-28">28</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>&#x2013;<xref ref-type="bibr" rid="ref-32">32</xref>]. For multiplying two <inline-formula id="ieqn-79"><mml:math id="mml-ieqn-79"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomial multiplications, several architectures have been presented in the literature [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-17">17</xref>&#x2013;<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>,<xref ref-type="bibr" rid="ref-27">27</xref>,<xref ref-type="bibr" rid="ref-28">28</xref>,<xref ref-type="bibr" rid="ref-30">30</xref>]. These includes (i) bit-serial, (ii) bit-parallel, (iii) digit-serial and (iv) digit-parallel approaches. Moreover, some systolic polynomial multiplication designs are also described in [<xref ref-type="bibr" rid="ref-33">33</xref>&#x2013;<xref ref-type="bibr" rid="ref-35">35</xref>]. In this context, the bit-serial designs are more appropriate for achieving the low-area and power-efficient architectures. But, on the other hand, the computational cost of bit-serial designs is the overhead as it utilizes <italic>m</italic> clock cycles for the multiplication of two <inline-formula id="ieqn-80"><mml:math id="mml-ieqn-80"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit operands. For high-speed cryptographic applications such as network servers, bit parallel and digit parallel multipliers are more attractive choices as they consume a single clock cycle for a polynomial multiplication [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-36">36</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>]. Higher hardware resource utilization and larger power consumptions limit the use of bit and digit parallel multipliers for wireless sensor nodes and RFID applications. The digit-serial multipliers consider both area and computational cost (throughput) simultaneously for polynomial multiplication. It takes <inline-formula id="ieqn-81"><mml:math id="mml-ieqn-81"><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle></mml:math></inline-formula> cycles for one polynomial multiplication, where <italic>a</italic> is the total digits, <italic>b</italic> is the operand length and <italic>c</italic> is digit size. Therefore, the digit-serial polynomial multipliers are the more attractive alternative for multiplying two polynomials. Consequently, our MULT contains a digit-serial architecture.</p>
<p><underline>Proposed digit-serial multiplier architecture:</underline> Our proposed digit-serial polynomial multiplication architecture (24-bits) is shown with the green color in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. The reason to select a 24-bits digit size is to obtain an optimal computational cost with minimum hardware resource utilization. The longer digit length reduces clock cycles requirement but utilizes more hardware resources and consumes more power which is not feasible for wireless sensor nodes and RFID applications [<xref ref-type="bibr" rid="ref-19">19</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>]. With this compliance, we have employed a 24-bit digit size in our multiplication architecture of <xref ref-type="fig" rid="fig-2">Fig. 2</xref> where two <inline-formula id="ieqn-82"><mml:math id="mml-ieqn-82"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomials, i.e., <italic>A</italic>, and <italic>B</italic>, are input to the proposed multiplier. We have stored an <inline-formula id="ieqn-83"><mml:math id="mml-ieqn-83"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomial <italic>B</italic> in an <inline-formula id="ieqn-84"><mml:math id="mml-ieqn-84"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit buffer. Then, to initiate a polynomial multiplication in the first cycle, we have loaded 24-bits of polynomial <italic>B</italic> from LSB (least-significant-side) into buffer <inline-formula id="ieqn-85"><mml:math id="mml-ieqn-85"><mml:mi>B</mml:mi><mml:mn>0</mml:mn></mml:math></inline-formula> for polynomial multiplications using <inline-formula id="ieqn-86"><mml:math id="mml-ieqn-86"><mml:mi>M</mml:mi><mml:mn>0</mml:mn></mml:math></inline-formula> multiplier. The size of <inline-formula id="ieqn-87"><mml:math id="mml-ieqn-87"><mml:mi>B</mml:mi><mml:mn>0</mml:mn></mml:math></inline-formula> is also 24-bits. In the next cycle, the next 24-bits of polynomial <italic>B</italic> are loaded into <inline-formula id="ieqn-88"><mml:math id="mml-ieqn-88"><mml:mi>B</mml:mi><mml:mn>0</mml:mn></mml:math></inline-formula> for multiplication. After the second multiplication of 24-bits of polynomial <italic>B</italic> with an <inline-formula id="ieqn-89"><mml:math id="mml-ieqn-89"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomial <italic>A</italic>, we have accumulated the current generated result with the previous result to acquire the resultant polynomial. This process will continue until all the 24-bit digits of polynomial <italic>B</italic> are multiplied with an <inline-formula id="ieqn-90"><mml:math id="mml-ieqn-90"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit input polynomial <italic>A</italic>. Finally, the resultant polynomial contains a <inline-formula id="ieqn-91"><mml:math id="mml-ieqn-91"><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> bit length. The computational cost of our multiplier is <inline-formula id="ieqn-92"><mml:math id="mml-ieqn-92"><mml:mi>a</mml:mi><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> cycles, where <italic>a</italic> is the total digits, <italic>b</italic> is the operand length (163 and 233 in this work) and <italic>c</italic> is the digit length (24-bits in this work). An additional clock cycle is needed to load an <inline-formula id="ieqn-93"><mml:math id="mml-ieqn-93"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomial <italic>B</italic> into a buffer.</p>

<p>For the computation of one <inline-formula id="ieqn-94"><mml:math id="mml-ieqn-94"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomial squaring or two <inline-formula id="ieqn-95"><mml:math id="mml-ieqn-95"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomial multiplications, the proposed SQR and MULT units result in <inline-formula id="ieqn-96"><mml:math id="mml-ieqn-96"><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x2212;</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> bit polynomials length, respectively. Therefore, a reduction is needed to obtain an <inline-formula id="ieqn-97"><mml:math id="mml-ieqn-97"><mml:mi>m</mml:mi></mml:math></inline-formula>-bit polynomials. The RED block in <xref ref-type="fig" rid="fig-2">Fig. 2</xref> is implemented using NIST defined reduction algorithms. For the corresponding reduction algorithms over <inline-formula id="ieqn-98"><mml:math id="mml-ieqn-98"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-99"><mml:math id="mml-ieqn-99"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, we refer readers to [<xref ref-type="bibr" rid="ref-6">6</xref>,<xref ref-type="bibr" rid="ref-37">37</xref>]. Moreover, in Algorithm 1, the reconversion from projective to affine shows that a finite field inversion operation is needed. The inversion block is not shown in <xref ref-type="fig" rid="fig-2">Fig. 2</xref>. However, we have used an Itoh-Tsujii inversion algorithm which is initially proposed in 1988 and the corresponding mathematical formulations are completely described in [<xref ref-type="bibr" rid="ref-38">38</xref>]. For implementations, it requires frequent squaring and multiplication operations [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-13">13</xref>,<xref ref-type="bibr" rid="ref-21">21</xref>]. Over <inline-formula id="ieqn-100"><mml:math id="mml-ieqn-100"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-101"><mml:math id="mml-ieqn-101"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the corresponding Itoh-Tsujii inversion algorithms are represented in [<xref ref-type="bibr" rid="ref-39">39</xref>,<xref ref-type="bibr" rid="ref-5">5</xref>], respectively. The Itoh-Tsujii algorithm in our design of <xref ref-type="fig" rid="fig-2">Fig. 2</xref> is implemented by sharing hardware resources of SQR and MULT blocks as implemented in [<xref ref-type="bibr" rid="ref-5">5</xref>,<xref ref-type="bibr" rid="ref-39">39</xref>]. This (also) allow us to save the hardware cost of our proposed design.</p>

</sec>
<sec id="s3_3_3"><label>3.3.3</label><title>Dedicated Controller (ECC CNTRL) and Clock Cycles Calculation</title>
<p>The <inline-formula id="ieqn-102"><mml:math id="mml-ieqn-102"><mml:mi>E</mml:mi><mml:mi>C</mml:mi><mml:mi>C</mml:mi><mml:mrow><mml:mtext mathvariant="italic">CNTRL</mml:mtext></mml:mrow></mml:math></inline-formula> unit is responsible to generate the corresponding control signals for the routing multiplexers (i.e., <inline-formula id="ieqn-103"><mml:math id="mml-ieqn-103"><mml:mi>M</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>4</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-104"><mml:math id="mml-ieqn-104"><mml:mi>M</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:mn>3</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>) and MULT unit. Moreover, it corresponds with the control unit block after the computation of <italic>x</italic> and <italic>y</italic> coordinates of public and shared keys. It consists of 88 states. State 0 is an idle state. However, the details for other states are as follows.
<list list-type="simple">
<list-item><label>&#x02022;</label><p><underline>Affine to projective coordinates conversion:</underline> Affine to projective conversions is performed from state 1 to state 6. Each state requires one clock cycle. So a total of six clock cycles are needed to compute affine to projective conversions.</p></list-item>
<list-item><label>&#x02022;</label><p><underline>PM in projective coordinates:</underline> Columns two and four of <xref ref-type="table" rid="table-1">Table 1</xref> shows that each <inline-formula id="ieqn-105"><mml:math id="mml-ieqn-105"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-106"><mml:math id="mml-ieqn-106"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> functions involve seven instructions (i.e., <inline-formula id="ieqn-107"><mml:math id="mml-ieqn-107"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>1</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula> to <inline-formula id="ieqn-108"><mml:math id="mml-ieqn-108"><mml:mi>I</mml:mi><mml:mi>n</mml:mi><mml:mi>s</mml:mi><mml:msub><mml:mi>t</mml:mi><mml:mrow><mml:mn>7</mml:mn></mml:mrow></mml:msub></mml:math></inline-formula>). Hence, a total of fourteen instructions are needed to compute each PA and PD operation. State seven is a conditional state which is responsible to check the value of key. If the <inline-formula id="ieqn-109"><mml:math id="mml-ieqn-109"><mml:msub><mml:mi>k</mml:mi><mml:mrow><mml:mi>i</mml:mi></mml:mrow></mml:msub></mml:math></inline-formula> in Algorithm 1 becomes 1 then the PA and PD operations of the <inline-formula id="ieqn-110"><mml:math id="mml-ieqn-110"><mml:mi>i</mml:mi><mml:mi>f</mml:mi></mml:math></inline-formula> part will be computed (during states eight to twenty-one) otherwise the <inline-formula id="ieqn-111"><mml:math id="mml-ieqn-111"><mml:mi>e</mml:mi><mml:mi>l</mml:mi><mml:mi>s</mml:mi><mml:mi>e</mml:mi></mml:math></inline-formula> part will be executed (during states twenty two to thirty-five). The last column in <xref ref-type="table" rid="table-1">Table 1</xref> shows that the six, three and five instructions are required for the computation of finite field multiplication, addition and squaring, respectively. The addition and squaring operations require only one clock cycle for computations. On the other hand, each finite field multiplication takes <inline-formula id="ieqn-112"><mml:math id="mml-ieqn-112"><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> clock cycles. Therefore, six multiplications take <inline-formula id="ieqn-113"><mml:math id="mml-ieqn-113"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mn>1</mml:mn></mml:math></inline-formula> clock cycles.</p>
</list-item>
<list-item><label>&#x02022;</label><p><underline>Reconversion from projective to affine coordinates</underline>: When the processor executes the <inline-formula id="ieqn-114"><mml:math id="mml-ieqn-114"><mml:mi>f</mml:mi><mml:mi>o</mml:mi><mml:mi>r</mml:mi></mml:math></inline-formula> loop statement in Algorithm 1 then the reconversions will be computed during states thirty-six to eighty-eight. Moreover, the reconversion portion of Algorithm 1 also incorporates the finite field inversion operation. Hence, over <inline-formula id="ieqn-115"><mml:math id="mml-ieqn-115"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field, each finite field inversion requires <italic>m</italic> squares and nine multiplications. So the computational cost will be <inline-formula id="ieqn-116"><mml:math id="mml-ieqn-116"><mml:mn>9</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>m</mml:mi></mml:math></inline-formula> clock cycles. Similarly, over <inline-formula id="ieqn-117"><mml:math id="mml-ieqn-117"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field, each finite field inversion demands <italic>m</italic> squares and ten multiplications. In this case the computational cost will be <inline-formula id="ieqn-118"><mml:math id="mml-ieqn-118"><mml:mn>10</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>b</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>m</mml:mi></mml:math></inline-formula> clock cycles.</p></list-item>
</list></p>
</sec>
</sec>
<sec id="s3_4"><label>3.4</label><title>Total Clock Cycle Calculations</title>
<p>The total clock cycles of our proposed processor architecture over <inline-formula id="ieqn-119"><mml:math id="mml-ieqn-119"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-120"><mml:math id="mml-ieqn-120"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is calculated using <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>, respectively.
<disp-formula id="eqn-1"><label>(1)</label><mml:math id="mml-eqn-1" display="block"><mml:mrow><mml:mtext mathvariant="italic">Total</mml:mtext></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">cycles</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>6</mml:mn><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>8</mml:mn><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>9</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>360</mml:mn></mml:math></disp-formula>
<disp-formula id="eqn-2"><label>(2)</label><mml:math id="mml-eqn-2" display="block"><mml:mrow><mml:mtext mathvariant="italic">Total</mml:mtext></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">cycles</mml:mtext></mml:mrow><mml:mo>=</mml:mo><mml:mn>6</mml:mn><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>8</mml:mn><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>10</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>]</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>450</mml:mn></mml:math></disp-formula></p>
<p>In <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>, <italic>m</italic> shows the targeted field length (i.e., 163 and 233) and <italic>c</italic> determines the digit length of 24-bits. The additional details are given below.
<list list-type="simple">
<list-item><label>&#x02022;</label><p><underline>Affine to projective coordinates conversion:</underline> A numerical value of 6 before the square brackets determine the clock cycles for affine to projective conversions.</p></list-item>
<list-item><label>&#x02022;</label><p><underline>PM in projective coordinates:</underline> In <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>, <inline-formula id="ieqn-121"><mml:math id="mml-ieqn-121"><mml:mn>6</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>8</mml:mn></mml:math></inline-formula> determines the clock cycles for PM computation in projective coordinates. If we substitute <inline-formula id="ieqn-122"><mml:math id="mml-ieqn-122"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>163</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-123"><mml:math id="mml-ieqn-123"><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn>24</mml:mn></mml:math></inline-formula> in <xref ref-type="disp-formula" rid="eqn-1">Eq. (1)</xref> then the 48 clock cycles are required to compute six multiplication instructions of <inline-formula id="ieqn-124"><mml:math id="mml-ieqn-124"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-125"><mml:math id="mml-ieqn-125"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> functions of Algorithm 1. The additional 8 clock cycles are needed to compute the addition and squares computations. Similarly, if we use <inline-formula id="ieqn-126"><mml:math id="mml-ieqn-126"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>233</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-127"><mml:math id="mml-ieqn-127"><mml:mi>c</mml:mi><mml:mo>=</mml:mo><mml:mn>24</mml:mn></mml:math></inline-formula> in <xref ref-type="disp-formula" rid="eqn-2">Eq. (2)</xref> then the 66 clock cycles are needed to compute six multiplication instructions of <inline-formula id="ieqn-128"><mml:math id="mml-ieqn-128"><mml:mi>P</mml:mi><mml:mi>A</mml:mi><mml:mi>D</mml:mi><mml:mi>D</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> and <inline-formula id="ieqn-129"><mml:math id="mml-ieqn-129"><mml:mi>P</mml:mi><mml:mi>D</mml:mi><mml:mi>B</mml:mi><mml:mi>L</mml:mi><mml:mo stretchy="false">(</mml:mo><mml:mo stretchy="false">)</mml:mo></mml:math></inline-formula> functions and 8 shows the additional clock cycles for addition and squares computations. Therefore, over <inline-formula id="ieqn-130"><mml:math id="mml-ieqn-130"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-131"><mml:math id="mml-ieqn-131"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the total cycles for one iteration of a loop statement of Algorithm 1 is 56 and 74, respectively. Then, the required cycles for <italic>m</italic> field operations is 9128 (<inline-formula id="ieqn-132"><mml:math id="mml-ieqn-132"><mml:mn>163</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>56</mml:mn></mml:math></inline-formula>) and 17242 (<inline-formula id="ieqn-133"><mml:math id="mml-ieqn-133"><mml:mn>233</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mn>74</mml:mn></mml:math></inline-formula>).</p></list-item>
<list-item><label>&#x02022;</label><p><underline>Reconversion from projective to affine coordinates:</underline> As shown in reconversion part of Algorithm 1, two finite field inversions are involved to execute the reconversion step. Therefore, over <inline-formula id="ieqn-134"><mml:math id="mml-ieqn-134"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-135"><mml:math id="mml-ieqn-135"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the <inline-formula id="ieqn-136"><mml:math id="mml-ieqn-136"><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>9</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-137"><mml:math id="mml-ieqn-137"><mml:mn>2</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>[</mml:mo><mml:mn>10</mml:mn><mml:mo>&#x00D7;</mml:mo><mml:mrow><mml:mo>{</mml:mo><mml:mrow><mml:mo>(</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mi>m</mml:mi><mml:mi>c</mml:mi></mml:mfrac></mml:mstyle><mml:mo>)</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mn>1</mml:mn><mml:mo>}</mml:mo></mml:mrow><mml:mo>+</mml:mo><mml:mi>m</mml:mi><mml:mo>]</mml:mo></mml:mrow></mml:math></inline-formula> portions of <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref> determines the inversion computation. If we use the corresponding values of <italic>m</italic> and <italic>c</italic>, the clock cycle computation becomes 468 and 682. The additional 360 and 450 cycles are needed to compute the remaining instructions of reconversion portion ofAlgorithm 1. Therefore, over <inline-formula id="ieqn-138"><mml:math id="mml-ieqn-138"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-139"><mml:math id="mml-ieqn-139"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the total clock cycle requirements for reconversion is 828 and 1132, respectively.</p></list-item>
</list>
In summary, for one PM execution, the clock cycles requirement of our proposed architecture over <inline-formula id="ieqn-140"><mml:math id="mml-ieqn-140"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-141"><mml:math id="mml-ieqn-141"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 10125 and 18613, respectively. As the ECDH protocol requires two time PM computation then the clock cycles for shared key generation are 20250 and 37226, respectively.</p>
</sec>
</sec>
<sec id="s4"><label>4</label><title>Results and Comparisons</title>
<sec id="s4_1"><label>4.1</label><title>Results</title>
<p>To describe the implementation results, we have first provided the simulation waveform in Section 4.1.1. After that, the implementation results are reported in Section 4.1.2. Finally, the schematic waveform after the circuit place and route is shown in Section 4.1.3.</p>
<sec id="s4_1_1"><label>4.1.1</label><title>Simulation Waveform</title>
<p>The simulation waveform over <inline-formula id="ieqn-142"><mml:math id="mml-ieqn-142"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is shown in <xref ref-type="fig" rid="fig-3">Fig. 3</xref>. It ensures that the proposed coprocessor architecture successfully generates the shared key value when the public and private/secret keys are input to the system. The generated shared key value could be used to perform key authentication or encryption and decryption between two wireless sensor nodes.</p>
<fig id="fig-3"><label>Figure 3</label><caption><title>RTL simulation waveform (captured on Vivado 2019.2)</title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_32849-fig-3.png"/></fig>
</sec>
<sec id="s4_1_2"><label>4.1.2</label><title>Implementation Results</title>
<p>Our proposed coprocessor architecture over <inline-formula id="ieqn-143"><mml:math id="mml-ieqn-143"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-144"><mml:math id="mml-ieqn-144"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is implemented in Verilog, using Vivado IDE (Integrated Design Environment) tool. The implementation results is performed for a 28&#x2005;nm technology on Virtex-7 (xc7vx485tffg1157-1) FPGA. The input parameters have been selected from NIST standardized document [<xref ref-type="bibr" rid="ref-16">16</xref>]. Consequently, the results are provided after place-and-route in <xref ref-type="table" rid="table-2">Table 2</xref>. The field length (<inline-formula id="ieqn-145"><mml:math id="mml-ieqn-145"><mml:mi>m</mml:mi></mml:math></inline-formula>) is presented in column one. Columns two, three and four present the slices, LUTs and FFs respectively. The clock frequency is presented in column five. The total number of required clock cycles (CCs) and latency (in <inline-formula id="ieqn-146"><mml:math id="mml-ieqn-146"><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi>s</mml:mi></mml:math></inline-formula>) figures are given in columns six and seven, respectively. Similarly, the clock cycles (CCs) and latency (in <inline-formula id="ieqn-147"><mml:math id="mml-ieqn-147"><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi>s</mml:mi></mml:math></inline-formula>) values for shared key generation are given in columns eight and nine, respectively. Finally, the consumed power (in <inline-formula id="ieqn-148"><mml:math id="mml-ieqn-148"><mml:mi>m</mml:mi><mml:mi>W</mml:mi></mml:math></inline-formula>) is provided in the last column. The area and frequency values are obtained from the Vivado IDE tool. The clock cycles are calculated using <xref ref-type="disp-formula" rid="eqn-1">Eqs. (1)</xref> and <xref ref-type="disp-formula" rid="eqn-2">(2)</xref>, the details are already described in Section 3.4. The latency values are calculated using <xref ref-type="disp-formula" rid="eqn-3">Eq. (3)</xref>. To obtain power values, we have used a Vivado Power Analysis tool [<xref ref-type="bibr" rid="ref-40">40</xref>].
<disp-formula id="eqn-3"><label>(3)</label><mml:math id="mml-eqn-3" display="block"><mml:mrow><mml:mtext mathvariant="italic">Latency</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow><mml:mo>=</mml:mo><mml:mstyle displaystyle="true" scriptlevel="0"><mml:mfrac><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">Clock</mml:mtext></mml:mrow><mml:mrow><mml:mtext mathvariant="italic">Cycles</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>C</mml:mi><mml:mi>C</mml:mi><mml:mi>s</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow><mml:mrow><mml:mrow><mml:mtext mathvariant="italic">Frequency</mml:mtext></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:mi>M</mml:mi><mml:mi>H</mml:mi><mml:mi>z</mml:mi><mml:mo>)</mml:mo></mml:mrow></mml:mrow></mml:mfrac></mml:mstyle></mml:math></disp-formula></p>
<table-wrap id="table-2"><label>Table 2</label><caption><title>Results of our proposed architecture over <inline-formula id="ieqn-223"><mml:math id="mml-ieqn-223"><mml:mrow><mml:mi mathvariant="bold-italic">G</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-224"><mml:math id="mml-ieqn-224"><mml:mrow><mml:mi mathvariant="bold-italic">G</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on Virtex-7 FPGA</title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left" rowspan="2"><inline-formula id="ieqn-225"><mml:math id="mml-ieqn-225"><mml:mi>m</mml:mi></mml:math></inline-formula></th>
<th align="left" colspan="3">Area Utilizations</th>
<th align="left" rowspan="2">Freq. (<italic>MHz</italic>)</th>
<th align="left" colspan="2">Public Key</th>
<th align="left" colspan="2">Shared Key</th>
<th align="left" rowspan="2">Power (<inline-formula id="ieqn-226"><mml:math id="mml-ieqn-226"><mml:mi>m</mml:mi><mml:mi>W</mml:mi></mml:math></inline-formula>)</th>
</tr>
<tr>
<th align="left">Slices</th>
<th align="left">LUTs</th>
<th align="left">FFs</th>
<th align="left">CCs</th>
<th align="left">Lat. (<inline-formula id="ieqn-227"><mml:math id="mml-ieqn-227"><mml:mrow><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi mathvariant="normal">s</mml:mi></mml:mrow></mml:math></inline-formula>)</th>
<th align="left">CCs</th>
<th align="left">Lat. (<inline-formula id="ieqn-228"><mml:math id="mml-ieqn-228"><mml:mrow><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi mathvariant="normal">s</mml:mi></mml:mrow></mml:math></inline-formula>)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left"><inline-formula id="ieqn-229"><mml:math id="mml-ieqn-229"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td align="left">1351</td>
<td align="left">5403</td>
<td align="left">1306</td>
<td align="left">250</td>
<td align="left">10125</td>
<td align="left">40.50</td>
<td align="left">20250</td>
<td align="left">81.00</td>
<td align="left">0.91</td>
</tr>
<tr>
<td align="left"><inline-formula id="ieqn-230"><mml:math id="mml-ieqn-230"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></td>
<td align="left">1789</td>
<td align="left">7156</td>
<td align="left">1864</td>
<td align="left">235</td>
<td align="left">18613</td>
<td align="left">79.20</td>
<td align="left">37226</td>
<td align="left">158.40</td>
<td align="left">1.37</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="tfn1_1"><p>Note: Lat: is the computational time. CCs: shows the clock cycles.</p></fn>
</table-wrap-foot>
</table-wrap>
<p>Due to different field lengths (i.e., <inline-formula id="ieqn-149"><mml:math id="mml-ieqn-149"><mml:mn>163</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-150"><mml:math id="mml-ieqn-150"><mml:mn>233</mml:mn></mml:math></inline-formula>), the proposed architecture over <inline-formula id="ieqn-151"><mml:math id="mml-ieqn-151"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> utilizes 1351, 5403 and 1306 FPGA slices, LUTs and FFs that are comparatively 0.75 (ratio of 1351 over 1789), 0.75 (ratio of 5403 over 7156) and 0.70 (ratio of 1306 over 1864) times lower than the design implemented over <inline-formula id="ieqn-152"><mml:math id="mml-ieqn-152"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> field. The use of a coprocessor implementation style and a digit-serial finite field multiplier results in a maximum frequency of 250 and 235&#x2005;<italic>MHz</italic> over <inline-formula id="ieqn-153"><mml:math id="mml-ieqn-153"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-154"><mml:math id="mml-ieqn-154"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, respectively. By employing different optimization techniques such as pipelining, parallelism and scheduling for PA and PD instructions of columns two and four of <xref ref-type="table" rid="table-1">Table 1</xref>, the clock frequency of our architecture could be improved for high-speed cryptographic applications.</p>

<p>Despite the hardware resources and operational frequency, our design requires 10125 and 20250 cycles for one public and shared keys computation over <inline-formula id="ieqn-155"><mml:math id="mml-ieqn-155"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. Similarly, for public and shared key generations, the clock cycle cost of our architecture over <inline-formula id="ieqn-156"><mml:math id="mml-ieqn-156"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 18613 and 37226, respectively. With some area (hardware resources) overhead, the clock cycles could be improved by employing bit-parallel or digit-parallel finite field multipliers inside the ALU of our coprocessor architecture. The computational cost in terms of latency is 40.50 and 81.00&#x2005;&#x03BC;s over <inline-formula id="ieqn-157"><mml:math id="mml-ieqn-157"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for one public key and shared key generation, respectively. For similar operations, the latency values over <inline-formula id="ieqn-158"><mml:math id="mml-ieqn-158"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 79.20 and 158.40&#x2005;&#x03BC;s. As expected, the computational cost (in terms of both CCs and latency) increases with the increase in the binary field length (i.e., from <inline-formula id="ieqn-159"><mml:math id="mml-ieqn-159"><mml:mn>163</mml:mn></mml:math></inline-formula> to <inline-formula id="ieqn-160"><mml:math id="mml-ieqn-160"><mml:mn>233</mml:mn></mml:math></inline-formula>). The latency of the proposed design could be improved by (i) reducing the clock cycles and (ii) maximizing the clock frequency.</p>
<p>Utilization of a digit-serial multiplier with a smaller digit size of 24-bit results in lower power consumption of 0.91 and 1.37&#x2005;<italic>mW</italic> over <inline-formula id="ieqn-161"><mml:math id="mml-ieqn-161"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-162"><mml:math id="mml-ieqn-162"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, respectively. The use of smaller digit length results in a lower computational power with clock cycles overhead [<xref ref-type="bibr" rid="ref-11">11</xref>,<xref ref-type="bibr" rid="ref-41">41</xref>]. The hardware resources and power consumption of our proposed design could be improved further by employing a bit-serial multiplication architecture as used in [<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
</sec>
<sec id="s4_1_3"><label>4.1.3</label><title>Schematic Layout</title>
<p>The circuit layout of our proposed coprocessor architecture over <inline-formula id="ieqn-163"><mml:math id="mml-ieqn-163"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is shown in <xref ref-type="fig" rid="fig-4">Fig. 4</xref>. It shows that the proposed coprocessor architecture is routable on our selected Virtex-7 (xc7vx485tffg1157-1) FPGA without the DRC (design rule check) and timing violations.</p>
<fig id="fig-4"><label>Figure 4</label><caption><title>Circuit layout of the proposed coprocessor architecture over <inline-formula id="ieqn-188"><mml:math id="mml-ieqn-188"><mml:mrow><mml:mi mathvariant="bold-italic">G</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi></mml:mrow><mml:mrow><mml:mo mathvariant="bold" stretchy="false">(</mml:mo><mml:msup><mml:mn mathvariant="bold">2</mml:mn><mml:mrow><mml:mn mathvariant="bold">163</mml:mn></mml:mrow></mml:msup><mml:mo mathvariant="bold" stretchy="false">)</mml:mo></mml:mrow></mml:math></inline-formula></title></caption><graphic mimetype="image" mime-subtype="png" xlink:href="CMC_32849-fig-4.png"/></fig>
</sec>
</sec>
<sec id="s4_2"><label>4.2</label><title>Comparisons</title>
<p>The comparison with state-of-the-art is shown in <xref ref-type="table" rid="table-3">Table 3</xref>. The reference design and publication years are displayed in column one. The implemented binary field length along with cryptographic operation is given in column two. Column three presents the targeted FPGA device. The values presented before the parenthesis in column four are the FPGA slices while a value inside the parenthesis is the FPGA LUTs. The operational frequency (in <inline-formula id="ieqn-164"><mml:math id="mml-ieqn-164"><mml:mi>M</mml:mi><mml:mi>H</mml:mi><mml:mi>z</mml:mi></mml:math></inline-formula>) and latency (in <inline-formula id="ieqn-165"><mml:math id="mml-ieqn-165"><mml:mrow><mml:mi>&#x03BC;</mml:mi></mml:mrow><mml:mi>s</mml:mi></mml:math></inline-formula>) values are presented in columns five and six, respectively. Moreover, we have used a symbol of &#x2018;&#x2013;&#x2019; in <xref ref-type="table" rid="table-3">Table 3</xref> where the relevant information is not given.</p>
<table-wrap id="table-3"><label>Table 3</label><caption><title>Comparison to most relevant state-of-the-art architectures over <inline-formula id="ieqn-231"><mml:math id="mml-ieqn-231"><mml:mrow><mml:mi mathvariant="bold-italic">G</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi mathvariant="bold-italic">m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula></title></caption>
<table frame="hsides">
<colgroup>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
<col align="left"/>
</colgroup>
<thead>
<tr>
<th align="left">Ref &#x0023;./Year</th>
<th align="left"><inline-formula id="ieqn-232"><mml:math id="mml-ieqn-232"><mml:mrow><mml:mi mathvariant="bold-italic">G</mml:mi><mml:mi mathvariant="bold-italic">F</mml:mi></mml:mrow><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi mathvariant="bold-italic">m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/Op</th>
<th align="left">Device</th>
<th align="left">Slices/(LUTs)</th>
<th align="left">Freq. (<inline-formula id="ieqn-233"><mml:math id="mml-ieqn-233"><mml:mi mathvariant="bold-italic">M</mml:mi><mml:mi mathvariant="bold-italic">H</mml:mi><mml:mi mathvariant="bold-italic">z</mml:mi></mml:math></inline-formula>)</th>
<th align="left">Lat. (<inline-formula id="ieqn-234"><mml:math id="mml-ieqn-234"><mml:mi mathvariant="bold-italic">&#x03BC;</mml:mi><mml:mi mathvariant="bold-italic">s</mml:mi></mml:math></inline-formula>)</th>
</tr>
</thead>
<tbody>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-11">11</xref>]/2019</td>
<td align="left"><inline-formula id="ieqn-235"><mml:math id="mml-ieqn-235"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECPM</td>
<td align="left">Virtex-7</td>
<td align="left">5120/(&#x2013;)</td>
<td align="left">357</td>
<td align="left">15.78</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-21">21</xref>]/2021</td>
<td align="left"><inline-formula id="ieqn-236"><mml:math id="mml-ieqn-236"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECPM</td>
<td align="left">Virtex-5</td>
<td align="left">&#x2013;/(1786)</td>
<td align="left">909</td>
<td align="left">2.88</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-32">32</xref>]/2021</td>
<td align="left"><inline-formula id="ieqn-237"><mml:math id="mml-ieqn-237"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/EiGamal</td>
<td align="left">Stratix-II</td>
<td align="left">&#x2013;/(&#x2013;)</td>
<td align="left">187</td>
<td align="left">4.91</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-39">39</xref>]/2017</td>
<td align="left"><inline-formula id="ieqn-238"><mml:math id="mml-ieqn-238"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECPM</td>
<td align="left">Virtex-7</td>
<td align="left">3657/(&#x2013;)</td>
<td align="left">135</td>
<td align="left">25.31</td>
</tr>
<tr>
<td align="left">This work</td>
<td align="left"><inline-formula id="ieqn-239"><mml:math id="mml-ieqn-239"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECPM</td>
<td align="left">Virtex-7</td>
<td align="left">1351/(5403)</td>
<td align="left">250</td>
<td align="left">40.50</td>
</tr>
<tr>
<td/>
<td align="left"><inline-formula id="ieqn-240"><mml:math id="mml-ieqn-240"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECPM</td>
<td align="left">Virtex-7</td>
<td align="left">1789/(7156)</td>
<td align="left">235</td>
<td align="left">79.20</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-14">14</xref>]/2022</td>
<td align="left"><inline-formula id="ieqn-241"><mml:math id="mml-ieqn-241"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECDH</td>
<td align="left">Virtex-7</td>
<td align="left">5102/(&#x2013;)</td>
<td align="left">318</td>
<td align="left">31.08</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-17">17</xref>]/2013</td>
<td align="left"><inline-formula id="ieqn-242"><mml:math id="mml-ieqn-242"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECDH</td>
<td align="left">Artix-7</td>
<td align="left">603/(&#x2013;)</td>
<td align="left">10</td>
<td align="left">167.60</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-24">24</xref>]/2015</td>
<td align="left"><inline-formula id="ieqn-243"><mml:math id="mml-ieqn-243"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECGDH-1</td>
<td align="left">Spartan-6</td>
<td align="left">&#x2013;/(13663)</td>
<td align="left">33</td>
<td align="left">33.60</td>
</tr>
<tr>
<td align="left">[<xref ref-type="bibr" rid="ref-27">27</xref>]/2016</td>
<td align="left"><inline-formula id="ieqn-244"><mml:math id="mml-ieqn-244"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/Enc/Dec</td>
<td align="left">Artix-7</td>
<td align="left">8847/(&#x2013;)</td>
<td align="left">229</td>
<td align="left">2.49 &#x0026; 2.50 <inline-formula id="ieqn-245"><mml:math id="mml-ieqn-245"><mml:mi>m</mml:mi><mml:mi>s</mml:mi></mml:math></inline-formula></td>
</tr>
<tr>
<td align="left">This work</td>
<td align="left"><inline-formula id="ieqn-246"><mml:math id="mml-ieqn-246"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECDH</td>
<td align="left">Artix-7</td>
<td align="left">1389/(5556)</td>
<td align="left">247</td>
<td align="left">81.98</td>
</tr>
<tr><td/>
<td align="left"><inline-formula id="ieqn-247"><mml:math id="mml-ieqn-247"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECDH</td>
<td align="left">Spartan-6</td>
<td align="left">1413/(5652)</td>
<td align="left">231</td>
<td align="left">87.66</td>
</tr>
<tr><td/>
<td align="left"><inline-formula id="ieqn-248"><mml:math id="mml-ieqn-248"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>/ECDH</td>
<td align="left">Virtex-7</td>
<td align="left">1789/(7156)</td>
<td align="left">235</td>
<td align="left">158.40</td>
</tr>
</tbody>
</table>
<table-wrap-foot><fn id="tfn2_1"><p>Note: Op: determines the elliptic curve operation. Freq: is the frequency. Lat: is the latency. The design of [<xref ref-type="bibr" rid="ref-17">17</xref>] uses 2 and 21 sizes of 36 and 18&#x2005;kb BRAMs. Additionally, it uses 38 DSP48A1 slices. ECGDH-1: is the elliptic curve group Diffie Hellman key exchange mechanism.</p></fn>
</table-wrap-foot>
</table-wrap>
<sec id="s4_2_1"><label>4.2.1</label><title>Comparison with ECPM Designs</title>
<p>On Virtex-7 over <inline-formula id="ieqn-166"><mml:math id="mml-ieqn-166"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the proposed architecture consumes 2.86 (ratio of 5120 with 1789) times lesser slices with respect to [<xref ref-type="bibr" rid="ref-11">11</xref>]. The reason is the use of a digit-serial multiplier (24-bits). On the other hand, a 32-bit digit size is used in a parallel way in [<xref ref-type="bibr" rid="ref-11">11</xref>]. Moreover, this comparison shows that the longer digits result in higher hardware resources. Additionally, the digit-parallel multiplication approach with a digit length of 32-bits results in lower clock cycles which ultimately improves the latency value in [<xref ref-type="bibr" rid="ref-11">11</xref>]. The use of 2-stage pipelining improves the clock frequency in [<xref ref-type="bibr" rid="ref-17">17</xref>].</p>
<p>As shown in <xref ref-type="table" rid="table-3">Table 3</xref>, the architecture of [<xref ref-type="bibr" rid="ref-21">21</xref>] over <inline-formula id="ieqn-167"><mml:math id="mml-ieqn-167"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on Virtex-7 utilizes lower FPGA LUTs and takes less time for computation with respect to the proposed design. The reason is the computation of only the PM operation of ECC while our design considers the ECDH protocol implementation for key authentication. As the objective of our work is to generate the shared key for wireless sensor nodes and RFID applications, the proposed architecture can operate up to a maximum of 250&#x2005;<italic>MHz</italic> while the architecture of [<xref ref-type="bibr" rid="ref-21">21</xref>] can operate on 909&#x2005;<italic>MHz</italic> frequency as the intent is to optimize only the PM operation.</p>

<p>Apart from the hardware resources and timing results, the power consumption of [<xref ref-type="bibr" rid="ref-21">21</xref>] is 0.73&#x2005;<italic>mW</italic> for one PM execution. In our work, a 0.91&#x2005;<italic>mW</italic> is consumed for one shared key generation using ECDH protocol. Furthermore, our design utilizes Montgomery PM algorithm for the implementation of the ECDH protocol of ECC as it is inherently secure against timing and power analysis attacks. On the other hand, the Lopez Dahab PM algorithm is used in [<xref ref-type="bibr" rid="ref-21">21</xref>]. In Lopez Dahab PM algorithm, swapping between the PA and PD computations is needed whenever the inspected value of the key-bit becomes 1. The need for swapping requires additional clock cycles which shows that the architecture of [<xref ref-type="bibr" rid="ref-21">21</xref>] is not secure against the timing and power analysis attacks. To summarize, our architecture is protected against timing and power analysis attacks and consumes a comparable power than the power consumption of [<xref ref-type="bibr" rid="ref-21">21</xref>].</p>
<p>The Stratix-II design of [<xref ref-type="bibr" rid="ref-32">32</xref>] achieves an operational frequency of 187&#x2005;<italic>MHz</italic> that is comparatively 1.33 (ratio of 250 with 187) times lower than our Virtex-7 implementation over <inline-formula id="ieqn-168"><mml:math id="mml-ieqn-168"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>. In other words, our work is 1.33 times faster in terms of frequency. However, our architecture requires more computational time as we have described a flexible design while a dedicated architecture of PM is discussed in [<xref ref-type="bibr" rid="ref-32">32</xref>]. The comparison to area and power values are not possible to provide as the relevant information is not presented in the reference design. On Virtex-7 over <inline-formula id="ieqn-169"><mml:math id="mml-ieqn-169"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, our architecture utilizes 2.70 (ratio of 3657 with 1351) times lower FPGA slices than [<xref ref-type="bibr" rid="ref-39">39</xref>]. The reason is the use of a digit-serial multiplier with a digit size of 24-bits in our work while a bit-parallel Karatsuba multiplier is considered for implementation in [<xref ref-type="bibr" rid="ref-39">39</xref>]. The use of bit-parallel multiplier results in lower clock cycles which eventually improves the latency value in [<xref ref-type="bibr" rid="ref-39">39</xref>]. Moreover, our design is 1.85 (ratio of 250 with 135) times faster in terms of operational frequency. Similar to [<xref ref-type="bibr" rid="ref-32">32</xref>], the power comparison is not possible because the corresponding information is not described in the reference design.</p>
</sec>
<sec id="s4_2_2"><label>4.2.2</label><title>Comparison to Key-Authentication Architectures</title>
<p>The most recent design of [<xref ref-type="bibr" rid="ref-14">14</xref>] for key-authentication using ECDH protocol over <inline-formula id="ieqn-170"><mml:math id="mml-ieqn-170"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on Virtex-7 FPGA results in 2.85 (ratio of 5102 with 1789) times higher slices as compared to our work. On the other hand, the design of [<xref ref-type="bibr" rid="ref-14">14</xref>] is 5.09 (ratio of 158.40 with 31.08) times faster in terms of computational time as compared to our architecture. The reason is the use of bit-parallel Karatsuba multiplier in the datapath of [<xref ref-type="bibr" rid="ref-14">14</xref>] while in our design, we employed a digit-serial multiplication approach. Another reason is the use of 2-stage pipelining to shorten the critical path which eventually increases the clock frequency with an area overhead. Power comparison is not possible as the corresponding information is not given in [<xref ref-type="bibr" rid="ref-14">14</xref>].</p>
<p>The efficient implementation of [<xref ref-type="bibr" rid="ref-17">17</xref>] for key authentication using ECDH protocol over <inline-formula id="ieqn-171"><mml:math id="mml-ieqn-171"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> on Artix-7 results in 603 slices that is comparatively lower than our work (1389). On the other hand, the architecture of [<xref ref-type="bibr" rid="ref-17">17</xref>] uses 2 and 21 sizes of 36 and 18&#x2005;kb BRAMs. In addition, it utilizes 38 DSP48A1 FPGA slices. In our design, we are not using the BRAMs as we implemented a RegFile as an array of registers to accommodate the initial, intermediate and final results. Therefore, a fair comparison to area is challenging. Despite the hardware resources, our architecture is 24 (ratio of 247 with 10) times faster in terms of clock frequency. Moreover, our architecture requires 2.40 (ratio of 167.70 with 81.98) times lower computational time (latency). Whenever, the power consumption of [<xref ref-type="bibr" rid="ref-17">17</xref>] is concerned for comparison, our architecture is 29.62 (ratio of 40&#x2005;<italic>mW</italic> with 1.35&#x2005;<italic>mW</italic> times efficient. The potential reason for higher power consumption and computational time in [<xref ref-type="bibr" rid="ref-17">17</xref>] is the support for various cryptographic algorithms such as SHA (Secure Hash Algorithm) for secure hashing while our design is specific to the ECDH protocol.</p>
<p>On similar Spartan-6 device, the design of [<xref ref-type="bibr" rid="ref-24">24</xref>] over <inline-formula id="ieqn-172"><mml:math id="mml-ieqn-172"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> for ECDH implementation is 2.41 (ratio of 13663 with 5652) times less area efficient as compared to our work. It is due to the employment of several finite field multipliers in their architecture. O the other hand, we have utilized a single serial multiplier. Moreover, the proposed design provides a speedup of 7 (ratio of 231 with 33), as far as the operational frequency is concerned. In terms of latency, the proposed design requires 2.60 (ratio of 87.66 with 33.60) times the higher computational cost. The cause is the parallelism using multiple finite field multipliers in [<xref ref-type="bibr" rid="ref-24">24</xref>]. The dynamic power in [<xref ref-type="bibr" rid="ref-24">24</xref>] at 33&#x2005;<italic>MHz</italic> is 571&#x2005;<italic>mW</italic> which is comparatively 435.8 (ratio of 571&#x2005;<italic>mW</italic> with 1.31&#x2005;<italic>mW</italic>) times higher than this work. The Artix-7 design of [<xref ref-type="bibr" rid="ref-27">27</xref>] over <inline-formula id="ieqn-173"><mml:math id="mml-ieqn-173"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> results in higher slices (i.e., 8847) as compared to our design whereas we have used only 1389. The reason is the additional encryption and decryption operations along with the ECDH protocol implementation while we have considered only the ECDH protocol for shared key computation. Due to the simpler datapath in our design, the operational frequency is 1.07 (ratio of 247 with 229) times higher. The comparison to latency is not possible as their architecture results in encryption and decryption time while we have computed a shared key generation without the encryption and decryption operations.</p>
</sec>
</sec>
</sec>
<sec id="s5"><label>5</label><title>Conclusions</title>
<p>This article has proposed a flexible coprocessor key-authentication architecture for 80/112-bit security-related applications over <inline-formula id="ieqn-174"><mml:math id="mml-ieqn-174"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> with <inline-formula id="ieqn-175"><mml:math id="mml-ieqn-175"><mml:mi>m</mml:mi><mml:mo>=</mml:mo><mml:mn>163</mml:mn></mml:math></inline-formula> and <inline-formula id="ieqn-176"><mml:math id="mml-ieqn-176"><mml:mn>233</mml:mn></mml:math></inline-formula> using an ECDH protocol. The flexibility is achieved by using a serial input/output interface to load/produce secret, public, and shared keys. Moreover, a finite field digit-serial multiplier architecture with a digit size of 24-bits is proposed using shift and accumulate methods. Two FSM controllers have been implemented to efficiently generate the control signals. The implementation results are reported on Xilinx Virtex-7 FPGA. Over <inline-formula id="ieqn-177"><mml:math id="mml-ieqn-177"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-178"><mml:math id="mml-ieqn-178"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>, the utilized hardware resources in terms of FPGA slices are 1351 and 1789. For similar key lengths, the operational clock frequency is 250 and 235&#x2005;<italic>MHz</italic>. The time required to compute one public key over <inline-formula id="ieqn-179"><mml:math id="mml-ieqn-179"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-180"><mml:math id="mml-ieqn-180"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 40.50 and 79.20&#x2005;&#x03BC;s, respectively. Similarly, the time for one shared key generation is 81.00 and 158.40&#x2005;&#x03BC;s. The consumed power over <inline-formula id="ieqn-181"><mml:math id="mml-ieqn-181"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> and <inline-formula id="ieqn-182"><mml:math id="mml-ieqn-182"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> is 0.91 and 1.37&#x2005;<italic>mW</italic>, respectively. Consequently, the proposed architecture outperforms state-of-the-art ECDH designs in terms of hardware resources.</p>
</sec>
</body>
<back>
<fn-group>
<fn fn-type="other"><p><bold>Funding Statement:</bold> This project has received funding by the NSTIP Strategic Technologies program under Grant Number 14-415 ELE1448-10, King Abdul Aziz City of Science and Technology of the Kingdom of Saudi Arabia.</p></fn>
<fn fn-type="conflict"><p><bold>Conflicts of Interest:</bold> The authors declare that they have no conflicts of interest to report regarding the present study.</p></fn>
</fn-group>
<ref-list content-type="authoryear">
<title>References</title>
<ref id="ref-1"><label>[1]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Rana</surname></string-name>, <string-name><given-names>Q.</given-names> <surname>Mamun</surname></string-name> and <string-name><given-names>R.</given-names> <surname>Islam</surname></string-name></person-group>, &#x201C;<article-title>Lightweight cryptography in IoT networks: A survey</article-title>,&#x201D; <source>Future Generation Computer Systems</source>, vol. <volume>129</volume>, pp. <fpage>77</fpage>&#x2013;<lpage>89</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-2"><label>[2]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Imran</surname></string-name>, <string-name><given-names>A. R.</given-names> <surname>Jafri</surname></string-name> and <string-name><given-names>T. F.</given-names> <surname>Al-Somani</surname></string-name></person-group>, &#x201C;<article-title>Flexible architectures for cryptographic algorithms: A systematic literature review</article-title>,&#x201D; <source>Journal of Circuits Systems and Computers (JCSC)</source>, vol. <volume>28</volume>, no. <issue>3</issue>, pp. <fpage>35</fpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-3"><label>[3]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>E.</given-names> <surname>Anaya</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Patel</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Shah</surname></string-name>, <string-name><given-names>V.</given-names> <surname>Shah</surname></string-name> and <string-name><given-names>Y.</given-names> <surname>Cheng</surname></string-name></person-group>, &#x201C;<article-title>A performance study on cryptographic algorithms for IoT devices</article-title>,&#x201D; in <conf-name>Proc. of the Tenth ACM Conf. on Data and Application Security and Privacy</conf-name>, <conf-loc>New York, USA</conf-loc>, pp. <fpage>159</fpage>&#x2013;<lpage>161</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-4"><label>[4]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>Miri</surname></string-name></person-group>, &#x201C;<source>Advanced Security and Privacy for RFID Technologies</source>,&#x201D; <publisher-loc>Hershey, PA</publisher-loc>: <publisher-name>IGI Global</publisher-name>, pp. <fpage>1</fpage>&#x2013;<lpage>342</lpage>, <year>2013</year>. [Online]. Available: <uri xlink:href="https://www.igi-global.com/gateway/book/72161">https://www.igi-global.com/gateway/book/72161</uri>.</mixed-citation></ref>
<ref id="ref-5"><label>[5]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Imran</surname></string-name> and <string-name><given-names>F.</given-names> <surname>Shehzad</surname></string-name></person-group>, &#x201C;<article-title>FPGA based crypto processor for elliptic curve point multiplication (ECPM) over</article-title> <inline-formula id="ieqn-183"><mml:math id="mml-ieqn-183"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>233</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula>,&#x201D; <source>International Journal for Information Security Research (IJISR)</source>, vol. <volume>7</volume>, pp. <fpage>706</fpage>&#x2013;<lpage>713</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-6"><label>[6]</label><mixed-citation publication-type="book"><person-group person-group-type="author"><string-name><given-names>D.</given-names> <surname>Hankerson</surname></string-name>, <string-name><given-names>A. J.</given-names> <surname>Menezes</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Vanstone</surname></string-name></person-group>, &#x201C;<source>Guide to Elliptic Curve Cryptography</source>,&#x201D; <publisher-loc>Henderson, NV, USA</publisher-loc>: <publisher-name>Springer</publisher-name>, pp. <fpage>1</fpage>&#x2013;<lpage>311</lpage>, <year>2004</year>. [Online]. Available: <uri xlink:href="https://link.springer.com/book/10.1007/b97644">https://link.springer.com/book/10.1007/b97644</uri>.</mixed-citation></ref>
<ref id="ref-7"><label>[7]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Housley</surname></string-name></person-group>, &#x201C;<source>Use of the Elliptic Curve Diffie-Hellman Key Agreement Algorithm with x25519 and x448 in the Cryptographic Message Syntax (CMS)</source>,&#x201D; RFC 8418, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2018</year>. [Online]. Available: <uri xlink:href="https://www.rfc-editor.org/info/rfc8418">https://www.rfc-editor.org/info/rfc8418</uri>.</mixed-citation></ref>
<ref id="ref-8"><label>[8]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Pornin</surname></string-name></person-group>, &#x201C;<source>Deterministic Usage of the Digital Signature Algorithm (DSA) and Elliptic Curve Digital Signature Algorithm (ECDSA)</source>,&#x201D; RFC 6979, pp. <fpage>1</fpage>&#x2013;<lpage>79</lpage>, <year>2013</year>. [Online]. Available: <uri xlink:href="https://www.rfc-editor.org/info/rfc6979">https://www.rfc-editor.org/info/rfc6979</uri>.</mixed-citation></ref>
<ref id="ref-9"><label>[9]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Turner</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Brown</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Yiu</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Housley</surname></string-name> and <string-name><given-names>T.</given-names> <surname>Polk</surname></string-name></person-group>, &#x201C;<source>Elliptic Curve Cryptography Subject Public Key Information</source>,&#x201D; RFC 5480, pp. <fpage>1</fpage>&#x2013;<lpage>20</lpage>, <year>2009</year>. [Online]. Available: <uri xlink:href="https://www.rfc-editor.org/info/rfc5480">https://www.rfc-editor.org/info/rfc5480</uri>.</mixed-citation></ref>
<ref id="ref-10"><label>[10]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>N.</given-names> <surname>Pirotte</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Vliegen</surname></string-name>, <string-name><given-names>L.</given-names> <surname>Batina</surname></string-name> and <string-name><given-names>N.</given-names> <surname>Mentens</surname></string-name></person-group>, &#x201C;<article-title>Design of a fully balanced ASIC coprocessor implementing complete addition formulas on weierstrass elliptic curves</article-title>,&#x201D; in <conf-name>21st Euromicro Conf. on Digital System Design (DSD)</conf-name>, <conf-loc>Prague, Czech Republic</conf-loc>, pp. <fpage>545</fpage>&#x2013;<lpage>552</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-11"><label>[11]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Imran</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name>, <string-name><given-names>A. R.</given-names> <surname>Jafri</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Kashif</surname></string-name></person-group>, &#x201C;<article-title>Throughput/area optimised pipelined architecture for elliptic curve crypto processor</article-title>,&#x201D; <source>IET Computers &#x0026; Digital Techniques</source>, vol. <volume>13</volume>, no. <issue>5</issue>, pp. <fpage>361</fpage>&#x2013;<lpage>368</lpage>, <year>2019</year>.</mixed-citation></ref>
<ref id="ref-12"><label>[12]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>B.</given-names> <surname>Rashidi</surname></string-name></person-group>, &#x201C;<article-title>Low-cost and fast hardware implementations of point multiplication on binary edwards curves</article-title>,&#x201D; in <conf-name>Electrical Engineering (ICEE), Iranian Conf. on</conf-name>, <conf-loc>Mashhad, Iran</conf-loc>, pp. <fpage>17</fpage>&#x2013;<lpage>22</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-13"><label>[13]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Imran</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name> and <string-name><given-names>I.</given-names> <surname>Shafi</surname></string-name></person-group>, &#x201C;<article-title>Lopez dahab based elliptic crypto processor (ECP) over</article-title> <inline-formula id="ieqn-184"><mml:math id="mml-ieqn-184"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mn>163</mml:mn></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <article-title>for low-area applications on FPGA</article-title>,&#x201D; in <conf-name>2018 Int. Conf. on Engineering and Emerging Technologies (ICEET)</conf-name>, <conf-loc>Lahore, Pakistan</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-14"><label>[14]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>S. Z.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Bahkali</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Alhomoud</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Throughput/area optimized architecture for elliptic-curve diffie-hellman protocol</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>12</volume>, no. <issue>8</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-15"><label>[15]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>J.</given-names> <surname>Vliegen</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Mentens</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Genoe</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Braeken</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Kubera</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>A compact FPGA-based architecture for elliptic curve cryptography over prime fields</article-title>,&#x201D; in <conf-name>21st IEEE Int. Conf. on Application-Specific Systems, Architectures and Processors</conf-name>, <conf-loc>Rennes, France</conf-loc>, pp. <fpage>313</fpage>&#x2013;<lpage>316</lpage>, <year>2010</year>.</mixed-citation></ref>
<ref id="ref-16"><label>[16]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><collab>NIST</collab></person-group>. &#x201C;<source>Recommended Elliptic Curves for Federal Government Use</source>,&#x201D; FIPS PUB 1862&#x2013;2: USA, pp. <fpage>1</fpage>&#x2013;<lpage>70</lpage>, <year>1999</year>. [Online]. Available: <uri xlink:href="https://csrc.nist.gov/csrc/media/publications/fips/186/2/archive/2000-01-27/documents/fips186-2.pdf">https://csrc.nist.gov/csrc/media/publications/fips/186/2/archive/2000-01-27/documents/fips186-2.pdf</uri>.</mixed-citation></ref>
<ref id="ref-17"><label>[17]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A.</given-names> <surname>De la Piedra</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Braeken</surname></string-name> and <string-name><given-names>A.</given-names> <surname>Touhafi</surname></string-name></person-group>, &#x201C;<article-title>Extending the IEEE 802.15.4 security suite with a compact implementation of the NIST P-192/B-163 elliptic curves</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>13</volume>, no. <issue>8</issue>, pp. <fpage>9704</fpage>&#x2013;<lpage>9728</lpage>, <year>2013</year>.</mixed-citation></ref>
<ref id="ref-18"><label>[18]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>D.</given-names> <surname>Liu</surname></string-name>, <string-name><given-names>X.</given-names> <surname>Zou</surname></string-name>, <string-name><given-names>H.</given-names> <surname>Lin</surname></string-name> and <string-name><given-names>J.</given-names> <surname>Cheng</surname></string-name></person-group>, &#x201C;<article-title>Design of an elliptic curve cryptography processor for RFID tag chips</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>14</volume>, no. <issue>10</issue>, pp. <fpage>17883</fpage>&#x2013;<lpage>17904</lpage>, <year>2014</year>.</mixed-citation></ref>
<ref id="ref-19"><label>[19]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>W. K.</given-names> <surname>Lee</surname></string-name> and <string-name><given-names>S. O.</given-names> <surname>Hwang</surname></string-name></person-group>, &#x201C;<article-title>A flexible gimli hardware implementation in FPGA and its application to RFID authentication protocols</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>9</volume>, pp. <fpage>105327</fpage>&#x2013;<lpage>105340</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-20"><label>[20]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>A. S. R.</given-names> <surname>Oliveira</surname></string-name>, <string-name><given-names>N. B.</given-names> <surname>Carvalho</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Santos</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Boaventura</surname></string-name>, <string-name><given-names>R. F.</given-names> <surname>Cordeiro</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>All-digital RFID readers: An RFID reader implemented on an FPGA chip and/or embedded processor</article-title>,&#x201D; <source>IEEE Microwave Magazine</source>, vol. <volume>22</volume>, no. <issue>3</issue>, pp. <fpage>18</fpage>&#x2013;<lpage>24</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-21"><label>[21]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name>, <string-name><given-names>S. S.</given-names> <surname>Jamal</surname></string-name>, <string-name><given-names>S. Z.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>A. R.</given-names> <surname>Alharbi</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Aljaedi</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Elliptic-curve crypto processor for RFID applications</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>11</volume>, no. <issue>15</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-22"><label>[22]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>T. D. P.</given-names> <surname>Bai</surname></string-name>, <string-name><given-names>K. M.</given-names> <surname>Raj</surname></string-name> and <string-name><given-names>S. A.</given-names> <surname>Rabara</surname></string-name></person-group>, &#x201C;<article-title>Elliptic curve cryptography based security framework for internet of things (IoT) enabled smart card</article-title>,&#x201D; in <conf-name>2017 World Congress on Computing and Communication Technologies (WCCCT)</conf-name>, <conf-loc>Tiruchirappalli, India</conf-loc>, pp. <fpage>43</fpage>&#x2013;<lpage>46</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-23"><label>[23]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><string-name><given-names>C.</given-names> <surname>Ankita</surname></string-name></person-group>, &#x201C;<source>Wireless Sensor Networks</source>,&#x201D; electroSome, <year>2013</year>. [Online]. Available: <uri xlink:href="https://electrosome.com/wireless-sensor-networks/#google_vignette">https://electrosome.com/wireless-sensor-networks/#google_vignette</uri>.</mixed-citation></ref>
<ref id="ref-24"><label>[24]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>L.</given-names> <surname>Parrilla</surname></string-name>, <string-name><given-names>D. P.</given-names> <surname>Morales</surname></string-name>, <string-name><given-names>J. A.</given-names> <surname>L&#x00F3;pez-Villanueva</surname></string-name>, <string-name><given-names>J. A.</given-names> <surname>L&#x00F3;pez-Ramos</surname></string-name> and <string-name><given-names>J. A.</given-names> <surname>&#x00C1;lvarez-Bermejo</surname></string-name></person-group>, &#x201C;<article-title>Hardware implementation of a new ECC key distribution protocol for securing wireless sensor networks</article-title>,&#x201D; in <conf-name>2015 Conf. on Design of Circuits and Integrated Systems (DCIS)</conf-name>, <conf-loc>Estoril, Portugal</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>6</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-25"><label>[25]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Peter</surname></string-name>, <string-name><given-names>O.</given-names> <surname>Stecklina</surname></string-name>, <string-name><given-names>J.</given-names> <surname>Portilla</surname></string-name>, <string-name><given-names>E.</given-names> <surname>de la Torre</surname></string-name>, <string-name><given-names>P.</given-names> <surname>Langendoerfer</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>Reconfiguring crypto hardware accelerators on wireless sensor nodes</article-title>,&#x201D; in <conf-name>6th IEEE Annual Communications Society Conf. on Sensor, Mesh and Ad Hoc Communications and Networks Workshops</conf-name>, <conf-loc>Rome, Italy</conf-loc>, pp. <fpage>1</fpage>&#x2013;<lpage>3</lpage>, <year>2009</year>.</mixed-citation></ref>
<ref id="ref-26"><label>[26]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>P.</given-names> <surname>Jilna</surname></string-name>, <string-name><given-names>P. P.</given-names> <surname>Deepthi</surname></string-name> and <string-name><given-names>U. K.</given-names> <surname>Jayaraj</surname></string-name></person-group>, &#x201C;<article-title>Optimized hardware design and implementation of EC based key management scheme for WSN</article-title>,&#x201D; in <conf-name>10th Int. Conf. for Internet Technology and Secured Transactions (ICITST)</conf-name>, <conf-loc>London, UK</conf-loc>, pp. <fpage>164</fpage>&#x2013;<lpage>169</lpage>, <year>2015</year>.</mixed-citation></ref>
<ref id="ref-27"><label>[27]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>G.</given-names> <surname>Leelavathi</surname></string-name>, <string-name><given-names>K.</given-names> <surname>Shaila</surname></string-name> and <string-name><given-names>K. R.</given-names> <surname>Venugopal</surname></string-name></person-group>, &#x201C;<article-title>Elliptic curve cryptography implementation on FPGA using montgomery multiplication for equal key and data size over</article-title> <inline-formula id="ieqn-185"><mml:math id="mml-ieqn-185"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <article-title>for wireless sensor networks</article-title>,&#x201D; in <conf-name>IEEE Region 10 Conf. (TENCON)</conf-name>, <conf-loc>Singapore</conf-loc>, pp. <fpage>468</fpage>&#x2013;<lpage>471</lpage>, <year>2016</year>.</mixed-citation></ref>
<ref id="ref-28"><label>[28]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Das</surname></string-name> and <string-name><given-names>Z.</given-names> <surname>Wang</surname></string-name></person-group>, &#x201C;<article-title>ED25519: A new secure compatible elliptic curve for mobile wireless networks security</article-title>,&#x201D; <source>Jordanian Journal of Computers and Information Technology (JJCIT)</source>, vol. <volume>8</volume>, no. <issue>1</issue>, pp. <fpage>57</fpage>&#x2013;<lpage>71</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-29"><label>[29]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>U.</given-names> <surname>Gulen</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Baktir</surname></string-name></person-group>, &#x201C;<article-title>Elliptic curve cryptography for wireless sensor networks using the number theoretic transform</article-title>,&#x201D; <source>Sensors</source>, vol. <volume>20</volume>, no. <issue>5</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>16</lpage>, <year>2020</year>.</mixed-citation></ref>
<ref id="ref-30"><label>[30]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. C.</given-names> <surname>Seo</surname></string-name> and <string-name><given-names>H.</given-names> <surname>Seo</surname></string-name></person-group>, &#x201C;<article-title>Highly efficient implementation of NIST-compliant koblitz curve for 8-bit AVR-based sensor nodes</article-title>,&#x201D; <source>IEEE Access</source>, vol. <volume>6</volume>, pp. <fpage>67637</fpage>&#x2013;<lpage>67652</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-31"><label>[31]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>Z.</given-names> <surname>Razali</surname></string-name>, <string-name><given-names>N.</given-names> <surname>Muslim</surname></string-name>, <string-name><given-names>S.</given-names> <surname>Kahar</surname></string-name>, <string-name><given-names>F.</given-names> <surname>Yunos</surname></string-name> and <string-name><given-names>K.</given-names> <surname>Mohamed</surname></string-name></person-group>, &#x201C;<article-title>Improved point 5P formula for twisted edwards curve in projective coordinate over prime field</article-title>,&#x201D; in <conf-name>Int. Conf. on Decision Aid Sciences and Applications (DASA)</conf-name>, <conf-loc>Chiangrai, Thailand</conf-loc>, pp. <fpage>498</fpage>&#x2013;<lpage>502</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-32"><label>[32]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>R.</given-names> <surname>Amiri</surname></string-name> and <string-name><given-names>O.</given-names> <surname>Elkeelany</surname></string-name></person-group>, &#x201C;<article-title>FPGA design of elliptic curve cryptosystem (ECC) for isomorphic transformation and EC ElGamal encryption</article-title>,&#x201D; <source>IEEE Embedded Systems Letters</source>, vol. <volume>13</volume>, no. <issue>2</issue>, pp. <fpage>65</fpage>&#x2013;<lpage>68</lpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-33"><label>[33]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Devi</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Mahajan</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Bagai</surname></string-name></person-group>, &#x201C;<article-title>A low complexity bit parallel polynomial basis systolic multiplier for general irreducible polynomials and trinomials</article-title>,&#x201D; <source>Microelectronics Journal</source>, vol. <volume>115</volume>, pp. <fpage>105163</fpage>, <year>2021</year>.</mixed-citation></ref>
<ref id="ref-34"><label>[34]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S.</given-names> <surname>Devi</surname></string-name>, <string-name><given-names>R.</given-names> <surname>Mahajan</surname></string-name> and <string-name><given-names>D.</given-names> <surname>Bagai</surname></string-name></person-group>, &#x201C;<article-title>Low complexity design of bit parallel polynomial basis systolic multiplier using irreducible polynomials</article-title>,&#x201D; <source>Egyptian Informatics Journal</source>, vol. <volume>23</volume>, no. <issue>1</issue>, pp. <fpage>105</fpage>&#x2013;<lpage>112</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-35"><label>[35]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>S. E.</given-names> <surname>Mathe</surname></string-name> and <string-name><given-names>L.</given-names> <surname>Boppana</surname></string-name></person-group>, &#x201C;<article-title>Bit-parallel systolic multiplier over</article-title> <inline-formula id="ieqn-186"><mml:math id="mml-ieqn-186"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <article-title>for irreducible trinomials with ASIC and FPGA implementations</article-title>,&#x201D; <source>IET Circuits, Devices &#x0026; Systems</source>, vol. <volume>12</volume>, no. <issue>4</issue>, pp. <fpage>315</fpage>&#x2013;<lpage>325</lpage>, <year>2018</year>.</mixed-citation></ref>
<ref id="ref-36"><label>[36]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Thirumoorthi</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Heidarpur</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Mirhassani</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Khalid</surname></string-name></person-group>, &#x201C;<article-title>An optimized m-term karatsuba-like binary polynomial multiplier for finite field arithmetic</article-title>,&#x201D; <source>IEEE Transactions on Very Large Scale Integration (VLSI) Systems</source>, vol. <volume>30</volume>, no. <issue>5</issue>, pp. <fpage>603</fpage>&#x2013;<lpage>614</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-37"><label>[37]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>H.</given-names> <surname>Kumar</surname></string-name>, <string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name>, <string-name><given-names>A.</given-names> <surname>Alhomoud</surname></string-name>, <string-name><given-names>S. Z.</given-names> <surname>Khan</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Bahkali</surname></string-name> <etal>et al.,</etal></person-group> &#x201C;<article-title>A scalable digit-parallel polynomial multiplier architecture for NIST-standardized binary elliptic curves</article-title>,&#x201D; <source>Applied Sciences</source>, vol. <volume>12</volume>, no. <issue>9</issue>, pp. <fpage>1</fpage>&#x2013;<lpage>18</lpage>, <year>2022</year>.</mixed-citation></ref>
<ref id="ref-38"><label>[38]</label><mixed-citation publication-type="journal"><person-group person-group-type="author"><string-name><given-names>T.</given-names> <surname>Itoh</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Tsujii</surname></string-name></person-group>, &#x201C;<article-title>A fast algorithm for computing multiplicative inverses in</article-title> <inline-formula id="ieqn-187"><mml:math id="mml-ieqn-187"><mml:mi>G</mml:mi><mml:mi>F</mml:mi><mml:mrow><mml:mo>(</mml:mo><mml:msup><mml:mn>2</mml:mn><mml:mrow><mml:mi>m</mml:mi></mml:mrow></mml:msup><mml:mo>)</mml:mo></mml:mrow></mml:math></inline-formula> <article-title>using normal bases</article-title>,&#x201D; <source>Information and Computation</source>, vol. <volume>78</volume>, no. <issue>3</issue>, pp. <fpage>171</fpage>&#x2013;<lpage>177</lpage>, <year>1988</year>.</mixed-citation></ref>
<ref id="ref-39"><label>[39]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Imran</surname></string-name>, <string-name><given-names>I.</given-names> <surname>Shafi</surname></string-name>, <string-name><given-names>A. R.</given-names> <surname>Jafri</surname></string-name> and <string-name><given-names>M.</given-names> <surname>Rashid</surname></string-name></person-group>, &#x201C;<article-title>Hardware design and implementation of ECC based crypto processor for low-area-applications on FPGA</article-title>,&#x201D; in <conf-name>Int. Conf. on Open Source Systems &#x0026; Technologies (ICOSST)</conf-name>, <conf-loc>Lahore, Pakistan</conf-loc>, pp. <fpage>54</fpage>&#x2013;<lpage>59</lpage>, <year>2017</year>.</mixed-citation></ref>
<ref id="ref-40"><label>[40]</label><mixed-citation publication-type="web"><person-group person-group-type="author"><collab>Xilinx</collab></person-group>, &#x201C;<source>Power Analysis and Optimization</source>,&#x201D; AMD Xilinx, UG907: USA, pp. <fpage>1</fpage>&#x2013;<lpage>112</lpage>. 2016. [Online]. Available: <uri xlink:href="https://docs.xilinx.com/v/u/2016.2-English/ug907-vivado-power-analysis-optimization">https://docs.xilinx.com/v/u/2016.2-English/ug907-vivado-power-analysis-optimization</uri>.</mixed-citation></ref>
<ref id="ref-41"><label>[41]</label><mixed-citation publication-type="conf-proc"><person-group person-group-type="author"><string-name><given-names>M.</given-names> <surname>Imran</surname></string-name>, <string-name><given-names>Z. U.</given-names> <surname>Abideen</surname></string-name> and <string-name><given-names>S.</given-names> <surname>Pagliarini</surname></string-name></person-group>, &#x201C;<article-title>An open-source library of large integer polynomial multipliers</article-title>,&#x201D; in <conf-name>24th Int. Symp. on Design and Diagnostics of Electronic Circuits &#x0026; Systems (DDECS)</conf-name>, <conf-loc>Vienna, Austria</conf-loc>, pp. <fpage>145</fpage>&#x2013;<lpage>150</lpage>, <year>2021</year>.</mixed-citation></ref>
</ref-list>
</back>
</article>






