Towards Public Integrity Audition for Cloud-IoT Data Based on Blockchain

With the rapidly developing of Internet of Things (IoT), the volume of data generated by IoT systems is increasing quickly. To release the pressure of data management and storage, more and more enterprises and individuals prefer to integrate cloud service with IoT systems, in which the IoT data can be outsourced to cloud server. Since cloud service provider (CSP) is not fully trusted, a variety of methods have been proposed to deal with the problem of data integrity checking. In traditional data integrity audition schemes, the task of data auditing is usually performed by Third Party Auditor (TPA) which is assumed to be trustful. However, in real-life TPA is not trusted as people thought. Therefore, these schemes suffer from the underlying problem of single-point failure. Moreover, most of the traditional schemes are designed by RSA or bilinear map techniques which consume heavy computation and communication cost. To overcome these shortcomings, we propose a novel data integrity checking scheme for cloud-IoT data based on blockchain technique and homomorphic hash. In our scheme, the tags of all data blocks are computed by a homomorphic hash function and stored in blockchain. Moreover, each step within the process of data integrity checking is signed by the performer, and the signatures are stored in blockchain through smart contracts. As a result, each behavior for data integrity checking in our scheme can be traced and audited which improves the security of the scheme greatly. Furthermore, batch-audition for multiple data challenges is also supported in our scheme. We formalize the system model of our scheme and give the concrete construction. Detailed performance analyses demonstrate that our proposed scheme is efficient and practical without the trust-assumption of TPA.


Introduction
Internet of Things [1] connects a variety of devices such as smartphones, sensors, starwatchers etc. to the Internet. As a result, many applications based on IoT like smart home, smart city, body networks and so on become popular and available which prompt the progress of the human society [2][3][4]. With the fast development of IoT technique [5][6][7], the data generated by IoT systems increases significantly so that traditional methods of data storage cannot match the requirements of data management for IoT systems. Therefore, many enterprises have to outsource their huge IoT data to cloud server [8,9]. By renting the cloud storage service, the IoT data owner's burden of data storage and supervision is reduced greatly. However, cloud service provider (CSP) is only semi-trusted for user, when CSP completely controls the sensitive IoT data, the security and privacy of cloud-IoT data should be solved well [10][11][12]. Consequently, checking the integrity of cloud-IoT data is necessary and crucial for the effective of IoT applications.
A trivial solution for cloud-IoT data integrity audition is to download and check the data in local. However, this simple solution is not practical because the volume of IoT data is normally very large. To address the problem, lots of data integrity auditing schemes for cloud-IoT data have been proposed [8][9][10][11][12][13][14][15][16][17][18][19][20][21][22]. However, these traditional schemes have two main problems: (1) TPA is assumed to be trustful, but in real application scenarios, TPA is not completely trusted. (2) Techniques of RSA or bilinear map which are used by most of these schemes are very expensive, so the performance bottleneck of the schemes is a big problem. Both of the two problems impede the usage of the data integrity auditing scheme in real-life.
Blockchain technique provides a new idea to check the integrity of cloud-IoT data [23]. With the advantages of decentralized, traceable and immutable characteristics, blockchain satisfies the needs of cloud-IoT data integrity checking. All the transactions recorded in the blockchain cannot be tampered and forged. Thus, storing important information of data audition into the blockchain can not only improve the audition performance, but also effectively discover the unreal audition results returned by the untrusted TPA.
Our main contributions of the manuscript are summarized as follows: (1) We present a blockchain-based data integrity audition scheme for cloud-IoT data. We give the description of system model and security model of our scheme. All the algorithms in our scheme are also presented in detail.
(2) We prove the security of our new scheme. In our scheme, the audition results returned by TPA can be verified too so as to resist the attacks from untrusted TPA. Moreover, CSP cannot forge data integrity proof to deceive TPA.
(3) Performance comparison and analysis for multiple schemes are given, various simulation experiments are conducted, and experiment results show that our scheme reduces the computational and communication overhead significantly.

Related Works
Security is the basic and also the most important requirement of data stored on cloud server. By auditing data integrity, user can discover the events of data corruption and lose in time and take effective methods to deal with them. The first data integrity audition protocol was presented by Ateniese et al. [13,14] in 2007. They made use of MAC technique to design two schemes for auditing the integrity of data on remote servers. However, the overhead of communication and computational cost of these two schemes are very large. Chen et al. [15] presented a provable data possession (PDP) model to verify the integrity of the data stored in remote servers. They proposed the concept of blockless verification which was realized by homomorphic verifiable tags to drastically reduced I/O cost However, these schemes are only available for static data regardless of supporting data dynamic operations. Aim to enhance the scalability, However, this scheme is proved insecure [16]. To overcome the security problem, Yan et al. [17,18] proposed improved dynamic PDP schemes which designed a new data structure to support the dynamic operations such as data insert, delete and update. Shen et al. [19,20] concentrated on preserving the privacy of authenticators. Zhu et al. [21,22] focused on the problem of data privacy preserving, they made use of random masking technique to obscure user data when generating proofs so as to protect the data privacy. To eliminate certificate management. Yan et al. [23,24] proposed an identity-based public group data checking scheme with data owner privacy preserving. The scheme hides the identity of data owner in integrity proof so that TPA verifies the proof without knowing the owner of challenged data. Zhang et al. utilized lattices technique to propose a scheme based on identity-based encryption for secure cloud storage [25,26]. To improve the security, Li et al. [27,28] based on certificateless cryptography to present a PDP scheme for data shared within a group in which the trusted group owner is designated to be the PKG. Ming et al. [29,30] presented PDP schemes for data integrity checking which was constructed on certificateless crypto and realized user privacy protection. However, all these schemes delegate TPA to audit the data integrity on behalf of data owners. Since the TPA is not really trustworthy, there existing a security risk that the TPA may response wrong information to data owner.
To solve the problems above, many blockchain-based schemes were proposed recently. Liu et al. [31] stored the hash values of data into blockchain ledger, by which the data owner could check the audition result with smart contract. However, this scheme cannot resist replay attack. Yang et al. [32] made use of MHT to store all proofs so as to enable the behaviors of CSP and data owners' accountable traceability. Yu et al. [33] used blockchain as a data channel to avoid the security threats of TPA, but the user cost is so high that it is not practical. Wang et al. [34] proposed a data integrity scheme for cloud-IoT data by blockchain and bilinear mapping, which introduced provable update mechanism to support dynamic IoT data. Wang et al. [35,36] proposed concrete private blockchain-based schemes which also realized client's privacy preserving. To address the centralized problem of TPA, Dong et al. [37] presented a secure data integrity checking scheme based on consortium blockchain, which also designed a punishment mechanism to punish the TPA who failed to send the audit result in time. Chen et al. [38] described a PDP scheme based on blockchain to realize decentralized cloud storage framework, the scheme also provides dynamic operations for outsourced data. Chen et al. [39,40] considered to distribute the workload to IoT edge nodes to make the scheme more practical, they developed a stochastic blockchain in which only limit nodes can generate block tags. Huang et al. [41] presented a collaborative verification framework based on blockchain for cloud data storage. They use consensus nodes to substitute the single TPA to perform data audition to prevent entities from cheating each other.

Blockchain Technology
Blockchain is essentially a decentralized database in which all transactions in untrusted networks are recorded. A blockchain contains a set of blocks which are linked as a growing list. Each block records many cryptographic information such as the hash value of previous block, a timestamp and transaction data. All blocks are linked by the hash value of previous one. Blocks keep one consistent ledger with the same transaction records which cannot be updated or deleted. Therefore, all the transactions occurred in the networks can be trusted without a centralized third party authority. Moreover, the records on the blockchain are transparent to all nodes, anyone can access the data in the blockchain. Fig. 1 shows the basic structure of blockchain.

Homomorphic Hash Function
The homomorphic hash function [42,43] has the features of secure and efficient which are suitable for constructing data possession proofs such as in [15] and [17]. We first describe the definition of the homomorphic hash function denoted by HðÁÞ in this section.
First, set two basic security parameters p and q , then randomly select two big primes number p and q with jpj ¼ p , jqj ¼ q , qjðp À 1Þ. Suppose message M is consisted of n bit strings: Choose n random values from in Z Ã p with order q to form a vector G ¼ ½g 1 ; g 2 ; Á Á Á ; g n . Thus, the homomorphic hash function is defined as: Obviously, the message M is compressed to one small string by the homomorphic hash function. If any part of the message M is changed, the hash value of the message will change too. Due to this property, the hash function HðÁÞ can be used to audit the integrity of the message. Moreover, the homomorphic feature of HðÁÞ can help to reduce the communication cost greatly. Suppose there are two messages M i and M j , both of which are split into n bit strings: The homomorphic property of HðÁÞ can be confirmed as: For the diversity of messages, the property is also hold.

.1 System Model
Our proposed scheme comprises of four different entities: data collector, CSP, TPA and Blockchain. All of the data collector, CSP and TPA join the Blockchain with smart contracts designed beforehand. The system model is illustrated in Fig. 2. The interactions between entities are described as follows: Data collectors (DC) are the entities who generate or collect huge original IoT data. DC is the owner of these data. After collecting the data, DC splits data into several data blocks and generates a tag for each block. Then DC outsources the blocks to CSP and uploads the tags to Blockchain by the corresponding smart contracts.
CSP supplies data storage and management services, DC rents CSP's service and outsource data to CSP. CSP maintains DC's data and responds data integrity challenges from TPA.
TPA audits the integrity of data stored in CSP. TPA submits random data integrity challenges to CSP. By checking the rightness of the proof returned from CSP, TPA gets the audition result for the data and reports it to DC.
BlockChain is an entity that works as a trusted open database in the system. DC, CSP and TPA can access information in Blockchain and also can store information to Blockchain. All of the behaviors between DC, CSP, TPA and Blockchain are conducted by smart contracts.

Security Assumption
In our system, the Blockchain is assumed to be trustful who stores and maintains the ledger honestly. However, the CSP is assumed to be semi-trusted. Namely, CSP can execute the audition protocol honestly, but may deceive TPA with forged proofs. Likewise, TPA is also considered to be semi-trusted, because it may be tempted by illegal profit to give a fake audition result to data collector. Therefore, the security of our scheme should include two aspects, the first is to resist attacks from untrusted CSP who generates forged proofs, the second is to resist attacks from untrusted TPA who lies to data collector with fake audition results. According to Refs. [24][25][26][27][28][29][30], we mainly consider three security threats brought by CSP as follows: Forgery attack: CSP forges a data integrity proof to deceive TPA. Replay attack: CSP sends previous valid proofs to bypass the current challenge.
However, because CSP only stores user data, the three attacks for our scheme is essentially the same one, that is CSP generates the proof with wrong data. No matter how the wrong data is produced, the results of the three attacks are the same. Based on the analysis, we define the security of our scheme to resist the attacks of CSP as: Define 1: Our blockchain-based data integrity checking scheme is secure, if the CSP can not generate valid proof to pass the integrity audition without real data.
Strictly speaking, there is no direct method for data collector to verify the truth of audition result returned from TPA because data collector is completely out of the audition process. However, if the audition result from TPA can be audited too, untrusted TPA will be more carefully to deal with the audition result, especially when a huge compensation is along with the fake audition results. Therefore, the security for our scheme to resist the attacks from untrusted TPA can be defined as: Define 2: Our blockchain-based data integrity auditing scheme can resist the attacks from TPA, if the data audition result reported by TPA can be verified.

Outline of Our Scheme
Our blockchain-based auditing scheme for cloud-IoT data is consisted of five algorithms which are described as follows: Setup: This algorithm is responsible for generating system parameters p , q , n and G for the homomorphic hash function HðÁÞ.
TagGen: This algorithm computes a tag T i for each block m i by the homomorphic hash function HðÁÞ.
Challenge: TPA uses this algorithm to output a data integrity challenge chal.
ProofGen: The algorithm outputs a data integrity proof P for the challenge chal.
Audit: TPA calls this algorithm to verify the rightness of P. If P passes the verification, the algorithm outputs '1', otherwise it outputs '0'.

The Proposed Scheme
In this section, we give the detailed construction of our blockchain-based data integrity checking scheme.
Setup: the DC sets two security parameters p and q , then selects two big primes p and q with jpj ¼ p , jqj ¼ q , qjðp À 1Þ. Choose n random values from Z Ã p to compose a vector G ¼ ½g 1 ; g 2 ; Á Á Á ; g n where every value g i has order q. Select a secure and efficient signature scheme sig which is used to sign all behaviors in the process of data integrity checking. DC chooses a signing key pair (dssk; dspk). Likewise, CSP chooses a signing key pair (cssk; cspk) and TPA who offers the data audition service to DC also has a signing key pair (assk; aspk).
TagGen: suppose M identified with Fid is the data to be outsourced on CSP. DC splits M to a blocks denoted as M ¼ ðm 1 ; m 2 ; Á Á Á ; m a Þ, then further split each block to n sectors: m i ¼ ðm i1 ; m i2 ; Á Á Á ; m in Þ. For each block m i (1 i a), to compute the tag T i by the homomorphic hash function HðÁÞ: So the proof can be computed as: Then CSP returns fP; r P CSP g to TPA where r P CSP is the signature of r P CSP ¼r cssk ðPÞ. Audit: Upon receiving the proof P, TPA first verifies the correctness of r P CSP ¼Sig cssk ðPÞ with the public signing key of cspk. If the r P CSP is valid, TPA accesses the blockchain and gets all the corresponding tags of challenged blocks from blockchain. TPA checks the following equation: If the Eq. (3) holds, the algorithm sets R au ¼ 1, otherwise R au ¼ 0. Then, TPA returns R au to DC and uploads fchal; R au ; P; r P CSP ; r P TPA g to blockchain where r P TPA ¼ Sig tssk ðchal; R au ; P; r P CSP Þ is the signature of ðchal; R au ; P; r P CSP Þ . The correctness of Audit algorithm can be confirmed as: Our scheme also supports the function of batch-auditing which means multiple-data can be audited by once. Suppose t different data files M 1 , M 2 ,…,M t are outsourced on CSP, TPA can audit all these files by one challenge. The updated algorithm of ProofGen and Audit are described as follows: ProofGen: Upon receiving the chal ¼ fs 1 ; s 2 ; Á Á Á ; s x g, CSP gets all the corresponding blocks from all files which can be denoted as: fm 1 s 1 ; m 1 s 2 ; Á Á Á ; m 1 s x g,…, fm t s 1 ; m t s 2 ; Á Á Á ; m t s x g, and computes m 1 Then the proof is computed as: Audit: Upon receiving the proof P, TPA gets all the corresponding tags from blockchain and checks the following equation: If the Eq. (5) holds, the algorithm outputs '1', otherwise outputs '0'.
The new Audit can be verified as: 6 Security Proof and Performance Analysis 6.1 Security Proof In this section we prove that our new blockchain-based scheme is secure against all the attacks defined in Section 3.2.
Theorem 1. If the homomorphic hash function is collision free, our blockchain-based data integrity checking scheme is secure.
With the challenge chal ¼ fs 1 ; s 2 ; Á Á Á ; s x g, we assume CSP successfully deceived the TPA by a forged proof P 0 in which the block m is changed to m 0 . According to ProofGen, P 0 ¼ Hðm s 1 Þ þ Á Á Á þHðm 0 Þ þ Á Á Á þ Hðm s x Þ. Since P 0 can pass the audition, there must be P 0 ¼ HðmÞ Â Á Á Á Â Hðm s x Þ. Due to the homomorphic property of HðÁÞ, the equation above can be deduced to: Thus, it is easy to get Hðm 0 Þ ¼ HðmÞ which is obviously contrast to the security property of the homomorphic hash function of HðÁÞ. Therefore, the theorem 1 is proved.
Theorem 2: our scheme is secure to resist the attacks from TPA, if the signature scheme Sig selected for our scheme is secure.
Proof: From the algorithm ProofGen, we can see that each proof is signed by CSP with the signature scheme Sig. With the signature r P CSP , TPA can ensure that the proof P is generated by CSP. According to the algorithm Audit, TPA uploads all the values used among this challenge-response process to blockchain after checking the correctness of the proof P. Moreover, TPA signs all these values with the signature scheme Sig to get the signature r P TPA which is stored in blockchain too. Obviously, with these values, data collector can audit the TPA's audition behaviors by replaying the challenge process.
Specifically, data collector randomly chooses one record fchal; R au ; P; r P CSP ; r P TPA g from blockchain, then checks the validity of r P TPA with the public singing key tspk of TPA. If the r P TPA passes the verification, it is no doubt all these values are generated by TPA. Data collector gets the chal from the record and sends the chal to CSP to get the integrity proof P. Finally, data collector calls the Audit algorithm to verify the correctness of P. If the verification result is not equal to R au , data collector believes that TPA has lied before. With these audition proofs, data collect can get huge compensation from TPA and terminates the cooperation with TPA. Therefore, if the signature scheme Sig is secure, our scheme can resist the attacks from TPA.

Performance Evaluation
We present the performance analysis of our scheme in this section. Let E, P, C Add , C mul denote the costs of exponentiation, pairing, addition and multiplication respectively which have different values in different experimental environments. The summaries of the computational cost for the four algorithms are listed below: DC runs the Setup algorithm to generate parameters for homomorphic hash. Because the values of p, q and G are selected randomly, the computational cost of Setup cannot ensured strictly. However, according to [12], the average time of Setup is very low. Moreover, Setup runs only once in the system, it brings little impact on the performance of the whole system. Suppose a block is cut into jnj sectors, the computation cost of TagGen is jnj Á ðE þ C mul Þ. The ProofGen costs jnj Á ðE þ C mul þC add Þ and the Audit costs jcj Á C mul where jcj denotes total number of challenged blocks for one integrity audition.
To exhibit the validity of our scheme, we make comparative analyses of our scheme with other two existing blockchain-based schemes in Tab. 1, in which jbj denotes the number of data blocks, jnj denotes the number of sectors in one data block and jcj denotes the number of challenged blocks for one integrity audition.
In [35] and [37], the data block won't be divided further into several sectors, which means each data block has only one sector. Therefore, the computational costs of these two schemes only depend on jbj and jcj. However, in our scheme each block is split into jnj sectors, the value jnj impacts the performance deeply especially in the phases of tag generation and proof generation. Outwardly, our scheme consumes more costs than that in other two schemes because of the value jnj. In fact, our scheme can deal with jnj times longer data block than in the schemes of [35] and [37]. If we compare the three schemes at the same level with jnj¼ 1, it is easy to get that our scheme has the best performance.
The communication cost of DC is jbj Á jnj Á jqj þ jbj Á jpj þ 2ðjSigj þ jFidj þ jID DC jÞ which mainly contains the data M and all tags. To verify data integrity, TPA sends a challenge with the size of 4jcj to CSP and CSP returns the proof fP; r P CSP g whose length is jpj þ jSigj. Easy to see that the communication cost of our scheme is very low especially in the process of data integrity checking. Tab. 2 shows that the scheme of [35] is the most efficient one of the three schemes in tag-generation step, and our scheme is a little more expensive than that of [35] but more efficient than the scheme in [37]. Further, 0.236 seconds for dealing with 1M data is practical for real application.
Next, we make experiments to evaluate the 'proof generation' performance of the three schemes. We set up total 2000 blocks in each scheme and keep other parameters the same as in the first experiment. The experimental results are shown in Fig. 3.
From Fig. 3 we can see that the increasing ratio of the time cost for schemes in [35] and [37] are very high, but in our scheme, the time cost of this phase almost keeps constant. When the number of challenged blocks is less than about 170, our scheme costs longer time than that in other two schemes. However, as Fig. 3 shown, with the number of challenged blocks increasing, the time costs of schemes in [35] and [37] surpass that of our scheme rapidly. Generally speaking, to get more accurate integrity audition result, the number of challenged blocks in one audition behavior should be more than 460 [13]. Therefore, our scheme is very efficient in real applications.  Obviously, due to the expensive pairing operations, scheme of [37] consumes heavy cost in this phase which is much larger than that of scheme in [35] and ours. We further compare the performance of scheme in [35] and our scheme, the result of which is shown in Fig. 5. It is easy to see that the performance difference between the scheme of [35] and our scheme is still big, and it grows fast with the number of challenged blocks increasing.

Conclusion
In this paper, a blockchain-based cloud-IoT data integrity auditing scheme is proposed. The scheme makes use of a homomorphic hash function to generate tags for data blocks and stores all the tags in blockchain. The homomorphic feature of the tags improves the efficiency of the proof generation and integrity audition. The blockchain ensures the security and immutability of all tags, which avoids most of threats in previous schemes. We prove the security of our scheme and the performance evaluation results show that our scheme is efficient and practical. Next, we will focus on upgrading the scheme to support data dynamic which is another attractive feature for secure cloud storage. Conflicts of Interest: The authors declare that they have no conflicts of interest to report regarding the present study.