OPPR: An Outsourcing Privacy-Preserving JPEG Image Retrieval Scheme with Local Histograms in Cloud Environment

As the wide application of imaging technology, the number of big image data which may containing private information is growing fast. Due to insufficient computing power and storage space for local server device, many people hand over these images to cloud servers for management. But actually, it is unsafe to store the images to the cloud, so encryption becomes a necessary step before uploading to reduce the risk of privacy leakage. However, it is not conducive to the efficient application of image, especially in the Content-Based Image Retrieval (CBIR) scheme. This paper proposes an outsourcing privacypreserving JPEG CBIR scheme. We design a set of JPEG format-compatible encryption method, making no file expansion to JPEG files. We firstly combine multiple adjacent 8 × 8 DCT coefficient blocks into big-blocks. Then, random scrambling and stream encryption are used on the binary code of DCT coefficients to protect the JPEG image privacy. The task of extracting features from encrypted images and retrieving similar images are done by the cloud server. The group index histograms of DCT coefficients are extracted from the encrypted big-blocks, then the global vector is produced to represent the JPEG image with the aid of bagof-words (BOW) model. The security analysis and experimental results show that our proposed scheme has strong security and good retrieval performance.


Introduction
Along with the gradual maturity and widely applied of multimedia technology and imaging device, vast number of HD images are produced by enterprises and individuals every day. As local users usually don't have enough memory to store these storage-consuming images, they will choose to outsource images to the cloud server. The revolution of cloud computing in IT industry has focused local users' attention on cloud products, such as AWS, Microsoft Azure, Google Cloud Service, Baidu Net disk and Ding Net disk, which can offer rich computing and storage resources.
While people are benefiting of cloud service, they are facing the hazard of privacy leak or network violence. Outsourcing privacy-rich images to the cloud means that the picture may be attacked. Recently, a technology news website reported that thousands of private videos on software zoom have been exposed to public websites [1]. Bloomberg published that Google cloud server leaked 120 million people's information, which alarm people on maintaining information security [2]. Various events indicate that the security of the cloud server needs to be improved. The most common measure can be adopted is encrypting the images before uploading them to the cloud. However, encrypting the images will affect the performance of image processing, including image retrieval. In addition, it is not easy to extract effective features while keeping good safety performance.
In general, image searchable encryption methods consist of two types. The first one is the feature-encryption based scheme and the second one is the image-encryption based scheme. For the first one, image features are extracted before image encryption, which increases the computing cost of local users. For image-encryption based schemes, when the image owners want to retrieve the similar image, they only need to encrypt the images and upload them to the cloud. Feature extraction, index building and image search are done by the cloud server, reducing the use's workload. Contribution: We design an outsourcing secure JPEG image retrieval scheme in this paper. This format-compatible scheme can achieve perfect retrieval performance and get strong security without causing file size expansion. This scheme has contributions as shown below: 1) A specifically-designed image encryption method is presented, including four steps, i.e., VLI binary code encryption, quantization tables encryption, inter-big-block permutation and intra-bigblock permutation. This encryption method can well protect the image content, support the efficient extraction of image feature, keep format compatible and cause no file size expansion. In this way, image owners only need to encrypt the image and upload it, while other work such as feature extraction and search are left to the cloud, which greatly facilitates the image owners.
2) The local histograms of the group index of DCT coefficients can be directly calculated as local feature vectors by the cloud server, without any communication to the image owner. With these local feature vectors, the cloud server can generate a global feature vector for each image using the bag-of-words (BOW) model. Then, the similarity between the images can be measured by calculating the Manhattan distance between the global feature vectors. It is shown that our scheme can reach better retrieval performance than the state-of-art ones not only in JPEG-domain, but also in spatial domain according to the experimental results.

Related Works
Content-Based Image Retrieval (CBIR) liberates people from the tedious work of using text annotation to retrieve images. CBIR schemes can search similar images by calculating the similarities between the image features and get high retrieval accuracy [3][4]. However, CBIR schemes in plaintext domain cannot avoid privacy disclosure. So recently, researchers proposed many CBIR technologies in ciphertext domain. The existing privacy-preserving CBIR (PPCBIR) schemes consist of two types: The feature-encryption based schemes and the image-encryption based schemes.
For the first method (the feature-encryption based image retrieval scheme), image owners need to extract the features, and encrypt the images and their features before uploading them to the cloud. Lu et al. [5] had the first attempt at image retrieval in encrypted domain based on CBIR technologies. They used the methods of order preserving encryption and hash functions to avoid the privacy disclosure caused by the analysis of image content by cloud server. A secure indexing framework has also been developed to ensure good search results. In [6], Lu et al. proposed three encryption methods, all of which can avoid the disclosure of image privacy. But the disadvantage is that the image retrieval accuracy is not high enough when these protection methods are used. For improving the retrieval accuracy [7], homomorphic encryption method is proposed to protect the image features. This encryption methods can achieve good retrieval accuracy and security, but increases the communication cost between the image owner and the cloud server. In [8], a large-scale encrypted image retrieval scheme was proposed by Weng. They utilized robust hash values and certain omittance to improve the security. The level of security can vary with different policies. In [9], Xia et al. used earth mover's distance to calculate the similarity between SIFT features. Locally sensitive hashes are used to reduce time complexity. In [10], Xia et al. used MPEG-7 to represent the images. A secure KNN method and watermarking were jointly proposed to keep images' safety. Locality-sensitive hash was utilized to improve the efficiency in the retrieval system. In [11], Yuan et al. also utilized the secure KNN to avoid the sensitive information reveal, and they built a tree index to increase the search efficiency. They have tried to outsource the tree index to the cloud, but leads the more communication burden to the image owner.
The feature extraction of the above schemes is done by the image owner. So as to reduce the workload of image owners, many image-encryption based schemes are proposed recently. The image owner only needs to encrypt the image and upload it, and the task of feature extraction is left to the cloud server. It is very important to design a simple and effective encryption method and can extract features from the encrypted images. Xu et al. [12] presented a privacy-preserving image search scheme using the orthogonal decomposition. AES algorithm is applied to encrypt the AC coefficient, which protect the image content.
And the rest part of the information is used for similarity calculation. Bernardo et al. [13] designed a novel image encryption framework to secure the CBIR service. They protect the image privacy with pixel value substitution and encryption. And the global color histograms are used for the encrypted image retrieval. In [14], Xia put forward a novel scheme using three encryption methods together to protect the privacy of image content. Profiting from the impact of BOW model, the image search accuracy was improved with the help of visual words produced by clustering. But the image encryption in the airspace will destroy the correlation between pixels and result in inefficient compression, which is not conducive to the storage and transmission of encrypted images. In [15], Zhang et al. encrypt the image in the JPEG compression process using stream encryption, so it can guarantee the compatibility of the format and do not affect the compression of the encrypted images. Cheng et al. used Markov model and SVM in [16], and utilized the statistics of ( , ) pairs in each 8×8 block in [17] for image retrieval. These schemes [15][16][17] are all encrypted in JPEG domain, so they can achieve good compression performance for encrypted JPEG images. But these schemes directly extract global Markov features for image searching, so their accuracy results are not very high.
We design a novel secure JPEG image retrieval framework in this paper. Stream encryption and random scrambling methods are applied together to avoid privacy leaks, which can ensure formatcompatible and cause no file expansion. The group index histograms of DCT coefficients are extracted from the encrypted big-blocks as local features by the cloud server, along with BOW model to improve the retrieval accuracy.

JPEG Compression
The image encryption method proposed in this paper is carried out in the process of JPEG compression. Because of the high compression ratio and good image quality, JPEG becomes popular for image storing and transmitting [18]. Here we briefly introduce the process of JPEG image compression.

Color space transformation.
First of all, we need to transform the color from into , which represent the images through luminance and chrominance component. 5. Intermediate encoding. The quantized coefficients are scanned, generating a one-dimensional vector in each 8×8 block. The difference of the quantized DC coefficient in current block and the former block are calculated, denoted as (−, ) . The quantized AC coefficients before the last nonzero coefficient are encoded to be run-length and value pairs, denoted as ( , ). 6. Entropy encoding. According to Tab. 1, a group index and a VLI binary code can be produced from the value in ( , ) pair. As a result, the quantized DCT coefficients are encoded to be triplets, denoted as � , , � . The run-length and the group index can be later encoded into Huffman binary code according to the Huffman tables. In the end, these binary streams converted from quantized DCT coefficients and other image information compose the JPEG file, as shown in Fig. 1. SOI and EOI are the start and the end marks for JPEG image files. The header section contains information such as the size of the image and so on. In this scheme, we bundle non-overlapping 8 × 8 image blocks into big-blocks, which are represented as .

Figure 1:
The structure of JPEG image files

Bag-of-Words Model
We use feature vectors to represent the image, and calculate similarities between images through the distance between the feature vectors. The BOW model consists of three steps: 1. Local feature generation. There are many kinds of local features for image retrieval in plaintext domain. In our scheme, we extract the histogram of VLI code length ( ) at different JPEG frequency positions in each big-block as local feature. 2. Vocabulary construction. All the local features extracted from the image are clustered, the clustering centers are defined as visual words. In this scheme, we use k-means for clustering. 3. Global feature generation. Finally, every local feature can be represented by its nearest visual word. In this way, each image can be represented by a histogram of visual words, which is a global feature for retrieval. In addition, normalization can eliminate the influence of image size.

System Model
There are two roles in system model for this paper: one is the image owner and another one is the cloud server, as shown in Fig. 2. Firstly, the image owner encrypts the images before uploading them. After receiving the encrypted images, the cloud server stores it and extracts the features. When the image owner wants to retrieve similar images, the query image is encrypted and uploaded to the cloud server. The feature of the query image is also extracted by the cloud server and matched with the features of the images in the cloud. Eventually, the images similar to the querying image are returned to the image owner for decryption. In the proposed scheme, we present a set of image encryption methods to prevent the disclosure of image information. In addition, we design a local features extraction method performed by cloud server. Local features of the encrypted image are then clustered in the BOW model to generate the global features for better retrieval accuracy.

Image Encryption
In order to prevent the disclosure of image information, we present four steps of encryption methods for the image owner, i.e., VLI binary code encryption, quantization tables encryption, inter-big-block permutation, and intra-big-block permutation. In the preparation, we produce secret keys for the later encryption process.

Secret Key Production
In the initial stage, the image owner has a security key . Next, image owner utilizes pseudorandom function( ) and a pseudorandom permutation generator( ) to generate the keys as following, where is the image's identity, represents the VLI binary code, denotes the number of bigblocks, and denotes the number of 8 × 8 blocks.

VLI Binary Code Encryption
As mentioned in 3.1, through decoding JPEG images, the VLI binary code can be respectively extracted in three components and encrypted by xor operation. Formula is as follows:

Quantization Tables Encryption
We can know from 3.1 that two quantization tables in JPEG header file can be obtained by decoding. We encrypt the quantization tables to prevent privacy leakage by stream encryption. It is worth noting that the encryption of the quantization tables will not affect the feature extraction. The quantization tables are encrypted as *

Big-Block Permutation
In this scheme, for protecting the image privacy and supporting feature extraction, we assemble the adjacent 8 × 8 blocks to be the big-blocks in each component for permutation. As shown in Fig. 3, we denote the i-th big-block in an image as . Due to the downsampling in the JPEG compression process, the U and V components have the half size to the Y components in both the height and width. So, the bigblocks in U and V component are half size to the one in Y components in both the height and width. The big-block permutation is as follows:

8 × 8 Block Permutation
In order to further protect the image content, we permutate the order of 8×8 blocks within the bigblock in each component. Denote the j-th 8 × 8 block in an image as . The encryption formula of 8 × 8 block permutation is: where = 1, ⋯ , , and is the number of 8 × 8 blocks in a DCT coefficient matrix. The image owner encrypts all of his images by the above steps. Then, the encrypted image set can be uploaded to the cloud server.

Image Feature Extraction
In our proposed scheme, feature extraction and retrieval are done by cloud services, which greatly reduces the burden on the image owner. Feature extraction method in this scheme contains four steps here. In the first, we need to preprocess the image data. Local histogram features are then calculated from the bigblocks . Next, Local histogram features are clustered into visual words using k-means algorithm. Finally, every image can produce a normalized occurrence histogram of visual words to represent the image.

Data Preprocessing
Even after the previous four steps of encryption by the image owner, the histogram of the encrypted JPEG image at different frequency positions is the same as that of the plaintext image. This invariant information can be used for image search.
By decoding the secret JPEG image, we can get the quantified DCT coefficient in Y, U, and V. Then, according to Tab. 1, we can convert three components into three Group index matrixes. Then, we will truncate the value of three group index matrices according to their probability distribution to reduce redundancy. The three group index matrices after the truncation can be respectively denoted by , , and .

Local Feature Generation
We extract the group index histograms of different frequency positions in each component to represent local features. The formula to calculate a local feature in a big-block is as follows: where t = 1,...,64. And ( ) = 1 if holds, else ( ) = 0.
For each big-block, we concatenate three kinds of features to generate a feature vector with 64 × 3 × (1+ )elements, which is defined as the local feature, we set the truncation parameter = 8 in this scheme.

Vocabulary Generation
In this way, each encrypted image can be presented by a set of local features. We cluster all of these local features in the whole image database and produce k cluster centers, which is a variable parameter. We defined the k cluster centers as visual words.

Histogram Calculation
Each image can be presented by a set of local features according to the vocabulary, so each local feature can be replaced by the nearest visual word. And then, each image can produce a global feature by calculating the occurrence histogram of the visual words.

Image Search
Similar images search is outsourced to the cloud server without any communication with image owners. After receiving a query image encrypted with the same four methods as mentioned in Section 4.1, the cloud server will extract the encrypted query image's local feature through the methods mentioned in Section 4.2, and generate a global feature according to the visual words of the image set. Finally, the global feature of the encrypted query feature can be compared with other global features in image dataset, by calculating the Manhattan distance. The formula is as follows: and separately represent the global feature of the query image and encrypted images in database, which are both k-dimensional vectors. The smaller the Manhattan distance between the global features, the more similar the corresponding image. In this way, a certain number of images are sent to the image owner for the next decryption using the secret key.

Security Analysis
In our scheme, the cloud server completes the tedious work of feature extraction and image retrieval. The honest-but-curious cloud server, which is the only potential adversary, might analyze the contents of the encrypted images to gain privacy. We interpret the safety of the proposed scheme under COA. As we generate a unique key for each image. As each image corresponds to a unique key, the threat can only be a brute-force attack.
Summary of information leakage. The cloud server knows part of the encrypted image, for supporting the feature extraction and encrypted images comparison, such as the image size, the size of bigblock , the number of big-block , the length of VLI binary code and the size of quantization tables. Theorem 1. If the scheme is attacked under the COA model, the security intensity is with an honest-but-curious probabilistic polynomial time (PPT) adversary under the COA model, the encrypted image is computational secure and the security strength equals to where represents the total length of VLI binary code, and respectively represents the length of two quantization tables, is the size of big-blocks and is the number of big-blocks in an image.
Proof. Because different encrypted images correspond to different keys, the only threat is the brute-force attack. So, the security strength depending on the total length of secret key. Xor operation is first conducted on the VLI binary code , where length is . Second, xor operation is conducted on two quantization tables, where length is sum of length of and . Third, inter-big-blocks permutation is conducted. The encryption complexity is 2 ! . In the end, intra-big-blocks permutation is conducted. The encryption complexity is 3 × × 2 ! . These four parts add up to safety strength.
End of Proof.

Evaluation of Experimental Results
In order to evaluate the performance of the proposed scheme, we carry out the corresponding experiments and analyze the results in four different aspects, i.e., image encryption effectiveness, time consumptions, retrieval accuracy, and expansion of file size. All tests would be operated in the Inria Holidays dataset [19], consisting of 1491 color images with different sizes and has 500 classes. Inria Holidays database provides a Python evaluation package to calculate mAP and was used in many image retrieval schemes, which facilitates fair retrieval accuracy comparison. We use MATLAB R2018b to implement the model in Ubuntu 18 system with 64 GB of RAM.

Image Encryption Effectiveness
To protect the privacy of users' images, four encryption steps are applied in the proposed scheme. As shown in Fig. 5, the single or combined visual effects of the four encryption steps are revealed. Fig. 5(b) is the VLI binary code encryption, (d) inter-big-block permutation can protect the image content well, while the (c) quantization tables encryption only disturbs the color information, but the intra-big-block permutation (e) is easy to leak the image privacy. In sum, the image content can be perfectly protected by the combination of these four encryption steps (f).

Time Consumptions
The time consumption for each step of image encryption (as shown in Tab. 2), local features generation (as shown in Tab. 3), K-means clustering (as shown in Tab. 4), global features production (as shown in Tab. 5), and search (as shown in Tab. 6) are recorded here.
It can be seen that the time spent on each step of the experiment is not too much. The step of local features generating and clustering can be performed automatically by the server in the cloud which means it does not affect the user.

Retrieval Accuracy
We utilize mAP (mean average precision) to reflect the accuracy of image retrieval, with the aid of a python evaluation package in Inria Holidays image set. We individually and collectively test the mAP values in Y, U, and V component. We also select different parameters such as the number of cluster centers (k) and the size of the big-block ( ). As shown in Tab. 7, when k is set to be 3000 and is set to be 64 or 96, it reaches the highest mAP. In addition, we also compared our mAP values with those of other schemes as shown in Tab.8.

Expansion of File Size
All four encryption steps of this scheme are operated on the JPEG bitstream, including VLI binary code encryption, inter-big-block permutation, quantization tables encryption, and the intra-big-block permutation. So, the size of the memory occupied by the image will not be changed after the encryption, which is proved by the following data (as shown in Tab. 9).   [12] 0.56040 IES [13] 0.54564 Cheng et al. [16] 0.54187 Cheng et al. [17] 0.36000  [13] 20.02 BOEW-YUV [14] 17.97

Conclusions
In this paper, we propose a novel encrypted JPEG image retrieval framework, which can Privacy can be guarantee the security, efficiency and accuracy. Stream encryption and scrambling encryption are used together to encrypt the images, ensuring the compatibility of the format and making no change of the image. Each big-block can be represented as a local color histogram by the cloud server for similarity measure. We also use BOW model to cluster these local features to generate a global feature for each encrypted image to obtain good retrieval accuracy. In this proposed framework, the cloud server provides the following services: Image storage, feature generation, and image retrieval, reducing the operations of the image owner. In future work, it is a significant research to use stronger encryption method to protect the security while ensuring the retrieval accuracy.