## Hyunjo Lee* , Youngho Song* and Jae-Woo Chang*## |

ID | Range of segments |

1 | 0 ≤ x < 0.74 |

2 | 0.74 ≤ x < 2.475 |

3 | 2.475 ≤ x < 3.808 |

4 | 3.808 ≤ x < 5.543 |

5 | 5.543 ≤ x < 7.119 |

6 | 7.119 ≤ x < 8.526 |

7 | 8.526 ≤ x < 10.32 |

8 | 10.324 ≤ x < 11.730 |

9 | 11.730 ≤ x < 4π |

To improve the performance of the exact matching query, we propose a periodic function based transformation algorithm by using the Eq. (4). By analyzing the data to be encrypted, we select a periodic function, i.e., *E*, such that *E(x _{1}) ≠ E(x_{2})* when

We propose a signature generation algorithm to improve the query processing performance. Our signature generation algorithm has three parts. (i) It divides the range of group transformation, segment transformation, and function transformation values into *2 ^{n}2^{m}* and

For the segment transformation values, we can make segments by differentiating the Eq. (6) and set the signature values for the segments as shown in Fig. 7(a). For the function transformation values, we divide the signature areas and set their signature values as shown in Fig. 7(b). For example, assume that the transformation values *{Group(x), Segment(x), Periodic(x)}* for the original data x are 324, 321, and 0.654, respectively. If the size of group for the original data is 70, the value 324 is included in the fifth data group. So we can set the signature of the group as 00000101 which means 5. And by using the Fig. 7, we can set the signature of 321 to 0011 while setting that of 0.654 to 1101. So by concatenating all signatures, the final encrypted signature for *x* is 0000010100111101.

In this section, we present the performance comparison of our GOPES with the existing POPIS [9], with respect to the degree of data privacy and query processing time. We implement our GOPES by using the periodic function shown in Eq. (6). For the encrypted query processing, all the datasets containing {key, value} pairs have to be encrypted. Here, in order to enforce the data security for keys and values, we use two different types of data encryption schemes, i.e., order-preserving encryption scheme (our GOPES) and Advanced Encryption Standard (AES) [10], respectively. The flow of the encrypted query processing is as follows. At first, a date owner encrypts the whole data with two schemes, i.e., keys with GOPES and values with AES-256. Then the data owner sends the encrypted {key, value} pairs to a service provider. Second, an authenticated user encrypts his/her query and issues it to the service provider. Third, the service provider processes the encrypted query without any decryption. Then the service provider returns the candidate result set to the query issuer. Finally, the user decrypts the received candidates and obtains the final result. For performance analysis, an exact matching query and a range query are used. We generate a dataset using the US Census data [11], containing 2 GB. Since our dataset includes sensitive attributes, like age, incomes, and jobs, we encrypt the data of sensitive attributes by using the POPIS and our GOPES. We also use 100 queries being randomly generated from our dataset. Table 2 shows our experimental environment.

Table 2.

Item | Spec |

CPU | Intel Core2 Quad, Q6600 2.4 GHz |

Memory | 2 GB |

OS | Windows 7 Enterprise K |

Compiler | Microsoft Visual Studio 2010 |

To evaluate data privacy against order matching attacks, we use two encryption functions as shown in Fig. 8. Here, the size of data group in each encryption function is 10. The encryption function for OPES have a cycle of 4π and the size of encryption group is 10. Thus, if min(*g _{i}*) < min(

When we use two encryption functions in Fig. 8, Fig. 9(a) shows the data distributions of the original data, the encrypted data using POPIS, and the encrypted one using our GOPES. Specially, Fig. 9(b) shows the detailed data distribution of the encrypted data using our GOPES. Here, the range of the original dataset is 0 ≤ *x* ≤ 100.

For our experiment, we measure the probability of data exposure against the order matching attack as follows.

1. An attacker obtains all the data of a specific column, containing sensitive data in the original database. An attacker also obtains all the encrypted data corresponding to all the data of the specific column.

2. Assuming that the encryption function is *E(x)*, the ith original data is Orii and the ith encrypted data is Enci. We sort both the original data and the encrypted data and make a pair of <Orii, Enci> based on their sorted order.

3. We count the number of pairs being correctly matched. A correctly matched pair is a pair of <Ori* _{i}*, Enc

4. For all correctly matched pairs, we measure the number of the indistinguishable pairs that satisfy both Ori* _{i}* ≠ Ori

5. We finally calculate the probability of data exposure against the order matching attack by using both the number of the correctly matched pairs and that of the indistinguishable pairs.

For example, there are two original data A and B that have the same encryption value *E(A)*. In this case, the attacker cannot clearly make a decision on whether *E(A)* is generated by using the original data A or not. As a result, the probability of data exposure is decreased while the number of the indistinguishable pairs increases. Table 3 shows the result of the order matching attack. Since the order of the encrypted data in POPIS is the same as that of the original data, the ratio of the number of the correctly matched pairs over the total number of pairs is 100% and the average number of the indistinguishable data pairs is 0. Thus, POPIS is weak to the order matching attack because the probability of data exposure against the order matching attack is 100%. However, in our GOPES, the ratio of the number of the correctly matched pairs over the total number of pairs is reduced to 31.16%. This is because our GOPES preserves the order of each data group, but not each data, by using the group transformation. In addition, our GOPES shuffles the order of the encrypted data based on the function transformation. Since our GOPES sets the same encryption value for the original data in the same segment by using our segment transformation, the average number of the indistinguishable data pairs is increased to 2.56. Since our GOPES shows the 12.18% probability of data exposure against the order matching attack, our GOPES outperforms the POPIS in terms of the data privacy against the order matching attack.

Table 3.

Encryption scheme | Ratio of matched pairs over total number of pairs (%) | Average number of the indistinguishable pairs | Probability of data exposure against order matching attack (%) |

POPIS | 100 | 0 | 100 |

GOPES | 31.16 | 2.56 | 12.18 |

To evaluate data privacy against data count attacks, if an attacker can infer the range of the encrypted data for an original data *x*, we count the number of the encrypted data falling into the range as the exposure of the original data. Using data counting, Fig. 10 shows the data frequency of the original data, the encrypted data using POPIS, and the encrypted one using our GOPES. In Fig. 10, it is shown that the frequency of the original data is very similar to that of the encrypted data using POPIS. To analyze the similarity between the frequency of the original data and that of the encrypted data, we measure the t-test by using Eq. (7). Here, *x _{i}* means the average value of dataset

If t is greater than the *p*-value, the frequency of the encrypted data is different from that of the original data. Otherwise, the frequency of the encrypted data is the same as that of the original data. Since the *t* statistic and the *p*-value are 0.19 and 0.85 in POPIS, the data frequency of the encrypted data using POPIS is the same as that of the original data. However, since the *t* statistic and the *p*-value are 6.441 and 2.52×10^{-8} in GOPES, the frequency of the encrypted data using GOPES is different from that of the original data. This is because our GOPES generates the same encryption value for the original data in the same segment, by using our segment transformation algorithm. As a result, our GOPES outperforms the existing POPIS in terms of the data privacy against the data count attack.

Table 4.

Encryption scheme | Comparator | t statistics | p-value |

POPIS | Original data | 0.19 | 0.85 |

GOPES | Original data | 6.441 | 2.52×10^{-8} |

Fig. 11 shows the encrypted query processing times of both POPIS and GOPES. In case of the exact matching query, our GOPES requires 0.036 seconds for the query processing time while POPIS requires 0.037 seconds. In case of the range query, our GOPES requires 0.045 seconds for the query processing time while POPIS requires 0.048 seconds. There are two reasons why GOPES outperforms POPIS. First, GOPES can reduce the size of the candidate set to be transmitted by using both the periodic function and periodic interval values, while POPIS requires a large candidate set to be transmitted as the range of the noise is increased for high data protection. Second, GOPES can reduce the search time of the encrypted key by using bit operations, while POPIS requires long key search time by using numeric operations.

To support advanced user-customized services in cloud computing, companies build the analyzed data of users by using outsourced databases. But, the data of users contain sensitive personal information that may infringe the privacy of users. To protect the sensitive data, it is required to encrypt the database prior to outsourcing it to the service provider. However, it is inefficient to perform query processing on the encrypted data by using the existing data encryption schemes. To overcome this limitation, we proposed data encryption scheme, called GOPES, which can support efficient query processing over encrypted data. Since our GOPES can preserve the order of each data group by generating the signatures of the encrypted data, it can protect data from both order matching attacks and counting attacks. We also showed that our GOPES achieved a higher degree of data privacy protection than the existing POPIS.

This work was partly supported by Institute for Information & communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) (No. R0113-17-0005, Development of an Unified Data Engineering Technology for Large-scale Transaction Processing and Real-time Complex Analytics) and this work was also supported by Basic Science Research Program through the National Research Foundation of Korea (NRF) funded by the Ministry of Education, Science and Technology (No. 2016R1D1A3B03935298).

He is a researcher and adjunct instructor in the Chonbuk National University. He received the B.S., M.S., and Ph.D, degrees in Computer Engineering from Chonbuk National University in 2006, 2008, and 2014, respectively. His research interests include spatial network database, parallel query processing, and encrypted query processing

He is a professor in the Department of Information and Technology, Chonbuk National University, Korea from 1991. He received the B.S. degrees in Computer Engineering from Seoul National University in 1984. He received the M.S. and Ph.D. degrees in Computer Engineering from Korea Advanced Institute of Science and Technology (KAIST) in 1986 and 1991, respectively. During 1996–1997, he stayed in University of Minnesota for visiting scholar. And during 2003–2004, he worked for Penn State University (PSU) as a visiting professor. His research interests include sensor networks, spatial network database, context awareness and storage system.

- 1 H. Hacigumus, S. Mehrotra, B. Iyer, "Providing database as a service," in
*Proceedings 18th International Conference on Data Engineering*, San Jose, CA, 2002;pp. 29-38. doi:[[[10.1109/ICDE.2002.994695]]] - 2 C. Curino, E. P. Jones, R. A. Popa, N. Malviya, E. Wu, S. Madden, H. Balakrishnan, N. Zeldovich, "Relational cloud: a database-as-a-service for the cloud," in
*Proceedings of the 5th Biennial Conference on Innovative Data Systems Research*, Asilomar, CA, 2011;pp. 235-240. custom:[[[http://people.csail.mit.edu/nickolai/papers/curino-relcloud-cidr.pdf]]] - 3 R. Buyya, C. S. Yeo, S. Venugopal, J. Broberg, I. Brandic, "Cloud computing and emerging IT platforms: vision, hype, and reality for delivering computing as the 5th utility,"
*Future Generation Computer Systems, 2009*, vol. 25, no. 6, pp. 599-616. doi:[[[10.1016/j.future.2008.12.001]]] - 4 Y. Zhu, G. J. Ahn, H. Hu, S. S. Yau, H. G. An, C. J. Hu, "Dynamic audit services for outsourced storages in clouds,"
*IEEE Transactions on Services Computing, 2013*, vol. 6, no. 2, pp. 227-238. doi:[[[10.1109/TSC.2011.51]]] - 5 R. Agrawal, J. Kiernan, R. Srikant, Y. Xu, "Order preserving encryption for numeric data," in
*Proceedings of the 2004 ACM SIGMOD International Conference on Management of Data*, Paris, France, 2004;pp. 563-574. doi:[[[10.1145/1007568.1007632]]] - 6 L. Xiao, I. L. Yen, "Security analysis for order preserving encryption schemes," in
*Proceedings of 2012 46th Annual Conference on Information Sciences and Systems (CISS)*, Princeton, NJ, 2012;pp. 1-6. doi:[[[10.1109/CISS.2012.6310814]]] - 7
*L. Xiao, I. L. Yen, and D. T. Huynh, 2012;*, https://eprint.iacr.org/2012/192.pdf - 8 D. H. Yum, D. S. Kim, J. S. Kim, P. J. Lee, S. J. Hong, "Order-preserving encryption for non-uniformly distributed plaintexts,"
*in Information Security Applications. Heidelberg: Springer2011,*, pp. 84-97. doi:[[[10.1007/978-3-642-27890-7_7]]] - 9 D. Liu, S. Wang, "Programmable order-preserving secure index for encrypted database query," in
*Proceedings of 2012 IEEE 5th International Conference on Cloud Computing*, Honolulu, HI, 2012;pp. 502-509. doi:[[[10.1109/CLOUD.2012.65]]] - 10
*National Institute of Standards and Technology, 2001;*, https://csrc.nist.gov/publications/detail/fips/197/final - 11
*US Census Bureau (Online). Available:*, https://www.census.gov/