## Liming Zhou* and Yingzi Shan**## |

Slicing pieces (M) | 2 | 3 | … | R |
---|---|---|---|---|

Probability (P) | [TeX:] $$p_{1}$$ | [TeX:] $$p_{2}$$ | … | [TeX:] $$p_{R-1}$$ |

After slicing the private data, node keeps one of the pieces itself and encrypts the remaining pieces. The [TeX:] $$M-1$$ [TeX:] $$M-1$$ pieces are then transmitted to the neighbor nodes by node i with h hops. We assume that node [TeX:] $$i^{\prime}$$ s set of neighbor nodes is [TeX:] $$\Omega_{i} \text { and } h=1 . d_{i j}$$ indicates that the information is sent from node i to node j. If node i does not transmit any slice to node j, [TeX:] $$j, d_{i j}=0.$$ Therefore,

and the final aggregation result in the BS is expressed by

where [TeX:] $$d_{v}=0, \forall j \notin \Omega_{i}.$$ Fig. 2 shows the slicing phase.

Each leaf node slices its private data to neighbors and waits for some time until it receives all slicing messages from other nodes. After that, each leaf node uses the shared key with the sender to decrypt the received piece. Then the leaf nodes aggregate all the received pieces and its own remaining piece. Meanwhile, intermediate nodes mix the received pieces and their private data to generate a new result. For instance, [TeX:] $$g_{5}=d_{45}+d_{55}+d_{65}$$ shows that node 5 receives two pieces from neighbors and gets [TeX:] $$g_{5},$$ which includes the received messages [TeX:] $$$d_{45}$ and $d_{65}$$$ and its own remaining data [TeX:] $$d_{55}.$$ Other nodes do not transmit pieces to node 5. Fig. 3 shows the details of the mixing process.

Leaf nodes aggregate the received pieces into new results, which are encrypted. The encrypted data is then sent to the intermediate nodes, which then receive and decrypt the encrypted messages. They also generate the new results. The new results in the intermediate nodes consist of slicing data [TeX:] $$d_{i j}$$ received from their neighbor nodes, sensed data [TeX:] $$k_{i}$$ they collected, and mixing results [TeX:] $$g_{i}.$$ The results are then sent to their parents by the intermediate nodes until the results are sent to the base station. Fig. 4 shows the aggregation process.

In the HEEPP scheme, the intermediate nodes need to wait a long time to receive results from their child nodes with the data query mechanism. In order to decrease the waiting time, we improve and present a data confirmation mechanism. When an intermediate child node completely transmits the aggregation results to its parent node, it can take the initiative to send a confirmation message. Upon receiving a confirmation message, an intermediate node does not wait for the intermediate child node. If an intermediate node does not receive a confirmation message, it can use the data query mechanism. Therefore, unlike other schemes, the data confirmation mechanism can increase the success rate of data transmission and maintain the accuracy of the aggregation messages but can also decrease the waiting time to receive the aggregation results. Algorithm 1 describes the data aggregation phase of the EBPP scheme.

For simulation experiments, there are many simulation platforms such as TOSSIM [22], NS-3 [23], and J-Sim [24]. In this section, our simulation is based on TOSSIM. We randomly deploy 400 sensor nodes in a square area of 20 m × 20 m. Tables 2 and 3 show the probability of the number of slicing pieces based on the remaining energy in one node, where the values of R are presented as R = 3 and R = 5, respectively. According to the simulations, we compare the performance of SMART, EEHA, and HEEPP with the EBPP scheme in terms of communication overhead and privacy preservation.

For SMART, EEHA, HEEPP, and EBPP, the communication overhead is shown in Fig. 5 with R = 3 under different time intervals. Similar to Fig. 5, Fig. 6 shows the performance of different schemes with R = 5. According to the simulation results, the EBPP scheme sends fewer messages and saves more communication overhead than other schemes. In SMART, the original message is divided into pieces in a node. Then the rest of the R – 1 pieces are sent to the selected neighbors. We assume that the network has N nodes to collect information. Thus, there are [TeX:] $$N \cdot R$$ exchanged messages in the SMART scheme. Unlike the SMART scheme, there are different functions between the leaf nodes and the intermediate nodes in the remaining three schemes. We assume that the percentage of intermediate nodes is , so there are [TeX:] $$\rho \cdot N$$ intermediate nodes. In other words, the intermediate nodes send [TeX:] $$\rho \cdot N$$ messages to their parents. Thus, different schemes have different number of leaf nodes among the EEHA, HEEPP, and EBPP schemes. For EEHA, leaf nodes transmit [TeX:] $$(1-\rho) \cdot N \cdot R$$ pieces to neighbors. In the HEEPP scheme, a leaf node divides an original message into M pieces with probability [TeX:] $$p_{M}.$$ As such, we assume that expectation [TeX:] $$E(M) \quad \text { is } \quad \sum_{M=2}^{R} M \cdot p_{M-1}.$$ Therefore, leaf nodes send [TeX:] $$(1-\rho) \cdot N \cdot \sum_{1 / L=2}^{R} M \cdot p_{M-1}$$ pieces to neighbors, where [TeX:] $$\sum_{M=2}^{R} M \cdot p_{\mathcal{M}-1}=E(M).$$ Compared to the HEEPP scheme, a leaf node divides an original message into M pieces with probability [TeX:] $$p_{\mathbf{M}}^{*}$$ based on the remaining energy in the EBPP scheme. Thus, we assume that expectation [TeX:] $$E^{*}(M) \text { is } \sum_{N=2}^{R} M \cdot p_{M-1}^{*}.$$ As such, leaf nodes send [TeX:] $$(1-\rho) \cdot N \cdot \sum_{M=2}^{R} M \cdot p_{M-1}^{*}$$ pieces to neighbors, where [TeX:] $$\sum_{M=2}^{R} M \cdot p_{M-1}^{*}=E^{*}(M).$$ Therefore, the communication overhead of four schemes is given by

From [6], the relationship among the SMART, EEHA, and HEEPP schemes in terms of communication overhead is [TeX:] $$\mathrm{HEEPP}<\mathrm{EEHA}<\mathrm{SMART} .$$ Let r be the ratio of communication overhead between HEEPP and EBPP, which can be given by

For the HEEPP scheme, each node slices the fixed number of pieces. According to the remaining energy, however, each node decides the number of slicing data with probability [TeX:] $$p_{i}$$ in EBPP. When a leaf node has low remaining energy, the probability of few slicing data increases. Thus, expectation [TeX:] $$E^{*}(M)$$ in EBPP is smaller than expectation [TeX:] $$E(M)$$ in HEEPP. As such, the value of r is less than 1 [TeX:] $$(r<1).$$ In other words, EBPP incurs less communication overhead than HEEPP. For example, in HEEPP, the number of slicing data maintains uniform distribution. When the value of R is 5, we assume that the probability of uniform distribution is 0.25. For the EBPP scheme, the parameters are shown in Table 3. As such, the expectations are shown as

[TeX:] $$\begin{aligned} <E(M)=0.25 \times 2+0.25 \times 3+0.25 \times 4+0.25 \times 5=3.5\\ <E^{*}(M)=0.35 \times 2+0.3 \times 3+0.2 \times 4+0.15 \times 5=3.15 \end{aligned}$$

Therefore, [TeX:] $$E^{*}(M)<E(M).$$ In other words, when the expectation value is smaller, the number of exchanged information is smaller among nodes. The communication overhead in EBPP is smaller than that in HEEPP.

From [4] and [6], eavesdropping and colluding can cause privacy disclosure. Meanwhile, according to [4] and [6], we get [TeX:] $$p_{\text {overhear}}=p_{\text {collude}}=q,$$ where q is the probability of privacy leaks. [TeX:] $$P(q)$$ indicates whether the scheme can efficiently protect privacy, which can be approximated by

where [TeX:] $$d_{\max }$$ is the maximum in-degree in a network and [TeX:] $$p(\text {in}-\text {degree})$$ is the probability that the in-degree of a node is k .

Figs. 7 and 8 show the privacy preservation performance with R = 3 and R = 5, respectively, for different schemes. Compared with the other three schemes, the EBPP scheme can protect private data better. Owing to the fewer messages exchanged compared to other schemes, it is difficult for the adversary to obtain all sliced pieces of one node. Meanwhile, when the value of M is uncertain based on the remaining energy in one node, the adversary cannot monitor and obtain detailed information.

In order to reduce communication overhead and preserve sensitive data, we have proposed a privacy-preserving, energy-saving data aggregation scheme in this paper. We have improved the aggregation tree construction phase and the slicing phase and increased the data confirmation mechanism. Each leaf node can divide its private data based on its remaining energy. Meanwhile, the leaf nodes transmit the pieces to the neighbors with high remaining energy. Our simulation results show that this method is more efficient in balancing energy dissipation, prolonging the network lifetime, and providing privacy preservation compared to the existing scheme.

This work was supported by the NSFC (No. 61402015), the Science and Technology Development Plan Project of Henan Province (No. 172102210189), the Research Fund Project of Henan University (No. 2016YBZR019), and the Key Scientific and Technological Project of Henan Province (No. 192102210277).

He received Ph.D. degree in State Key Laboratory of Networking and Switch Technology from Beijing University of Posts and Telecommunications in 2015. He is an associate professor in the School of Computer and Information Engineering, Henan University, from 2015. His research interests include information security, cryptography and security in Internet of Things.

He received Ph.D. degree in State Key Laboratory of Networking and Switch Technology from Beijing University of Posts and Telecommunications in 2015. He is an associate professor in the School of Computer and Information Engineering, Henan University, from 2015. His research interests include information security, cryptography and security in Internet of Things. She received M.S. degree from Henan University of Economics and Law in 2012. She is a lecturer in the Department of Finance, Yellow River Conservancy Technical Institute, from 2013. Her current research interests include cryptology, information security.

- 1 S. Fu, J. Ma, H. Li, C. Wang,
*Energy-balanced seperating algorithm for cluster-based data aggregation in wireless sensor networks*, vol. 9, no. 1, 2013.doi:[[[10.1155/2013/570805]]] - 2 E. Zeydan, D. Kivanc, C. Comaniciu, U. Tureli, "Energy efficient routing for correlated data in wireless sensor networks,"
*Ad Hoc Networks*, vol. 10, no. 6, pp. 962-975, 2012.custom:[[[-]]] - 3 J. T. Meng, J. R. Yuan, S. Z. Feng, Y. J. Wei, "An energy efficient clustering scheme for data aggregation in wireless sensor networks,"
*Journal of Computer Science and Technology*, vol. 28, no. 3, pp. 564-573, 2013.doi:[[[10.1007/s11390-013-1356-y]]] - 4 W. He, X. Liu, H. Nguyen, K. Nahrstedt, T. Abdelzaher, "PDA: privacy-preserving data aggregation in wireless sensor networks," in
*Proceeding of the 26th IEEE International Conference on Computer Communications (INFOCOM)*, Barcelona, Spain, 2007;pp. 2045-2053. custom:[[[-]]] - 5 H. Li, K. Lin, K. Li, "Energy-efficient and high-accuracy secure data aggregation in wireless sensor networks,"
*Computer Communications*, vol. 34, no. 4, pp. 591-597, 2011.doi:[[[10.1016/j.comcom.2010.02.026]]] - 6 C. X. Liu, Y. Liu, Z. J. Zhang, Z. Y. Cheng, "High energy-efficient and privacy-preserving secure data aggregation for wireless sensor networks,"
*International Journal of Communication Systems*, vol. 26, no. 3, pp. 380-394, 2013.doi:[[[10.1002/dac.2412]]] - 7 S. Ozdemir, M. Peng, Y. Xiao, "PRDA: polynomial regression‐based privacy‐preserving data aggregation for wireless sensor networks,"
*Wireless Communications and Mobile Computing*, vol. 15, no. 4, pp. 615-628, 2015.custom:[[[-]]] - 8 M. Elhoseny, X. Yuan, H. K. El-Minir, A. M. Riad, "An energy efficient encryption method for secure dynamic WSN,"
*Security and Communication Networks*, vol. 9, no. 13, pp. 2024-2031, 2016.doi:[[[10.1002/sec.1459]]] - 9 M. Elhoseny, H. Elminir, A. Riad, X. Yuan, "A secure data routing schema for WSN using elliptic curve cryptography and homomorphic encryption,"
*Journal of King Saud University-Computer and Information Sciences*, vol. 28, no. 3, pp. 262-275, 2016.custom:[[[-]]] - 10 S. Ji, J. S., He, Y. Pan, Y. Li, "Continuous data aggregation and capacity in probabilistic wireless sensor networks,"
*Journal of Parallel and Distributed Computing*, vol. 73, no. 6, pp. 729-745, 2013.doi:[[[10.1016/j.jpdc.2013.02.005]]] - 11 L. Yu, J. Li, S. Cheng, S. Xiong, H. Shen, "Secure continuous aggregation in wireless sensor networks,"
*IEEE Transactions on Parallel and Distributed Systems*, vol. 25, no. 3, pp. 762-774, 2014.doi:[[[10.1109/TPDS.2013.63]]] - 12 T. Wang, X. Qin, Y. Ding, L. Liu, Y. Luo, "Privacy-preserving and energy-efficient continuous data aggregation algorithm in wireless sensor networks,"
*Wireless Personal Communications*, vol. 98, no. 1, pp. 665-684, 2018.doi:[[[10.1007/s11277-017-4889-5]]] - 13 J. Zhang, J. Zhu, Z. Jia, X. Yan, "A secret confusion based energy-saving and privacy-preserving data aggregation algorithm,"
*Chinese Journal of Electronics*, vol. 26, no. 4, pp. 740-746, 2017.custom:[[[-]]] - 14 X. Zhao, J. Zhu, X. Liang, S. Jiang, Q. Chen, "Lightweight and integrity-protecting oriented data aggregation scheme for wireless sensor networks,"
*IET Information Security*, vol. 11, no. 2, pp. 82-88, 2017.doi:[[[10.1049/iet-ifs.2015.0387]]] - 15 C. Chen, Y. Lin, Y. Lin, H. Sun, "RCDA: recoverable concealed data aggregation for data integrity in wireless sensor networks,"
*IEEE Transactions on Parallel and Distributed Systems*, vol. 23, no. 4, pp. 727-734, 2012.doi:[[[10.1109/TPDS.2011.219]]] - 16 B. Gupta, D. P. Agrawal, S. Yamaguchi,
*Handbook of Research on Modern Cryptographic Solutions for Computer and Cyber Security*, PA: IGI Global, Hershey, 2016.custom:[[[-]]] - 17 C. Stergiou, K. E. Psannis, B. G. Kim, B. Gupta, "Secure integration of IoT and cloud computing,"
*Future Generation Computer Systems*, vol. 78, pp. 964-975, 2018.doi:[[[10.1016/j.future.2016.11.031]]] - 18 A. P. Plageras, K. E. Psannis, C. Stergiou, H. Wang, B. B. Gupta, "Efficient IoT-based sensor big data collection: processing and analysis in smart buildings,"
*Future Generation Computer Systems*, vol. 82, pp. 349-357, 2018.custom:[[[-]]] - 19 C. T. Li, M. S. Hwang, "A lightweight anonymous routing protocol without public key en/decryptions for wireless ad hoc networks,"
*Information Sciences*, vol. 181, no. 23, pp. 5333-5347, 2011.doi:[[[10.1016/j.ins.2011.07.014]]] - 20 N. T. T. Huyen, M. Jo, T. D. Nguyen, E. N. Huh, "A beneficial analysis of deployment knowledge for key distribution in wireless sensor networks,"
*Security and Communication Networks*, vol. 5, no. 5, pp. 485-495, 2012.doi:[[[10.1002/sec.337]]] - 21 D. Du, H. Xiong, H. Wang, "An efficient key management scheme for wireless sensor networks,"
*International Journal of Distributed Sensor Networks*, vol. 8, no. 1, 2012.doi:[[[10.1155/2012/406254]]] - 22 P. Levis, N. Lee, M. Welsh, D. Culler, "Tossim: accurate and scalable simulation of entire TinyOS applications," in
*Proceedings of the 1st International Conference on Embedded Networked Sensor Systems*, Los Angeles, CA, 2003;pp. 126-137. custom:[[[-]]] - 23 M. A. Alsmirat, Y. Jararweh, I. Obaidat, B. B. Gupta, "Automated wireless video surveillance: an evaluation framework,"
*Journal of Real-Time Image Processing*, vol. 13, no. 3, pp. 527-546, 2017.doi:[[[10.1007/s11554-016-0631-x]]] - 24 N. Cao, P. Liu, G. Li, C. Zhang, S. Cao, G. Cao, M. Yan, B. B. Gupta, "Evaluation models for the nearest closer routing protocol in wireless sensor networks,"
*IEEE Access*, vol. 6, pp. 77043-77054, 2018.custom:[[[-]]]