1. Introduction
Wireless sensor networks (WSNs) consist of various sensor nodes that are used for gathering data from the physical world to a sink or a base station. With the limited energy of sensors, data aggregation algorithms can efficiently decrease the communication overhead of the network and maintain balanced energy consumption among all sensors; this in turn can increase the utilization of sensor nodes and extend the network lifetime [1-3].
Although some existing schemes can gather the data to the sink with different protocols, they incur a lot of communication overhead. The lifetime of various applications is seriously affected by the unbalanced energy consumption. In order to solve this problem, we need to study further the efficient method of data aggregation. The Slice-Mix-AggRegaTe scheme proposed in [4], called SMART, is based on slicing and assembling technology. In this scheme, each node i divides its data into K pieces. Node i then holds one of the K pieces. The other (K – 1) pieces are subsequently encrypted and transmitted to its (K – 1) neighbors. Afterward, node i receives pieces from its neighbors and applies the shared key to decrypt these pieces. Then the intermediate sensor nodes aggregate all pieces of data into new results and transmit the results to the base station, which can protect the private data from being monitored by the adversary. Nonetheless, the SMART scheme generates considerable communication overhead and decreases the lifetime of the sensor network. Li et al. [5] proposed an energy-efficient, high-accuracy (EEHA) scheme for secure data aggregation. This scheme divides the nodes into two types of nodes, which include leaf nodes and intermediate nodes. Both of them have different operations. The leaf nodes slice their data and mix new results to protect data privacy. The intermediate nodes receive data pieces from leaf nodes and mix them with their privacy data into a new result. Compared with SMART, the EEHA scheme can save much more energy. Nonetheless, the leaf nodes still generate a lot of communication overhead. A HEEPP scheme [6] is proposed to develop the slicing and assembling method in the leaf nodes. In HEEPP, the leaf nodes randomly slice the original data with probability p. Therefore, the number of pieces in a node is changed. It also develops a data query method to ensure data accuracy. Compared with EEHA, leaf nodes generate fewer sliced data in HEEPP. Therefore, HEEPP has lower energy consumption than SMART and EEHA.
In this paper, a privacy-preserving, energy-saving data aggregation scheme (EBPP) is presented to improve the slicing and assembling technology based on energy-balanced technology. According to its remaining energy and that of its neighbors, each node decides the number of sliced data. Meanwhile, we improve the query mechanism to decrease the waiting time and maintain the accuracy of aggregation data. Compared with the SMART, EEHA, and HEEPP schemes, sensor nodes reduce the communication overhead and save more energy in our scheme. Therefore, it can efficiently extend the network lifetime in our scheme.
The rest of the paper is organized as follows. We describe the related works and previously proposed techniques for data aggregation in Section 2. Section 3 discusses the system model and the security model. We then present our privacy-preserving, energy-saving data aggregation scheme in Section 4. Through the communication overhead analysis and the security analysis, the experimental results and comparisons are presented in Section 5. Finally, we have the conclusions in Section 6.
2. Related Works
In various sensor networks, secure data aggregation has received great attention in recent years. In this section, we describe the previously proposed technologies that were designed to protect private data and establish energy-efficient topologies to save energy.
For the preservation of data privacy, a data aggregation strategy based on polynomial functions is proposed in [7]. Sensor nodes use the coefficient of polynomial functions instead of the real data. Their scheme can reduce the communication overhead and protect data privacy. A data aggregation method with dynamic sensor clusters is proposed in [8], which uses the elliptic curve cryptography algorithm (ECC). They use ECC to generate binary string as encryption keys. Their method can prevent an adver¬sary from gaining the original data. On the other hand, [9] proposed an encryption scheme based on ECC and homomorphic encryption to protect data privacy in WSNs.
In order to solve the snapshot data aggregation problem and the continuous data aggregation problem, Ji et al. [10] proposed two continuous data collection algorithms in probabilistic WSNs. For secure, continuous aggregation, a scheme for detecting false temporal variation patterns is proposed in [11]. The scheme only checks parts of the aggregation results and verifies the temporal variation patterns. PECDA is proposed in [12], which can protect data privacy through inexpensive encryption techniques. The scheme can also remove redundant data efficiently and reduce the communication overhead.
The secret confusion data aggregation strategy proposed in [13] can efficiently save energy and protect privacy. In the confusion phase, the scheme uses positive-negative pairs as well as a confusion factor to confuse real data and their sources. In the aggregation phase, in order to reduce network traffic and message collision, the scheme uses a positive-negative neutralization strategy and a time slice allocation method. The LIPDA proposed in [14] protects private complex data with homomorphic encryption. This scheme uses the integrity verification method to maintain the reliability of data. The RCDA proposed in [15] can recover the sensing data from the aggregated data in the base station and reduce the transmission overhead. Meanwhile, Gupta et al. [16] proposed many modern cryptographic solutions for computer and cyber security.
In addition, the recently developed Internet of Things (IoT) is closely related to cloud computing and big data [17,18]. Therefore, data aggregation can be widely used in IoT, such as smart cities, smart buildings, pollution monitoring, forest fire prevention, military applications and so on. As such, data aggregation can save the energy of the entire network and improve the accuracy of data collection. Our scheme can also be deployed in these applications, and it can reduce communication consumption.
3. Network and Adversary Models
3.1 Network Model
Sensor networks include different types of sensor nodes that have been deployed to monitor the environment or collect data and send information to the sink in an area. In sensor networks, each sensor sends data to its neighboring nodes within its radio range.
In this paper, we construct an aggregation tree that includes the leaf nodes, the intermediate nodes, and the base station (BS). In the network, all sensor nodes need to send messages to the BS. The intermediate nodes mainly manage the leaf nodes. Meanwhile, they receive the mixing data transmitted from the leaf nodes and generate new results. Then the intermediate nodes send the aggregation results to their parents until the BS receives the results. The leaf nodes slice their private data into several pieces with the slicing and assembling technology and transmit the pieces to different neighbors. The leaf nodes not only mix their remaining one piece and the received data to get a new result but also transmit the new result to their parents. In this paper, we mainly concentrate on additive aggregation functions that are simply used in data aggregation.
3.2 Adversary Model
In wireless sensor networks, we assume that an adversary is a motivated, funded attacker who wants to eavesdrop on the communication and obtain sensitive data information. The adversary has unlimited energy resource, enough computation power, and sufficient data storage. He/she can use the leaked sensitive data to threaten the sensor networks, such as health monitoring networks. For data aggregation, the adversary tries to generate fake messages and send them back to the user.
We assume that the adversary randomly stays in the network and constantly monitors and eavesdrops on the communication among sensors. He/she wants to find the location information of leaf nodes. Upon compromising a leaf node, the adversary will gain private data, and the compromised node will transmit fake messages to the user. Thus, we focus on protecting data privacy and preventing the adversary from eavesdropping in WSNs.
4. Privacy-Preserving, Energy-Saving Data Aggregation Scheme
In this section, we give the basic idea of EBPP. In our scheme, the slicing and assembling technology can efficiently prevent the adversary from getting private data. We also assume that each node uses secret keys to encrypt the transmitted data so that the adversary will find it difficult to gain the content of transmitted packets. There are many key pre-distribution protocols to protect the messages’ security. Many key pre-distribution protocols can be applied to protect data security [19-21]. This way, the adversary cannot use the content to trace the object. The EBPP scheme is presented in detail. There are five phases for forming the EBPP scheme: constructing the aggregation tree, slicing, mixing, aggregation data, and data confirmation mechanism.
4.1 Aggregation Tree Construction
Similar to the HEEPP [6] scheme, we construct an aggregation tree that includes the leaf nodes, the intermediate nodes, and the BS. The leaf nodes slice and assemble pieces of data, and the results are transmitted to the parents. The intermediate nodes are mainly responsible for the management of the leaf nodes. They also collect the aggregation messages. Thus, when the scheme increases the proportion of intermediate nodes, the communication overhead will decrease.
Nonetheless, we improve the initial deployment phase, which is different from SMART, EEHA, and HEEPP. Before being deployed in the network, each node constructs its own neighbor table and energy table. The energy table of one node records its remaining energy and that of its neighbors. Sensor nodes should have two parameters set: NORMAL_CHILDREN and ADDITIONAL_NUM. NORMAL_ CHILDREN is usually the maximum number of child nodes. In particular cases, nodes can use ADDITIONAL_NUM, which is the number of additional child nodes. For instance, if one intermediate node is removed from the network, and its leaf nodes should search a new intermediate node, other intermediate nodes can use ADDITIONAL_NUM to add additional leaf nodes. In Fig. 1, the set of leaf nodes includes node 4 to node 10, which slice and mix their pieces. The set of intermediate nodes includes nodes 1, 2, and 3
Aggregation tree construction.
4.2 Slicing
In this step, we adopt and improve the slicing technique based on the energy-balanced mechanism, which is different from SMART, EEHA, and HEEPP. We assume that node [TeX:] $$i(i=1, \cdots, N)$$ gains the sensor data, which is denoted by [TeX:] $$k_{i}.$$ Then data [TeX:] $$k_{i}$$ is randomly divided into pieces by leaf node . In other words, the sum of pieces is equal to sensor data [TeX:] $$k_{i}.$$ Nonetheless, the intermediate nodes do not slice their data in the slicing phase. The number of pieces in a leaf node is a change value that is randomly distributed based on the node's remaining energy with certain probability [TeX:] $$p_{i}.$$ When a leaf node has low remaining energy, the number of slicing pieces has a high probability of decreasing. Table 1 shows the probability of the number of slicing pieces based on the remaining energy in one node. Therefore,
Probability of the number of slicing pieces based on the remaining energy in one node
After slicing the private data, node keeps one of the pieces itself and encrypts the remaining pieces. The [TeX:] $$M-1$$ [TeX:] $$M-1$$ pieces are then transmitted to the neighbor nodes by node i with h hops. We assume that node [TeX:] $$i^{\prime}$$ s set of neighbor nodes is [TeX:] $$\Omega_{i} \text { and } h=1 . d_{i j}$$ indicates that the information is sent from node i to node j. If node i does not transmit any slice to node j, [TeX:] $$j, d_{i j}=0.$$ Therefore,
and the final aggregation result in the BS is expressed by
where [TeX:] $$d_{v}=0, \forall j \notin \Omega_{i}.$$ Fig. 2 shows the slicing phase.
4.3 Mixing
Each leaf node slices its private data to neighbors and waits for some time until it receives all slicing messages from other nodes. After that, each leaf node uses the shared key with the sender to decrypt the received piece. Then the leaf nodes aggregate all the received pieces and its own remaining piece. Meanwhile, intermediate nodes mix the received pieces and their private data to generate a new result. For instance, [TeX:] $$g_{5}=d_{45}+d_{55}+d_{65}$$ shows that node 5 receives two pieces from neighbors and gets [TeX:] $$g_{5},$$ which includes the received messages [TeX:] $$$d_{45}$ and $d_{65}$$$ and its own remaining data [TeX:] $$d_{55}.$$ Other nodes do not transmit pieces to node 5. Fig. 3 shows the details of the mixing process.
4.4 Aggregation
Leaf nodes aggregate the received pieces into new results, which are encrypted. The encrypted data is then sent to the intermediate nodes, which then receive and decrypt the encrypted messages. They also generate the new results. The new results in the intermediate nodes consist of slicing data [TeX:] $$d_{i j}$$ received from their neighbor nodes, sensed data [TeX:] $$k_{i}$$ they collected, and mixing results [TeX:] $$g_{i}.$$ The results are then sent to their parents by the intermediate nodes until the results are sent to the base station. Fig. 4 shows the aggregation process.
4.5 Data Confirmation Mechanism
In the HEEPP scheme, the intermediate nodes need to wait a long time to receive results from their child nodes with the data query mechanism. In order to decrease the waiting time, we improve and present a data confirmation mechanism. When an intermediate child node completely transmits the aggregation results to its parent node, it can take the initiative to send a confirmation message. Upon receiving a confirmation message, an intermediate node does not wait for the intermediate child node. If an intermediate node does not receive a confirmation message, it can use the data query mechanism. Therefore, unlike other schemes, the data confirmation mechanism can increase the success rate of data transmission and maintain the accuracy of the aggregation messages but can also decrease the waiting time to receive the aggregation results. Algorithm 1 describes the data aggregation phase of the EBPP scheme.
5. Performance Evaluation
For simulation experiments, there are many simulation platforms such as TOSSIM [22], NS-3 [23], and J-Sim [24]. In this section, our simulation is based on TOSSIM. We randomly deploy 400 sensor nodes in a square area of 20 m × 20 m. Tables 2 and 3 show the probability of the number of slicing pieces based on the remaining energy in one node, where the values of R are presented as R = 3 and R = 5, respectively. According to the simulations, we compare the performance of SMART, EEHA, and HEEPP with the EBPP scheme in terms of communication overhead and privacy preservation.
Probability of the number of slicing pieces based on the remaining energy in one node (R = 3)
Probability of the number of slicing pieces based on the remaining energy in one node (R = 5)
5.1 Energy Consumption Analysis
For SMART, EEHA, HEEPP, and EBPP, the communication overhead is shown in Fig. 5 with R = 3 under different time intervals. Similar to Fig. 5, Fig. 6 shows the performance of different schemes with R = 5. According to the simulation results, the EBPP scheme sends fewer messages and saves more communication overhead than other schemes. In SMART, the original message is divided into pieces in a node. Then the rest of the R – 1 pieces are sent to the selected neighbors. We assume that the network has N nodes to collect information. Thus, there are [TeX:] $$N \cdot R$$ exchanged messages in the SMART scheme. Unlike the SMART scheme, there are different functions between the leaf nodes and the intermediate nodes in the remaining three schemes. We assume that the percentage of intermediate nodes is , so there are [TeX:] $$\rho \cdot N$$ intermediate nodes. In other words, the intermediate nodes send [TeX:] $$\rho \cdot N$$ messages to their parents. Thus, different schemes have different number of leaf nodes among the EEHA, HEEPP, and EBPP schemes. For EEHA, leaf nodes transmit [TeX:] $$(1-\rho) \cdot N \cdot R$$ pieces to neighbors. In the HEEPP scheme, a leaf node divides an original message into M pieces with probability [TeX:] $$p_{M}.$$ As such, we assume that expectation [TeX:] $$E(M) \quad \text { is } \quad \sum_{M=2}^{R} M \cdot p_{M-1}.$$ Therefore, leaf nodes send [TeX:] $$(1-\rho) \cdot N \cdot \sum_{1 / L=2}^{R} M \cdot p_{M-1}$$ pieces to neighbors, where [TeX:] $$\sum_{M=2}^{R} M \cdot p_{\mathcal{M}-1}=E(M).$$ Compared to the HEEPP scheme, a leaf node divides an original message into M pieces with probability [TeX:] $$p_{\mathbf{M}}^{*}$$ based on the remaining energy in the EBPP scheme. Thus, we assume that expectation [TeX:] $$E^{*}(M) \text { is } \sum_{N=2}^{R} M \cdot p_{M-1}^{*}.$$ As such, leaf nodes send [TeX:] $$(1-\rho) \cdot N \cdot \sum_{M=2}^{R} M \cdot p_{M-1}^{*}$$ pieces to neighbors, where [TeX:] $$\sum_{M=2}^{R} M \cdot p_{M-1}^{*}=E^{*}(M).$$ Therefore, the communication overhead of four schemes is given by
From [6], the relationship among the SMART, EEHA, and HEEPP schemes in terms of communication overhead is [TeX:] $$\mathrm{HEEPP}<\mathrm{EEHA}<\mathrm{SMART} .$$ Let r be the ratio of communication overhead between HEEPP and EBPP, which can be given by
Data acquisition from IoTs to MEC server using a UAV.
Communication overhead in different schemes (R = 5).
For the HEEPP scheme, each node slices the fixed number of pieces. According to the remaining energy, however, each node decides the number of slicing data with probability [TeX:] $$p_{i}$$ in EBPP. When a leaf node has low remaining energy, the probability of few slicing data increases. Thus, expectation [TeX:] $$E^{*}(M)$$ in EBPP is smaller than expectation [TeX:] $$E(M)$$ in HEEPP. As such, the value of r is less than 1 [TeX:] $$(r<1).$$ In other words, EBPP incurs less communication overhead than HEEPP. For example, in HEEPP, the number of slicing data maintains uniform distribution. When the value of R is 5, we assume that the probability of uniform distribution is 0.25. For the EBPP scheme, the parameters are shown in Table 3. As such, the expectations are shown as
Therefore, [TeX:] $$E^{*}(M)<E(M).$$ In other words, when the expectation value is smaller, the number of exchanged information is smaller among nodes. The communication overhead in EBPP is smaller than that in HEEPP.
5.2 Privacy Analysis
From [4] and [6], eavesdropping and colluding can cause privacy disclosure. Meanwhile, according to [4] and [6], we get [TeX:] $$p_{\text {overhear}}=p_{\text {collude}}=q,$$ where q is the probability of privacy leaks. [TeX:] $$P(q)$$ indicates whether the scheme can efficiently protect privacy, which can be approximated by
where [TeX:] $$d_{\max }$$ is the maximum in-degree in a network and [TeX:] $$p(\text {in}-\text {degree})$$ is the probability that the in-degree of a node is k .
Figs. 7 and 8 show the privacy preservation performance with R = 3 and R = 5, respectively, for different schemes. Compared with the other three schemes, the EBPP scheme can protect private data better. Owing to the fewer messages exchanged compared to other schemes, it is difficult for the adversary to obtain all sliced pieces of one node. Meanwhile, when the value of M is uncertain based on the remaining energy in one node, the adversary cannot monitor and obtain detailed information.
Privacy preservation in different schemes (R = 3).
Privacy preservation in different schemes (R = 5).
6. Conclusion
In order to reduce communication overhead and preserve sensitive data, we have proposed a privacy-preserving, energy-saving data aggregation scheme in this paper. We have improved the aggregation tree construction phase and the slicing phase and increased the data confirmation mechanism. Each leaf node can divide its private data based on its remaining energy. Meanwhile, the leaf nodes transmit the pieces to the neighbors with high remaining energy. Our simulation results show that this method is more efficient in balancing energy dissipation, prolonging the network lifetime, and providing privacy preservation compared to the existing scheme.
Acknowledgement
This work was supported by the NSFC (No. 61402015), the Science and Technology Development Plan Project of Henan Province (No. 172102210189), the Research Fund Project of Henan University (No. 2016YBZR019), and the Key Scientific and Technological Project of Henan Province (No. 192102210277).