1. Introduction
Wireless sensor networks (WSNs) technologies have been attracting a lot of attention from researchers because of the potential use in various fields, i.e., military, environment, city, and so on. However, the realization of WSNs is slow with many initial cost of installing the sensors. Thus, a new service paradigm, crowdsensing, has recently emerged to reduce the initial cost of deploying WSNs and provide the service even when it is difficult or impossible to install the sensors realistically. Crowdsensing is a system that collects sensing information from the public with sensors such as smart phones and smart watches, and provides services to users who requires the sensing information [1,2]. Especially, crowdsensing service can be easily operated in a crowded city where many people with sensors have, that is essential for implementing urban-based Internet services [3,4].
As an application of urban-based Internet services, smart parking system provides saturation information as well as parking lot information, i.e., location, the number of accommodated vehicles, and parking fee. In most existing smart parking systems, saturation information is provided through sensors installed at each parking space in entire parking lot [5]. This method can provide accurate saturation information for the entire parking lot. However, it cannot be used in parking lots without sensor installed, the initial cost for sensor installation is high, and thus smart parking system has low penetration into small parking lots. Using crowdsensing, parking lot saturation information can be provided based on the participation of the parking lot user without installing the sensor. As our previous work [6], we designed a rough architecture of smart parking system based on crowdsensing to replace existing parking systems and only focused on removing privacy information in sensing data.
In the system of [6], it is assumed that the sensing data provider provides the service provider server with saturation for 4 to 6 parking spaces on average through the photograph information. Accordingly, in order for the service provider server to provide the actual saturation information of the entire parking lot to the user, a plurality of sensing data is required. The amount of required sensing data increases in proportion to the size of the parking lot (the number of parking spaces). Moreover, since the parking lot saturation rapidly changes, the amount of sensing data required to provide the actual saturation information of the entire parking lot in real time further increases. The increase in the amount of sensing data required by the server to the user means an increase in the compensation budget paid to the sensing data provider. Therefore, in reality, it is difficult to satisfy the amount of sensing data required to calculate the actual saturation of the entire parking lot. Thus, in order to operate the smart parking system based on crowdsensing in reality, it is ideal to provide whole saturation prediction information based on the sensing data including the saturation information of a part of the parking lot.
In this paper, we propose a method to process sensing data provided by a sensing information provider and convert it into saturation information. In particular, we propose a prediction model that predicts the total saturation based on the saturation information of a part of the parking lot provided by the user using regression analysis. That is, we design the information processing module of service provider system in detail, based on a rough architecture of smart parking system based on crowdsensing, our previous work. We perform prediction model learning with real sensing data gathered from a specific parking lot, and evaluate the performance of the prediction model to show its efficiency and feasibility of the system.
This paper is organized as follows. Section 2 presents related work. In Section 3, we propose a prediction model for smart parking system. Section 4 shows the results of prediction model learning. Section 5 evaluates our proposed model, and Section 6 concludes this paper.
2. Related Work
2.1 Crowdsensing
Fig. 1 shows a flow from information request to information provision in crowdsensing service. The steps are as follows: (1) ‘User A’ who needs information sends a request for information to the server. (2) The server sends a request for providing the sensing data to the ‘user B’, which can provide the information required by the ‘user A’. (3) ‘User B’ senses the data and (4) sends it to the server. (5) The server processes the sensing data provided by the ‘user B’ and makes the information necessary for the ‘user A’. (6) The server transmits the processed information to ‘user A’.
If it is difficult or impossible to install a sensor on a system that needs to use sensor information, crowdsensing can be used to implement the system without installing a sensor. There are two types of crowdsensing: participatory crowdsensing and persistent crowdsensing. The participatory crowdsensing sends the request to the user when an information needs, and the user who receives the request decides whether to participate [7]. At this time, the user providing the sensing data obtains compensation for the sensing data from the server [8]. In the persistent crowdsensing, a server continuously collects information from users [9].
Crowdsensing service flow.
In the smart parking system proposed in this paper, a participatory crowdsensing is applied to send sensing data provision request to the user in the place where the sensing data is needed, and the parking lot saturation information is predicted and provided based on the sensing data provided by the user [6].
2.2 Research Trends of Crowdsensing
Crowdsensing technologies have attracted much attention as an alternative to WSNs that has a large initial facility cost, and related research is proceeding actively. We introduce research trend of crowdsensing technologies. There are incentive mechanism research for realizing crowdsensing, security problems to assure data reliability or protect personal information concerned in sensing data sharing, and related application research.
2.2.1 Incentive mechanism
As an open issue of crowdsensing, incentive mechanism is an important part to incentivize users to be workers providing sensing services. That is, in real world, users might not provide sensing data if crowdsensing system would not pay adequate incentive. Moreover, if every user who provide sensing data gets the same amount of incentive, users would not try to provide high-quality sensing data. Thus researchers in [10] proposed an incentive mechanism based on a quality-driven auction, and included a probabilistic model to evaluate the reliability of the submitted data through Wi-Fi fingerprint-based indoor localization. Meanwhile, if users could not get satisfactory incentive, users would not provide data. The proposed incentive mechanism QUOIN in [11] for application requirements and guarantees each participant achieves a satisfactory level of profits. Similarly adequate incentive in [12,13] is decided by estimating data quality and participants are paid incentive differently depending on estimation results. Effort to collect sensing data and the trustiness of users in [14] are suggested as a way to decide the amount of incentive. On the contrary, some researches consider a way to operate unpaid crowdsensing system [15].
2.2.2 Security
There are two main security issues in crowdsensing: assuring data reliability and preserving the privacy of sensing data providers. Firstly, crowdsensing system is operated with sensing data provided by participants to make meaningful information, and thus the success of the crowdsensing service depends on the reliability of the data. Moreover, if there are some users have malicious purpose, i.e., malfunction of crowdsensing system or ill-gotten incentive, crowdsensing system could not provide meaningful information using sensing data. Researchers in [16] proposed a framework FIDC to defend collusion attacks that malicious participants collaboratively send fake information to mislead the system and toimprove data credibility. In [17], researchers focused on detecting these Sybil identities who falsify multiple identities and negatively influence the effectiveness of sensing data, through trust management system. That is, many researchers have concentrated on designing methods to block invalid data provision and assure data reliability through data quality estimation and user trustiness management.
Secondly, many sensing data in crowdsensing system include the private information of data providers, e.g., location [18], provider information [19], etc. In crowdsensing system without proper privacy preservation mechanisms, a user would not participate in sensing despite pronounced incentive. Thus, many researchers [18-21] concentrate on taking countermeasure for privacy issues to make users participate in providing sensing data. The mechanism proposed in [20] utilizes a random-sampling based on the privacy auction for data in order to guarantee the data privacy of participants.
2.2.3 Application
Crowdsensing system could be applied to various sensing data-based system, especially for the cases which is hard to install sensors and build sensor network, i.e., smart city [22], wireless LAN monitoring [23,24], data recognition [25], etc. vCity Map [22] visualizes the city environment for two kinds of information: sound and road conditions. Pazl [23] provides indoor WiFi monitoring system with
Summarization of main researches
individual measurements taken from participant phones. Similarly researchers [24] presented crowdsensing based urban WiFi characterization, i.e., the presence of deployed WiFi AP, used channels, and location. The authors [25] designed an algorithm for finding regions of interest in mobile crowdsensing, through utilizing cycles of crowd-querying and feedback.
In this paper, we design a crowdsensing based smart parking system, and provide the saturation prediction model of parking lot from the image information of participants. Table 1 summarizes the main researches.
2.3 Regression Analysis
Regression analysis is an analytical method that evaluates the performance of a model that represents the relationship between two or more variables [26]. The variables to be predicted through regression analysis are called dependent variables and the factor variables that affect dependent variables are called independent variables. In simple regression analysis, one independent variable is used to predict the value of a dependent variable. In multiple regression analysis, two or more independent variables are used for the value of a dependent variable. If the relationship between the independent variable and the dependent variable forms a nonlinear curve, the dependent variable can be predicted using a polynomial model with the form of Eq. (1), where y denotes a dependent variable, x is an independent, [TeX:] $$w_{i} \mathrm{s}$$are coefficients [TeX:] $$(0 \leq i \leq d)$$, and d is the degree of polynomial. The degree d is decided depending on the relationship between independent and dependent variables.
In this paper, we make a time-based polynomial regression model that predicts the total saturation of parking lot with time as independent variables. For example, the model predicts the saturation of ParkingLotA at 20:00 as 0.8 (saturation 1 means full). We then create a sensing data-based linear multiple regression model such as Eq. (2) with saturation value [TeX:] $$x_{1}$$predicted by time-based regression model, the saturation degree [TeX:] $$x_{2}$$ of parking lot part included in sensing data, and the number of parking spaces [TeX:] $$x_{3}$$ in the sensing data. The sensing data-based linear multiple regression model predicts the degree of saturation y of the parking lot based on the user provided sensing data. [TeX:] $$w_{i} \mathrm{s}(0 \leq i \leq 3)$$ are a weight that indicates the effect of each independent variable on the dependent variable
Using the generated regression model, we predict the saturation of the parking lot based on the sensing data provided by the user (Provider in Fig. 1) and provide the saturation information to the user (Consumer in Fig. 1). The performance of the generated regression model is evaluated by the mean squared error (MSE) value [27], which represents the error between the predicted value and the actual saturation of the regression model, and the R-squared ([TeX:] $$\mathrm{R}^{2}$$) score, which represents the degree to which the predicted value of the regression model is suitable to represent the actual saturation [28].
Previous Work [6]
Fig. 2 is a system structure proposed in [6], as our previous work, to provide crowdsensing based smart parking system. The system consists of two entities: users who provide sensing data or use information, and a service provider that provides and operates a service. The user application includes an Information Sensing module, an Information Requesting/Using module for requesting information provided to the service provider or for using the provided information, an Information Processing module for removing personal information from the sensing data, and an Incentive Managing module that manages compensation for sensing data. The server of service provider is composed of a User Managing module, an Information Processing module for detecting an empty parking space and saturation prediction, and an Incentive Managing module for managing the compensation paid to the sensing data provider.
The processing flow between two entities is shown in Fig. 3. The service user application periodically sends location information to the server. (1) When the server of service provider receives a request for parking information from a user A, (2) it searches for users in the corresponding area, and (3) requests information provision. (4) (5) User B requested to provide information removes personal information (i.e., image obfuscation process) such as a license plate before transmitting the photographed parking lot image to the server. (6) The server calculates the position of the empty parking space and the parking lot saturation information through the received information, and (7) provides the information to the user A.
Based on the previous work [6], in this paper, we propose a prediction model that the Information Processing module of the service provider server uses to predict the total saturation based on sensing data supplied from the users. We also evaluate the performance of the prediction model and show its feasibility.
3. Information Processing Module for Prediction
In this paper, we focus on designing an Information Processing module in service provider server on smart parking system of our previous work [6]. Fig. 4 shows the structure of the information processing module, including a parking Block Size Decision module for determining the number of parking spaces included in one parking block, a Data Processing module for converting the sensing data into data for predicting saturation, and a Saturation Prediction module for predicting the total saturation based on the data processed in the data processing module.
Information Processing module structure.
3.1 Block Size Decision
By separating the parking lots into several blocks with a certain number of parking spaces, it is possible to specify more precisely which block the empty parking space is in. Moreover, by separating blocks, different saturation patterns can be predicted and users can be guided to approach parking blocks with low saturation.
The smaller the number of parking spaces included in one parking block, the more accurate the position of the vacant parking space can be provided to the user. However, if the number of parking spaces included in one parking block is too small, the amount of sensing data needed to predict the total saturation increases, losing the benefit of using the prediction model. We will evaluate the prediction performance by changing the number of parking spaces included in one parking block, as future work.
3.2. Data Processing
The user provides the server with a partial image of the parking lot. The server detects the block number and the parking status in the provided image. The cascade detection algorithm used in [6] is used for the detection of block number and parking status.
The sensing data structure detected from the image provided by the user is as shown in Fig. 5. The value of the timestamp is the time at which the information was provided. The block indicates to which block the parking space indicated in the image belongs. The value of the length is the number of parking spaces included in the image. The value of the state is a bit string indicating the parking status of the parking space included in the image. Sensing data providers may be asked to transmit information that may be a reference point, such as a number on a parking pillar or on a wall. Thus, the state bits can consist with the reference position information. An empty parking space is expressed as the state value 0, and a parking space in which the vehicle is parked is represented by 1. The value of the saturation is the degree of saturation of the portion of the parking lot shown in the image.
Eq. (3) represents the method of calculating saturation, which is the value of the saturation [TeX:] $$S$$. [TeX:] $$l$$ is the number of parking spaces included in the image and is equal to the value of the length field in the sensing data structure. [TeX:] $$a_{n}$$ is the [TeX:] $$n^{th}$$ matrix representing the parking state of the parking spaces included in the image, and is equal to the value of the state field in the sensing data structure.
If the sensing data provided by several users is data for the same part of the parking lot, the accuracy of the prediction result may be degraded by predicting saturation based on the actual parking state and other data. Therefore, redundancy is eliminated by using the state field value of the sensing data structure. Fig. 6 is the example diagrams illustrating a method of eliminating redundancy for sensing data A and sensing data B provided by different users. These bits strings for sensing data consist with reference position information (e.g., parking lot number A0, A1, and so on). In Fig. 6(a), when the state bit string of the sensing data A is included in the state bit string of the sensing data B, the sensing data A is deleted. In Fig, 6(b), if there is an overlap between the state bit string of the sensing data A and the state bit string of the sensing data B, the overlapping portion is deleted from the state bit string of the sensing data B and the two are collected. Based on this, the data processing module eliminates redundancy and keeps the data up-to-date if changes occur in a short time.
Examples of removing duplicate sensing data. (a) Case that one string is included in another string, (b) Case that an existing string overlaps a new string.
3.3 Saturation Prediction
According to the location of the parking lot, the shape of the hourly saturation is different. For example, a parking lot in a residential area shows high saturation before and after work hours on weekdays, and the parking lot of a large mart shows high saturation on a weekend afternoon. Therefore, the prediction model uses the location and time of the parking lot as variables. The location of the parking lot is set based on the location specified in the data supply request. The time is the value of the timestamp field included in the sensing data. Since the prediction model using the location and time of the parking lot depends on the existing data, the accurate saturation of real time cannot be predicted. Therefore, in order to obtain accurate prediction results based on real-time data, the saturation degree of the parking lot part included in the sensing data is used as a variable. At this time, the saturation degree of the parking lot portion is set to the saturation field value of the sensing data. The server provides the user with a block number and saturation prediction for the parking block with an empty parking space. The real-time accuracy of the information can be evaluated and provided according to how much information provided at this time refers to the real-time sensing data.
4. Prediction Model Learning
The prediction model of this system consists of a polynomial regression model that predicts the degree of saturation with time as a variable and a linear regression model that predicts the degree of saturation of the part of the parking lot that is not included in the sensing data with the information of the parking lot part included in the sensing data. A time-based prediction model predicts the degree of saturation observed at a specific time in a specific place, and then increases its accuracy with a sensing data-based prediction model. The learning data for the prediction model learning was collected hourly for 2 days at the same place with the hourly saturation degree for 52 parking spaces in total. We use TensorFlow [29] to generate prediction models by artificial neural network based on sensing data. In artificial neural network, we set up the saturation degree of the part included in the sensing data, the size of the sensing data, and the time-based predicted saturation as the input variables. We also set up the saturation degree of the part not included in the sensing data as the dependent variable. Using the Gradient Descent algorithm, we learn until the error value dropped below 1e-5. We also use scikit-learn [30] to calculate the accuracy of predicted values, [TeX:] $$\mathrm{R}^{2}$$ and MSE values through the generated prediction model.
Fig. 7 is a graph showing the real saturation information of the hourly hour angle at the same place and the time-based polynomial regression model learned based on the actual saturation information. As a result of the prediction of the degree of saturation according to time, the accuracy is relatively high in the time zone where the daily saturation change is not large. However, it can be confirmed that the accuracy is lowered when the saturation aspect appears slightly different every day. In this case, a linear regression model based on the user-provided sensing data can be used together to obtain a higher accuracy saturation prediction value.
The degree of the polynomial used in the time-based polynomial regression model is determined by the MSE value of the learned regression model and the score of [TeX:] $$\mathrm{R}^{2}$$. The MSE value and [TeX:] $$\mathrm{R}^{2}$$ score according to the order of the polynomial are shown in Fig. 8. When the degree is above 5, the lowest MSE value and the highest [TeX:] $$\mathrm{R}^{2}$$ score are shown. The higher the degree of the polynomial, the more time it takes to predict the degree of saturation. Therefore, it is advantageous to select a lower order if the change in the MSE and [TeX:] $$\mathrm{R}^{2}$$ values is small. The degree of the polynomial is set to 5 in the time-based polynomial regression model proposed in this paper.
Degrees of MSE and [TeX:] $$\mathrm{R}^{2}$$.
The sensing data-based prediction model predicts the degree of saturation of the parking space not included in the sensing data, based the predicted saturation degree predicted by the time-based prediction model, the degree of saturation of a part of the parking lot included in the sensing data, and the number of parking spaces included in the sensing data as variables.
Fig. 9 shows the MSE value according to the size of the sensing data (the number of parking spaces included in the sensing data) input to the sensing data-based prediction model. The results show that the sensing data generation, learning, and MSE calculation of the predicted values are repeated 10 times and averaged. When the size of the sensing data is 0, it means that the saturation degree is predicted using only the time-based prediction model. The MSE value when the sensing data-based prediction model is used together is lower than the MSE value when only the time based prediction model is used. Moreover, it can be confirmed that the MSE value decreases as the number of parking spaces included in the sensing data increases and the sensing data, and the sensing data supplements the lack of only time-based prediction model. This means that the prediction accuracy is higher when the sensing data based prediction model is used together than when using only the time based prediction model.
MSE of time-based and sensing data-based models.
5. Performance Evaluation
In this section, we evaluate the performance of our system model depending on the block size and whether or not the outlier is removed.
5.1 Parking Block Size
Fig. 10 is a graph showing the MSE value according to the number of parking spaces included in one parking zone and the required minimum data quantity. The total number of parking spaces in the whole parking lot is 52, which is the result of separation into 1 parking block, 2 parking blocks and 4 parking blocks, respectively. The minimum amount of sensing data required to predict the overall parking saturation increases in proportion to the number of parking spaces. The MSE value of the saturation prediction result is lowest in the model where the entire parking lot is divided into two parking blocks.
Block size-regression performance.
If the parking block is not divided (52 parking spaces per parking block), the MSE value of the predicted model is relatively high, while the total saturation can be predicted with only one sensing data. If the entire parking lot is divided into 4 parking blocks (13 parking spaces per parking block), the MSE value of the predicted model is relatively low, but the minimum amount of sensing data required to predict the total saturation is high. The lowest MSE value was obtained when the entire parking lot was divided into 2 parking blocks (26 parking spaces per parking block). In the case with 2 parking blocks, the minimum number of sensing data required to predict the total parking lot saturation was also smaller than the case that is divided into 4 parking spaces. If the user participation rate is low, the parking zone size with the lowest MSE value of the prediction model can be selected within the range of satisfying the required data quantity.
It can be expected that the MSE value of the prediction model decreases as the parking blocks are separated. In addition, when the parking lots are divided into a plurality of blocks, it is possible to more precisely specify the position of the vacant parking space and provide the information to the user. Therefore, it is more efficient to separate parking lots into multiple parking blocks if the user has sufficient data participation rate.
5.2 Outlier Removal
In this subsection, we compare the results with and without outlier removal. Results are taken with 2,000 sensing data in where the block size is 52. In case of outlier removal, we remove the top 5% of the difference between the saturation and actual saturation in the generated sensing data. Fig. 11 is a graph showing the MSE value of the regression model that is learned based on the learning data with and without the anomaly values. The results are averaged over repeated 10 times. The outlier removal is performed by deleting the upper 5% of the data based on the difference between the degree of total saturation and the degree of partial saturation included in the training data. As a result, the MSE value when the outlier values are removed is smaller than the value when they are not.
If the size of the parking block is small, the probability of a large difference between the degree of total saturation and the degree of partial saturation is relatively low. Therefore, higher prediction accuracy can be expected by using the prediction model learned based on the learning data from which the outlier values are removed. However, if the size of the parking block is large, the difference between the degree of total saturation and the degree of partial saturation is relatively high. Therefore, prediction accuracy can be lowered by using prediction models learned based on learning data from which outliers have been removed. In this case, it is safe not to apply the manipulation to the learning data because the difference between the MSE values of the prediction model in which the outliers are removed and the prediction model in which they are not is not large.
6. Conclusion
In this paper, we propose a method of predicting total saturation based on sensing data in information processing module of crowdsensing based smart parking system. We have implemented a predictive model and performed the prediction model learning. The comparison of MSE between time-based prediction model and combined prediction model showed that sensing data-based prediction model makes the high prediction accuracy, because the sensing data provides insufficient information in time-based prediction model. As future research, we will design the compensation mechanism of the proposed system and develop a realistic crowdsensing based smart parking system. For example, even if users do not consciously participate in sensing data provisioning, the application can use devices that can collect sensing data, e.g., the black box or rear camera of vehicle, CCTV, etc. By setting the vehicle cameras to periodically send photos when the car is parked in public parking lots, users will be able to receive incentives for crowdsensing.
Acknowledgement
This research was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No.2018R1A2B6009620).