## Syed Nazir Hussain* , Azlan Abd Aziz , Md. Jakir Hossen , Nor Azlina Ab Aziz , G. Ramana Murthy and Fajaruddin Bin Mustakim## |

Time intervals | RMSE | ||
---|---|---|---|

CNN | LSTM | CNN-LSTM | |

7 | 9281 | 13356 | 5284 |

15 | 12886 | 10109 | 4908 |

30 | 10888 | 9827 | 6844 |

Average | 11018 | 11097 | 5678 |

We also determine that the hybrid CNN-LSTM model is less computationally expensive and a good fit on the input dataset while training three different time-step prediction ranges such as 7, 15, and 30 time-steps. CNN-LSTM model fitness determined by evaluating the training and validation loss plots over a range of epoch cycles taken during training. When the model fit function called during compilation, it returns the model training history, including training and validation losses. Train and validation losses calculated at every epoch during model training, which allows us to diagnose model training fitness on input datasets.

The DNN models fitness on the given input dataset is primary classified into three categories: a good-fit, over-fit, and under-fit. If training and validation losses decrease and stabilize around at same point is considered a good-fit model. If the validation loss of a model keeps decreasing at a certain level and afterwards starts increasing, this is considered overfitting. Furthermore, if model validation loss is higher than train loss and tends to improve further, it is considered an under-fit model.

The hybrid CNN-LSTM model fits each of three different time-step prediction ranges shown in Fig. 12. It observed that training and validation loss from the hybrid CNN-LSTM model indicates a good-fit training condition on each of three distinct time-step prediction ranges. The model reaches 120 maximum range of epoch cycles only at 30 days' time-step predictions, as shown in Fig. 9. The number of epoch cycles on remaining time-steps prediction ranges such as forecast intervals of 7 and 15 days is below 100, suggesting that the model is less expensive in computation and good fits on an input training dataset.

Overall, the CNN-LSTM model average prediction error score for three different time intervals is comparatively lower than the single CNN and LSTM models. Furthermore, the CNN-LSTM average computational time measured in seconds is also comparatively lower than the single CNN and LSTM models, as shown in Fig. 12(c).

Fig. 13 represents a boxplot of each model training computation time for three different prediction ranges. We can determine from the experiment that the hybrid CNN-LSTM sequence model could be used to predict large gaps of missing values in our proposed framework.

Prediction of missing values large gaps in the electricity consumption time-series dataset is difficult due to the nonlinear data patterns that contain abrupt shifts. The recurrent neural network models in deep learning, such as LSTM and hybrid CNN-LSTM, are explicitly designed to predict data in the form of sequences such as time series. These models preserve the contextual information in the memory cells of previous input observations and effectively map an input to the output.

The average RMSE values of each model on three different time-steps are exceptionally high. The major problem is the lack of observed time-series data used to forecast and fulfil the specified range of missing values. The traditional statistical and machine learning methods, such as autoregressive integrated moving average (ARIMA) [30] and LSSVM [31], used in forecasting time-series data, would not be appropriate in this case. These models perform well when the time-series data patterns involve some sort of trend and seasonality.

Furthermore, the proposed framework is compatible with other time-series missing values dataset rather than electricity load data. The specified prediction ranges of this framework could extend according to the requirements. Our proposed framework prediction accuracy will further enhance if we incorporate multiple input features to support multivariate forecasting instead of univariate.

Missing values is one of the significant data quality issues in IoT-based SHS. If the collected data is missing or incomplete, it could eventually be useless. This paper proposed a novel framework for predicting large missing values gaps observed in IoT based home appliances electricity consumption time-series datasets. Missing values in SHS appliances datasets observed at 7, 15, and 30 time-steps, so we have defined these predictive ranges in our proposed framework to forecast the missing values within these ranges. The proposed framework automatically detects, predicts, and reconstructs the missing values within the defined ranges-a hybrid CNN-LSTM neural network used in the proposed framework to forecast the detected ranges of missing values. The CNN-LSTM model input historical data for training is four-times more extensive than the defined detected range after sampling.

A comparison experiment performed to analyze the forecasting performance of the CNN-LSTM model with its single variant such as CNN and LSTM on the defined missing values ranges. A tuning on crucial selected hyperparameters performed to determine the best hyperparameters combination for each model training configuration in this experiment. It observed that the CNN-LSTM model generates fewer prediction errors on average and takes less time to train the input datasets. The experiment results confirm the selection of a hybrid CNN-LSTM neural network as a predictive model for our proposed framework.

He received a B.S. degree in Computer Science from Usman Institute of Technology, affiliated with Hamdard University, Karachi, Pakistan in 2017. Since November 2019, he is with the Faculty of Engineering from Multimedia University, Malaysia, as an M.Eng.Sci. candidate. His current research interests include data analysis and data quality improvement. He had previously worked as a software engineer for Golpik Inc., a software company.

He received B.Tech. degree from Acharya Nagarjuna University, Andhra Pradesh, India in 1990, M.Tech. degree from G.B. Pant University of Agriculture & Technology, Uttar Pradesh, India in 1993, and Ph.D. from Multimedia University, Malaysia and secured the grant from TM R&D Telekom Malaysia in 2019. Currently he is working as a professor in E.C.E Department from Alliance College of Engineering and Design at Alliance University, Bangalore, India. His main research interests include VLSI, embedded systems, nanotechnology, memory optimization, low-power design, FPGA, and evolutionary algorithms.

He received a bachelor’s degree of Civil Engineering from (UiTM) University, and subsequently receiving an M.Sc. in Construction Management from Universiti Teknologi Malaysia (UTM). He was awarded a Ph.D. Engineering in Scientific and Engineering Simulation from the Nagoya Institute of Technology (NIT), Japan. Currently appoints as consultant project entitled "TMR Asynchronous V2V with NLoS Vehicular Sensing (V2V)" funded by TM R&D at Multimedia University.

- 1 N. H. Motlagh, M. Mohammadrezaei, J. Hunt, B. Zakeri, "Internet of Things (IoT) and the energy sector,"
*Energies2020*, vol. 13, no. 2, 2049.doi:[[[10.3390/en1304]]] - 2 M. A. Rahman, A. T. Asyhari, "The emergence of Internet of Things (IoT): connecting anything, anywhere,"
*vol.82019*, no. 2, 2004.doi:[[[10.3390/computers800]]] - 3 C. Paul, A. Ganesh, C. Sunitha, "An overview of IoT based smart homes," in
*Proceedings of 2018 2nd International Conference on Inventive Systems and Control (ICISC)*, Coimbatore, India, 2018;pp. 43-46. custom:[[[-]]] - 4 T. Banerjee, A. Sheth, "IoT quality control for data and application needs,"
*IEEE Intelligent Systems*, vol. 32, no. 2, pp. 68-73, 2017.doi:[[[10.1109/MIS.2017.35]]] - 5 H. Kang, "The prevention and handling of the missing data,"
*Korean Journal of Anesthesiology*, vol. 64, no. 5, pp. 402-406, 2013.custom:[[[-]]] - 6 A. N. Baraldi, C. K. Enders, "An introduction to modern missing data analyses,"
*Journal of School Psychology*, vol. 48, no. 1, pp. 5-37, 2010.custom:[[[-]]] - 7 D. Sovilj, E. Eirola, Y. Miche, K. M. Bjork, R. Nian, A. Akusok, A. Lendasse, "Extreme learning machine for missing data using multiple imputations,"
*Neurocomputing*, vol. 174, pp. 220-231, 2016.doi:[[[10.1016/j.neucom.2015.03.108]]] - 8 P. E. Bunney, A. N. Zink, A. A. Holm, C. J. Billington, C. M. Kotz, "Orexin activation counteracts decreases in nonexercise activity thermogenesis (NEA T) caused by high-fat diet,"
*Physiology & Behavior*, vol. 176, pp. 139-148, 2017.custom:[[[-]]] - 9 J. Poulos, R. V alle, "Missing data imputation for supervised learning,"
*Applied Artificial Intelligence*, vol. 32, no. 2, pp. 186-196, 2018.doi:[[[10.1080/08839514.2018.1448143]]] - 10 X. Xu, W. Chong, S. Li, A. Arabo, J. Xiao, "MIAEC: missing data imputation based on the evidence chain,"
*IEEE Access*, vol. 6, pp. 12983-12992, 2018.doi:[[[10.1109/ACCESS.2018.2803755]]] - 11 I. Lana, I. I. Olabarrieta, M. V elez, J. Del Ser, "On the imputation of missing data for road traffic forecasting: new insights and novel techniques,"
*Transportation Research Part C: Emerging Technologies*, vol. 90, pp. 18-33, 2018.custom:[[[-]]] - 12 S. Sridevi, S. Rajaram, C. Parthiban, S. SibiArasan, C. Swadhikar, "Imputation for the analysis of missing values and prediction of time series data," in
*Proceedings of 2011 International Conference on Recent Trends Information Technology (ICRTIT)*, Chennai, India, 2011;pp. 1158-1163. custom:[[[-]]] - 13 T. A. Mohamed, N. El Gayar, A. F. Atiya,
*in ANNPR 2014: Artificial Neural Networks in Pattern Recognition*, Switzerland: Springer, Cham, pp. 93-104, 2014.custom:[[[-]]] - 14 S. F. Wu, C. Y. Chang, S. J. Lee, "Time series forecasting with missing values," in
*Proceedings of 2015 1st International Conference on Industrial Networks and Intelligent Systems (INISCom)*, Tokyo, Japan, 2015;pp. 151-156. custom:[[[-]]] - 15 A. S. Dhevi, "Imputing missing values using inverse distance weighted interpolation for time series data," in
*Proceedings of 2014 6th International Conference on Advanced Computing (ICoAC)*, Chennai, India, 2014;pp. 255-259. custom:[[[-]]] - 16 E. P. Caillault, A. Lefebvre, A. Bigand, "Dynamic time warping-based imputation for univariate time series data,"
*Pattern Recognition Letters*, vol. 139, pp. 139-147, 2020.doi:[[[10.1016/j.patrec.2017.08.019]]] - 17 N. Bokde, M. W. Beck, F. M. Alvarez, K. Kulat, "A novel imputation methodology for time series based on pattern sequence forecasting,"
*Pattern Recognition Letters*, vol. 116, pp. 88-96, 2018.custom:[[[-]]] - 18 Z. Ding, G. Mei, S. Cuomo, Y. Li, N. Xu, "Comparison of estimating missing values in IoT time series data using different interpolation algorithms,"
*International Journal of Parallel Programming*, vol. 48, pp. 534-548, 2020.custom:[[[-]]] - 19 A. Chaudhry, W. Li, A. Basri, F. Patenaude, "A method for improving imputation and prediction accuracy of highly seasonal univariate data with large periods of missingness,"
*Wireless Communications and Mobile Computing*, vol. 2019, no. 4039758, 2019.doi:[[[10.1155//4039758]]] - 20 T. Kim, W. Ko, J. Kim, "Analysis and impact evaluation of missing data imputation in day-ahead PV generation forecasting,"
*Applied Sciences*, vol. 9, no. 1, 2019.doi:[[[10.3390/app9010204]]] - 21 N. Al-Milli, W. Almobaideen, "Hybrid neural network to impute missing data for IoT applications," in
*Proceedings of 2019 IEEE Jordan International Joint Conference on Electrical Engineering and Information Technology (JEEIT)*, Amman, Jordan, 2019;pp. 121-125. custom:[[[-]]] - 22 K. Zor, O. Celik, O. Timur, H. B. Yildirim, A. Teke, "Simple approaches to missing data for energy forecasting applications," in
*Proceedings of the 16th International Conference on Clean Energy (ICCE)*, Gazimagusa, Turkey, 2018;custom:[[[-]]] - 23 X. Cao, S. Dong, Z. Wu, Y. Jing, "A data-driven hybrid optimization model for short-term residential load forecasting," in
*Proceedings of 2015 IEEE International Conference on Computer and Information Technology; Ubiquitous Computing and Communications; Dependable*, Autonomic and Secure Computing; Pervasive Intelligence and Computing, Liverpool, UK, 2015;pp. 283-287. custom:[[[-]]] - 24 T. Y. Kim, S. B. Cho, "Predicting residential energy consumption using CNN-LSTM neural networks,"
*Energy*, vol. 182, pp. 72-81, 2019.custom:[[[-]]] - 25 X. Shao, C. S. Kim, P. Sontakke, "Accurate deep model for electricity consumption forecasting using multi-channel and multi-scale feature fusion CNN–LSTM,"
*Energies*, vol. 13, no. 8, 2020.doi:[[[10.3390/en13081881]]] - 26 J. Du Preez, S. F. Witt, "Univariate versus multivariate time series forecasting: an application to inter-national tourism demand,"
*International Journal of Forecasting*, vol. 19, no. 3, pp. 435-451, 2003.custom:[[[-]]] - 27 K. Yan, X. Wang, Y. Du, N. Jin, H. Huang, H. Zhou, "Multi-step short-term power consumption fore-casting with a hybrid deep learning strategy,"
*Energies*, vol. 11, no. 11, 2018.doi:[[[10.3390/en11113089]]] - 28 M. Massaoudi, S. S. Refaat, I. Chihi, M. Trabelsi, H. Abu-Rub, F. S. Oueslati, "Short-term electric load forecasting based on data-driven deep learning techniques," in
*Proceedings of the 46th Annual Conference of the IEEE Industrial Electronics Society (IECON)*, Singapore, 2020;pp. 2565-2570. custom:[[[-]]] - 29
*M. Li, M. Soltanolkotabi, and S. Oymak, 2019 (Online). Available:*, https://arxiv.org/abs/1903.11680 - 30 C. Nichiforov, I. Stamatescu, I. Fagarasan, G. Stamatescu, "Energy consumption forecasting using ARIMA and neural network models," in
*Proceedings of 2017 5th International Symposium on Electrical and Electronics Engineering (ISEEE)*, Galati, Romania, 2017;pp. 1-4. custom:[[[-]]] - 31 F. Kaytez, M. C. Taplamacioglu, E. Cam, F. Hardalac, "Forecasting electricity consumption: a comparison of regression analysis, neural networks and least squares support vector machines,"
*International Journal of Electrical Power & Energy Systems*, vol. 67, pp. 431-438, 2015.custom:[[[-]]]