Ji Su Park* and Jong Hyuk Park**
Enhanced Machine Learning Algorithms: Deep Learning, Reinforcement Learning, and Q-Learning
Abstract: In recent years, machine learning algorithms are continuously being used and expanded in various fields, such as facial recognition, signal processing, personal authentication, and stock prediction. In particular, various algorithms, such as deep learning, reinforcement learning, and Q-learning, are continuously being improved. Among these algorithms, the expansion of deep learning is rapidly changing. Nevertheless, machine learning algorithms have not yet been applied in several fields, such as personal authentication technology. This technology is an essential tool in the digital information era, walking recognition technology as promising biometrics, and technology for solving state-space problems. Therefore, algorithm technologies of deep learning, reinforcement learning, and Q-learning, which are typical machine learning algorithms in various fields, such as agricultural technology, personal authentication, wireless network, game, biometric recognition, and image recognition, are being improved and expanded in this paper.
Keywords: Deep Learning , Machine Learning , Reinforcement Learning , Q-Learning
In recent years, with the advancement of machine learning, studies on machine learning algorithms, deep learning, reinforcement learning, and Q-learning are continuously developing. Machine learning is one of the oldest technologies, but its development continued until the 1990s. However, machine learning has made remarkable developments due to deep learning in recent years and it continues to improve. In deep learning, artificial neural networks designed in a multilayered structure are pre-processed through unsupervised learning. Therefore, learning can be effectively performed even when the neural network is deepened. Such deep learning has obtained many achievements in the fields of image recognition, speech recognition, and translation. Current deep learning is being continuously used in various fields to solve complex problems for various difficulties in the field of early research.
Reinforcement learning is an area of machine learning. In an environment, an agent (a computer program targeted for reinforcement learning) recognizes the current state and selects an action sequence that maximizes reward among the selectable actions. Such learning is used in various fields, such as game theory, control theory, information theory, simulation optimization, and genetic algorithms. Q-learning, one of the reinforcement learning techniques is a method of learning without using a separate model. This technique is a method of learning the optimal policy by studying the Q-function in the finite Markov decision process. The Q-function predicts the expected value of the utility that will be introduced in a given action and state.
This paper considered the improvement and extension of algorithms in a variety of fields, including agriculture, personal authentication, wireless networks, games, biometrics, and image recognition. Therefore, the present paper comprehensively discusses the following technologies: improved defect detection algorithm, survey of deep learning in agriculture techniques, next-generation personal authenti¬cation scheme, rate adaptation with Q-learning, evaluation of operational security, reinforcement learning activation functions, fast leaf recognition and retrieval, intelligent management and service, gait recognition, intelligent on-demand routing protocol, thangka image inpainting algorithm, classroom roll-call system, localization algorithm, hybrid genetic ant colony optimization (ACO) algorithm, visual saliency detection, heuristic and statistical prediction algorithms, object augmentation scheme, and dynamic action space handling method. The present paper mainly aims to rapidly provide popular and trendy research to researchers.
2. Enhanced Machine Learning Algorithms
Ma et al.  proposed an improved defect detection algorithm of jean fabric based on the optimized Gabor filter with the two-dimensional (2D) image entropy and the loss evaluation function. Two algorithms are compared to verify the effectiveness of the proposed algorithm. This paper selected a common denim fabric defect sample image, which includes normal image, weft missing, hole breaking, and oil pollution. The experimental results showed a good average detection rate of common defects of denim is more than 91.25%.
Ren et al.  surveyed the development of deep neural-based work efforts in the agriculture domain over the last 5 years and investigated 32 research contributions that apply deep learning techniques to the agriculture domain. This paper surveyed different types of deep neural network architectures in agricul¬ture and the current state-of-the-art methods. They found that deep learning was better in performance than other technologies, which concluded that deep learning will receive more attention and broader applications in future research.
Yang  proposed an electroencephalography (EEG)-based password system as a next-generation personal authentication system. This paper pointed out that it is difficult to use in reality due to the limitations of the current brain-computer interface technology, and argued that a practical EEG-based authentication system should be constructed by using dry electrodes. However, the technical limitations of the EEG signal interpretation by using dry electrodes were regarded as an obstacle to the use of the EEG-based encryption system. To overcome this problem, this paper used not only EEG signals but also deep learning techniques that is a multinomial classification with a one-hot encoding.
Cho  proposed a reinforcement learning agent based on Q-learning to control the data transmission rates of nodes in carrier sensing multiple access with collision avoidance (CSMA/CA)-based wireless networks. This paper used the ns3-gym framework to simulate reinforcement learning and investigated the effects of the parameters of Q-learning on the performance of the reinforcement learning agent. The experimental results showed that the proposed reinforcement learning agent adequately adjusts the modulation and coding scheme (MCS) levels according to the changes in the network, and achieves a high throughput comparable to those of the existing data transmission rate adaptation schemes.
Wang et al.  proposed a method for evaluating the safe operation of passenger stations in high-speed rail based on the fuzzy analysis of section number that combines interval eigenvalue method and fuzzy analysis method. In order to express the evaluation index in the evaluation process and to reduce the uncertainty and ambiguity of the evaluation element, the use of the interval number was suggested. Through this, the objectivity and reliability of the evaluation results were improved, and a new idea was provided for the evaluation of operational safety management of high-speed rail passenger stations.
Lee  proposed an agent using a reinforcement learning algorithm and a neural network to evaluate which activation function can get the best result when an agent learns a game through reinforcement learning in a 2D racing game environment. This paper evaluated the activation functions in the network by switching them together and measured the reward, the output of the advantage function, and the output of the loss function while training and testing them. The experimental result showed that the best activation function for the agent to learn the game and the difference between the best and the worst was 35.4%
Xu and Zhang  proposed a novel multi-scale angular description method by using the golden ratio for fast plant leaf recognition and retrieval. Plant leaf species recognition using images is difficult because of the large inter-class and small distances between different species. Thus, the authors developed an angular description method of leaf contours by using a new scale generation rule to address such difficulty. The evaluation results show that the proposed method has a faster computation time than the existing method and has high recognition and search accuracy.
Wang et al.  introduced an Internet-based intelligent gas valve management and service system design scheme. This paper added a sensor and GPRS (General Packet Radio Service) module to the existing gas valve, and added a networking function between the gas valve and the server while using wireless packet communication technology. The authors suggested that the method proposed in this paper is more convenient and efficient than the existing gas valve management and service projects.
Wen  proposed an improved convolutional neural network (CNN) based on Gabor filter to achieve gait recognition. In this paper, a Gabor filter-based walking feature extraction layer was inserted into the existing CNN and used to extract the walking features from the walking silhouette image. A metric learning technique was used to calculate the distance between two walks and a k-nearest neighbor (KNN) classifier was used to classify the walks. The experimental results showed that the proposed method reaches state-of-the-art performances in terms of correct recognition rate on the OULP and CASIA-B datasets.
Ye et al.  pointed out that the characteristics such as random movement, limited energy, self-protection, and limited transmission capacity of nodes in the ad hoc network have a significant influence on the communication link configuration of the network, and proposed an intelligent decision routing strategy IAODV based on the AODV algorithm. To compare the IAODV and AODV algorithms, three evaluation indices, packet loss rate, packet transfer rate, and routing overhead were adopted. The experimental results show that IAODV has better performance and better adaptability in Ad Hoc network with frequent node mobility.
The thangka image is a Buddhist painting produced as a hanging painting in the form of a scroll. The inpainting method used in the Thangka image is not ideal for contouring curves when using high-frequency information. Therefore, Yao  proposed a new image inpainting algorithm by using edge structural constraints and wavelet transform coefficients. This algorithm separates the damaged Thangka image into low- and high-frequency subgraphs while using wavelet transformation. Moreover, the low-frequency subgraphs are restored using an improved fast marching method. In addition, the extracted and repaired edge contour information for the high-frequency subgraph is an algorithm that limits structure inpainting. Thus, the authors prove that the inpainting accuracy is superior by comparing the three existing methods and the improved method.
Zhu et al.  proposed a new face recognition model based on game theory for classroom rollover by using CNNs with outstanding performance in the field of face recognition. This model uses multiple face images as inputs and constructs a student identity list by identifying each face with a confidence score. The optimization goal is then determined by tracking the face with the same identity or low confidence. The established reliability value and the authentication strategy for ID are used in this paper to increase recognition accuracy. The error rate is markedly reduced by using this method compared with the existing schemes with deep neural networks
In the distance vector hop wireless sensor network, if the wireless sensor node A can communicate directly with B and C, then the number of homes A and B is considered to be the same. However, the actual distance between wireless sensor nodes A and B may be different from that between wireless sensor nodes A and C. This condition causes a difference between the actual distance and the expected home count distance, thereby resulting in a node location error in the distance vector hop wireless sensor network. Accordingly, Zhao and Zhang  reduced the error rate by modifying the distance estimation method and determining the location of the wireless sensor node. Thus, different communication capabilities are initially set for each node, and a distance difference between an expected and an actual distance between anchor nodes of the wireless sensor network is calculated.
The ant colony optimization (ACO) algorithm is a classic metaheuristic optimization algorithm that facilitates the easy capture of the local minimum and has a slow convergence speed. Wang et al.  proposed a new combination ACO algorithm, namely CG-ACO, to solve the problem of ACO. The CG-ACO includes genetic algorithms and cloud models in the ACO to find an initial solution with good performance and optimal parameters. The proposed CG-ACO shows a superior performance to other algorithms in the experimental results.
Truong and Kim  proposed an approach to detect protrusion in infrared images based on a threshold. Visual protrusion detection is essential in many vision-based applications, but the number of protrusion detection methods is limited in infrared images. Thus, the input image is calculated as a threshold value for several Boolean maps, and the initial protrusion map is calculated as the sum of the weights of the generated Boolean maps. Morphological operations and Gaussian filters are used in this paper to further refine the initial map to produce the final high-quality extrusion map.
Accurate prediction technology is important because algorithms that predict the behavior of residents in smart spaces have been recently used to provide smart services. Malik et al.  surveyed prediction algorithms by dealing with statistical, heuristic, and hybrid techniques for prediction in smart environ¬ments, such as smart home, farm, and health.
Jang and Ko  proposed a real-time mobile augmented reality (AR) framework. In the Internet of Things (IoT) environment, pervasive AR technology has been recently used to efficiently search for necessary information regarding in-store products. However, if the augmented object is modified, then such an object cannot be replaced in real-time. Therefore, after selecting the optimal object to be augmented based on object similarity comparison, the augmented object is efficiently managed by using a distributed metadata service according to user requirements. Therefore, the proposed framework provides a better user experience than the existing functions.
Woo and Sung  proposed a reinforcement learning method that dynamically adjusts the behavioral complex, which can shorten the learning time and reduce the state space. This method prohibits the allocation of the state space unless it is performed or learned a non-optimal motion. Moreover, this method solves the state official problem by applying the reinforced deep learning. The experimental results reveal that the state space of the proposed method is reduced by approximately 0.33% of Q-learning. However, the existing Q-learning reduced the cost and time required for learning with similar results.
This paper features 18 high-quality articles following a rigorous review process. This paper reviewed the technologies developed in various research fields, such as improved defect detection algorithm, survey of deep learning in agriculture techniques, next-generation personal authentication scheme, rate adaptation with Q-learning, evaluation of operational security, reinforcement learning activation func¬tions, fast leaf recognition and retrieval, intelligent management and service, gait recognition, intelligent on-demand routing protocol, Thangka image inpainting algorithm, classroom roll-call system, loca¬lization algorithm, hybrid genetic ACO algorithm, visual saliency detection, heuristic and statistical prediction algorithms, object augmentation scheme, and dynamic action space handling method.
Ji Su Parkhttps://orcid.org/0000-0001-9003-1131
He received his B.S., M.S. degrees in Computer Science from Korea National Open University, Korea, in 2003, 2005, respectively and Ph.D. degrees in Computer Science Education from Korea University, 2013. He is currently a Professor in Department of Computer Science and Engineering from Jeonju University in Korea. His research interests are in grid computing, mobile cloud computing, cloud computing, distributed system, computer education, and AIoT. He is employed as associate editor of Human-centric Computing and Information Sciences (HCIS) by Springer, The Journal of Information Processing Systems (JIPS) by KIPS. He has also served as the chair, program committee chair or organizing committee chair at international conferences and workshops. He has received "best paper" awards from the CSA2018 conferences and "outstanding service" awards from CUTE2019 and BIC2020.
James J. (Jong Hyuk) Parkhttps://orcid.org/0000-0003-1831-0309
He received Ph.D. degrees from the Graduate School of Information Security, Korea University, Korea and the Graduate School of Human Sciences of Waseda University, Japan. Dr. Park served as a research scientist at the R&D Institute, Hanwha S&C Co. Ltd., Korea from December 2002 to July 2007, and as a professor at the Department of Computer Science and Engineering, Kyungnam University, Korea from September 2007 to August 2009. He is currently employed as a professor at the Department of Computer Science and Engineering and the Department of Interdisciplinary Bio IT Materials, Seoul National University of Science and Technology (SeoulTech), Korea. Dr. Park has published about 200 research papers in international journals and conferences. He has also served as the chair, program committee chair or organizing committee chair at many international con-ferences and workshops. He is a founding steering chair of various international conferences including MUE, FutureTech, CSA, UCAWSN, etc. He is employed as editor-in-chief of Human-centric Computing and Information Sciences (HCIS) by Springer, The Journal of Information Processing Systems (JIPS) by KIPS, and the Journal of Convergence (JoC) by KIPS CSWRG. He is also the associate editor or editor of fourteen international journals, including eight journals indexed by SCI(E). In addition, he has been employed as a guest editor for various international journals by such publishers as Springer, Elsevier, Wiley, Oxford University Press, Hindawi, Emerald, and Inderscience. Dr. Park’s research interests include security and digital forensics, human-centric ubiquitous computing, context aware-ness, and multimedia services. He has received "best paper" awards from the ISA-08 and ITCS-11 conferences and "outstanding leadership" awards from IEEE HPCC-09, ICA3PP-10, IEE ISPA-11, and PDCAT-11. Furthermore, he received an "outstanding research" award from SeoulTech in 2014. Also, Dr. Park's research interests include human-centric ubiquitous computing, vehicular cloud computing, information security, digital forensics, secure commu-nications, multimedia computing, etc. He is a member of the IEEE, IEEE Computer Society, KIPS, and KMMS.