Article Information
Corresponding Author: Wei Song* (sw@ncut.edu.cn)
Wei Song*, Dept. of Digital Media Technology, North China University of Technology, Beijing, China, sw@ncut.edu.cn
Shuanghui Zou*, Dept. of Digital Media Technology, North China University of Technology, Beijing, China, shuanghui_zou@sina.com
Yifei Tian**, Dept. of Computer and Information Science, University of Macau, Macau, China, sunsu71@foxmail.com
Su Sun*, Dept. of Digital Media Technology, North China University of Technology, Beijing, China, tianyifei0000@sina.com
Simon Fong**, Dept. of Computer and Information Science, University of Macau, Macau, China, ccfong@umac.mo
Kyungeun Cho***, Dept. of Multimedia Engineering, Dongguk University, Seoul, Korea, cke@dongguk.edu
Lvyang Qiu***, Dept. of Multimedia Engineering, Dongguk University, Seoul, Korea, lvyang_Qiu@foxmail.com
Received: August 31 2018
Revision received: October 2 2018
Accepted: October 22 2018
Published (Print): December 31 2018
Published (Electronic): December 31 2018
1. Introduction
Unmanned driving, which integrates visual computing, localization, and artificial intelligence technologies, allows computers to automatically operate motor vehicles safely and reliably. Through the mounted sensors, an unmanned ground vehicle (UGV) can perceive its surrounding environment and obtain the vehicle location, terrain, and obstacle information. This way, the vehicle maintains safe distance from obstacles and avoids traffic accidents to ensure the safety of urban residents.
Environment perception and reconstruction studies apply multi-sensor fusion technology to provide driving awareness for the navigation of mobile robots in unknown environments. Traditionally, researchers have used video sequences or stereo images to sense color and depth information of the surrounding environment for terrain perception. Because video cameras are easily influenced by changes in illumination and are limited by their resolution, the perception accuracy is unstable, especially in faint environments. Laser scanners, which are not affected by the illumination conditions in an environment, achieve higher accuracy in measuring three-dimensional (3D) distance and perceiving object shape, but they do not provide color information. Thus, the integration of laser scanning and video cameras has been widely researched in the domain of 3D terrain perception and reconstruction [1].
Owing to the large size of sensed 3D point clouds, it is difficult to achieve real-time modeling of the surrounding environment by using traditional CPU-based terrain reconstruction techniques [2]. Considering the real-time performance, accuracy, and intuition of 3D terrain reconstruction, in this paper, the authors propose a CPU-GPU hybrid system of environment perception and 3D reconstruction for UGV based on the integration of multiple sensors. After the original datasets, including 3D point clouds, image sequences, and motion information are collected, intrinsic and extrinsic calibration processes are implemented to estimate the sensors’ measurement parameter, position, and rotation states. The local coordinates of the point clouds are transformed into global coordinates in the terrain model based on localization performed by an inertial measurement unit (IMU). A connected component labeling algorithm is applied to cluster objects as a driving awareness interface for path planning in urban environments [3]. The terrain model is composed of a texture mesh and color particles, which represent ground surface and non-ground objects, respectively. In addition, GPU-based parallel programming technology is utilized to increase the computing speed of the environment perception and reconstruction tasks.
The remainder of this paper is organized as follows. Section 2 provides an overview of related work. Section 3 introduces the proposed CPU-GPU hybrid system of environment perception and 3D reconstruction for UGV. Section 4 evaluates the performance of the proposed methods. Section 5 concludes the paper.
2. Related Works
Currently, multiple sensors are widely applied in 3D terrain reconstruction to collect environment information for UGVs. Sukumar et al. [4] proposed a 3D terrain reconstruction system that combined 3D laser scanners, two-dimensional (2D) cameras, global positioning system (GPS), and IMU to perceive and reconstruct the surrounding terrain. By separating raw datasets as ground surface and non-ground objects, Huber et al. [5] created a colored ground mesh and colored point clouds to represent a photorealistic terrain model in a virtual environment.
To register large-scale point clouds into limited memory, Elseberg et al. [6] utilized an octree to store and compress massive data. Redundant points between successive point clouds were removed effectively to enable large-scale terrain reconstruction with limited storage capacity. However, when mobile robots continue to explore in outdoor environments, the terrain model space will continue to expand and exceed the performance space defined by the octree data structure. Gingras et al. [7] applied a mesh simplification algorithm to reduce the number of grids in large-scale ground models. Through reconstruction of unstructured terrain models by using an irregular triangular mesh, large-scale point clouds were registered into the terrain model effectively. However, these methods processed terrain reconstruction point-by-point, which made it very difficult to achieve real-time terrain reconstruction owing to low computational speed.
Object segmentation and tracking is the primary task of intelligent cognitive process, and this task provides a reliable basis for intelligent obstacle avoidance and path planning. Golovinskiy and Funkhouser [8] applied a k-nearest neighbor’s graph algorithm to separate the foreground and background. By using a min-cut algorithm, the objects were segmented from the point clouds. However, it was difficult to update the k-nearest neighbors graph in real time when large-scale point clouds were sensed and registered into the terrain model. For actualizing real-time object segmentation, Douillard et al. [9] presented a segmentation strategy based on the voxel grids of 3D points. In this strategy, a terrain mesh was first constructed using 3D point clouds, in which ground points were extracted by computing the gradient field in the mesh model. Thereafter, a cluster-all method is applied to separate and classify the non-ground points. Himmelsbach et al. [10] proposed a bottom-up method to segment objects. In this method, fitted line segments searched from point clouds were utilized to determine the ground points. Subsequently, non-ground points were segmented into individual clusters based on the spatial connected relationship. Wang et al. [11] voxelized non-ground points and clustered interconnected voxels as target objects. They utilized the principal component analysis algorithm to determine the eigenvectors of voxel clusters instead of discrete non-ground points. The eigenvectors and eigenvalues described the shape features, thus reducing the memory consumption for object description. To achieve real-time object segmentation and tracking, Wang et al. [12] first projected LiDAR (light detection and ranging) point clouds collected by mobile unmanned vehicles onto a rasterized horizontal plane to conduct segmentation. Then, a support vector machine (SVM) was exploited to classify objects, and a Kalman filter algorithm was implemented to track different objects. Broggi et al. [13] used a stereo camera to capture point clouds surrounding an unmanned car, and these point clouds were grouped into several different clusters by using a flood fill method. A linear Kalman filter was employed to analyze the movement and posture of obstacles, which were classified as moving or stationary. The iterative and traversal computation processes required to search for neighboring points and analyze the large-scale dataset reduced the computational speeds of these methods [14].
In view of the insufficiency existing in 3D terrain perception and reconstruction, in the present work, we describe a CPU-GPU hybrid system of environment perception and 3D reconstruction to achieve the real-time and intuitive requirements of UGVs.
3. The Environment Perception and 3D Reconstruction System
Herein, we propose an environment perception and 3D reconstruction system based on multi-sensor integration. The multi-sensor datasets, including 3D point clouds, 2D images, motion information of mobile vehicles, are processed and integrated to realize environment perception and 3D terrain reconstruction. As shown in Fig. 1, the system contains three main modules: multi-sensor data collection and pre-processing, environment perception, and 3D reconstruction. The multi-sensor data collection and pre-processing module implements the collection, processing, and fusion of multi-sensor terrain data. The environment perception module uses the LiDAR and gyroscope to filter the traversable ground surface and detect non-ground obstacles in the surrounding environment. The 3D reconstruction module performs real-time and high-definition reconstruction of ground and nonground objects by using a texture mesh model and a color particle model to represent a real environment intuitively in a virtual environment.
The proposed system framework of environment perception and reconstruction.
3.1 Multi-Sensor Data Collection and Pre-process Module
In the multi-sensor data collection and pre-processing module, the system obtains global environment data of the unmanned vehicle. Owing to inconsistencies in the spatial position and rotation angle between the LiDAR and the cameras, direct projection of the point clouds onto the video images leads to distorted mapping. By using a calibration board, calibration between the cameras and the LiDAR adjusts the projection matrix to provide a 3D terrain model with accurate color information.
After calibrating multiple sensors, the system utilizes these sensors to collect raw datasets for realtime environment perception and large-scale 3D terrain reconstruction. The local coordinates of the point clouds are converted into global coordinates based on the position and motion datasets of the UGV, which are sensed by the IMU. By using the translation and rotation matrices, the 3D point clouds in different coordinate systems are transformed into a global coordinate system. The repeated areas detected by LiDAR sensors, and cameras introduce redundant data leading to massive wastage of storage resources. Thus, redundant and noise data are removed following the deduplication principle. In a repeatedly covered voxel, only one point is registered in the terrain model, so that 3D reconstruction is realized with low memory consumption.
3.2 Multi-Sensor Data Collection and Pre-process Module
The environment perception module utilizes the data collected by the mounted LiDAR and IMU to realize environment perception, which involves the following functions: ground and non-ground segmentation, non-ground object clustering, spatial distribution feature extraction of non-ground objects, and driving awareness map generation. Fig. 2 presents a sequential diagram of the proposed environment perception module, in which the data flow and functions of the CPU and GPU procedures are shown. The corresponding variables are explained in Table 1.
Proposed system framework for environment perception and reconstruction.
Firstly, the converted global point clouds stored in CPU memory are acquired. By using a height threshold, the non-ground points are filtered out and copied to GPU memory for implementing object clustering in parallel. The non-ground points stored in GPU memory are projected to the x-z plane to generate a 2D histogram map with a rasterizing process. The histogram map is utilized to count the non-ground sensing points above the covered area of each grid cell in the map [15]. A flag map is initialized to mark the effectiveness of the point counts recorded in each cell of the histogram map. The label map is initialized and updated to indicate connected components of grid cells in the histogram map through several iterations in GPU memory. Thus, from the clustering result in the label map, point labels are obtained by inverse mapping to identify individual objects in the environment. The point labels are copied from the GPU to the CPU memory as reference information about non-ground objects, which are rendered in the 3D reconstruction module. In the spatial distribution feature extraction function of non-ground objects, the point labels and non-ground points in GPU memory are traversed to extract and compute the distribution features of non-ground objects. The object features stored in GPU memory are copied into CPU memory. In the driving awareness map generation function, ground points and non-ground points are rendered based on the object features as the result of environment perception [16].
Explanation of variables in Fig. 2
3.3 3D Reconstruction Module
The 3D reconstruction module performs four main functions: multiple sensor integration, texture mesh generation of ground surface, colored particle generation for non-ground object modeling, and 3D terrain representation in virtual environment. GPU-based parallel programming technology is applied to accelerate the 3D reconstruction process. The CPU and GPU functions in the 3D reconstruction module are described in the sequential diagram shown in Fig. 3. The corresponding variables are explained in Table 2.
Under the multiple sensors integration function, global point clouds are acquired from CPU memory and segmented into non-ground and ground points, which are stored in GPU memory. The ground points are registered into a ground mesh so that the large-scale discrete ground points are removed and represented by a texture mesh. A calibration method is used to adjust the projection matrix between the global point cloud and the image sequences. In the texture mesh generation process, image sequences are copied from the CPU to the GPU memory, in which the ground mesh is mapped onto the image sequences determined using the calibrated projection matrix. The pixel image sequence mapped from the triangles in the ground mesh is registered to the texture datasets of the ground mesh. The generated ground texture mesh is copied from the GPU to the CPU memory to realize real-time rendering in the virtual environment. The non-ground objects are represented by colored particles, which are generated by mapping the non-ground points onto the image sequence to acquire color information. In the 3D terrain representation process, the texture mesh and the color particles stored in CPU memory are rendered as ground surface and objects, respectively.
CPU-GPU sequential diagram of proposed 3D reconstruction module.
Explanation of variables in Fig. 3
4. Experiments
As shown in Fig. 4, the UGV was equipped with a LiDAR to collect point cloud data, three GC655 VGA CCD cameras to photograph the surroundings, and an MTi-G-700 IMU to report the position and rotation information of the vehicle. The CUDA programming method for parallel computing was used in our experiment. The system was executed on a 3.20 GHz Intel Core Quad CPU computer with a GeForce GT 770 graphics card and 4 GB RAM. The system utilized Direct3D software development kits to visualize the object segmentation and 3D reconstruction results.
Multiple sensors mounted on UGV. (a) LiDAR, (b) CCD camera, (c) IMU, and (d) multiple sensor integration.
Fig. 5 shows the surrounding point clouds perceived by the LiDAR mounted on the UGV. By using the proposed ground segmentation technique, the ground points were segmented, as shown in green, and these points were recognized as traversable regions. As shown in Fig. 6, by using the connected component labeling algorithm, non-ground points were divided into several distinguished objects shown in different colors, which are considered obstacles. This way, the UGV could plan its path effectively.
The system adopted the CUDA GPU programming technique to realize parallel computation in the three proposed modules. The object segmentation speed performances in the proposed CPU-GPU hybrid system and the CPU-based method were demonstrated in Fig. 7. In each frame, the reconstructed terrain model contained nearly 100,000 3D points, which were hard to be processed by the CPU sequential computation in real time. Our proposed method was able to achieve more than 31.3 frames per second (fps) on average, which satisfied the real-time processing requirement of UGV.
Segmentation result of ground and non-ground points.
Ground segmentation and object clustering result in LiDAR point clouds.
Object segmentation speed performances using the proposed CPU-GPU hybrid system and the CPU-based method.
High-resolution terrain reconstruction results using texture mesh (a) and colored particles (b).
In the system, the ground surface and non-ground objects were reconstructed using a high-resolution texture mesh and colored particles, respectively, as shown in Fig. 8. The speed of environment perception and high-resolution reconstruction was more than 23.0 fps. Thus, the proposed system can effectively solve the problem of limited computing speed and satisfy the requirements for real-time processing of large-scale datasets.
5. Conclusion
In this paper, we introduced a CPU-GPU hybrid system of environment perception and 3D reconstruction for UGVs, and the data flows and functions of this system were designed based on sequence diagrams. The system mainly contained three modules, namely, data collection and preprocessing, environment perception, and 3D terrain reconstruction. After removing redundant and noise data, the pre-processing function registered LiDAR point clouds and video sequences into a global terrain model. The environment perception module segmented the ground surface from the point clouds and clustered the non-ground points into individual objects by using a connected component labeling algorithm. In the 3D terrain reconstruction module, the projection matrices of the LiDAR and the three cameras were calibrated to project the local point clouds onto the video image sequences accurately. The terrain model was reconstructed with the texture mesh and the colored particles to represent the surrounding environment intuitively. Our proposed method is compatible with applications such as intelligent surveying and mapping, robot vision, and 3D modeling.
Acknowledgement
This research was supported by Beijing New Star Project of Interdisciplinary Science and Technology (No. XXJC201709), by the National Natural Science Foundation of China (No. 61503005), by NCUT “The Belt and Road” Talent Training Base Project, by NCUT “Yuyou” Project, supported by the Ministry of Science and ICT, Korea, under the Information Technology Research Center support program (No. IITP-2018-2013-1-00684) supervised by the Institute for Information communications Technology Promotion (IITP), and by Science and Technology Project of Beijing Municipal Education Commission (No. KM2015-10009006).