Article Information
Corresponding Author: Ai-jun Xu* , xuaj1976@163.com
Ting-ting Yang*, Zhejiang Agriculture and Forestry University, Hangzhou, China; Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and InformationTechnology, Hangzhou, China; Key Laboratory of State Forestry and Grassland Administration on Forestry Sensing Technology and Intelligent Equipment,Hangzhou, China, 917251944@qq.com
Su-yin Zhou*, Zhejiang Agriculture and Forestry University, Hangzhou, China; Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and InformationTechnology, Hangzhou, China; Key Laboratory of State Forestry and Grassland Administration on Forestry Sensing Technology and Intelligent Equipment,Hangzhou, China, zsy197733@163.com
Ai-jun Xu*, Zhejiang Agriculture and Forestry University, Hangzhou, China; Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and InformationTechnology, Hangzhou, China; Key Laboratory of State Forestry and Grassland Administration on Forestry Sensing Technology and Intelligent Equipment,Hangzhou, China, xuaj1976@163.com
Jian-xin Yin*, Zhejiang Agriculture and Forestry University, Hangzhou, China; Zhejiang Provincial Key Laboratory of Forestry Intelligent Monitoring and InformationTechnology, Hangzhou, China; Key Laboratory of State Forestry and Grassland Administration on Forestry Sensing Technology and Intelligent Equipment,Hangzhou, China, 10262029@qq.com
Received: December 7 2018
Revision received: August 16 2019
Accepted: September 26 2019
Published (Print): December 31 2020
Published (Electronic): December 31 2020
1. Introduction
In recent years, digital image processing based on machine vision has been widely used for various purposes, such as organ segmentation [1], fruit segmentation [2], insect detection [3], unmanned traffic control systems [4], and fingerprint identification [5]. These methods can be roughly classified into six classes: (1) Threshold approach relies on the gray-scale values in an image by comparing the value of each pixel, in turn, with the size of the threshold. Examples include threshold segmentation and the Otsu method [6,7]. This segmentation method that relies solely on gray values does not work well in the absence of color. (2) Region-based image segmentation methods divide pixels with similar properties into the same region [8,9]. The phenomenon of under-segmentation or over-segmentation probably occur in these methods. (3) Edge-based approaches use edge points by judging the maximum value of the first derivative of the image or the zero-crossing information of the second derivative. Edge-based techniques include the Canny method [10,11] and Roberts method [12]. These methods are more suitable for the segmentation of regular objects and less effective for irregular objects. (4) Graph theory provides a mathematical approach that maps objects in an image onto a graph or sub-graph, as in the graph cut method [13] and grab cut method [14]. These kind of human interaction methods have good segmentation effects, but poor operability. (5) Deep learning approaches use convolutional neural network to study features of the target to achieve the purpose of semantic segmentation [15-17]. This method is more intelligent to get segmentation results, but need to train massive data, and require higher computer hardware configuration. (6) Other approaches are based on specific theories, such as wavelet transform [18] or clustering methods [19]. The mean shift algorithm [20] is a typical clustering method that does not require any prior knowledge and relies entirely on the sample point in the feature space to calculate the value of the density function. Thus, we adopted the mean shift algorithm to segment the tree from image.
.
The mean shift, an effective algorithm for estimating a probability density gradient, has been used widely for image segmentation and other applications [21-23]. Scholars applied the mean shift algorithm to image clusters before. The image pixels were converted to the sampling points of the feature space. Clustering in the feature space amounted to the segmentation of the image space. Usually, the image’s gray-scale value was selected to constitute the feature space [24]. However, in practice, just relying on the gray-scale values cannot meet the complex requirements for image segmentation completely. Therefore, many scholars continued to seek other approaches to optimize image segmentation [25-27] [28-30]. Although researchers have provided some advances in simple image segmentation, there are still problems while segmenting tree images. Tree images collected originally in the natural environment have too much noise in the background, such as houses, roads, and ground. Moreover, the branches and leaves of a tree caused many small hollows and obvious texture details in the image that are very different from the images of other types of natural objects, such as a brain or lung. The texture features and noise cannot be ignored when segmenting the image. Therefore, a meaningful and reasonable image pre-processing technique is necessary for using the mean shift algorithm to segment a tree image.
2. Materials and Methods
2.1 Materials and Data
The image data are obtained from the natural environment in Hangzhou, Zhejiang Province, on the southeast coast of China (119.72E, 30.23N), in the autumn, 2018. To verify the feasibility and accuracy of our method, we gathered a variety of tree images as input data. We used a mobile phone camera to collect 31 images of different shapes and species of trees at a distance of 2–8 m. The image has a resolution of 3120×4160 pixels. The characteristics of the tree images were as follows:
1) Illumination conditions: photographs were taken under a variety of lighting conditions, including direct sunlight, back-lighting, and shading, among others.
2) Tree species: we selected trees of different species with relatively luxuriant canopies.
3) Background features: photographs included uncut grass, roads, houses, ground, pedestrians, and other background features.
2.2 Overview of the Tree Image Segmentation Method
Fig. 1 provides a block diagram that summarizes the proposed algorithm. Considering the noisy and gaps in the canopies, first we abstracted each tree image by applying background smoothing and reducing the canopy gaps. This approach allowed us to obtain the abstracted salient tree images. Then, the optimized bandwidth parameters of the clustering algorithm were calculated using spatial location and gray-scale features. Combined with the Gaussian kernel function, we got the clustering results. Finally, based on the filled clustering results and the region of interest (ROI), the final tree segmentation image was obtained via mathematical morphology processing.
Block diagram for the proposed algorithm.
A summary of the entire tree segmentation method based on image abstraction is shown below:
1) Image abstraction: We selected an arbitrary image from our image set. The image was processed using bilateral filtering and an image pyramid. Then we got the abstracted salient image.
2) Tree image clustering: For the spatial point [TeX:] $$x$$, we interactively searched its modal points by the joint feature space of the spatial domain and the gray domain. Then each pixel of the image was set as another new value using the flood fill method, and each was connected by a specific rule. The criteria for the realization of region connectivity were as follows:
(a) The space distance between two adjacent regions was less than [TeX:] $$h_{s}$$;
(b) The gray distance between two adjacent regions was less than [TeX:] $$h_{r}$$; and
c) The minimum number of pixels of a region was set as [TeX:] $$M$$. If the number of pixels in a single region was fewer than [TeX:] $$M$$, the region was merged into adjacent regions.
3) ROI extraction of the tree image: According to the filled clustered image, the ROI of the tree image was extracted by color histogram of the image.
4) Mathematical morphology processing: An open operation was used repeatedly to remove the small noise found in the image and smooth the edges of the tree. Then a closed operation was used to fill the scattered hollows in the image of the branches and leaves of the target trees.
2.3 Image Abstraction
To blur the background and smooth canopy hollows of the tree image, we abstract pixel of the original image from multiple perspectives combined with the features of the tree image. This approach reduced the difficulties associated with segmentation caused by both the features of the tree itself and its background.
Bilateral filtering is a nonlinear filtering technique in which the kernel function not only considers the Euclidean distance of the pixels but also involves the difference in radiation of the pixels’ range (e.g., the degree of similarity among the pixels in the convolution kernel and the central pixels, color intensity, and depth distance). As the tree is generally showed in the middle of the image in the data set, other objects located around the trees were smoothed and their edge information was well-preserved according to the filter characteristics. Therefore, bilateral filtering iterative processing was used for spatial smoothing to reduce the influence of background noise. For this paper, to describe the degree of image smoothing and denoising quantitatively, we introduced a filtering iteration time with the same spatial and range value, [TeX:] $$t (t \in \mathrm{N})$$.
Although the previous step solved the background texture interference to a certain extent, image features of the tree still need to be processed. Accordingly, we used the image pyramid method to smooth the image further. First, the smoothed image was down-sampled using the Gaussian pyramid. The first layer of the pyramid was convoluted with a Gaussian kernel, and all even rows and columns were deleted, causing the number of image pixels of the second layer to be a quarter of the original image. It can be seen from Fig. 2 that the image down-sample was filtered. We used interlaced sampling, which reduced the impact of the canopy gaps and foliage texture. Next, an upwards reconstruction method was used to restore the resolution size of the down-sampled image, filling the even rows and columns with 0 gray value. Then we used the same convolution kernel to enlarge the image to estimate the value of the “lost” pixels. By using these two steps iteratively, we achieved an abstracted salient tree image. Similarly, the image pyramid layer, [TeX:] $$l \ (l \in \mathrm{N})$$ (with one “layer” defined as one down-sample and one upward reconstruction), was introduced to measure the degree of image blurring.
Image abstraction process, where [TeX:] $$a \times b$$ is the image size.
2.4 Adaptive Mean Shift Algorithm
Mean shift is an iterative algorithm for kernel density estimation. The core of the mean shift technique is clustering the sample points in feature space. Let [TeX:] $$x_{i} \in R^{d}, i=1,2, n$$ denote the set of feature vectors in a [TeX:] $$d$$-dimensional feature space. The density function at a point [TeX:] $$x$$ can be estimated by kernel function [TeX:] $$K(x)$$ and bandwidth [TeX:] $$h$$:
The mean shift algorithm vector is given as
where [TeX:] $$g(x)=-K^{\prime}(x)$$.
In this paper, the spatial and gray feature vectors are used to estimate the corresponding feature bandwidths [TeX:] $$h_{s}$$ and [TeX:] $$h_{r}$$:
where [TeX:] $$x_{s}$$ and [TeX:] $$x_{r}$$ denote the spatial part and the gray part of a feature vector, respectively. [TeX:] $$C$$ is the normalization constant. The [TeX:] $$K(x)$$ profile kernel is named [TeX:] $$k(x)$$.
Then the adaptive mean shift translation vector based on multidimensional features is given as follows:
where [TeX:] $$x_{i}^{s}$$ and [TeX:] $$x_{i}^{r}$$ denote the spatial and gray feature vectors of [TeX:] $$x$$ near sampling points, respectively.
An adaptive mean shift algorithm is used to determine the proper bandwidth. Spatial bandwidth [TeX:] $$h_{s}$$ affects not only the mis-segmentation rate of the image but also the running speed of the algorithm by iterations. We calculate the spatial bandwidth using the method proposed by [31]. First, the initial value of spatial bandwidth [TeX:] $$H_{0}$$ is given as 8, and the increment [TeX:] $$s$$ as 3; then [TeX:] $$h i=h_{i-1}+s$$ When [TeX:] $$\Sigma \sigma_{j}$$<[TeX:] $$0.7 n_{i}\left(j=1,2, \ldots, n_{i}\right)$$, the corresponding spatial bandwidth [TeX:] $$h_{s}$$ is set as [TeX:] $$n_{i \cdot}$$. In other words, when the number of the sampling points is less than 70% of all sampling points [TeX:] $$n_{i \cdot}$$, the corresponding [TeX:] $$h_{i}$$ is the required spatial bandwidth [TeX:] $$h_{s}$$. The sampling points in which gray values are similar to the smoothed points belong to the spatial bandwidth [TeX:] $$h_{s}$$. The condition under which the sampling point gray value is similar to the smoothed point gray value is [TeX:] $$\left|h_{j}-h_{0}\right| \leq 8$$, where [TeX:] $$h_{j}$$ denotes the H component value of the sampling point, [TeX:] $$h_{0}$$ denotes the H component value of the smoothed point, and the symbol function [TeX:] $$\sigma_{j}$$ is 1 [TeX:] $$\left|h_{j}-h_{0}\right| \leq 8$$, and 0 otherwise.
The bandwidth of gray feature [TeX:] $$h_{r}$$, is an important parameter when smoothing an image. Generally, there is a global optimal fixed-range bandwidth and an adaptive-range bandwidth. In this paper, the insertion rule method [30] is used to obtain the adaptive [TeX:] $$h_{r}$$:
where [TeX:] $$d$$ denotes the dimension of the feature space, [TeX:] $$n$$ denotes the amount of data, [TeX:] $$\sigma_{j}$$ denotes the standard deviation, [TeX:] $$G_{x}(x=1,2, \ldots, n)$$ denotes the gray value of each pixel of the image, and [TeX:] $$G_{x}$$ denotes the average gray value of the image.
As the image abstraction method smoothed the original tree image and reduced the noise influence of small particles to a certain extent, the two bandwidths of the local adaptive [TeX:] $$h_{s}$$ and [TeX:] $$h_{r}$$ based on image abstraction achieved a compromise between over-smoothing and under-smoothing in the process of smoothing the texture.
3. Results
3.1 Algorithm Evaluation
To measure the accuracy rate of the segmentation, we evaluate our algorithm via segmentation accuracy (SA), over-segmentation rate (OR), and under-segmentation rate (UR), as shown below.
where [TeX:] $$R_{s}$$ denotes the ground truth, and [TeX:] $$T_{s}$$ denotes the area segmented by our method. Therefore [TeX:] $$\left|R_{s}-T_{s}\right|$$ denotes the number of pixels segmented mistakenly, [TeX:] $$O_{s}$$ denotes the number of pixels that should not be included in the total number of pixels of the segmentation result, but were counted nonetheless. [TeX:] $$U_{s}$$ is the number of pixels that should have been included in the total number of pixels of the segmentation result, but were not counted.
3.2 Evaluation with Different Abstraction Levels
The method of tree image abstraction was validated using a filtering iteration time ([TeX:] $$t$$) from 0 to 15 and image pyramid layer ([TeX:] $$l$$) from 0 to 10. Fig. 3 (which is based on one sample image only) shows subjectively the clustering results of our algorithm versus different levels of image abstraction. From Fig. 4 (evaluation of the segmentation results of the crown and trunk of 31 sample images at different abstraction levels), it is obvious that the degree of image abstraction had a certain impact on image segmentation. Additionally, when [TeX:] $$l$$ or [TeX:] $$t$$ was relatively small or big, the clustering results were not satisfactory, but they were more satisfied with [TeX:] $$t=10, l=5$$ or 10. Fig. 5 shows objectively the segmentation results of our algorithm based on different tree abstraction images. Obviously, the SA was higher and more stable with [TeX:] $$t$$ ranging from 5 to 9 and l from 0 to 10, and the SA decreased with the increase of [TeX:] $$t$$. Furthermore, the crown segmentation was affected more by image abstraction than the trunk.
Clustering results of different abstraction levels. From left to right: [TeX:] $$t$$ is 0, 5, 10, and 15, respectively. From up to down: [TeX:] $$l$$ is 0, 5, and 10, respectively.
An evaluation of the crown and trunk segmentation result database, where [TeX:] $$t$$ ranges from 0 to 15 and [TeX:] $$l$$ is 0, 5, or 10: (a) segmentation accuracy, (b) over-segmentation rate, and (c) under-segmentation rate. The solid line shows the segmentation results of the crown, and the dashed line shows the segmentation results of the trunk.
3.3 Segmentation Results based on Best Image Abstraction
In the next part of our work, we validated the proposed algorithm using tree image segmentation data. The segmentation results based on best image abstraction ([TeX:] $$t=10$$ and [TeX:] $$l=5$$) are shown in Fig. 5 (which is based on just one sample image). The following steps describe our procedure.
Segmentation results of a tree image: (a) original tree image; (b) part of (a); (c) abstracted salient tree image; (d) part of (c); (e) clustering result; (f) filled clustering result; (g) ROI exaction result; and (h) segmentation result.
Step 1 (Searching the modal points): The spatial vector and the gray-scale vector are combined into a “space-color” field. Based on the optimal values [TeX:] $$h_{s}=29.8$$ and [TeX:] $$h_{r}=19.9$$, mean shift filtering is performed to obtain the tree clustering image. As the result of this filtering, [TeX:] $$m$$ modal points are obtained, and all the pixels are clustered into the same model point to form a cluster [TeX:] $$\left\{Q_{p}\right\}, p=1,2, \ldots, m, \text { and } m\begin{equation}<<\end{equation}n$$, where [TeX:] $$m$$ is the number of clustering regions, and [TeX:] $$n$$ is the data volume.
Step 2 (Combining similar areas and merging small areas): Step 1 is the process of image classification smoothing. In Step 2, the clustering result is filled using the flood fill function. The gray-scale value of pixels in the area is set to a new value, and the regional connection is realized by specific rules.
Step 3 (Segmenting the tree image): Based on the filled clustering image, the gray-scale thresholds of the canopy and trunk are set, and the ROI areas are exacted by color code table (crown with R = 49, G = 204, B = 52, and trunk with R = 191, G = 23, B = 130). The final result of the adaptive mean shift method (AMSM) for tree segmentation based on image abstraction is obtained through iterative application of the corrosion and expansion operations of mathematical morphology to provide further denoising.
4. Discussion
We also compared the performance of our proposed algorithm with other published methods that have been applied to tree image segmentation. The method used by [32] employed only color features for tree image segmentation in simple scenes. The method adopted by Zhao et al. [33], used fractional dimension-based color to segment tree images. Jiang [34] used color feature regions, a spatial clustering method, and color information to segment tree images divided into two parts, the trunk and crown, but just adapted to simple scenes. Ding et al. [35] utilized the grab cut algorithm to segment tree images. In addition, our own previous tree segmentation method based on the graph cut algorithm employed an interactive method to segment the image [36]. Fig. 6 shows the results of some of the segmentation methods in a visual form that can be viewed intuitively.
Comparison of segmentation results from four methods. From left to right: original image, ground truth, segmentation results by color feature, segmentation results by color and fractional dimension method, segmentation results using the grab cut algorithm, segmentation results using the graph cut algorithm, and our proposed AMSM algorithm.
Average segmentation accuracy for 31 tree images
Table 1 provides an objective comparison of our algorithm with the average accuracy of the other methods of tree image segmentation based on best image abstraction. For our algorithm, the average values of SA, OR, and UR of the crown were 91.21%, 3.54%, and 9.85%, respectively. The average values of SA, OR, and UR of the trunk were 92.78%, 8.16%, and 7.93%, respectively. These results were better than those obtained by the color feature, fractional dimension, or grab cut methods. Compared with the graph cut algorithm, our method was more automatic, efficient, and accurate.
5. Conclusion and Future Works
In this paper, we proposed a new tree segmentation algorithm based on a combination of image abstraction and the adaptive mean shift algorithm. First, we analyzed the challenges involved in tree image segmentation, and we proposed spatial smoothing and image pyramid operations to reduce the influence of unnecessary background information and canopy gaps on clustering. The degree of image abstraction was measured by the filtering iteration time ([TeX:] $$t$$) ) and image pyramid layer ([TeX:] $$l$$). A comparison of segmentation results was made using different abstraction levels, with [TeX:] $$t$$ in the range of 0–15 and [TeX:] $$l$$ from 0–10. We found that the segmentation results were better when [TeX:] $$l=5$$ and [TeX:] $$t$$ ranged from 4 to 8. Then, to eliminate the need to set the parameters manually, we utilized an adaptive mean shift segmentation algorithm that used the corresponding bandwidth solution methods in both the spatial and range domains. Spatial location and gray scale features were introduced, obtained by step detection and the insertion rule method, respectively. The flood fill method was then employed to fill the results of clustering to highlight the region of interest. To prove the effectiveness of tree image abstraction on image clustering, we compared different abstraction levels and achieved the optimal clustering results.
With the experimental data set including various types of tree images, our algorithm achieved an average SA of more than 90%. In comparison with two segmentation algorithms using color and fractional dimension-based color, our proposed segmentation algorithm showed relatively high SA and finer edge information of tree. What’s more, our method didn’t require human marking and demonstrated a higher degree of accuracy for tree image segmentation than prior approaches. The research is also of great significance for the other target segmentation and the measurement of target size based on image information. However, when the target tree to be photographed is far away from the camera or the surrounding environment of tree is more complex, the SA of tree is reduced. To solve this problem, in the next step we will devote to use the deep learning method to study tree features such as shape or texture or edge or color. Therefore, in the future of our work, we will improve the performance of the method and engage to use this segmentation results to solve tree information measurements. For example, we are now trying to calculate tree height and crown and diameter at breast height (DBH) based on segmentation results of tree. It may be useful for forest survey.
Acknowledgement
This work was supported by the National Natural Science Foundation of China (No. 31670641), Zhejiang Science and Technology Key R&D Program Funded Project (No. 2018C02013) and Zhejiang Public Welfare Project (No. LGN21C160004).