1. Introduction
Due to the limited depth of field of the optical lens, it is very difficult for the visible light imaging system to clearly define all the objects in the scene that have large differences in distance [1]. However, through the multi-focus images fusion methods, we can merge the targets of different focal lengths of the same scene into a new image with all the clear targets for human observation or computer processing [2].
At present, most of the research on multi-focus image fusion technology is based on the pixel layer. The algorithm can be roughly divided into two categories: spatial domain image fusion and transform domain image fusion. For the former, firstly treat the fusion images with some forms of multi-scale geometric transformation (wavelet transform, contourlet transform, shear wave transform, etc.) [3]. A certain fusion rule is used to combine the transformed coefficients to obtain the fused transform coefficients, and finally the inverse transform is performed to obtain the final fused image [4]. The fusion method of multi-scale transformation also has good fusion effect because it is similar to the process of the human visual system from coarse to fine recognition [5]. However, this kind of method has two obvious shortcomings: (1) there are different degrees of loss in the clear degree; (2) the selection of the rules in the process of fusion is complex, and there is not a unified and effective multi-scale transformation fusion rule. In order to solve the above problems, this paper presents a new multi-focus image fusion method. The experimental results show that this method will not lose sharpness and avoid the complicated selection of fusion rules, which is effective.
2. The Fusion Method Proposed in This Paper
The block diagram of the fusion method is shown in Fig. 1.
Fusion method diagram of this paper.
According to block diagram, the main fusion steps are as follows:
1. According to lifting stationary wavelet transform (LSWT), quickly obtain the fused initial image, prepared for next step.
2. The fused initial image gets smoothing, and then normalized cut image segmentation is performed.
3. The source images are respectively nonsubsampled contourlet transform (NSCT) decomposed to obtain a coefficient matrix of the same size as the source image.
4. Find the sum of the absolute values of the high-frequency coefficients in each divided regions and select the region with the larger value as the corresponding region after the fusion.
5. Traverse all the divided areas to get the final fusion image.
2.1 Initial Fusion
LSWT transform is a flexible static wavelet transform method, which can use linear and nonlinear predictors to ensure the reversibility of the transform. Different from the classical wavelet transform, LSWT has the invariance of translation and better real-time performance. After LSWT decomposition, the size of the sub-band image is the same as the size of the source image, which can better determine the relationship between the sub-band images and is very conducive to the implementation of the fusion algorithm [6]. Therefore, this paper selects LSWT for initial fusion.
2.2 Normalized Cut
Normalized cut [6] is a normalized form, which gets the highest similarity inside the sub-graph and the smallest similarity among the subgraphs. The normalized cut algorithm divides the initial fused image into different regions.
First, let w(i, j) be the similarity between the pixel i and the pixel j. The value is mainly related to the brightness value, relative distance, texture and edge between two pixels. In this paper, only the pixel brightness and relative distance are considered. The smaller the brightness distance of the two pixels or the closer the spatial distance, the larger the value of w(i, j) is. The weight is defined as:
where Xi and Xj represent the spatial positions of pixel i and pixel j, respectively, [TeX:] $$\left\| X _ { i } - X _ { j } \right\| _ { 2 } ^ { 2 }$$ denotes the spatial distance between two pixels, Fi and Fj denote the pixel values of pixel i and pixel j, respectively.
According to literature [7], to minimize N-cut is equivalent to solving general matrix characteristic equation:
where W is the n×n symmetric matrix defined in Eq. (1), D is the diagonal matrix and D(i, j)=di. For the solution, using y as the indicator vector, the eigenvector corresponding to the second smallest eigenvalue of Eq. (2) is the solution of the disjoint subsets A and B.
Segmentation steps in this paper are as follows:
1. According to the formula 1, calculate the similarity between any two pixels in the image to be segmented, that is, the weight of the connection between two points, and establish a corresponding undirected graph;
2. Solve the eigenvector corresponding to the second smallest eigenvalue in Eq. (2);
3. According to the appropriate threshold, divide the eigenvector of the second eigenvalue into two parts, and divide the image into two parts A and B.
4. Determine whether it needs iterative segmentation. If, skip to the second step of iteration, or end.
2.3 Fusion Rules
An NSCT decomposition of the source image is performed to obtain the high-frequency detail component and the low-frequency approximation component, respectively. Among them, the highfrequency detail component represents the gradient information of the source image. The larger the absolute value, the stronger the change of the corresponding pixel value in the source image is, and the clearer the pixel value is. Therefore, this article calculates the sum of the absolute value of the high frequency part of the NSCT decomposition coefficient corresponding to each of the divided regions in each source image, so as to represent the clarity of the region in the source image. A larger value indicates that the block is clearer. Define the sharpness measurements for the k-th region in source images A and B are:
where ROAk and ROBk represent the k-th region in the source images A and B, respectively.
Then compare the sharpness of the corresponding area of the two images, and the larger the resolution, the more clearly the area will be clearer than the other image in its corresponding source image. Then directly select the clear area as the fusion of the corresponding area.
where RoFk represents the k-th region of the fused image F. Traverse all the divided areas to get the final fusion image.
3. Experimental Results
In order to verify the effectiveness of the proposed algorithm in multi-focus image fusion, we proceed simulation experiment with two groups of fully-focused multi-focus images by MATLAB software. All source images were 512×512 in size and 256 in gray scale. For comparison, the improved static wavelet transform (LSWT), the NSCT fusion method and the Shearlet fusion method are adopted. LSWT fusion method adopts the second decomposition A(l=2) taking the average of the simple fusion algorithm in the low frequency part, and taking large fusion rules of the modulus in the high frequency part. The NSCT transform uses 4-level decomposition, the directional decomposition filter adopts the "dmaxflat" filter, the scale decomposition filter adopts the "maxflat" filter, and the order of decomposition is 2, 3, 3, 4; the fusion rules is same as LSWT. Shearlet method employs parameters in the literature [8].
The contrast of fusion result for multi-focus images (group 1). (a) Original image 1, (b) original image 2, (c) LSWT method, (d) NSCT method, (e) Shearlet method, and (f) the proposed method.
The first group of multi-focus image fusion results are shown in Fig. 2, in which Fig. 2(a) is a right clear original timepiece image and Fig. 2(b) is a left clear original timepiece image. Fig. 2(c)–(f) show the fusion results of multi-focus images by LSWT, NSCT, Shearlet and the proposed fusion method respectively. Due to the translational invariance, LSWT is better to restrain the distortion in image fusion, but because of the simple rule of selection, the clarity of the image to be fused cannot be well preserved. From the result, it can be seen that the writing on the dial is very fuzzy, but with high realtime.
The results of NSCT fusion have more edge information than LSWT, which is mainly due to the fact that NSCT transform is more powerful than wavelet in image edge representation. The Shearlet transform method, due to the unrestricted direction of the shearing operation and the ability to analyze the image in more directions, takes advantage of its fast coefficient degradation and the ability to approximate image edges at any scale, resulting in clearer detail of the edges and contrast. Based on the fusion of LSWT transform, the method of this paper selects a clear region of the source image through a simple normalized cut and introduces NSCT transform. The algorithm is simple and has almost no loss of sharpness. Therefore, this method has the highest definition and best edge detail information.
The contrast of fusion result for multi-focus images (group 2). (a) Original image 1, (b) original image 2, (c) LSWT method, (d) NSCT method, (e) Shearlet method, and (f) the proposed method.
The second group of multi-focus image fusion results are shown in Fig. 3, where Fig. 3(a) and (b) are the original images. Fig. 3(c)–(f) show the fusion results of multi-focus images by LSWT, NSCT, Shearlet and the proposed fusion method, respectively. From the results of the fusion, several methods can well eliminate the focus difference of the source image, and obtain a clearer image with a foreground and a background. However, by careful comparison, we can find that the proposed algorithm has the highest fusion performance. The LSWT method is less sharp and has more obvious spurious information than the source image, especially in richly textured areas. In NSCT, due to the increase of translational invariance, false information is well suppressed. Shearlet method compared with the method of this paper, the apparent low contrast, and the goal of this method is not clear after the fusion. And the algorithm in this paper is also carried out in the NSCT domain with translational invariance, which can restrain false noise well. So comprehensively, this method has certain advantages both in definition and in phantom suppression.
Fig. 4 is a comparison of local images obtained by the fusion of all the methods. It can be clearly seen that the fusion result of the method not only preserves the edge characteristics but also overcomes the edge oscillation and has the best effect.
In order to objectively evaluate the fusion image quality, this paper selects information entropy (IE), mutual information (MI), peak signal-to-noise ratio (PSNR), edge information retention (QAB/F) and execution time T(s) as evaluation indicators.
The contrast of fusion result for local zoom images: (a) LSWT method, (b) NSCT method, (c) Shearlet method, and (d) the proposed method.
Image information entropy is an important indicator to measure whether the image information is rich, defined as:
where L is the grayscale of the image F and pi is the distribution probability of the grayscale.
Mutual information defines the correlation between two images statistically. The mutual information between the reference image A and the fused image F is:
where, hAF(i, j) is a normalized joint histogram of images A and , hA(i) and hF(j) are normalized histograms of the two images, and L is the number of gray levels. Similarly, the mutual information between the image B and the fused image F is denoted by MIBF . The result of image fusion is evaluated by the sum of mutual information of each source image and fusion image, that is:
The peak signal-to-noise ratio is mainly used to measure the vividness of images, which is defined as:
The higher the PSNR value, the better the fusion effect is.
Edge information retention (QAB/F), proposed by Xydeas and Petrovic, is used to evaluate the amount of edge information passed from the source image to the fused image.
It can be seen from Table 1 that the proposed method is the best on the four indicators of IE, MI, PSNR, and QAB/F [9]. And it consumes only a little more running time than the LSWT method, which is much lower than the other two fusion methods of multi-scale geometric analysis. It proves that the method in this paper also has certain advantages in computational complexity.
4. Conclusion
In this paper, aiming at the characteristics of the focused image, an improved fusion method is proposed. Firstly, Lifting Stationary Wavelet Transform is used to obtain the initial fused image quickly. After the simple normalized cut of the initial fused image, the sum of the absolute values of the coefficients of the NSCT transform is selected as the fusion rule. The experimental results show that this method will not lose the sharpness and avoid the complex selection of the fusion rules, which is effective.
Acknowledgement
The paper is supported by Chongqing Municipal Education Commission Foundation and Frontier Research Project (No. KJ1500635), The National Natural Science Fund of China (No. 31501229), Initial Scientific Research Fund of Young Teachers in Chongqing Technology and Business University (No. 2013-56-06), Chongqing Nature Science Foundation for Fundamental Science and Frontier Technologies (csct2015jcyjA40014, cstc2018jcyjAX0483).