1. Introduction
Infrared sensors are important to the reflection of real scenes. The resulting images are featured by low resolution and low signal-to-noise ratio. The details of the scene under certain conditions can be reflected clearly using visible light sensor, and the imaging is susceptible to the natural conditions such as illumination and weather [1].
Currently, infrared and visible light image fusion is based on various improved multi-scale analysis methods. Meng et al. [2] proposed an infrared and visible image fusion based on object region detection and the non-subsampled contourlet transform (NSCT), and by using the model that had proposed, objects and be clarified and details and visual artifacts in the fused image can be preserved. Zhu et al. [3] proposed an infrared and visible image fusion method based on an improved multi-scale top-hat transform model; this model highlights the target of infrared images and preserves the details of visible images, and its performance is better than certain conventional multi-scale transform image methods in this respect. Liu et al. [4] proposed a multi-scale geometric analysis combined with image fusion method in sparse representation (SR), and they obtained better results than that of solely SR or multi-scale geometric analysis. Zhang et al. [5] proposed a multi-exposure image fusion algorithm based on time-domain median filtering. Although the traditional multi-scale method has certain noise robustness, the performance will decrease linearly when the noise is serious. In general, the structural components of infrared and visible light images mainly depict the basic contour structure of the image, and are most susceptible to noise interference. If the fusion rule is capable of filtering and maintaining the edge of the image, a clear structural fusion result can be obtained; and the texture components can reduce the effect of noise on the fusion results on infrared and visible light images by discarding the texture components on the smallest scale.
Based on this, this paper proposes an infrared and visible light image fusion method based on variational multi-scale decomposition. After the infrared and visible light images are decomposed to obtain structural components and texture components, the structural components are subject to fusion processing by means of guided filtering; After the texture components on the minimum two-layer scales are discarded, the phase-consistency, sharpness, and brightness synthesis information are adopted to integrate the weights of the fusion weights for the fusion of texture components; finally, the structural and textural component information are added together to obtain the fusion results. It can effectively alleviate the influence of noise in the fusion process by adopting the methods, while maintaining a clear edge contour structure, and excellent filtering effect is realized. The fusion method block diagram is shown in Fig. 1.
The block diagram of this paper.
2. Variational Multi-scale Decomposition
Variational multiscale decomposition is frequently adopted to recover real images from noise images. It is an important research content in the domain of image processing research. It can be seen as a decomposition of the image [TeX:] $$f \cdot f=u+v$$ denotes that the former mainly represents the image structure component, and the latter mainly represents the texture component of the image [6]. For the input infrared light image [TeX:] $$f^{A}$$ with the initial scale [TeX:] $$\lambda=0.0005$$ , the following energy functional is defined:
where [TeX:] $$\lambda$$ denotes the initial decomposition scale. [TeX:] $$f_{u_{\lambda}}^{A}$$ denotes the structural component of the infrared light image at the scale [TeX:] $$\lambda$$ , and [TeX:] $$f_{v_{\lambda}}^{A}$$ denotes the texture component of the infrared light image at the scale [TeX:] $$\lambda$$ . arginf() is the independent variable corresponding to the maximum likelihood function, [TeX:] $$f_{u}^{A}$$ is the structural component of the input infrared light image, [TeX:] $$f_{v}^{A}$$ is the texture component of the input infrared light image, and the residual of the image can be expressed as:
According to the above rules, constantly decomposition is performed to establish a multi-scale decomposition form, and obtain the structural components, texture components, and residual components of the final infrared light image:
Among them, m denotes the number of decomposition layers, [TeX:] $$f_{u_{m}}^{A}$$ and [TeX:] $$f_{v_{m}}^{A}$$ are the structural components and texture components of the m-layer decomposition of the visible light image, and [TeX:] $$f_{r_{m}}^{A}$$ is the residual difference component of the input infrared image after m-layer decomposition. Similarly, by taking the above steps for the visible light image, the corresponding structural component [TeX:] $$f_{u_{m}}^{B}$$ , texture component [TeX:] $$f_{v_{m}}^{B}$$ , and residual difference component [TeX:] $$f_{r_{m}}^{B}$$ of the visible light image can be obtained:
By discarding the texture components at the lowest two scales, the effect of noise on the fused image is reduced. The overall structural components [TeX:] $$f_{u}^{A}$$ and [TeX:] $$f_{u}^{B}$$ of infrared and visible light images are [TeX:] $$f_{u}^{A}=\sum_{i=0}^{m} f_{u_{i}}^{A}$$ and [TeX:] $$f_{u}^{B}=\sum_{i=0}^{m} f_{u_{i}}^{B}$$ , respectively, and the overall texture components [TeX:] $$f_{v}^{A}$$ and [TeX:] $$f_{v}^{B}$$ are [TeX:] $$f_{v}^{A}=\sum_{i=2}^{m} f_{v_{i}}^{A}$$ and [TeX:] $$f_{v}^{B}=\sum_{i=2}^{m} f_{v_{i}}^{B}$$ , respectively.
3. Structural Component Fusion
Guided filtering refers to a filtering algorithm based on the local linear model. It was proposed by He et al. [7] in 2013. It is a local multi-point filter and an effective boundary-preserving filter. Its main advantage including its function of maintaining the edge of the image while filtering, and the algorithm complexity is linear and efficient. Therefore, the fusion rule is an ideal choice for the setting of the structural components.
It is assumed that, in a window [TeX:] $$(2 r+1) \times(2 r+1)$$ of size ω, the output O of the filter is always represented as the linear transformation of the pilot image I:
In the fusion process, the structural components on different scales are respectively fused to obtain the fused structural components at different scales. [TeX:] $$f_{u}^{A}$$ and [TeX:] $$f_{u}^{B}$$ are the structural components of infrared and visible light images at a certain scale, and then [TeX:] $$f_{u}^{A}$$ and [TeX:] $$f_{u}^{B}$$ are compared based on each pixel to obtain weight maps [TeX:] $$w_{1}$$ and [TeX:] $$w_{2}$$ :
The structural component is obtained by through the calculation and the noise and the edges are not uniform in this respect. Then, guided filtering is performed on the weighted graphs [TeX:] $$w_{1}$$ and [TeX:] $$w_{2}$$ and [TeX:] $$f_{u}^{A}$$ and [TeX:] $$f_{u}^{B}$$ are respectively acting as the infrared and visible guiding images to obtain new weighted graphs [TeX:] $$m_{1}$$ and [TeX:] $$m_{2}$$ :
And the filter core is expressed as:
where [TeX:] $$\Omega$$ is the radius of the pilot filter and is set to r=3, which [TeX:] $$\zeta$$ is the regularized adjustment parameter of the pilot filter and is set to [TeX:] $$\zeta=10^{-6}, \sigma_{k}^{2}$$ and [TeX:] $$\mu_{k}$$ represent the average and variance of the guide image within the radius respectively. With the new weight graphs m1 and m2, the fused structure component [TeX:] $$f_{u}$$ is obtained:
4. Texture Component Fusion
The texture component includes a large amount of texture information and detail information in the image. Considering the relationship between the intensity of the current subband coefficient and the different subband coefficients in the same area, phase coherence information is adopted on the subbands of the infrared light and visible light texture components. Definition and image brightness information and other comprehensive information are adopted to calculate the texture component information of the image to be fused.
First of all, for the infrared and visible light component subbands, the phase consistency information is obtained as:
where m is the decomposition scale of the subband of the texture component, and [TeX:] $$M_{A}^{m}(x, y), M_{B}^{m}(x, y)$$ are the phase coherence information of infrared and visible light images on the m-scale at the point [TeX:] $$(x, y) ; E_{A, \theta}^{m}(x, y)$$ and [TeX:] $$E_{B, \theta}^{m}(x, y)$$ denote the local energy information of the Fourier domain in the m-scale and [TeX:] $$\theta$$ -direction angles of the texture component sub-bands of the infrared and visible light images at the point [TeX:] $$(x, y) ; S_{A, n, \theta}^{m}(x, y)$$ and [TeX:] $$S_{B, n, \theta}^{m}(x, y)$$ denote the local amplitude information of the Fourier region of the infrared component and the visible light image at the point (x, y) in the m-scale and θ-direction angular subbands; [TeX:] $$\mathcal{E}$$ is a very small positive value, which is adopted in case the denominator is zero. The sliding window is set as [TeX:] $$\phi$$ , and it is assumed that the center of the window is (x, y), when the image is traversed to an arbitrary position. The definition information of the sub-band of the texture component of the visible light and infrared light image in the window is calculated as:
[TeX:] $$I_{h, A}^{m}(x, y)$$ and [TeX:] $$I_{h, B}^{m}(x, y)$$ are the texture component subbands of the m-scale at the point (x, y) of the infrared and visible light images respectively. [TeX:] $$C_{A}^{m}\left(x_{0}, y_{0}\right)$$ amd [TeX:] $$C_{B}^{m}\left(x_{0}, y_{0}\right)$$ denote the amount of definition information of the texture component sub-bands at the m-scale dimension of the pixel point [TeX:] $$\left(x_{0}, y_{0}\right)$$ respectively. At the same time, [TeX:] $$\left(x_{0}, y_{0}\right)$$ is an arbitrary point within the sliding window [TeX:] $$\phi$$ . In this paper, according to the experience [TeX:] $$\phi$$ , the value is [TeX:] $$11 \times 11$$ .
Through the comprehensive analysis of the phase consistency and the factors such as definition and image brightness, the rules for subband fusion of the texture components are obtained as follows. The activity of the subband of the texture component is defined as:
[TeX:] $$H_{A}^{m}(x, y)$$ and [TeX:] $$H_{B}^{m}(x, y)$$ denote the active level of the texture component of the infrared and visible light images on the m-scale, and [TeX:] $$\left|I_{A}^{m}(x, y)\right|^{\gamma^{1}}$$ and [TeX:] $$\left|I_{B}^{m}(x, y)\right|^{\gamma^{1}}$$ denote the brightness of the subbands of the texture component. [TeX:] $$a_{1}, \beta_{1}$$ and [TeX:] $$\gamma_{1}$$ denote the weight information of phase consistency information, sharpness, and image brightness information respectively. Based on the experience values, the weight coefficient in experiment were 0.001, 1, and 1, respectively.
Then the weight of the fused texture components of infrared and visible light images is calculated:
[TeX:] $$Q S_{A}^{m, p}(x, y)$$ and [TeX:] $$Q S_{B}^{m, p}(x, y)$$ are the number of active pixels in the sliding window of the infrared and visible light images at the point (x, y) m-scale, where [TeX:] $$x \times y$$ is the size of the sliding window, and the expression of [TeX:] $$Q S_{A}^{m, p}(x, y)$$ and [TeX:] $$Q S_{B}^{m, p}(x, y)$$ are:
Finally, texture component fusion is performed:
5. Fusion Results
In this experiment, the simulation experiment platform uses Intel Xeon CPU E3-1231 v3 @ 3.40 G 3.40 GHZ and its memory is 16 G, with PC Win7 and programming language MATLAB2010a. Due to the limited length of the paper, only a set of infrared visible images were adopted to verify the experimental results. At the same time, Gaussian white noise was artificially added to all images. The standard deviation was 10, 20. It can naturally introduce the denoising process into the fusion process, and has somehow denoising capability by adopting the transform domain method. Therefore, in this paper, it is mainly compared with the transform domain fusion method. Three representative methods for transform domain fusion are compared with the proposed method. That is, NSCT fusion method, SR fusion method, Shearlet transform (ST) fusion method, respectively. Among them, discrete wavelet transform (DWT), NSCT, ST decomposition layer is set to 4 layers, and wavelet base db4 is used in DWT method. In NSCT method, the direction filter is set to "vk", decomposition filter is set to "pyrexc", and the direction of 4 layers decomposition are 4, 8, 8, and 16, respectively. The SR method image block size is set to [TeX:] $$8 \times 8$$ , the reconstruction error is set to 0.1, k-singular value decomposition (K-SVD) is used for training dictionary, and the dictionary size is 256. In terms of fusion rules, NSCT method, RT method, and ST method integrate the fusion rules in literature [8,9] and [10], respectively.
Fig. 2 is the result of the experiment in this paper when the noise standard deviation is 10. Fig. 2(a) and 2(b) are the original images of the third group of samples to be fused. Fig. 2(c) and 2(d) are schematic diagrams of the third group of samples to be fused after adding 10 standard deviation Gaussian white noise; Fig. 2(e)–2(i) are the four fusion methods for the purpose of comparison, and the fusion of the paper as a result of the method. It can be seen that due to the small amount of added noise, the fusion results of several methods are relatively ideal, as they are basically capable of maintaining the edges and details, and all of them have displayed certain capacity of noise suppression. The NSCT method, the Shearlet method, and the method mentioned in this paper are relatively smooth and have displayed relatively better visual effects.
Comparison of the third group of fusion results (noise standard deviation is 10): (a) original visible light image, (b) original infrared light image, (c) visible light and noise image, (d) infrared light and noise image, (e) NSCT method fusion result, (f) RT method fusion result, (g) Shearlet method fusion result, (h) fusion results of proposed method in this paper.
In order to highlight the noise suppression effect of this method, the standard deviation of Gaussian white noise added is increased to 20 and the others remain unchanged as shown in Fig. 3. Since the serious noises can be effectively suppressed by adopting most of existing methods for multi-scale fusion methods, the guide filtering is adopted in the method to set the fusion rules on the structural components, which effectively removes noise pollution at the edges, and discards the lowest two in the texture component. The texture components on the scale have shown the best noise suppression performance compared to the other types. It can be seen that the fusion results of the DWT method, the ST method and the NSCT method are very poor, and the noise particles have been heavily contaminated to the structural edge and texture information of the fusion result. The fusion of the method in this paper is obviously outperform other methods for transform domain fusion. It can effectively suppress the noise while retaining the edge and texture information and has the best visual effect. And after the partial amplification of the fusion results of the above four methods can be clearly seen (as shown in Fig. 4), and the image obtained through this method looks significantly smoother than that obtained using other methods, and edge and texture information is much better. In summary, the visual effect of this method is better than the traditional transform domain method in the presence of noise pollution, especially when the noise is greater than 20 standard deviations, namely, the fusion method of this paper is more prominent when the original image is seriously polluted by noise.
Comparison of the third group of fusion results (noise standard deviation is 20): (a) original visible light image, (b) original infrared light image, (c) visible light and noise image, (d) infrared light and noise image, (e) NSCT method fusion result, (f) RT method fusion result, (g) Shearlet method fusion result, and (h) fusion results of proposed method in this paper.
The partially enlarged views of the fusion results in Fig. 3.
In terms of the objective evaluation indicators, five common fusion indicators are adopted to evaluate the objective quality of various fusion methods. In the aspect of the information theory, the mutual information index is adopted, and the gradient-based fusion index is adopted for the image feature. As for the structure, the fusion index based on structural similarity is adopted and the index in human visual sensitivity is also used. The objective evaluation results of the quality of the three sets of image fusion results are shown in Table 1.
From the above Table 1, we can see that when 10 standard deviation Gaussian white noise is added, this method has shown a few advantages in aspect of fusion index comparison, followed by RT fusion method and NSCT fusion method. However, when the noise is increased to 20 standard deviations, the advantages of the method become obvious in terms of gradient, structural similarity, visual sensitivity, and mutual information indicators. The merits of the method is obvious for the traditional transform domain when dealing with noise-contaminated original images. Meanwhile, as the complexity of the pilot filter algorithm is linear, regardless of the size of the set window, it has high efficiency. In terms of subjective and objective synthesis, this method can be applied to the fusion of noisy infrared and visible light images, and it has certain advantages over traditional transform domain fusion methods.
6. Conclusions
In this paper, an infrared and visible light image fusion method based on variational multiscale decomposition is proposed. Guided filtering was used to fuse the structural components. The fusion weight rules were constructed using the phase consistency, definition, and brightness synthesis information on the texture components. Compared with the traditional methods for infrared and visible image fusion, it not only effectively overcomes the noise interference in the fusion process, but also obtains better texture details while maintaining the edge structure, and displays certain subjective and objective quality by adopting the method. However, due to the complexity of the guided filtering algorithm, it takes a long time if the algorithm is used. In the case of high real-time demand, the algorithm is difficult to meet the requirements, so the next study needs to further improve the computational efficiency of the algorithm.
Acknowledgement
The paper is supported by National Natural Science Foundation of China (No. 31501229, 61861025), Chongqing Nature Science Foundation for Fundamental Science and Frontier Technologies (No. cstc2018jcyjAX0483), Key Laboratory of Chongqing Technology and Business University (No. KFJJ2019076), and project of science and technology research program of Chongqing Education Commission of China (No. KJQN201900821).