1. Introduction
Single-channel blind source separation (SCBSS) is a technique primarily used to isolate source signals from a single-channel mixed signal [
1], and the mixing methods of mixed signals mainly include linear and convolution mixing, which play a vital role in denoising and restoration processes across various fields. These fields include medical research [
2], image processing [
3], speech processing [
4], video processing [
5], traffic signals [
6] and other fields.
Historically, traditional SCBSS algorithms have primarily addressed the linear hybrid mode, such as nonnegative matrix factorization (NMF) [7] and independent component correlation algorithm (ICA) [8]. These methods grounded in the principles of linear equations to separate the source signal, are proficient in recovering the source signal from linearly mixed signals. However, they fall short in handling recovering the source signal from convolutionally mixed signals, which present a more complex challenge compared to linear mixing. To address this gap, researchers have further proposed a regressionbased method [9]. This method learns the complex mapping relationship between mixed signal and source signal through the robust learning capabilities of deep neural networks. Despite their effectiveness, a significant limitation of these methods is their dependency on a predefined convolution mixing matrix. Consequently, any alterations in the mixing matrix render a trained model incapable of separating the new test set., highlighting a critical area for further research and development in SCBSS.
The SCBSS algorithms mainly include: the blind deconvolution algorithm proposed by Fan et al. [10] and Stoller et al. [11] was used to recover the source signal. Building upon this, Lin and Gao [12] introduced a blind source separation (BBS) algorithm combined with a high-order spectrum. While this algorithm exhibits some capability in separating mixed convolutional signals, it is marked by high computational demands and inefficient. Meanwhile, the application of convolutional neural networks [13] and fully connected neural networks [14] has been explored for separating mixed source signals and facilitating blind deconvolution. In terms of audio, recurrent neural networks [15] have shown proficiency in separating speech from mixed noise. In addition, automatic encoders [16] have been employed for supervised separation of source signals. In 2014, after the generative adversarial network (GAN) was proposed [17], Subakan and Smaragdis [18] presented a GAN-based SCBSS algorithm in the audio field. However, this method requires prior knowledge of the mixing matrix type and assumes that the mixing matrix and the source signal share the same distribution for training. Addressing the challenge of an unknown mixing matrix, Kong et al. [19] proposed a synthesis decomposition (S-D) algorithm utilizing deep convolutional generic adversarial networks (DCGAN). This approach, which does not require prior knowledge of the convolutional mixing matrix, has achieved notable success. However, this method is only for the source image separation of convolution mixing images. Aiming at the separation of multichannel source signals, Liu et al. [20] estimated the source signal and mixing matrix through a reconstruction approach based on the minimum error of the observation signal and Bayesian maximum a posteriori estimation method.
In the actual process of composite image restoration, source images are often mixed through various mixing methods. To solve this problem, a closed-loop triple generative adversarial network (TriGAN) structure is constructed in this paper grounded in the dual learning concept, which learns the mapping relationship between the composite image and the source image, and breaks through the internal mathematical model disparities of source image separation caused by different mixing methods. The discriminator continuously updates the generator with feedback information until the generator reaches the optimal solution. Unlike previous models, TriGAN's discriminator calculates the loss at the granular level of 1 × 1 pixel block, utilizing the least square method to separate the difference between the image and the pixel block of the source image. In this way, the SCBSS of different mixed images is realized, thereby enhancing source image restoration. The rest of the paper is organized as follows. Section 2 presents mathematical models of the two main hybrid approaches in SCBSS. Section 3 elucidates the functioning of the TriGAN discriminator and outlines its training procedure. To demonstrate the effectiveness of the position of the discriminator proposed in this paper, Section 4 shows the efficiency of the discriminator under various conditions. Additionally, the experimental results in the second part of Section 4 reveal the effectiveness of TriGAN. Section 5 gives the conclusion.
2. Mathematical Model of SCBSS
Through extensive research efforts over the years, SCBSS models can be classified into two types: the linear mixing model and the convolution mixing model. The linear mixing model is widely recognized as the more common and mature method within the SCBSS domain. Notably, other complex mixing methods can be transformed into this model through mathematical transformation. The focus of the algorithm presented in this paper is on scenarios involving two source images, denoted as [TeX:] $$S=\left\{S_1, S_2\right\}.$$ These source images [TeX:] $$S_1 \text{ and } S_2$$ undergo a linear addition process to form a linearly mixed image X. The mathematical model representing this linear combination is expressed as follows:
where the mixture matrix is represented by A.
Contrasting with linear mixing, convolution mixing presents a more complex scenario in SCBSS. The key distinction lies in its mixing approach: rather than a straightforward linear relationship, convolution mixing involves a matrix convolution operation. In situations where the mixed image is the sole observational image X, the convolution mixed SCBSS mathematical model can be expressed as:
The symbol "*" indicates convolution operation:
where [TeX:] $$R^d$$ is the Euclidean space and [TeX:] $$\propto$$ denotes the convolution mixing matrix. When only the observed image X is available, the SCBSS algorithm normally solves for the remaining unknowns by assuming one of the unknowns in mathematical solutions for the mixing matrix A, the convolutional mixing matrix [TeX:] $$\propto$$ and the source image S. This process often presupposes knowledge of the mixing matrix type, utilizing it to resolve for the source image S.
The solution process varies with the type of the mixing matrix. Consequently, this variability poses a challenge for SCBSS algorithms in addressing the separation of source images under multiple hybrid methods. This paper endeavors to transcend these mathematical model constraints of SCBSS. By leveraging the inherent feature information of the blended image, the study aims to surmount the variations inherent in mixing methods. This approach seeks to accomplish comprehensive, thereby enhancing the versatility of the SCBSS algorithm in the realm of image restoration.
3. Proposed TriGAN
Building upon the foundation of the dual learning generative adversarial network (DualGAN), TriGAN consists of three GANs, which involves three image domains instead of dealing with the direct transformation problem of two image domains, diverging from the traditional approach of handling the direct transformation between two image domains. Instead, TriGAN employs a cyclic network structure to facilitate the learning of mapping relationships across these domains: from the visible image domain X to source image domain [TeX:] $$S_1,$$ source image domain [TeX:] $$S_1$$ to source image domain [TeX:] $$S_2,$$ and source image domain [TeX:] $$S_2$$ to visible image domain domain X.
The generators of TriGAN retain the structural essence of the original GAN, but introduce a redefined operational mode for the discriminators. The generators, denoted as [TeX:] $$G_{X \rightarrow S_1}, G_{S_1 \rightarrow S_2}, \text { and } G_{S_2 \rightarrow X}$$ are mirrored by their corresponding discriminators [TeX:] $$D_{X \rightarrow S_1}, D_{S_1 \rightarrow S_2}, \text { and } D_{S_2 \rightarrow X} \text {. }$$ Among them, the network architecture of the three generators is consistent, with each generator having the identical number of downsampling and upsampling layers. Additionally, mirror downsampling and upsampling layers are integrated between the generators. This inclusion, along with the implementation of jump connections, form a U-shaped structure that facilitates the sharing of low-level information between the input and generated image, thus enabling rapid convergence of the generators. The structure of the three discriminators follows the same pattern, except that, instead of using the DualGAN discriminators, a new discriminator is distributed over each pixel block of the whole image, creating a loss function between the blocks. This captures high frequency features more effectively on a pixel by pixel basis, making full use of the texture, color and style information inherent in the visible image.
In order to better and effectively learn the mapping relationship between two image domains for the deep learning network, the discriminator [TeX:] $$D_{X \rightarrow S_1}, D_{S_1 \rightarrow S_2}, \text { and } D_{S_2 \rightarrow X}$$ adopts the core principle of the least square method to calculate the error between each pixel of the generated image F and the real image R. The total error for each pixel block being calculated during the training process as follows:
Real image [TeX:] $$R \in\left\{\mathrm{X}, \mathrm{S}_1, \mathrm{S}_2\right\},$$ F is generated by three generators corresponding to it.
The initial Gaussian distribution generates images randomly during the generation, so the errors are incurred randomly and these fluctuate up and down around the true value. At the point where the total error is small, it gets closer to the true value. The minimal total error is obtained when the derivative of R equals 0.
Then TriGAN is using this concept as a loss function, replacing the loss function of DualGAN, and the objective function of TriGAN can be expressed as follows:
In the discriminator's objective function, the real data and the generated data are encoded, respectively. The discriminator [TeX:] $$D_{X \rightarrow S_1}, D_{S_1 \rightarrow S_2}, \text { and } D_{S_2 \rightarrow X}$$ calculates the loss in pixels, and after its objective function reaches an optimal value, the generator is fine-tuned to create images that are increasingly akin to the domains [TeX:] $$X, S_1, S_2.$$ The loss function for the new discriminator captures the distance of the image from the decision boundary, whereas allowing the more distant data to receive a penalty term in proportion to the distance, Therefore, for the discriminator's gradient to converge to zero, the generator image must closely approximate the real image’s position. By replacing DualGAN's loss function with this new method, TriGAN mitigates instability issues. The training process in TriGAN begins with fixing the generator and then focuses on training the discriminator:
The calculated optimal solution of the discriminator is:
Once the discriminator attains its optimal state, the discriminator is fixed, and the generator is trained until its objective function also reaches an optimal solution.
The training procedure for TriGAN is summarized in Algorithm 1.
TriGAN training procedure
4. Experimental Results
In this paper, the MNIST dataset, ancient Chinese character image dataset [21] and RESIDE dataset were employed to ascertain the efficacy of TriGAN model. In addition, each experiment was repeated 50 times, with the average of the training results being considered for analysis.
4.1 Proposed Discriminator Works
To verify the efficiency of which the discriminator of TriGAN calculates the error between the generated image F and the real image R at a granular leve of 1×1 pixels experiments were conducted based on the MNIST dataset. The normalized image size was 28 × 28 pixels. In this paper, the image samples were divided into six cases of 1 × 1, 2 × 2, 4 × 4, 7 × 7, 16 × 16, and 28 × 28 pixels, respectively in order to compare the correlation between their separated images and source images and to evaluate the work efficiency of TriGAN under the condition of different image sample sizes. The correlation between [TeX:] $$S \text { and } S^{\prime}$$ is:
Table 1 shows the correlation between [TeX:] $$S \text { and } S^{\prime}$$ of 1 × 1, 2 × 2, 4 × 4, 7 × 7, 16 × 16, and 28 × 28 pixels. A 1 × 1 approach yields a mean correlation of 0.9072. The 2 × 2, 4 × 4, 7 × 7, 16 × 16, and 28 × 28 pixels approach achieves lower and lower correlation means of 0.8970, 0.7769, 0.5659, 0.1895 and 0.1180, respectively.
Correlation between source signal [TeX:] $$S_1, S_2$$ and separated signal [TeX:] $$S_1^{\prime}, S_2^{\prime}$$ under different image sample division units
Stronger correlation indicates higher degree of image similarity. Table 1 illustrates that the separation image S′ has the highest correlation with the source image S when the method in this paper divides the image samples to calculate the loss in 1 × 1 pixels. The correlation between the separation image S′ and the source image S is similar when the segmentation units of the image samples are of the same size. As the segmentation unit size increases, the correlation decreases. This suggests that the pixel-level loss calculation employed by the TriGAN discriminator is more effective than the traditional GAN’s global loss calculation and DualGAN’s texture-based loss calculation.
In addition to the change of the working unit of the TriGAN discriminator, the loss function of the TriGAN discriminator has also changed. In order to verify that the new loss function is effective, an experiment was conducted replacing the original loss function of DualGAN with the new loss function. The incorporation of this new loss function into DualGAN resulted in a significant improvement in the restoration of synthetic images. In this section, the proposed loss function is applied in DualGAN to solve image-to-image translation. We carry out experiments on RESIDE dataset as starting research for this challenging problem and show the effectiveness of the proposed new discriminator. The dataset, known as real single image defogging (RESIDE), is a large-scale resource designed to enable fair evaluation and comparison of single image defogging algorithms. The experimental results are shown in Fig. 1.
Fig. 1 shows RESIDE dataset for image-to-image separate experiment (campsite): [TeX:] $$\mathrm{I}_{\mathrm{com}}(1) (2) (3) (4)$$ the composite image, [TeX:] $$\mathrm{I}_{\mathrm{sou}}(1) (2) (3) (4)$$ the source image, [TeX:] $$\mathrm{I}_{\mathrm{DGs}}(1) (2) (3) (4)$$ DualGAN separate image [22], [TeX:] $$\mathrm{I}_{\mathrm{NDs}}(1) (2) (3) (4)$$ this paper proposed new discriminator works globally separate image, [TeX:] $$\mathrm{I}_{\mathrm{NDs}} 1 \times 1(1) (2) (3) (4)$$ this paper proposed new discriminator works 1 × 1 pixel unit separate image.
The first and second column of Fig. 1 demonstrate the composite and source image from the RESIDE dataset. The third column shows the separate image using DualGAN which has been notably successful in image-to-image for two image domains. The fourth column indicates that the new discriminator proposed in this paper works alone on a global scale. Fig. 1 indicate that this new discriminator generates images more closely resembling the source images compared to the original DualGAN.
The new discriminator addresses the issues of poor image generation quality and unstable training. The traditional loss function in DualGAN does not allow the generator to continue to generate images that the discriminator discriminates as real images, especially when these images are still notably different from the real image. Utilizing the least squares method, the new approach calculates the distance of the image from the decision boundary, assigning a penalty term proportional to the distance to more distant data. This method ensures that the discriminator’s gradient approaches zero, compelling the generator to produce images that more closely align with the real image’s location, as shown in Fig. 2.
Restoration result graph of composite image.
Proximity of fake samples to real samples and the loss function decision boundary.
PSNR and SSIM result graph.
Fig. 3 shows 500 synthetic images based on the RESIDE dataset, in which the image-to-image separation experiment was completed. A total of four methods were compared.
1) DualGAN [22]: This method uses the standard DualGAN approach for image separation.
2) New D with 28 × 28 unit: Here, the original DualGAN discriminator is replaced with a new discriminator, operating at a 28 × 28 pixel unit scale (global operation).
3) New D with 14 × 14 unit: This approach also involves replacing DualGAN's original discriminator, but with the discriminator's operational unit modified to 14 × 14 pixels.
4) New D with 1 × 1 unit: The final method features the new discriminator with an operational unit of 1 × 1 pixel, as proposed in this paper.
4.2 TriGAN Separates Different Mixed Images
Trigan can separate both the convolutionally mixed image and the linearly mixed image. The separation results are shown in Figs. 4 and 5. Experiments in this paper were conducted on 600 pairs of randomly selected images in the MNIST dataset. The MNIST dataset was mainly composed of handwritten numeral images and their corresponding labels. There are 10 types of images, ranging from 0 to 9, with a total of 10 Arabic numerals. These images were blended into 600 convolutionally blended images and 600 linearly blended images. After applying TriGAN to separate the mixed images, the peak signal-to-noise ratio (PSNR) and structural similarity index (SSIM) were used to evaluate the degree of deviation of each separated image from its source image, and the average value of 600 groups of experimental PSNR and SSIM were calculated, respectively. I edited for clarity here.
Please review to make sure your intent has been maintained.
Based on 600 pairs of convolutionally/linearly mixed images, the average PSNR and average SSIM in different methods
Based on 600 pairs of loss of ancient Chinese characters images, the average PSNR and average SSIM in different methods
Fig. 4 shows MNIST dataset convolutionally mixing image experiment (campsite): [TeX:] $$\mathrm{I}_{\mathrm{cm}}(1)(2)(3)$$ the convolutionally mixing image, [TeX:] $$\mathrm{I}_{\mathrm{sou1}}(1)(2)(3)$$ the source image [TeX:] $$\boldsymbol{S}_{\mathbf{1}}, \mathrm{I}_{\mathrm{sep} 1}(1)(2)(3)$$ the separate image [TeX:] $$\boldsymbol{{S}^{\prime}}_{\mathbf{1}}, \mathrm{I}_{\mathrm{sou} 2}(1)(2)(3)$$ the source image [TeX:] $$\boldsymbol{S}_{\mathbf{2}}, \mathrm{I}_{\mathrm{sep} 2}(1)(2)(3)$$ the separate image [TeX:] $$\boldsymbol{{S}^{\prime}}_{\mathbf{2}}.$$. Fig. 5 shows MNIST dataset linearly mixing image experiment (campsite): [TeX:] $$\mathrm{I}_{\operatorname{lm}}(1)(2)(3)$$ the linearly mixing image, [TeX:] $$\mathrm{I}_{\text {sou1 }}(1)(2)(3)$$ the source image [TeX:] $$\boldsymbol{S}_{\mathbf{1}}, \mathrm{I}_{\mathrm{sep} 1}(1)(2)(3)$$ the separate image [TeX:] $$\boldsymbol{{S}^{\prime}}_{\mathbf{1}}, \mathrm{I}_{\mathrm{sou} 2}(1)(2)(3)$$ the source image [TeX:] $$\boldsymbol{S}_{\mathbf{2}}, \mathrm{I}_{\mathrm{sep} 2}(1)(2)(3)$$ the separate image [TeX:] $$\boldsymbol{{S}^{\prime}}_{\mathbf{2}}.$$
Convolutionally mixed image restoration for MNIST datasets.
Linearly mixing image restoration for MNIST datasets.
Image-restoration represents a significant application of SCBSS. Yin et al. [21] presented the restoration of ancient Chinese characters using SCBSS and creating a specialized dataset for ancient Chinese characters. Fig. 6 shows sample datasets, comprising five sets of ancient Chinese character image sets randomly selected from the database, each training set has 4,096 images, 2,048 images of ancient Chinese characters and occlusion respectively, and the test set had 512 images, with an equal split of 256 images each for ancient Chinese characters and occlusion. This paper leverages the ancient Chinese character dataset to restore ancient Chinese character images, differentiating the source from these images and contrasting the results with the single-channel blind deconvolution algorithm based on deep convolution generating adversarial network (DCSS) method proposed by Yin et al. [21].
A sample of the ancient Chinese character datasets: [TeX:] $$\mathrm{I}_{\mathrm{anc}}$$ ancient Chinese characters image, [TeX:] $$\mathrm{I}_{\mathrm{occ}}$$ occlusion image, [TeX:] $$\mathrm{I}_{\mathrm{com}}$$ composite image.
Restoration results of ancient Chinese character datasets.
Fig. 7 shows the experiment with the ancient Chinese character datasets, illustrating the following: [TeX:] $$\mathrm{I}_{\mathrm{cha}}(1) (2) (3)$$ is the ancient Chinese character image; [TeX:] $$\mathrm{I}_{\mathrm{sou1}}(1) (2) (3)$$ refers to the ancient Chinese characters source image; [TeX:] $$\mathrm{I}_{\mathrm{DCs1}}(1) (2) (3)$$ represents the ancient Chinese characters separate image by DCSS; [TeX:] $$\mathrm{I}_{\mathrm{our1}}(1) (2) (3)$$ is the ancient Chinese characters separate image by this paper method; [TeX:] $$\mathrm{I}_{\mathrm{sou2}}(1) (2) (3)$$ is the occlusion source image; [TeX:] $$\mathrm{I}_{\mathrm{DCs2}}(1) (2) (3)$$ denotes the occlusion separate image by DCSS and [TeX:] $$\mathrm{I}_{\mathrm{our2}}(1) (2) (3)$$ is the occlusion separate image by this paper method.
5. Conclusion
In this paper, a novel TriGAN closed-loop structure is constructed. It is an attempt to surpass the constraints of the SCBSS mathematical model, making full use of the inherent feature information of the blended image and overcoming the inherent variation among ways of mixes and fulfil the separation process of the source image. The experimental results demonstrate the generality of the SCBSS algorithm in image restoration. A new discriminator is used to calculate the pixel loss of the generated image. This methodology enables the separation of source images from a single blended image without prior knowledge of the mixing matrix, a notable breakthrough in the field. The experimental results show that this algorithm is applicable to convolutionally mixed image and linearly mixed image, and outperforms other blind source separation algorithms. In addition, it has yielded exceptional results in the restoration of ancient Chinese characters, significantly improving their restoration effect.