## Hung-Ngoc Nguyen* , Cheol-Hong Kim** and Jong-Myon Kim*## |

Filter coefficient | 2’s Complement representation | CSDBE representation | |

H(z) low-pass | -0.1294 | 1.110111101110000 | 0.001000010010000 |

0.2241 | 0.001110010101111 | 0.010010101010001 | |

0.8365 | 0.110101100010010 | 1.001010100010010 | |

0.4830 | 0.011110111010010 | 0.100001001010010 | |

G(z) high-pass | -0.4830 | 1.100001000101110 | 0.100001001010010 |

0.8365 | 0.110101100010010 | 1.001010100010010 | |

-0.2241 | 1.110001101010001 | 0.010010101010001 | |

-0.1294 | 1.110111101110000 | 0.001000010010000 |

In this study, the input signal is convoluted by many coefficients at the same time. An AFS technique is proposed, which allows further savings in hardware resource utilization for the convolution operation. The AFS reuses the same functional operators (i.e., taking advantage of the shifters and adders similar to the architecture) for saving hardware resources on a chip. A convolution of input samples and h_{l} coefficients can be expressed in a CSDBE form, as follows:

Y_{0} = X * (−0.1294) = X * 0.001000010010000_{CSDBE}

Y_{1} = X * 0.2241 = X * 0.010010101010001_{CSDBE}

Y_{2} = X * 0.8365 = X * 1.001010100010010_{CSDBE}

Y_{3} = X * 0.4830 = X * 0.100001001010010_{CSDBE}

By setting C_{0} = X >> 4, C_{1} = X >> 3, C_{2} = X >> 2, C_{3} = X >> 1, and C_{4} = X >> 0, we have B_{0} = X * 0.1001_{CSDBE} = C_{0} + C_{3}, B_{1} = X * 0.101_{CSDBE} = C_{1} + C_{3}, B_{2} = X * 0.101_{CSDBE} = C_{1} – C_{3}, and B_{3} = X * 1.001_{CSDBE} = C_{4} – C_{1}. We can get the following results by combining the above expressions:

Y_{0} = −C_{1} – B_{0} >> 8,

Y_{1} = C_{2} + B_{2} >> 5 – B_{1} >> 9,

Y_{2} = B_{3} – B_{1} >> 5 + B_{0} >> 11,

Y_{3} = C_{3} – B_{0} >> 6 + B_{0} >> 11.

The output of the convolution using the shift-add method is Y_{low}(n) = Y_{0} + Y_{1} >> 1 + Y_{2} >> 2 + Y_{3} >> 3, as shown in Eq. (1). A hardware implementation of the AFS technique is presented in Fig. 7. In this case, a hardware implementation of the AFS technique for the FIR filter design can be synthesized with only 11 adders, showing very high resource savings. The proposed design is evaluated by comparing its efficiency to others in the same architecture. The performance of the system is analyzed for a 1024-point DWPT computation in the next section.

The proposed pipelined DWPT processor for 1024-point computation with five-level decomposition and the aforementioned designs are implemented on a Virtex 7 XC7VX485T FPGA board using the Xilinx Vivado Design Suite (XVDS) tool for testing function, timing simulation, and design synthesis. The Verilog HDL code is used for programming the high-level description of the designs. The process of implementation is simulated in a hardware environment and its obtained results are compared with MATLAB to confirm the accuracy of the system. Consider the case of 1024-point data representing a frame of an acoustic emission (AE) signal sampled at 1 MHz. Since five-level DWPT is used, there are 32 sub-band analyses at the output. The accuracy of the design is measured via the mean squared error (MSE) values, and results obtained via MATLAB are used as a baseline for comparison. The average MSE value is about 10^{−5}, which is very small. This indicates that the proposed design can provide high-accuracy of DWPT processing.

The proposed Db2-based DWPT core IP is efficiently designed by employing the FFP architecture and advantage of the CSDBE and AFS algorithms. The data streams are represented in a signed fixed-point format using 16-bit word length for the in/out data and 10-bit precision (i.e., 10 fractional bits) used for internal computation process of the system. A schematic RTL in the gate-level of the five-level pipelined DWPT processor using a Db2-based FIR filter is presented in Fig. 8. At each decomposition level j (j = 0, 1, …, level−1), there are 2^{j} WFPE cells for sub-band analysis. The synthesized result on hardware for a WFPE cell at first decomposition level of DWPT processor is shown in Fig. 9. The WFPE cells are based on the efficient transpose form structure, which improves the memory resource utilization for hardware implementation.

Table 2 summarizes the resource utilization of the above architecture and compares its hardware complexity to that of traditional designs. The traditional architecture usually uses more distributed logic resources such as flip-flops (FFs), look-up tables (LUTs), memory LUTs, block RAMs, and block DSPs. The proposed AFS and CSDBE-based DWPT processor uses fewer resources and does not require any embedded dedicated DSP blocks. Convolution in Db2-based FIR filters for DWPT decomposition only requires configurable logic blocks (CLBs) and distributed memory, further reducing the logic resources on the FPGA chip and avoiding the need for DSP blocks. Overall, the proposed design employing a combination of the CSDBE and AFS techniques achieves better hardware resource utilization compared with conventional designs.

Table 2.

# of slices | % Savings | # of slices | % Savings | |||

# of registers (CLB flip-flops) | 6,441 | 3,328 | 48.33 | 3,319 | 48.47 | 607,200 |

# of LUTs | 42,242 | 26,866 | 36.39 | 18,903 | 55.25 | 303,600 |

# of memory LUTs | 7,400 | 4,840 | 34.59 | 4,840 | 34.59 | 130,800 |

# of block RAMs | 4 | 2 | 50 | 2 | 50 | 1,030 |

# of block DSPs | 8 | 0 | 100 | 0 | 100 | 2,800 |

In this paper, we presented the Db2 mother wavelet function-based efficient implementation of a fivelevel pipelined DWPT processor using FIR filter banks. The proposed AFS and CSDBE-based DWPT processor was verified on the Virtex-7 FPGA board using the XVDS tool. This optimized design is based on an efficient transpose form structure, thereby reducing its computational complexity by half, while also achieving significant savings in hardware resources for its FPGA implementation. The proposed design successfully exploited the FFP architecture and enhanced the performance by employing both the CSDBE algorithm and the AFS technique. Experimental results showed that the proposed design achieves better hardware resource utilization compared to the conventional designs, while maintaining high accuracy of the result of DWPT.

This research was support by The Leading Human Resource Training Programof Regional Neo Industry through the National Research Foundation of Korea funded by the Ministry of Science, ICT, and future Planning (No. NRF-2016H1D5A1910564). It was alsofunded in part by the Korea Institute of Energy Technology Evaluation and Planning (KETEP) and by the Ministry of Trade, Industry, & Energy (MOTIE) of the Republic of Korea (No. 20172510102130).

He received his B.S. and M.Sc. degrees in Electronics, Telecommunications, and Computer Engineering in 2007 and 2011, respectively, from the University of Science in Ho Chi Minh City, Vietnam. He is currently a PhD student of Electrical, Electronics, and Computer Engineering, University of Ulsan, Ulsan, Korea. His research interests include hardware implementation of efficient signal processing algorithms on FPGAs and embedded system design and multi-core architecture.

He received his B.S., M.S., and Ph.D. degrees in Computer Engineering from Seoul National University, Seoul, Korea, in 1998, 2000, and 2006, respectively. He is an Associate Professor of Electronics and Computer Engineering at Chonnam National University, Gwangju, Korea. His research interests include multi-core architecture and embedded systems.

He received a B.S. in Electrical Engineering from Myongji University in Yongin, Korea, in 1995, an M.S. in Electrical and Computer Engineering from the University of Florida in Gainesville, FL, USA, in 2000, and a Ph.D. in Electrical and Computer Engineering from the Georgia Institute of Technology in Atlanta, GA, USA, in 2005. He is a Professor of IT Convergence at the University of Ulsan, Korea. His research interests include multimedia processing, digital watermarking, multimedia specific processor architecture, parallel processing, and embedded systems. He is a member of IEEE and the IEEE Computer Society.

- 1 S. Mallat,
*A Wavelet Tour of Signal Processing*, San Diego, CA: Academic Press, 1999.doi:[[[10.1016/B978-0-12-374370-1.X0001-8]]] - 2 C. Anibou, M. N. Saidi, D. Aboutajdine, "Classification of textured images based on discrete wavelet transform and information fusion,"
*Journal of Information Processing Systems*, vol. 11, no. 3, pp. 421-437, 2015.doi:[[[10.3745/JIPS.02.0028]]] - 3 D. Wang, F. Yang, H. Zhang, "Blind color image watermarking based on DWT and LU decomposition,"
*Journal of Information Processing Systems*, vol. 12, no. 4, pp. 765-778, 2016.doi:[[[10.3745/JIPS.03.0055]]] - 4 M. Bahoura, H. Ezzaidi, "FPGA-implementation of discrete wavelet transform with application to signal denoising,"
*Circuits Systems, and Signal Processing*, vol. 31, no. 3, pp. 987-1015, 2012.doi:[[[10.1007/s00034-011-9355-0]]] - 5 M. Satone, G. K. Kharate, "Face recognition based on PCA on wavelet subband of average-half-face,"
*Journal of Information Processing Systems*, vol. 8, no. 3, pp. 483-494, 2012.doi:[[[10.3745/JIPS.2012.8.3.483]]] - 6 J. Agarwal, S. S. Bedi, "Implementation of hybrid image fusion technique for feature enhancement in medical diagnosis,"
*Human-centric Computing and Information Sciences*, vol. 5, no. 3, 2015.doi:[[[10.1186/s13673-014-0020-z]]] - 7 S. Ardhapurkar, R. Manthalkar, S. Gajre, "ECG denoising by modeling wavelet sub-band coefficients using kernel density estimation,"
*Journal of Information Processing Systems*, vol. 8, no. 4, pp. 669-684, 2012.doi:[[[10.3745/JIPS.2012.8.4.669]]] - 8 S. Lalani, D. Doye, "Discrete wavelet transform and a singular value decomposition technique for watermarking based on an adaptive fuzzy inference system,"
*Journal of Information Processing Systems*, vol. 13, no. 2, pp. 340-347, 2017.doi:[[[10.3745/JIPS.03.0067]]] - 9 C. Wang, J. Zhou, L. Liao, J. Lan, J. Luo, X. Liu, M. Je, "Near-threshold energy-and area-efficient reconfigurable DWPT/DWT processor for healthcare-monitoring applications,"
*IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 62, no. 1, pp. 70-74, 2015.doi:[[[10.1109/TCSII.2014.2362791]]] - 10 V. K. Tiwari, S. K. Jain, "Hardware implementation of polyphase-decomposition-based wavelet filters for power system harmonics estimation,"
*IEEE Transactions on Instrumentation and Measurement*, vol. 65, no. 7, pp. 1585-1595, 2016.doi:[[[10.1109/TIM.2016.2540861]]] - 11 R. Yan, R. X. Gao, X. Chen, "Wavelets for fault diagnosis of rotary machines: a review with applications,"
*Signal Processing*, vol. 96(Part A), pp. 1-15, 2014.doi:[[[10.1016/j.sigpro.2013.04.015]]] - 12 M. Kang, J. Kim, J. M. Kim, "An FPGA-based multicore system for real-time bearing fault diagnosis using ultrasampling rate AE signals,"
*IEEE Transactions on Industrial Electronics*, vol. 62, no. 4, pp. 2319-2329, 2015.doi:[[[10.1109/TIE.2014.2361317]]] - 13 J. Uddin, R. Islam, J. M. Kim, "Texture feature extraction techniques for fault diagnosis of induction motors,"
*Journal of Convergence*, vol. 5, no. 2, pp. 15-20, 2014.custom:[[[]]] - 14 G. Strang, T. Nguyen,
*Wavelets and Filter Banks*, Wellesley, MA: Wellesley-Cambridge Press, 1996.custom:[[[-]]] - 15 C. H. Hsia, J. H. Yang, W. Wang, "An efficient VLSI architecture for discrete wavelet transform," in
*Proceedings of 2015 Asia-Pacific Signal and Information Processing Association Annual Summit and Conference (APSIPA)*, Hong Kong, China, 2015;pp. 684-687. doi:[[[10.1109/APSIPA.2015.7415359]]] - 16 M. Chehaitly, M. Tabaa, F. Monteiro, A. Dandache, "A fast and configurable architecture for discrete wavelet packet transform," in
*Proceedings of 2015 Conference on Design of Circuits and Integrated Systems (DCIS)*, Estoril, Portugal, 2015;pp. 1-6. doi:[[[10.1109/DCIS.2015.7388599]]] - 17 M. Bahoura, H. Ezzaidi, "Pipelined architecture for discrete wavelet transform implementation on FPGA," in
*Proceedings of 2010 International Conference on Microelectronics*, Cairo, Egypt, 2010;pp. 459-462. doi:[[[10.1109/ICM.2010.5696188]]] - 18 H. N. Nguyen, C. H. Kim, J. M. Kim, "Efficient Daubechies-based pipelined discrete wavelet package transform for sub-band analysis using advanced functional sharing," in
*Proceedings of the 11th International Conference on Multimedia and Ubiquitous Engineering (MUE)*, Seoul, Korea, 2017;custom:[[[-]]]