1. Introduction
With the continuous development and progress of society, communication and communication have become an integral part of people’s lives. Wireless communication technology is a communication method that utilizes the characteristics that electromagnetic signals can be freely transmitted in space to exchange information, it has the characteristics of convenient use, good scalability, and low cost. Wireless communication technology does not require a large-scale supporting infrastructure, has low installation and maintenance costs, and is highly adaptable to the environment. With the rapid development of science and technology, wireless communication technology has also been used initially for the exchange of user voice information, and gradually applied to cluster communication, satellite communications, mobile video technology and other aspects. But wireless communication also has some defects, such as weak anti-interference ability, slow transmission speed, limited bandwidth and limited transmission distance.
Single carrier frequency domain equalization (SC-FDE) is a kind of good antimultipath techniques. Compared with orthogonal frequency division multiplexing (OFDM), SC-FDE has the advantages of low peak to average power ratio and insensitivity to carrier phase. In recent years, SC-FDE has been widely concerned and applied in wireless communication systems [1]. In SC-FDE system, timing synchronization error will not only cause signal amplitude and phase distortion, but also cause inter-symbol interference, thus, seriously affecting the system performance. Therefore, timing synchronization has always been the focus of research and directly affects SC-FDE system performance. Because of the rapid development of digital processing chips such as field programmable gate array (FPGA) and digital signal processing (DSP), the hardware realization of complex signal processing algorithms has become a reality. Therefore, realizing SC-FDE timing synchronization algorithm on FPGA has strong engineering significance.
The SC-FDE timing synchronization has been studied in a large amount of literature, it can be divided into three main categories: synchronous algorithm based on cyclic prefix (CP), synchronous algorithm based on special structure, and synchronization algorithm based on training sequence. In [2], because of the use of CP, there is a plateau area, the precision is not high, and it cannot even work under the influence of the multipath channel. The authors [3] use the particularity of the conjugate symmetric structure for timing synchronization, it performs poorly in low signal-to-noise ratio (SNR) and is more complex in the FPGA implementation. In [4], the authors use good autocorrelation and cross-correlation of CAZAC (constant amplitude zero auto correlation) sequence to complete timing estimation and frequency offset estimation, but timing synchronization performance is very susceptible to frequency offset. In [5], the training sequence constructed by CAZAC sequence has the structure of repetition in the time domain, and the pseudo-noise (PN) sequence is weighted, so that the timing function has very sharp peaks. However, the PN sequence weighting destroys the repetition of the training sequence, which leads to the low frequency offset estimation performance under the multipath channel. The authors [6] use two different CAZAC sequences for timing synchronization, also has quite sharp peaks, but the weighted operation of CAZAC sequences is too complex to achieve on the FPGA side. Tsai et al. [7] designed a combined carrier frequency offset and timing offset combined weighted least squares estimation algorithm under multipath fading channel conditions. The algorithm can achieve good performance in multipath channels, but with high complexity. Lv et al. [8] proposed a coarse synchronization and fine synchronization multi-level estimation algorithm, where coarse synchronization uses LS algorithm to quickly obtain coarse synchronization and coarse frequency offset estimation, and then use the ML algorithm to obtain the final fine synchronization results. Simulation results show that the algorithm is more suitable for achieving higher accuracy synchronization in a fast time varying multi-path channel environment. Schmidl and Cox [9] use the correlation of repeated CAZAC sequences as a measure function, and the implementation of FPGA is relatively simple, but its synchronization accuracy is low.
Considering timing synchronization algorithm in [9] and the data block structure recommended by IEEE802.16 [10], this paper designs a timing synchronization algorithm and implements the algorithm on FPGA platform. In the implementation process, this paper uses the sliding window accumulation, quantization and amplitude reduction techniques to reduce the computational complexity and hardware overhead of the timing synchronization algorithm. The experimental results verify the availability of [11] synchronization algorithm in the actual hardware environment.
2. SC-FDE System Overview
In this paper, the 3-channel model SUI-3 recommended by IEEE802.16 is used in the simulation [12], and the maximum multipath delay is extended to 0.9 μs, which is within the guard interval duration Ng. The specific parameters are shown in Table 1.
Fig. 1 is the frame structure of SC-FDE. Each data block adds the same unique word (UW) suffix to form a symbol [13,15] . The UW suffix that is added to each data block is the same, so each symbol has ideal periodicity. When the maximum delay [TeX:] $$N_{h}$$ of the channel is within the UW length range of [TeX:] $$N_{g},$$ that is [TeX:] $$N_{h}<N_{g},$$the interference of the previous symbol data to the current data block can be eliminated. The UW sequence consists of CAZAC sequences, which can be obtained by Eq. (1). CAZAC sequences has good correlation and broadband and stationary frequency response, so it can be used as UW suffix, frequency offset tracking and frequency domain equalization [15].
SC-FDE system block diagram shown in Fig. 2. In the transmitter path, the binary input data is firstly encoded by convolution. After coding, the binary values are converted into QAM (quadrature amplitude modulation) values, and a guard period is added between successive blocks. After pulse shaping (rootraised- cosine pulses) and digital-to-analog conversion, the resulting I/Q signals are up-converted to RF. In the receiver path, after passing the RF part and the analog-to-digital conversion, the receiver uses a special frame structure for time frequency synchronization and channel estimation. The received data that removes the prefix is then transformed into the frequency domain, and the channel state information (CSI) obtained by channel estimation is used for frequency domain equalization. Finally, the received data is converted to the time domain, and the QAM mapping and Viterbi decoding are performed [16].
Assuming that the transmitted data is xn and the channel impulse response is hn, when the number of symbols is less than the length of the impulse response, each received data symbol may be expressed as:
where [TeX:] $$\mathcal{V}_{n}$$ is the additive Gaussian white noise, [TeX:] $$\otimes$$ is the convolution symbol. After FFT transformation, the frequency domain receive signal [TeX:] $$Y_{k}$$ is expressed as:
where [TeX:] $$H_{k}$$ is the channel frequency domain response and [TeX:] $$V_{k}$$ is the noise frequency domain response. It is assumed that both time frequency synchronization and channel estimation are ideal, the equilibrium structure of the judgment feedback is used, the signal before the decision is:
where [TeX:] $$W_{k}$$ is the feedforward frequency domain filter and [TeX:] $$C_{k}$$ is the coefficient of the feedback time domain filter [17].
3. Timing Synchronization Scheme
3.1 Preamble Sequence Structure for Timing Synchronization
As shown in Fig. 3, the preamble sequence contains short preamble and long preamble. The short preamble consists of 8 repetitive short-training symbols A, each of which has 32 sampling points. The entire short preamble has 256 sampling points, which can complete the automatic gain control (AGC), coarse synchronization, coarse frequency offset estimation and other functions [18]. The long preamble consists of four repetitive long-training symbols C, each of which has 64 sampling points. The entire long preamble has 256 sampling points and can be used for fine synchronization, fine frequency offset estimation and channel estimation. Both the short and long preamble are composed of CAZAC sequences.
Preamble sequence of SC-FDE.
3.2 Coarse Synchronization Algorithm
The coarse synchronization can be realized by the time delay autocorrelation algorithm, and the block diagram of the algorithm is shown in Fig. 4. The window C is the correlation coefficient between the received signal and the received signal after D time delay. The delay [TeX:] $$z^{-D}$$ is equal to the period of the short preamble, where D = 32. The window P calculates the energy of the received signal during the correlation coefficient window. The resulting energy values are used for normalizing the decision statistics, so that the decision variable mn is independent of the received power.
Block diagram of delay autocorrelation algorithm.
The value of delay correlation [TeX:] $$C_{n}$$ is as follows:
Eq. (5) is the cross-correlation between the currently received L data, and the L data received before D times.
The value of the received signal energy [TeX:] $$P_{n}$$ can be expressed as:
The decision variable [TeX:] $$m_{n}$$ of the delay correlation algorithm is:
At low SNR conditions, the decision variable [TeX:] $$m_{n}$$ may be affected by a larger random noise in the channel and exceeds the preset threshold, so that the receiver cannot correctly judge the arrival of data. Therefore, in order to reduce the false alarm probability and improve the reliability of the coarse synchronization algorithm under the condition of low SNR, it can increase the requirement of keeping the length on the basis of the delay correlation algorithm. As shown in Fig. 5, the decision variable mn is required to maintain a certain number of sampling cycles above the preset threshold to determine that coarse synchronization is completed, thus avoiding the effect of large random noise.
Simulation diagram of coarse synchronization algorithm.
3.3 Fine Synchronization Algorithm
The coarse synchronization algorithm based on delay correlation and length retention can only provide a rough estimate of the start of the received data, while the exact timing of the received data is done by fine synchronization. The fine synchronization algorithm is implemented by the cross-correlation between the received data and the local sequence. The correlation coefficient is obtained as follows:
where superscript * indicates conjugation, M is the length of the correlation coefficient, and the size of the M determines the performance of the fine synchronization algorithm. The greater the value of M, the better the synchronization performance, the greater the amount of calculation. M = 64 is the length of the long training symbol
The peak value of [TeX:] $$\left|C_{k}\right|$$ represents the end point of a long training symbol, with this feature, you can find the end point of all long training symbols in the long preamble. The last peak time of the [TeX:] $$\left|C_{k}\right|$$ is the end point of the long preamble.
Before timing synchronization, the received signal is not corrected for frequency offset, because the crystal stability of the general hardware is higher. After simulation, we can know that the cross-correlation method can achieve precise synchronization under the small frequency offset. If the frequency offset is very large, the frequency offset estimation and compensation can be made by short training after coarse synchronization. In the additive Gaussian white noise (AWGN) channel with a SNR of 10 dB, the MATLAB simulation results of the amplitude of the correlation coefficient between the received signal and the locally known long-training symbol are shown in Fig. 6. As can be seen from Fig. 6, the receiver includes 300 noise before the arrival of the data symbol, and 4 peaks can be obtained by cross-correlation between the long training sequence and the received data. According to the frequency and the order of the peak value, the synchronous ordinal position can be obtained.
Simulation diagram of fine synchronization algorithm.
4. Timing Synchronization of FPGA Design
4.1 FPGA Design of Coarse Synchronization Algorithm
As shown in Eq. (9), coarse synchronization decision variable mn is not only larger than the threshold [TeX:] $$T_{h},$$ but also must keep a certain length of time. Referring to the experience values of multiple simulations, the retention time value T of this article is 50.
A large number of complex multipliers are used in the synchronization process. As shown in Eq. (10), the implementation of the usual complex multiplication requires 4 multipliers. In order to reduce the use of multipliers, we convert (10) into Eqs. (11) and (12). As shown in Fig. 7, only three multipliers and some additional addition and subtraction operations can achieve the multiplication of complex data.
Complex multiplication structure.
Correlation value accumulation calculation mainly completes the accumulation calculation of the correlation value within the length of the delay window. In order to reduce hardware overhead, idea of sliding window accumulation is considered in hardware implementation. Sliding window structure shown in Fig. 8.
Structure of sliding window accumulation algorithm.
First, the correlation values are sent to a shift register of depth 32, and the data delay is achieved by shift registers. Then send the current correlation value and the delay 32 level correlation value to the accumulation window to achieve accumulation. The sliding window accumulation equation is as follows:
In Eq. (13), [TeX:] $$r_{n+D}^{*} r_{n+2 D}^{*}$$ is the current correlation coefficient, and [TeX:] $$r_{n}^{*} r_{n+D}^{*}$$ is the correlation coefficient after the delay of stage 32.
The calculation of [TeX:] $$\left|C_{n}\right|$$ in Eq. (9) is not conducive to hardware implementation, so set [TeX:] $$C_{n}=a_{n}+j b_{n}$$ and do the following approximate:
After approximate simplification is completed, the estimated value [TeX:] $$\left|C_{n}\right|$$ s slightly larger than the actual value, so the threshold [TeX:] $$T_{h}$$ must be adjusted slightly.
Directly implementing the Eq. (9) on the FPGA requires an additional divider. Since the value of [TeX:] $$T_{h}$$ is predetermined, conversion of Eqs. (9)–(15) can avoid division operations. In order to save multipliers, let [TeX:] $$T_{h}=0.5, P_{n}$$ multiply by 0.5 can be achieved by right shift one bit.
where >> on behalf of the right shift operation.
Based on the above analysis, coarse synchronization hardware structure shown in Fig. 9. Hardware implementation structure of coarse synchronization algorithm mainly includes five modules: data buffer, master control, delay correlation energy calculation, correlation window energy calculation, and frame search. The data buffer module implements the caching of the input data for waiting detection. The main control module outputs the corresponding control instructions to the data buffer module according to the current state of the system and the output result of the frame search module. Delay correlation energy calculation, correlation window energy calculation and frame search three modules constitute the main body of the delay correlation algorithm, which completes coarse synchronization and feedback to the main control module.
Structure diagram of coarse synchronous hardware.
4.2 FPGA Design of Fine Synchronization Algorithm
When implementing algorithms for fine synchronization, there are two points to consider based on the complexity of the hardware circuit:
- Generally, the method of searching for peak value usually uses the method of searching maximum value, which requires more complicated logic circuit and control circuit for hardware implementation. Therefore, from the point of view of simplifying hardware design, this paper adopts the method of setting threshold. When the value of |Ck| exceeds the preset threshold value, the peak is found.
- The implementation of the fine synchronization algorithm requires 64 complex multipliers, which take up the valuable multiplier resources on the FPGA and are also detrimental to the speed of the operation. In the hardware circuit, in order to save all of the multiplier and improve the speed of the system, the received signal is quantized as {+1, -1}.
Fig. 10 is a the MATLAB simulation results of the amplitude of the correlation coefficient between the quantized received signal and the known long training symbol in the white noise (AWGN) channel with a SNR of 10 dB. Likewise, the received signal here also includes 300 noise points before the arrival of the data symbol. After quantization, the noise is slightly higher, but there are still obvious peaks.
Based on the above analysis, Fine synchronization of the hardware structure shown in Fig. 11. The hardware implementation of the fine synchronization algorithm can be divided into three parts: quantization, matching filtering, and symbol output.
Correlation value accumulation calculation are the first part of the matched filtering module, which mainly realizes the correlation value accumulation of the received data with the local short training symbol. First, the quantized received signal is shifted into the shift register and then multiplied by the 64 Local training symbols. In the specific hardware implementation, the received data after quantization is limited to four cases: [TeX:] $$1+j, 1-j,-1+j \text { and }-1-j.$$ So the multiplication can be simply implemented by the addition operation, thus saving the hardware overhead. If the local long training symbol is sampled as a + jb, the cross-correlation operation corresponding to the four cases is:
Using the quantitative processing of fine synchronization algorithm simulation diagram.
Fine synchronization hardware implementation diagram.
As can be seen from Fig. 12, the method of quantizing the received signal will have a slight effect on the timing synchronization performance at low SNR. After 2,000 independent experiments, the simulation results show that the timing accuracy can reach more than 85% in the AWGN multipath channels of SNR = 3. When the SNR = 8 or more, in AWGN multipath channels, timing accuracy can reach almost 100%. So, the impact of quantization on synchronization performance is negligible.
5. Simulation Results
On the basis of the above analysis and discussion, the program is written on Quartus-II development software, and then the timing synchronization algorithm is simulated on ModelSim software.
Comparison of synchronization performance before and after quantization.
ModelSim simulation result of coarse synchronization module.
The coarse synchronization simulation diagram is shown in Fig. 13. DataARe and DataAIm are the real part and the imaginary part of the current data, DataBRe and DataBIm are the real and imaginary parts of the data after the delay of level 32, SumDelayCorrelationMagnituder is the delay correlation energy value [TeX:] $$\left|C_{n}\right|$$ and SumMagnituder is the correlation window energy value [TeX:] $$P_{n}.$$ Eq. (9) shows that mn is a decision variable, and [TeX:] $$T_{h}$$ is a threshold. BufferDetection is used to calculate the continuous points of [TeX:] $$m_{n}$$ greater than threshold [TeX:] $$T_{h}$$ by means of the shift method. When BufferDetection = 50'h3ffffffffffff, that is, the decision variable [TeX:] $$m_{n}$$ keeps the number of 50 sampling periods above the preset threshold value 0.5, satisfying the condition of keeping length in Fig. 5. At this point, the coarse synchronization is judged to be completed. The fine synchronization simulation is shown in Fig. 14. Where DataInRe and DataInIm represent the real and imaginary parts of the input data, QuantizationRe and QuantizationIm are the real and imaginary parts of the quantized data, CorrelationSumRe and CorrelationSumIm are cumulative value of correlation calculation for the quantified input data and local 64 long-training symbol sampling, STS_end_counter is the number of detected peaks. The received data after quantization is correlated with the local 64 long-training symbols sampling points. When STS_end_counter = 4, these 4 peaks correspond to the four peaks in the Fig. 10. At this point, the fine synchronization is judged to be completed. After that, the long training symbols and data symbols are serially output. The hardware design is based on Altera’s Cyclone V series 5CGXFC5C6F27C7N chip, occupying resources as shown in Table 2.
ModelSim simulation results of the fine synchronization module.
Algorithm major resource occupancy
From Table 2 we can see that the timing algorithm occupies DSPBlock 4, occupancy rate of 3%. This shows that the algorithm can save a lot of hardware resources and is conducive to practical engineering applications.
6. Conclusion
In this paper, the coarse synchronization algorithm and the fine synchronization algorithm of SC-FDE timing synchronization are analyzed, and the FPGA design scheme of timing synchronization algorithm is given. In order to save the hardware resources and improve the speed of operation, the sliding window accumulation, quantization processing and amplitude simplification are used in the hardware implementation. As can be seen from the simulation Fig. 12, the timing accuracy can reach more than 85% in the AWGN multipath channels of SNR = 3. When the SNR = 8 or more, in AWGN multipath channels, timing accuracy can reach almost 100%. The simulation results verify that the synchronization algorithm has good performance in the actual hardware environment. The timing synchronization algorithm implemented on FPGA in this paper is more suitable for applications in high SNR scenes. The future research direction is to improved synchronization algorithm and implement the synchronization algorithm with higher timing accuracy under low SNR on FPGA hardware platform.
Acknowledgement
This paper is supported by National Key Technology Research and Development Program of the Ministry of Science and Technology of China (No. 2015BAK05B01).