## Suyuan Ji* , Chao Chen* and Yu Zhang**## |

Parameter | Specification |
---|---|

Channel bandwidth (MHz) | 2.5 |

Sampling interval ([TeX:] $$\mu \mathrm{s}$$) | 0.4 |

Mapping method | 16 QAM |

Data rate (Mbps) | 10 |

Data block length, [TeX:] $$N_{\text {total}}$$ | 256 |

FFT size, [TeX:] $$N_{F F T}$$ | 256 |

UW length, [TeX:] $$N_{g}\left(T_{g}\right)$$ | 32 |

Frame length | 1,000 symbols |

Multipath delay, t ([TeX:] $$\mu S$$) | [0 0.4 0.9] |

Omnidirectional antenna power, P (dB) | [0 -5 -10] |

Doppler frequency shift, fm (Hz) | [0.4 0.3 0.5] |

Fig. 1 is the frame structure of SC-FDE. Each data block adds the same unique word (UW) suffix to form a symbol [13,15] . The UW suffix that is added to each data block is the same, so each symbol has ideal periodicity. When the maximum delay [TeX:] $$N_{h}$$ of the channel is within the UW length range of [TeX:] $$N_{g},$$ that is [TeX:] $$N_{h}<N_{g},$$the interference of the previous symbol data to the current data block can be eliminated. The UW sequence consists of CAZAC sequences, which can be obtained by Eq. (1). CAZAC sequences has good correlation and broadband and stationary frequency response, so it can be used as UW suffix, frequency offset tracking and frequency domain equalization [15].

SC-FDE system block diagram shown in Fig. 2. In the transmitter path, the binary input data is firstly encoded by convolution. After coding, the binary values are converted into QAM (quadrature amplitude modulation) values, and a guard period is added between successive blocks. After pulse shaping (rootraised- cosine pulses) and digital-to-analog conversion, the resulting I/Q signals are up-converted to RF. In the receiver path, after passing the RF part and the analog-to-digital conversion, the receiver uses a special frame structure for time frequency synchronization and channel estimation. The received data that removes the prefix is then transformed into the frequency domain, and the channel state information (CSI) obtained by channel estimation is used for frequency domain equalization. Finally, the received data is converted to the time domain, and the QAM mapping and Viterbi decoding are performed [16].

Assuming that the transmitted data is xn and the channel impulse response is hn, when the number of symbols is less than the length of the impulse response, each received data symbol may be expressed as:

where [TeX:] $$\mathcal{V}_{n}$$ is the additive Gaussian white noise, [TeX:] $$\otimes$$ is the convolution symbol. After FFT transformation, the frequency domain receive signal [TeX:] $$Y_{k}$$ is expressed as:

where [TeX:] $$H_{k}$$ is the channel frequency domain response and [TeX:] $$V_{k}$$ is the noise frequency domain response. It is assumed that both time frequency synchronization and channel estimation are ideal, the equilibrium structure of the judgment feedback is used, the signal before the decision is:

where [TeX:] $$W_{k}$$ is the feedforward frequency domain filter and [TeX:] $$C_{k}$$ is the coefficient of the feedback time domain filter [17].

As shown in Fig. 3, the preamble sequence contains short preamble and long preamble. The short preamble consists of 8 repetitive short-training symbols A, each of which has 32 sampling points. The entire short preamble has 256 sampling points, which can complete the automatic gain control (AGC), coarse synchronization, coarse frequency offset estimation and other functions [18]. The long preamble consists of four repetitive long-training symbols C, each of which has 64 sampling points. The entire long preamble has 256 sampling points and can be used for fine synchronization, fine frequency offset estimation and channel estimation. Both the short and long preamble are composed of CAZAC sequences.

The coarse synchronization can be realized by the time delay autocorrelation algorithm, and the block diagram of the algorithm is shown in Fig. 4. The window C is the correlation coefficient between the received signal and the received signal after D time delay. The delay [TeX:] $$z^{-D}$$ is equal to the period of the short preamble, where D = 32. The window P calculates the energy of the received signal during the correlation coefficient window. The resulting energy values are used for normalizing the decision statistics, so that the decision variable mn is independent of the received power.

The value of delay correlation [TeX:] $$C_{n}$$ is as follows:

Eq. (5) is the cross-correlation between the currently received L data, and the L data received before D times.

The value of the received signal energy [TeX:] $$P_{n}$$ can be expressed as:

The decision variable [TeX:] $$m_{n}$$ of the delay correlation algorithm is:

At low SNR conditions, the decision variable [TeX:] $$m_{n}$$ may be affected by a larger random noise in the channel and exceeds the preset threshold, so that the receiver cannot correctly judge the arrival of data. Therefore, in order to reduce the false alarm probability and improve the reliability of the coarse synchronization algorithm under the condition of low SNR, it can increase the requirement of keeping the length on the basis of the delay correlation algorithm. As shown in Fig. 5, the decision variable mn is required to maintain a certain number of sampling cycles above the preset threshold to determine that coarse synchronization is completed, thus avoiding the effect of large random noise.

The coarse synchronization algorithm based on delay correlation and length retention can only provide a rough estimate of the start of the received data, while the exact timing of the received data is done by fine synchronization. The fine synchronization algorithm is implemented by the cross-correlation between the received data and the local sequence. The correlation coefficient is obtained as follows:

where superscript * indicates conjugation, M is the length of the correlation coefficient, and the size of the M determines the performance of the fine synchronization algorithm. The greater the value of M, the better the synchronization performance, the greater the amount of calculation. M = 64 is the length of the long training symbol

The peak value of [TeX:] $$\left|C_{k}\right|$$ represents the end point of a long training symbol, with this feature, you can find the end point of all long training symbols in the long preamble. The last peak time of the [TeX:] $$\left|C_{k}\right|$$ is the end point of the long preamble.

Before timing synchronization, the received signal is not corrected for frequency offset, because the crystal stability of the general hardware is higher. After simulation, we can know that the cross-correlation method can achieve precise synchronization under the small frequency offset. If the frequency offset is very large, the frequency offset estimation and compensation can be made by short training after coarse synchronization. In the additive Gaussian white noise (AWGN) channel with a SNR of 10 dB, the MATLAB simulation results of the amplitude of the correlation coefficient between the received signal and the locally known long-training symbol are shown in Fig. 6. As can be seen from Fig. 6, the receiver includes 300 noise before the arrival of the data symbol, and 4 peaks can be obtained by cross-correlation between the long training sequence and the received data. According to the frequency and the order of the peak value, the synchronous ordinal position can be obtained.

As shown in Eq. (9), coarse synchronization decision variable mn is not only larger than the threshold [TeX:] $$T_{h},$$ but also must keep a certain length of time. Referring to the experience values of multiple simulations, the retention time value T of this article is 50.

A large number of complex multipliers are used in the synchronization process. As shown in Eq. (10), the implementation of the usual complex multiplication requires 4 multipliers. In order to reduce the use of multipliers, we convert (10) into Eqs. (11) and (12). As shown in Fig. 7, only three multipliers and some additional addition and subtraction operations can achieve the multiplication of complex data.

Correlation value accumulation calculation mainly completes the accumulation calculation of the correlation value within the length of the delay window. In order to reduce hardware overhead, idea of sliding window accumulation is considered in hardware implementation. Sliding window structure shown in Fig. 8.

First, the correlation values are sent to a shift register of depth 32, and the data delay is achieved by shift registers. Then send the current correlation value and the delay 32 level correlation value to the accumulation window to achieve accumulation. The sliding window accumulation equation is as follows:

In Eq. (13), [TeX:] $$r_{n+D}^{*} r_{n+2 D}^{*}$$ is the current correlation coefficient, and [TeX:] $$r_{n}^{*} r_{n+D}^{*}$$ is the correlation coefficient after the delay of stage 32.

The calculation of [TeX:] $$\left|C_{n}\right|$$ in Eq. (9) is not conducive to hardware implementation, so set [TeX:] $$C_{n}=a_{n}+j b_{n}$$ and do the following approximate:

After approximate simplification is completed, the estimated value [TeX:] $$\left|C_{n}\right|$$ s slightly larger than the actual value, so the threshold [TeX:] $$T_{h}$$ must be adjusted slightly.

Directly implementing the Eq. (9) on the FPGA requires an additional divider. Since the value of [TeX:] $$T_{h}$$ is predetermined, conversion of Eqs. (9)–(15) can avoid division operations. In order to save multipliers, let [TeX:] $$T_{h}=0.5, P_{n}$$ multiply by 0.5 can be achieved by right shift one bit.

where >> on behalf of the right shift operation.

Based on the above analysis, coarse synchronization hardware structure shown in Fig. 9. Hardware implementation structure of coarse synchronization algorithm mainly includes five modules: data buffer, master control, delay correlation energy calculation, correlation window energy calculation, and frame search. The data buffer module implements the caching of the input data for waiting detection. The main control module outputs the corresponding control instructions to the data buffer module according to the current state of the system and the output result of the frame search module. Delay correlation energy calculation, correlation window energy calculation and frame search three modules constitute the main body of the delay correlation algorithm, which completes coarse synchronization and feedback to the main control module.

When implementing algorithms for fine synchronization, there are two points to consider based on the complexity of the hardware circuit:

- Generally, the method of searching for peak value usually uses the method of searching maximum value, which requires more complicated logic circuit and control circuit for hardware implementation. Therefore, from the point of view of simplifying hardware design, this paper adopts the method of setting threshold. When the value of |Ck| exceeds the preset threshold value, the peak is found.

- The implementation of the fine synchronization algorithm requires 64 complex multipliers, which take up the valuable multiplier resources on the FPGA and are also detrimental to the speed of the operation. In the hardware circuit, in order to save all of the multiplier and improve the speed of the system, the received signal is quantized as {+1, -1}.

Fig. 10 is a the MATLAB simulation results of the amplitude of the correlation coefficient between the quantized received signal and the known long training symbol in the white noise (AWGN) channel with a SNR of 10 dB. Likewise, the received signal here also includes 300 noise points before the arrival of the data symbol. After quantization, the noise is slightly higher, but there are still obvious peaks.

Based on the above analysis, Fine synchronization of the hardware structure shown in Fig. 11. The hardware implementation of the fine synchronization algorithm can be divided into three parts: quantization, matching filtering, and symbol output.

Correlation value accumulation calculation are the first part of the matched filtering module, which mainly realizes the correlation value accumulation of the received data with the local short training symbol. First, the quantized received signal is shifted into the shift register and then multiplied by the 64 Local training symbols. In the specific hardware implementation, the received data after quantization is limited to four cases: [TeX:] $$1+j, 1-j,-1+j \text { and }-1-j.$$ So the multiplication can be simply implemented by the addition operation, thus saving the hardware overhead. If the local long training symbol is sampled as a + jb, the cross-correlation operation corresponding to the four cases is:

As can be seen from Fig. 12, the method of quantizing the received signal will have a slight effect on the timing synchronization performance at low SNR. After 2,000 independent experiments, the simulation results show that the timing accuracy can reach more than 85% in the AWGN multipath channels of SNR = 3. When the SNR = 8 or more, in AWGN multipath channels, timing accuracy can reach almost 100%. So, the impact of quantization on synchronization performance is negligible.

On the basis of the above analysis and discussion, the program is written on Quartus-II development software, and then the timing synchronization algorithm is simulated on ModelSim software.

The coarse synchronization simulation diagram is shown in Fig. 13. DataARe and DataAIm are the real part and the imaginary part of the current data, DataBRe and DataBIm are the real and imaginary parts of the data after the delay of level 32, SumDelayCorrelationMagnituder is the delay correlation energy value [TeX:] $$\left|C_{n}\right|$$ and SumMagnituder is the correlation window energy value [TeX:] $$P_{n}.$$ Eq. (9) shows that mn is a decision variable, and [TeX:] $$T_{h}$$ is a threshold. BufferDetection is used to calculate the continuous points of [TeX:] $$m_{n}$$ greater than threshold [TeX:] $$T_{h}$$ by means of the shift method. When BufferDetection = 50'h3ffffffffffff, that is, the decision variable [TeX:] $$m_{n}$$ keeps the number of 50 sampling periods above the preset threshold value 0.5, satisfying the condition of keeping length in Fig. 5. At this point, the coarse synchronization is judged to be completed. The fine synchronization simulation is shown in Fig. 14. Where DataInRe and DataInIm represent the real and imaginary parts of the input data, QuantizationRe and QuantizationIm are the real and imaginary parts of the quantized data, CorrelationSumRe and CorrelationSumIm are cumulative value of correlation calculation for the quantified input data and local 64 long-training symbol sampling, STS_end_counter is the number of detected peaks. The received data after quantization is correlated with the local 64 long-training symbols sampling points. When STS_end_counter = 4, these 4 peaks correspond to the four peaks in the Fig. 10. At this point, the fine synchronization is judged to be completed. After that, the long training symbols and data symbols are serially output. The hardware design is based on Altera’s Cyclone V series 5CGXFC5C6F27C7N chip, occupying resources as shown in Table 2.

Table 2.

Resource | Resource occupation |
---|---|

Logicutilization (inALMs) | 1027/29080 (3%) |

Registers | 1272 |

Memorybits | 3262/4567040 (<1%) |

DSPBlock | 4/150 (3%) |

From Table 2 we can see that the timing algorithm occupies DSPBlock 4, occupancy rate of 3%. This shows that the algorithm can save a lot of hardware resources and is conducive to practical engineering applications.

In this paper, the coarse synchronization algorithm and the fine synchronization algorithm of SC-FDE timing synchronization are analyzed, and the FPGA design scheme of timing synchronization algorithm is given. In order to save the hardware resources and improve the speed of operation, the sliding window accumulation, quantization processing and amplitude simplification are used in the hardware implementation. As can be seen from the simulation Fig. 12, the timing accuracy can reach more than 85% in the AWGN multipath channels of SNR = 3. When the SNR = 8 or more, in AWGN multipath channels, timing accuracy can reach almost 100%. The simulation results verify that the synchronization algorithm has good performance in the actual hardware environment. The timing synchronization algorithm implemented on FPGA in this paper is more suitable for applications in high SNR scenes. The future research direction is to improved synchronization algorithm and implement the synchronization algorithm with higher timing accuracy under low SNR on FPGA hardware platform.

He received M.S. and Ph.D. degrees from Communication University of China in 2006 and 2010, respectively. He worked in the Engineering Center of Radio and Television Digital Education Ministry of Communication University of China, and got the title of associate researcher in March 2014. His current research interests include wireless communication system, wireless digital broadcasting system, audio/image processing.

He received M.S. degrees from Beijing University of Technology in 2011. He worked in the Communication and Technology Bureau of Xinhua News Agency. His current research interests include design and application of wireless multimedia communication technology, streaming media video coding and transmission technology in the news and media industry.

- 1 J. Coon, J. Siew, M. Beach, A. Nix, S. Armour, J. McGeehan, "A comparison of MIMO-OFDM and MIMO-SCFDE in WLAN environments," in
*Proceedings of IEEE Global Telecommunications Conference (GLOBECOM'03) (IEEE Cat. No. 03CH37489)*, San Francisco, CA, 2003;pp. 3296-3301. custom:[[[-]]] - 2 J. J. Van De Beek, M. Sandell, M. Isaksson, P. O. Borjesson, "Low-complex frame synchronization in OFDM systems," in
*Proceedings of the 4th IEEE International Conference on Universal Personal Communications*, Tokyo, Japan, 1995;pp. 982-986. custom:[[[-]]] - 3 Y. Guo, G. Liu, J. Ge, "A novel time and frequency synchronization scheme for OFDM systems,"
*IEEE Transactions on Consumer Electronics*, vol. 54, no. 2, pp. 321-325, 2008.doi:[[[10.1109/TCE.2008.4560093]]] - 4 J. Meng, G. Kang, "A novel OFDM synchronization algorithm based on CAZAC sequence," in
*Proceedings of 2010 International Conference on Computer Application and System Modeling*, Taiyuan, China, 2010;pp. 634-637. custom:[[[-]]] - 5 G. Ren, Y. Chang, H. Zhang, H. Zhang, "Synchronization method based on a new constant envelop preamble for OFDM systems,"
*IEEE Transactions on Broadcasting*, vol. 51, no. 1, pp. 139-143, 2005.doi:[[[10.1109/TBC.2004.842520]]] - 6 Y. Zhu, H. Zhang, Y. Luo, "An OFDM timing and frequency synchronization algorithm based on CAZAC sequence,"
*Computer Simulation*, vol. 26, no. 11, pp. 130-133, 2009.custom:[[[-]]] - 7 P. Y. Tsai, H. Y. Kang, T. D. Chiueh, "Joint weighted least-squares estimation of carrier-frequency offset and timing offset for OFDM systems over multipath fading channels,"
*IEEE Transactions on Vehicular Technology*, vol. 54, no. 1, pp. 211-223, 2005.doi:[[[10.1109/TVT.2004.838891]]] - 8 T. Lv, H. Li, J. Chen, "Joint estimation of symbol timing and carrier frequency offset of OFDM signals over fast time-varying multipath channels,"
*IEEE Transactions on Signal Processing*, vol. 53, no. 12, pp. 4526-4535, 2005.doi:[[[10.1109/TSP.2005.859233]]] - 9 T. M. Schmidl, D. C. Cox, "Robust frequency and timing synchronization for OFDM,"
*IEEE Transactions on Communications*, vol. 45, no. 12, pp. 1613-1621, 1997.doi:[[[10.1109/26.650240]]] - 10
*IEEE Standard for Local and Metropolitan Area Networks - Part 16: Air Interface for Fixed Broad-Band Wireless Access Systems, IEEE 802*, 16-REVd/D5 as IEEE 802.16-2004, 2004.custom:[[[-]]] - 11 W. Nie, H. Jin, H. Yan, "Time synchronization algorithm for MIMO-OFDM system,"
*Communication Technology*, vol. 49, no. 3, pp. 374-377, 2016.custom:[[[-]]] - 12 M. Xia, D. Rouseff, J. A. Ritcey, X. Zou, C. Polprasert, W. Xu, "Underwater acoustic communication in a highly refractive environment using SC–FDE,"
*IEEE Journal of Oceanic Engineering*, vol. 39, no. 3, pp. 491-499, 2013.doi:[[[10.1109/JOE.2013.2257232]]] - 13 X. Liao, Y. Bai, "Improved symbol timing synchronization algorithms for SC-FDE systems," in
*Proceedings of 2013 3rd International Conference on Consumer Electronics*, Communications and Networks, Xianning, China, 2013;pp. 363-366. custom:[[[-]]] - 14 C. Feng, J. Zhang, Y. Zhang, M. Xia, "A novel timing synchronization method for MIMO OFDM systems," in
*Proceedings of IEEE Vehicular Technology Conference*, Singapore, 2008;pp. 913-917. custom:[[[-]]] - 15
*IEEE 802.16 Broadband Wireless Access Working Group, 2003;*, http://www.ieee802.org/16/tga/docs/80216a-03_01.pdf - 16 S. Yoshizawa, H. Tanimoto, T. Saito, "SC-FDE vs OFDM: Performance comparison in shallow-sea underwater acoustic communication," in
*Proceedings of 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS)*, Phuket, Thailand, 2016;pp. 1-5. custom:[[[-]]] - 17 C. Chen, M. Zhao, W. Chen, "Timing synchronization for SC-FDE,"
*Journal of Zhejiang University (Engineering Science)*, vol. 41, no. 3, pp. 445-449, 2007.custom:[[[-]]] - 18 M. Huemer, H. Witschnig, J. Hausner, "Unique word based phase tracking algorithms for SC/FDE-systems," in
*Proceedings of IEEE Global Telecommunications Conference (IEEE Cat. No. 03CH37489)*, San Francisco, CA, 2003;pp. 70-74. custom:[[[-]]]