Wei-Xin Yao* , Dan Yang** , Gui-Fu Lu** and Jun Wang**A Fast Rough Mode Decision Algorithm for HEVCAbstract: HEVC is the high efficiency video coding standard, which provides better coding efficiency contrasted with the other video coding standard. But at the same time the computational complexity increases drastically. Thirty-five kinds of intra-prediction modes are defined in HEVC, while 9 kinds of intra prediction modes are defined in H.264/AVC. This paper proposes a fast rough mode decision (RMD) algorithm which adopts the smoothness of the up-reference pixels and the left-reference pixels to decrease the computational complexity. The three step search method is implemented in RMD process. The experimental results compared with HM13.0 indicate that the proposed algorithm can save 39.7% of the encoding time, while Bjontegaard delta bitrate (BDBR) is increased slightly by 1.35% and Bjontegaard delta peak signal-to-noise ratio (BDPSNR) loss is negligible. Keywords: HEVC , Intra Prediction , Rough Mode Decision , Video Coding 1. IntroductionHigh efficiency video coding (HEVC) standard uses various coding tool to enhance compression performance. Among them, intra prediction introduces 35 angular prediction modes and a flexible coding unit (CU) size based on quadtree. Coding tree unit (CTU) is defined in HEVC, whose size is 64×64. As root of encoding unit, CTU is divided into smaller CUs. A CU is further divided into prediction unit (PU) and transform unit (TU) [1]. To decide the optimal prediction mode and CU size, HEVC makes use of rate distortion optimization (RDO). However, HEVC increases computational burden compared with H.264. At present, a research hotspot is focused on the research of fast intra prediction mode. The research works are lying in two aspects: fast intra prediction mode decision algorithms (FIPMD) and the fast CU or (PU) size decision algorithms [2]. FIPMD algorithms decrease the number of candidate modes before compute RD cost. In [3], the authors proposed improved FIPMD based on gradient. The prediction mode is decided by the texture direction, which is denoted by the intensity gradient. In [4], progressive rough mode search (PRMS) which is based on the calculation of Hadamard cost is proposed. Rate-distortion optimized quantization (RDOQ) is implemented by fewer effective candidates, which are selected by PRMS. In [5], edge of the PU was also used to speed-up the best selection procedure. In order to obtain the minimal direction differences, Wang and Siu [6] made use of the features of every directional prediction mode to calculate the strength of directional errors of initial spatial domain. Na et al. [7] made use of spatial correlation to detect edges. The candidate modes are adaptively chosen by the edge map. Lee et al. [8] analyzed the probability distribution of local binary patterns to obtain textural information of the most probable mode. The fast CU or PU size decision algorithms can skip unnecessary CU size partition [9-14]. In [9], the proposed method makes use of motion homogeneity to decide CU depth range. Unnecessary depth levels do not need to be traversed. In [10], according to regular updates of the statistical parameter, CU depth decision was early terminated. In [11], a novel CU size decision strategy is presented by the authors, which is based on the local and global edge complexities. In order to reduce complexity of the screen content coding, Duanmu et al. [12] proposed machine learning based on CU statistics and sub-CU homogeneity. Yu et al. [13] adopted convolution neural network to reduce the amount of CU/PU candidate modes, which decrease the complexity of the real-time hardwired encoder. According to the depth of adjacent CUs, CU size decision was early terminated. In terms of correlation between the current layer mode and the higher layer mode in PU, the authors propose a new fast PU mode decision algorithm [14]. In this paper, we analyze the reference pixels of the PU to exclude some angular intra prediction modes before intra prediction. The amount of candidate modes is decreased from 35 to 19. The three step search method is implemented by 19 modes. The proposed algorithm has many advantages such as easy implement, good performance and high robustness. The rest of the paper is arranged as followed. A brief review of intra coding process is described in Section 2. In Section 3, a fast rough mode decision algorithm is presented. Experiments and the corresponding result analysis of the proposed algorithm are presented in Section 4. Finally, some inspiring conclusions are given in Section 5. 2. INTRA-PREDICTION IN HEVCHEVC makes use of quadtree structure to split large code unit (LCU) into CU sizes ranging from 8×8 to 64×64. The numbers denote depth levels in Fig. 1. Depth 3, 2, 1, and 0 correspond to the CU size of 8×8, 16×16, 32×32, and 64×64, respectively. A CU is divided into PUs, which are used as basic units for intra-prediction. According to the adjacent decoded pixels, the predicted pixels are obtained by linear interpolation. Two PU sizes are M×M and 2M×2M. Therefore, the PU sizes range from 4×4 to 64×64. The CUs are also divided into TUs according to quadtree structure. TU is used for transformation and quantization. The TU sizes range from 32×32 to 4×4. Fig. 2 shows the process of dividing a CU of 32×32 size into PU and TU. 2.1 Rough Mode DecisionHEVC supports 35 intra-prediction modes which are shown in Fig. 3. The order number 0 and 1 represent plana mode and DC mode. The order numbers from 2 to 34 represent 33 angular prediction modes. The N most possible candidate modes are obtained by the rough mode decision (RMD). In the RMD processes, the Hadamard cost and the bit cost of 35 prediction modes are computed. N is related to PU size. N is 8, 8, 3, 3, and 3, respectively corresponding to the PU size of 4×4, 8×8, 16×16, 32×32, and 64×64. The N lowest Hadamard cost is selected to a subset of candidate, as defined in Eq. (1) [15].
where Hadamard transformation is a generalized Fourier transformation and SATD denotes the sum of absolute Hadamard transformed residual signal, Rmode represents bit costs for the prediction mode, and is the Lagrange multiplier. After the RMD processes, we apply RDO to select the optimal prediction mode for PU. RDO is implemented in the subset of candidate. The least RDO cost is selected by function (2).
SSE represents the sum of squared difference of the predicted against original PU. R and λ denote bit-rate for encoding process and Lagrange parameter, respectively. 2.2 Most Probable ModesIntra-prediction mode is fast decided by Zhao et al. [16]. According to spatial correlation, most probable modes (MPMs) of the current PU have a great correlation with the prediction modes of neighboring PUs. The optimal prediction mode of PU exists in the MPMs, which is very probable. Even in different types of test sequences, this probability fluctuation is very small. MPMs are inserted into the subset of candidate if modes of above and left neighboring PUs are not included in the subset of candidate. RDO is applied to N+MPM modes. 3. Fast Rough Mode DecisionHowever, 35 kinds of prediction mode are calculated in RMD process, which requires too much time. It is required that a fast RMD algorithm is proposed. Fig. 4 shows the reference pixels for the 4×4 block. When the reference pixels are the same, the current PU block is a smoothing block. Planar mode is selected, and the other modes are skipped. The Planar mode adopts the mean of linear interpolation in horizontal and vertical directions to obtain the prediction values of PUs. DC mode adopts the mean of reference pixels to obtain the prediction values of PUs. DC mode and planar mode are suitable for blocks containing smooth texture. However, planar mode is appropriate for blocks with gradient texture. The planar mode maintains the maximum continuity of block boundaries. We skip the unnecessary modes based on smoothness of the reference pixels. We can define the smoothness of the reference pixels as following [17].
(4)[TeX:] $$V a r_{l e f t}=\frac{1}{2 N+1} \sum_{K=0}^{2 N}\left(f_{k}^{l e f t}-U_{l e f t}\right)^{2}$$where Uup denotes average of the up-reference pixels, Uleft denotes average of the left-reference pixels, N represents size of the current block, [TeX:] $$f_{k}^{u p}$$ denotes the up-reference pixel associated with index K, and [TeX:] $$f_{k}^{l e f t}$$ denoes left-reference pixel associated with index K. If Varup is equal to Varleft, Planar mode is selected for the current block. If Varup is smaller than Varleft, the left-reference pixels are smoother than the up-reference pixels and RMD process is performed by 0–18 prediction modes. The three step search method is implemented by using [TeX:] $$0,1,2+4 \delta$$, where [TeX:] $$\delta \in[0,4] \text { or } \delta \in[4,8]$$. The subset of candidate mode is selected roughly by using step size 4. Then the subset of candidate mode is expanded by using step size 2 and 1. Taken 16×16 as an example, 0–18 prediction modes are selected according to smoothness of the reference pixels. Firstly, step size 4 for search is performed in range of mode index 0–18, which constituted the subset of candidate mode. Hadamard cost is calculated by using the subset and the ascending order of Hadamard cost is {10, 6, 14, 18, 2, 1, 0}. We select the three best mode as 1{10, 6, 14}. Secondly, the neighboring modes of two distances are added in {10,6,14}. Then, the subset is updated as {8, 10, 6, 4, 12, 14, 16, 18, 2, 1, 0} and selects the three best mode as 2{8,10,6}. At last, the neighboring modes of one distance are added in {8, 10, 6} and the subset is updated as {8, 9, 10, 7, 6, 11, 5, 4, 12, 14, 16, 18, 2, 1, 0}. RDO process is implemented by 3{8, 9, 10}. The three step search process is shown in the Fig. 5. 4. Experimental ResultsIt is performed on HM13.0 reference software that fast mode decision proposed. The experiment environment is Intel Core i7-6500 2.5 GHz and RAM of 4 GB. This paper makes use of 100 frames of the video sequences to experiment. It is presented by JCJ-VC the video sequences, containing five classes. Different QPs (37, 32, 27, 22) are tested in the experiment. To evaluate performance of the proposed algorithm, the parameters are measured according to Bjontegaard delta bitrate (BDBR), Bjontegaard delta peak signal-to-noise ratio (BDPSNR) and encoding time saving. BDBR and BDPSNR are used to represent the average differences. Encoding time saving represents the total time reduction in percentage compared with HM13.0. Encoding time saving is defined by formula (5).
where THM and Tprop present the encoding time which is required by HM13.0 and the proposed algorithm. Table 1 shows that experiment results compared between Gao algorithm and the proposed algorithm. The fast rough mode decision algorithm can save time from 30.0 to 39.7, while it increases BDBR by between 0.56 and 2.64, and decreases BDPSNR by between 0.01 and 1.01. Gao algorithm [18] increases less BDBR than our algorithm, but our algorithm can save more time. It is negligible that BDBR increase and BDPSNR loss. Table 1.
We can see the RD performances of the proposed algorithm are close to that of HM13.0 from Fig. 6. The encoding efficiency of the proposed algorithm is approximately equal to that of HM13.0, while the proposed algorithm can save more time. 5. ConclusionThe paper proposes the fast RMD decision algorithm. The proposed algorithm is implemented by using two methods. Firstly, we compare the upper and left reference pixels to decrease the amount of candidate modes from 35 to 19. Secondly, the three step search method is implemented in 19 candidate modes to reduce computational complexity. It is the final result that 12–15 modes are performed in RMD process instead of 35 modes. Our algorithm can save a lot of time, while RD performance is approximately unchanged. AcknowledgementThis work is supported by the National Natural Science Foundation of China (No. 61572033) and Key Projects of Natural Science Research Fund of Anhui Province (No. KJ2015A311), and Provincial Natural Science Research General Project of Anhui higher Education Promotion Plan in 2014 (No. TSKJ2014B07). BiographyGui-Fu Luhttps://orcid.org/0000-0001-9047-6579He received the B.S. degree in 1997 from Hefei University of Technology in China, the M.S. degree in 2004 from Hangzhou Institute of Electronics Engineering, and the Ph.D. degree in 2012 from Nanjing University of Science and Information, Since 2004, he has been teaching in the School of Computer Science and Information, Anhui Polytechnic University, Wuhu, China. His research interests include computer vision, digital image processing and pattern recognition. BiographyReferences
|