3. Proposed Scheme
3.1 Method Motivation
As we all know that there is a strong correlation between adjacent coding blocks of video images due to their successive texture characteristics. For example, for a coding block with size of 32×32, it can be divided into four coding blocks with 16×16 size in HEVC. If one or two in four coding blocks with 16×16 size above can be divided into multiple coding blocks with 8×8 size, the rest of coding blocks with 16×16 size are also possible to be divided into multiple coding blocks with 8×8 size. Table 1 is the average probability of statistical partition for a current CU when its one or more of its adjacent coding blocks are partitioned in different depth from 12 different HEVC official test sequences. The 12 different HEVC official test sequences include 6 class different resolutions: Class A (2560×1600), Class B (1920×1080), Class C (1280×720), Class D (1024×768), Class E (832×480) and Class F (416×240), and their frame rate and frame number are set to 25 and 100 in our experiment, respectively.
Average statistical probability of partition for a current CU
From Table 1, we can see that there is a strong partition correlation between adjacent CUs due to their successive texture characteristics of video images. Once the adjacent coding sub-blocks are segmented: in depth 1, the partition probability for the current CU is up to 79.1%; in depth 2, the partition probability is reduced to 49.5%; however, in depth 3, the partition probability is dropped to only 37.3%.
The motivation of our proposed scheme in this paper mainly comes from the statistics partition probability above in different depth on the basis of careful observation and analysis of statistics partition probability and standard partition algorithm of CU size in HEVC, and proposed a fast CU size decision optimal algorithm based on neighborhood prediction to quickly determine its optimal CU partition mode for the current CU by using relevant partition information between adjacent CUs in the same depth, which can reduce lots of unnecessary partition and prediction operations for the current CU, therefore saving much computational complexity for HEVC.
3.2 Realizing Principle
Since the standard intra prediction algorithm in HEVC does not take the relevant partition information between adjacent CUs into account, and firstly it only independently executes prediction and partition operations for 85 CUs in a LCU, then it calculates their RDCost values, finally it saves the best way of partition mode, which needs to take lots of unnecessary CU partition and prediction operations in the realizing process, thereby leading to high computational complexity for HEVC.
The realizing principle for our proposed scheme is that we use the partition information of neighborhood CUs to quickly determine partition mode for the current CU by using neighborhood prediction technology. It can reduce many unnecessary prediction and partition operations, and then reduce lots of computational complexity for HEVC. Specifically, in our scheme, we mainly use the partition information of left, up, and left-up CUs to quickly determine whether the current CU in different depth needs to be further divided by its partition threshold value due to its strong correlation between adjacent CUs, which can reduce lots of unnecessary prediction and partition operations for the current CU, therefore saving much computational complexity for HEVC. Fig. 2 is the realizing principle of our proposed scheme.
Realizing principle of our proposed scheme.
Fig. 3 is the convergence figure for our proposed scheme. From Fig. 3, we can clearly see that, in our proposed method, we firstly read adjacent CUs for a current CU, then calculate their prediction, at last we determine to execute or terminate the operation of standard partition algorithm according to the prediction threshold value above.
Where Read its adjacent CUs mainly executes read operations for its adjacent CUs of the current CU. Calculate prediction threshold mainly executes the calculation operations for its adjacent CUs by proposed formula. Execute the standard partition algorithm mainly executes the operation of standard partition algorithm according to prediction threshold value. Terminate the standard partition algorithm mainly terminates the operation of standard partition algorithm according to its prediction threshold value.
Convergence figure for our proposed scheme.
3.3 Realizing Process
3.3.1 Define prediction threshold
From table1 above, we can see that the subdivision correlation in depth 1 is stronger than in depth 2, and the subdivision correlation in depth 2 is stronger than in depth 3. And with the increase of depth, the partition correlation in adjacent CUs become worse and worse.
In order to realize to the principle above, in this paper, we firstly need to define a threshold value of prediction partition to determine whether the current CU needs to be partitioned according to calculating its correlation degree of partition from its adjacent CUs in different depth, which can be shown in the following formula:
where [TeX:] $$T_{\text {depth}}$$ represents threshold value of prediction partition. [TeX:] $$b_{L-U}$$ represents the left-up CUs of the current prediction CU. It is set to 1 when it is partitioned, otherwise it is set as 0. [TeX:] $$b_{L}$$ represents the left CU of the current prediction CU. It is set to 1 when it is divided, otherwise it is set as 0. [TeX:] $$b_{U}$$ represents the up CU of the current prediction CU. It is set to 1 when it is partitioned, otherwise it is set to 0. In this paper, the segmentation weigh coefficient of [TeX:] $$b_{L-U}, b_{L}, b_{U}$$ for the current CU are obtained according to their traversal priority and importance degree with the current CU. Based on the principles above and combination with test results of kinds of different test sequences, the segmentation weigh coefficient for [TeX:] $$b_{L-U}, b_{L}, b_{U}$$ are finally set to 0.2, 0.4, and 0.4, respectively, which can get better experimental results compared with other parameters in our experiment.
3.3.2 Design segmentation strategy
After defining the partition threshold, in our scheme, we need to design different segmentation strategy for different depth based on partition threshold to determine whether the current CU needs to be further partitioned. The specific segmentation strategies are shown as follows.
From the previous partition statistics in Table 1, we can see that, in depth 1, the statistical probability of partition for the current CU reaches about 80% when one or more of [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ are partitioned. This shows that once one of the [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ is partitioned, the partition probability for the current CU is very high. So in depth 1, we designed our segmentation strategy as followed:
The current CU will execute partition operations when one or more of [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ are partitioned. Otherwise it will not execute the partition operations. So in our scheme of this paper, we can set 0 as our partition threshold value according to its segmentation requirements above, and the segmentation strategy can be expressed the following formula.
where [TeX:] $$b_{c}$$ represents the current prediction CU block. [TeX:] $$b_{c}=1$$ represents that the current prediction CU block to be partitioned. [TeX:] $$b_{c}=0$$ stands for that the current prediction CU block to not be partitioned.
From the previous partition statistics in Table 1, we can also see that, in depth 2, once one of [TeX:] $$b_{L}, b_{U} \text { and } b_{L-U}$$ are partitioned,the statistical partition probability for the current CU is about 50%. The partition probability is higher than 50% when two or more of [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ are partitioned. So in depth 1, we designed our segmentation strategy as followed:
The current CU will execute the partition operations when one or more of [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ are partitioned; Otherwise it will not execute the partition operations. In order to realize the segmentation requirements above, we set 0.3 as our partition threshold in our scheme of this paper, and the partition strategy can be expressed as follows:
where [TeX:] $$b_{c}$$ represents the current prediction CU block. [TeX:] $$b_{c}=1$$ stands for the current prediction CU block to be partitioned. [TeX:] $$b_{c}=0$$ represents the current prediction CU block not to be partitioned.
When depth is 3, since the partition probability for the current CU with Size N×N mode is only about 38% when one of the [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ uses the Size N×N mode as its optimal partition mode. So in this paper, the segmentation strategy is designed as followed:
The current CU will execute the partition operations by using Size N×N mode when two or more of [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ are partitioned by using N×N mode. Otherwise it will not execute the partition operations with N×N mode. In order to meet the conditions above, 0.5 is set as our partition threshold in our scheme of this paper, and the segmentation strategy can be expressed as follows:
where [TeX:] $$b_{c}$$ stands for the current prediction CU block. [TeX:] $$b_{c}=1$$ represents that the current prediction CU block needs to be partitioned. [TeX:] $$b_{c}=0$$ represents that the current prediction CU block does not need to be partitioned.
Note that, in depth 0, since the statistical partition probability is very high when one or more of [TeX:] $$b_{L}, b_{U} \text { and } b_{L-U}$$ are partitioned, almost reaching 1, we do not consider to optimize it in this paper due to its high partition probability. In depth 3, since the smallest 8×8 CU has 2 prediction modes (PART_N×N and PART_2N×2N), we only consider whether the current CU uses the prediction mode PART_N×N by judging whether its neighborhood CUs uses the prediction mode PART_N×N as their optimal prediction modes when it requires the same CU size. In our segmentation strategy, we also further defined that if only [TeX:] $$b_{L-U}$$ are partitioned, the current CU does not execute the partition operations.
3.3.3 Design realizing process
Fig. 4 is the realizing process for our proposed scheme. From Fig. 4, we can see that our suggested scheme mainly includes the following sit contents: Obtain the coding CU, Judge the depth of CU, Read its adjacent CUs, Calculate prediction threshold, Compare prediction threshold and partition threshold, and Execute/Terminate the standard partition algorithm.
Where Obtain the coding CU mainly executes the obtain operations for the current coding CU. Judge the depth of CU is mainly to judge in which depth for the current CU locates. Read its adjacent CUs mainly executes read operations for its adjacent CUs. Calculate prediction threshold mainly executes the calculation operations for its adjacent CUs by formula (1). Compare prediction threshold and partition threshold mainly compares the difference value between prediction threshold and partition threshold. Execute/Terminate the standard partition algorithm mainly executes or terminates the operations of the standard partition algorithm according to the difference value between prediction threshold and partition threshold.
Realizing process for our proposed scheme.
3.4 Realizing Steps
Since the division of LCU in HEVC is adopted by recursion quadtree, our proposed method in this paper mainly optimizes the CU partitioning process by depth order from large to small. The detailed realizing steps for our proposed scheme can be summarized into the following steps:
Step 1: Obtain the current prediction CU and its depth;
Step 2: Read its partition mode of [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ of the current CU in depth 3, and judge whether it uses the partition mode PART_N×N as its optimal partition mode. Then it calculates the prediction threshold by using formula (1), and compare the difference value between prediction threshold and partition threshold by using formula (4) above. If the difference value is greater than 0, the current CU needs to be further divided with PART_N×N mode. Otherwise it does not need to be further partitioned.
Step 3: Calculate the total RdCost value for [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ and the current CU, and compare it with the whole coding block constituted by them in depth 3 to judge whether the whole coding block needs to be further partitioned. The whole coding block will execute the partition operations when the former is larger than the latter. Otherwise it does not execute the partitions operations.
Step 4: Execute the partition strategy to calculate for [TeX:] $$b_{L} \text { and } b_{U}$$ by taking the same step 2 and step 3 above, then get their partition mode information and save them respectively.
Step 5: Obtain the partition mode for [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ in depth 2, then calculate their prediction threshold values by using formula (1) and compare their difference value between prediction threshold and partition threshold by using formula (3) above. If the difference value is greater than 0, the current prediction block needs to be further partitioned. Otherwise it does not need to be partitioned.
Step 6: Calculate the total RdCost value for [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ and the current CU, and compare it with the whole coding block constituted by them in depth 2 to judge whether the whole coding block needs to be further partitioned. The whole coding block will execute the partition operations when the former is greater than the latter. Otherwise it will not execute the partitions operations.
Step 7: Take the same partition operations for [TeX:] $$b_{L} \text { and } b_{U}$$ taking the same step 2 to step 6, then calculate and save their partition mode information.
Step 8: Calculate the partition mode information for [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$, then calculate their prediction threshold value by using formula (1), then compare their difference value between prediction threshold and partition threshold by using formula (2) above. If the difference value is greater than 0, the current prediction block needs to be further partitioned. Otherwise it does not need to be partitioned.
Step 9: Calculate the total RdCost value for [TeX:] $$b_{L}, b_{U}, \text { and } b_{L-U}$$ and the current CU, and compare it with the whole coding block constituted by them in depth 0 to judge whether the whole coding block needs to be further partition. It will execute the partition operations when the former is greater than the latter. Otherwise it will not execute the partition operations.
3.5 Application Instance
Take partitioning and determining its optimal partition mode from a LCU, for example, to illustrate the detailed realizing process for our proposed scheme, which is shown in Fig. 5.
Step 1: Obtain a current prediction CU and its depth;
Step 2: Read its partition mode information of [TeX:] $$b_{L} \mathrm{B}, b_{U} \mathrm{C} \text { and } b_{L-U} \mathrm{A}$$ from the current CU D in depth 3, and judge whether B, C and A coding CU uses the partition mode PART_N×N as their optimal predicted mode. Then it will calculate the prediction threshold by using formula (1), and compare the difference value between prediction threshold and partition threshold by using formula (4) above. If the difference value is greater than 0, the current CU does not need to be further partitioned. Otherwise it needs to be further partitioned with PART_N×N mode.
Step 3: Calculate the total RdCost value for [TeX:] $$b_{L} \mathrm{B}, b_{U} \mathrm{C} \text { and } b_{L-U} \mathrm{A}$$ and the current CU D, and compare it with the whole coding block E constituted by them to judge whether the whole coding block E needs to be further partitioned. The whole coding block E will needs to be partitioned when the former is smaller than the latter. Otherwise it does not need to be partitioned.
Step 4: Execute the partition strategy to calculate for [TeX:] $$b_{L} \mathrm{F} \text { and } b_{U} \mathrm{G}$$ by taking the same step 2 and step 3 above, then get and save their partition information.
Step 5: Obtain the partition mode information for [TeX:] $$b_{L} \mathrm{F}, b_{U} \mathrm{G}, b_{L-U} \mathrm{E}$$ in depth 2, then calculate their prediction threshold value by using formula (1) and compare their difference value between prediction threshold and partition threshold by using formula (3) above. If the difference value is greater than 0, the current CU H needs to be further partitioned. Otherwise it does not need to be further partitioned.
Step 6: Calculate the total RDCost value for [TeX:] $$b_{L} \mathrm{F}, b_{U} \mathrm{G}, b_{L-U} \mathrm{E}$$ and the current CU H, and compare it with their whole coding block I constituted by them in depth 2 to judge whether the whole coding block I needs to be further partitioned. The coding block I will execute the partition operations when the former is greater than the latter. Otherwise it will not execute the partitions operations.
Step 7: Execute the same partition operations for [TeX:] $$b_{L} \mathrm{J} \text { and } b_{U} \mathrm{K}$$ by using the same step 2 to step 6 above, then calculate and save their partitions information.
Step 8: Calculate the partition mode information for [TeX:] $$b_{L} \mathrm{J}, b_{U} \mathrm{K}, b_{L-U} \mathrm{I}$$ in depth 1, then calculate their prediction threshold value by using formula (1), and compare their difference value between prediction threshold and partition threshold by using formula (2) above. If the difference value is greater than 0, the current prediction block L needs to be further partitioned; Otherwise it does not need to be partitioned.
Step 9: Calculate the total RDCost value for [TeX:] $$b_{L} \mathrm{J}, b_{U} \mathrm{K}, b_{L-U} \mathrm{I}$$ and the current CU L, and compare it with the whole coding block LCU in depth 0 to judge whether the LCU needs to be further partitioned. The LCU will execute the partition operations when the former is greater than the latter. Otherwise it will not execute the partition operations.
Fast CU size partition optimal instance with our proposed scheme.
4. Experimental Results and Analysis
In order to verify the effectiveness of our proposed algorithm above, some experiments are performed in this section. In our experiment, our designed experiments mainly include the following two contents: building the experimental environment and testing the experimental results.
4.1 Building the Experimental Environment
The simulation environment was conducted on an Intel 2-GHz processors, 2-GB memory capacity, Intel Windows XP operating systems. The test conditions, configurations and test sequences in our experiment are built based on the JCT-VC [20]. We use 12 different official test sequences of HEVC above with six different class resolutions: Class A (2560×1600), Class B (1920×1080), Class C (1280×720), Class D (1024×768), Class E (832×480), and Class F (416×240) as our experimental test objects. And their frame rate and frame number are set to 25 and 100, respectively. Other some general configuration parameters are shown at Table 2.
General configuration parameters
In our experiment, BD-rate performance and coding time saving rate are taken as our comparison indicators, where BD-rate [21] is mainly used to measure the loss of coding efficiency; coding time saving rate is used to measure the reduction rate of computational complexity for HEVC. The less coding time saving rate represents the higher computational complexity efficiency for HEVC. Two important statistical performances, average and STD, are introduced into our experiment to evaluate the performance of our method. Where the average represents the average value of BD-rate performance and coding time saving rate by repeated random experiments in our experiments, and it can reflect the concentration trend of different methods under the same conditions; STD represents the standard deviation of BD-rate performance and coding time saving rate by repeated experiments in our experiments, and it can evaluate the robustness of different methods under the same conditions. The coding time saving rate can be expressed as follows:
where [TeX:] $$coding \text{ } time_{HM16.1}$$ and [TeX:] $$coding\text{ }time_{our}$$ represent the total encoding decision time of the standard reference software of HM16.1 and our proposed algorithm in this paper, respectively.
In our experiment, two general comparison schemes, the standard reference HM scheme [22] and Ha’s scheme [9] are selected to compare with our proposed method from two aspects, BD-rate performance and coding time saving rate. Where the standard reference HM scheme mainly used neighborhood correlation of neighborhood CUs to quickly determine the optimal partition mode for the current CU; Ha's scheme mainly used texture similarity of neighborhood CUs to quickly determine the optimal partition mode for the current CU, while our proposed method mainly used neighborhood prediction of neighborhood CUs in different depth with prediction threshold value to quickly determine the optimal partition mode for the current CU.
4.2 Testing the Experimental Results
In this subsection, we mainly test the performance of coding time saving rate and BD-rate increase compared with the standard reference HM scheme [22] and Ha's scheme [9]. The related experimental results are shown in Tables 3 and 4.
Table 3 is the coding time saving rate and BD-rate in different test sequences with QP=27. Table 4 is the coding time saving rate and BD-rate in different test sequences with QP=37. From Tables 3 and 4, we can see clearly that our proposed method in this paper have good perform in coding time compared with the standard reference software of HM16.1 under the same test conditions, reaching about 19.0% coding time saving, and only adding 0.102% BD-rate. Ha's scheme has the similar good perform in coding time, reaching about 20.0% coding time saving, but it needs to add about 2.06% BD-rate. The standard reference software of HM16.1presents the worst coding performance in three schemes above. From Tables 3 and 4, we can also see clearly that our proposed method in this paper have good perform of STD value in BD-rate performance and coding time saving rate compared with the Ha's scheme under the same test conditions. Fig. 6 is the coding time saving rate for Traffic test sequence with different QP value. Fig. 7 is the coding time saving rate for BQMal test sequence with different QP value. From Figs. 6 and 7, we can also see that our proposed method in this paper has good coding time saving rate in the different QP values compared to the other two methods.
Time saving rate in different test sequences with QP=27
Time saving rate in different test sequences with QP=37
Coding time saving rate for Traffic sequence.
Coding time saving rate for BQMal sequence.
The main reasons for them are that, in our proposed scheme, we mainly use the relevant partition information between neighbor CUs in different depth to quickly determine the optimal partition mode for the current CU, which can reduce many unnecessary prediction and partition operations for the current CU, therefore saving about 19% computational complexity and only adding about 0.102% BD-rate for HEVC. Ha's scheme can save about 20% computational complexity due to the use of texture-based technology in its scheme, which can reduce many unnecessary computations for the current CU due to texture similarity, thus saving much computational complexity for HEVC. But at the same time, it needs to add about 2.06% BD-rate for HEVC. The standard reference software of HM16.1 costing the most coding time lies in its independent prediction and partition operations for each CU to find its optical partition mode, which needs to take many unnecessary CU partition and prediction operations, thereby leading to the highest computational complexity for HEVC.
From Figs. 6 and 7, we can also observe that the coding time saving rate presents to slightly rise with the increase of QP value in our proposed algorithm and Ha's scheme, which shows that the two schemes can save much more coding time in high QP value compared to lower QP value under the same test condition. The similar results can be also found in the Ha's scheme [9]. The main reason for it is that Ha's scheme and our proposed scheme have a different impact on different QP value, thus influencing different encoding time. However, the standard reference software of HM16.1 does not present the phenomenon above.