Base View Rate Control Algorithm Based on VVC for 3D Video Sequences

Tao Yan; Qian Zhang

doi:10.3745/JIPS.02.0231

ISSN: 2092-805X

Volume 21, No 6 (2025), pp. 613 - 623

10.3745/JIPS.02.0231

Tao Yan and Qian Zhang

Base View Rate Control Algorithm Based on VVC for 3D Video Sequences

Abstract: A new generation of versatile video coding (VVC) standards was released in 2020; however, existing rate control algorithms for three-dimensional (3D) video coding based on VVC do not consider the effects of scene switching in actual video applications. Under the background of scene switching, the quality of encoded images degrades owing to the unreasonable allocations of coding resources. To solve this problem, this study proposes a 3D video basic view rate control algorithm based on scene switching detection of image structure similarity. First, scene switching is detected before coding. Basic view target bits estimated by the allocation model are first estimated by view weight factors. Finally, scene switching is judged by combining the structural similarity between frames with its transformation trend. When scene switching occurs, the rate control parameters and coding structure are adjusted in time. Experimental results show that the proposed algorithm not only tracks the target bit rate accurately but also significantly improves the reconstructed image quality in the case of scene switching and ensures that the decoder input buffer does not overflow or underflow.

Keywords: Scene Switching , Structure Similarity , VVC , 3D Video Coding

1. Introduction

Versatile video coding (VVC) is the latest coding standard after high efficiency video coding (HEVC), which integrates the most advanced video compression technology [1-5]. However, three-dimensional video coding based on VVC (3D-VVC) has not been studied. With the rise and rapid development of 3D technology, multi-view video has attracted great attention from academia and industry, both domestically and abroad, because it offers more vivid images and richer visual experiences. It provides dynamic scenes and simulates multiple viewing angles to enable users to see more realistic video scenes. A multi-view original video sequence is captured by multiple cameras placed horizontally that simultaneously shoot the same scene, and each camera outputs corresponding camera parameters for coding. This method is widely used [6-8].

Encoders based on 3D-HEVC do not accurately predict the number of bits generated by B frames, and an effective rate control algorithm for layered B frames is lacking. To solve the problem of inter-dependence among hierarchical layers in the time domain, Liu et al. [9] introduced a fixed hierarchical weight factor rate control algorithm based on a hierarchical B frame structure on the basis of JVT-G012, which greatly improved the average peak signal-to-noise ratio (PSNR) of whole video sequences. An algorithm that adds parts realizes bit allocation for each B frame via two steps. Encoding bitrate control plays a crucial role in the entire technical system of 3D video. Recently, the international scientific research community has conducted many studies in this field and achieved remarkable results [10-13]. Scene switching changes the characteristics of video sequences; if different scenes are evenly distributed, the bit rate of fast-moving scenes will be low, while the bit rate of relatively flat scenes will be in surplus, thus wasting coding resources and degrading video quality.

The compression coding of texture videos and depth maps affects the quality of 3D video. Moreover, the reasonable allocation of code rate between texture and depth directly affects video quality [14-17]. In multi-view video, the non-base view should refer to the base view, and in a single-view video player, the base view is the most important view in multi-view video. Therefore, a high-quality base must be ensured and thus multi-view rate control is necessary. A full search method by quantization parameters (QPs) and pre-coding are adopted to obtain the optimal quantization step size for texture and depth coding. The full search method can achieve the best virtual view rendering quality, but the computational complexity is several times that of currently popular methods [18-20]. In 3D video coding, inter-view bit allocation is undoubtedly the core aspect of bitrate control. Currently, various bit allocation strategies are employed in industrial applications. A method based on rate-distortion optimization (RDO) attempts to strike a balance between coding efficiency and video distortion. By analyzing the effect of scene switching in different frame positions on video sequence coding, this paper presents an accurate scene switching detection method with slightly more computation and provides a corresponding rate control method. Traditional scene switching detection algorithms [21-23] can be categorized into those based on gray or histogram detection and those based on block matching. In addition, for video coding standards, a scene switching detection algorithm based on coded data exists. For detected scene switching points, previous studies [24,25] have suggested optimizing the rate control strategy according to the scene information to improve coding efficiency and modifying the coding parameters. The 3D videos are being widely applied, and the requirements for bitrate control are increasing. However, when dealing with complex and changeable scenarios as well as different motion characteristics, existing bitrate control methods often struggle to allocate bits accurately, thereby degrading video quality and wasting bitrate. Many scholars, both domestically and abroad, have proposed optimal resource allocation among viewpoints in multi-view rate control and improved its rate distortion performance. Considering that inter-view prediction technology can directly transmit distortion in the basic view to the related view, a joint rate-distortion model of multi-view texture video was established [26-28].

Based on the research on 3D-HEVC bit rate control algorithms, considering the complexity of coded frames and scene switching in images, a scene switching detection method and corresponding bit rate are proposed in this study. Such a method based on VVC 3D video coding has not yet been studied. Hence, this paper proposes a code rate control algorithm based on scene switching applied to a 3D-VVC simulcast platform. First, the scene switching frame is detected by calculating the luminance difference transform trend of neighboring frames, and after determining that the current frame is a scene switching frame, the coding parameters are adjusted in time to maintain video smoothness and improve coding quality, while maintaining the accuracy of the output bit rate.

2. Scene Switching Detection

Because of the characteristics of multi-view video, using two dimensions and a stereo video rate to control a model directly is impossible, and significant differences will occur between the multi-view coding rate control models. Our previous rate control algorithm predicts the complexity value of the current encoded frame according to the previous reference frame in the control flow. Because the temporal correlation between adjacent images is destroyed, scene switching may lead to the incorrect operation of the rate controller, which in turn may cause the buffer to overflow or underflow. At this time, to prevent the degradation of coding efficiency caused by motion compensation failure, many areas in the scene adopt intra-frame coding, which inevitably increases the bit rate. The 3D videos are becoming increasingly applied, and the requirements for bitrate control are also increasing. However, when dealing with complex and changeable scenarios as well as different motion characteristics, existing bitrate control methods often struggle to accurately allocate bits, thereby degrading video quality and wasting bitrate.

Usually, the occurrence of scene switching in adjacent frames can be detected by calculating the mean square error between all macroblocks in the current frame and the macroblocks in different positions in a certain search window in the reference frame. The essence of this method is similar to motion estimation in MPEG, and this method can detect scene switching effectively. However, it requires excessive computations. Even if only every P-th frame is detected and a fast algorithm, similar to motion estimation, is adopted, the additional computation can cause a large computational burden.

In practice, the structural similarity index measure (SSIM) algorithm is employed to find the degree of similarity between two images, which is also known as the structural similarity of images. This method assumes that [TeX:] $$P_i$$ denotes the scene switching probability of frame i, [TeX:] $$\Delta {cof} f_{i, j}$$ is the correlation between frames i and j, and θ is the threshold:

(1)

[TeX:] $$P_i=\left|\Delta \operatorname{cof} f_{i, i-1}-\Delta \operatorname{cof} f_{i, i+1}\right| \geq \theta$$

where GOP (group of pictures) is the coding length, θ is the experimental threshold, and [TeX:] $$\Delta {cof} f_{i, j}$$ is the correlation between frames i and j. We view the inter-point similarity to determine whether scene switching occurs in 3D videos. In this study, existing algorithms were used to obtain the similarity between images. The similarity metric [TeX:] $$\Delta {cof} f_k$$ of the current frame in the encoded GOP group is defined as

(2)

[TeX:] $$\Delta {Coff}(A, B)=(L(A, B))^\phi \cdot(C(A, B))^{\varphi} \cdot(S(A, B))^\gamma$$

where, L(A,B), C(A,B), and S(A,B) represent comparisons of the luminance, contrast, and structural information of the two images, respectively, and [TeX:] $$\phi, \varphi, \gamma$$ can adjust the proportions of the luminance, contrast, and structural information, respectively.

3. Base View Rate Control Algorithm

Considering that scene switching affects 3D video viewing, we considered reallocating bits to improve the bit rate control accuracy when scene switching occurs in a video. The results of our previous research show that the code rate control algorithm based on a proportional-integral-derivative (PID) controller can obtain better control of the code rate fluctuation and thereby improve image quality, thus making it possible to track the target code rate more accurately to obtain a high null domain reconstructed image while simultaneously improving the time domain reconstructed image quality. Moreover, for cases of scene switching, the corresponding control strategy is adopted in a PID controller for processing. Multi-viewpoint video must satisfy the requirements of both the multi-viewpoint and single-viewpoint video players, but also must satisfy the single-viewpoint video player. In multi-viewpoint video, the non-base viewpoint should refer to the base viewpoint. As the base viewpoint video is played in the single-viewpoint video player, the base viewpoint is the most important viewpoint in the multi-viewpoint video. Hence, high quality is critical, and multi-viewpoint based viewpoint bit rate control is therefore necessary. In addition to predicting the optimal coded QP for the current frame or coding block by using the correlation between the time and space domains, in 3D-VVC, the bit rate of the non-basic viewpoints is controlled based on the inter-viewpoint weights. Owing to space limitations, the specific algorithm for base unit layer bit rate control is not presented in this paper. The main algorithms for viewpoint layer and frame layer bit rate control are described next.

3.1 Bit Allocation between Viewpoints

The improved adaptive V-GOP video coding takes a V-GOP as its coding unit, and the video frame information in a V-GOP exhibits great redundancy. A V-GOP is divided into fixed frame number and variable frame number coding modes. The fixed frame number coding mode does not consider scene switching, which affects the coding performance. Variable frame number coding is also called adaptive V-GOP video coding. Adaptive V-GOP video coding algorithms are also based on a fixed frame number. However, when scene switching occurs, the adaptive algorithm separates the video frames and generates a new V-GOP coding. This method easily produces smaller video frames (for example, a V-GOP has only one frame), but allocating a V-GOP to these small video frames incurs a large coding cost. Therefore, an improved adaptive V-GOP video coding standard is proposed.

An irrationality exists in the target bit allocation in the code rate control of the V-GOP layer of the 3D video coding viewpoint layer. Previously the impact of the last V-GOP remaining bits was limited by a threshold M. The viewpoint layer bit-bit allocation [TeX:] $$T_{V-G o p}(i)$$ is shown in Eq. (3):

(3)

[TeX:] $$\begin{aligned} & T_{V-G o p}(i) \\ & \qquad \begin{array}{l} =\min \left(\frac{u(i)}{F_r} \times N_{V-g o p}(i) \times N_{\text {view }}+T_{V-G o p}^*(i\right. \\ \left.-1), \frac{u(i)}{F_r} \times N_{V-g o p}(i) \times N_{\text {view }}+M\right) \end{array} \end{aligned}$$

Previous results show that incremental PID is well-suited for scene-switching bit allocation [29,30]. Incremental PID control focuses on the effect of error accumulation on the system, and the output of the controller is only an increment of the control quantity [TeX:] $$\Delta E_{P I D}(t),$$ which is given by the control equation:

(4)

[TeX:] $$\Delta E_{P I D}(t)=k_P \cdot \Delta e(t)+k_I \cdot e(t)+k_D \cdot[\Delta e(t)-\Delta e(t-1)]$$

where [TeX:] $$k_P, k_I, \text { and } k_D$$ are the proportional, integral, and differential coefficients, respectively; e(t) is the error signal; and [TeX:] $$\Delta e(t)$$ is given by

(5)

[TeX:] $$\Delta e(t)=e(t)-e(t-1)$$

An advantage of the incremental PID algorithm is that it does not require repeated accumulation when calculating the output increment does not need to do repeated accumulation calculation, which reduces the amount of computation required significantly. The output increment is related only to the error sampling value of the last few times. Therefore, the computation is small, which satisfies the real-time requirements of the control system.

Experimental results show that the above results are not ideal. Therefore, we investigated the introduction of an incremental PID control algorithm with V-GOP layer rate control. The demand function is defined as the total number of bits that can be used for stream transmission, according to the actual bandwidth of the channel, and the sum of the actual number of bits used by the encoded V-GOP is taken as a feedback function. Eq. (6) shows the error function of incremental PID control in the V-GOP layer:

(6)

[TeX:] $$e_g(i)=\sum_{k=0}^i \frac{u(k)}{F_r} \times N_{V-g o p}(i) \times N_{\text {view }}-\sum_{k=0}^i A(k)$$

where A(k) is the actual number of bits used for the K-th V-GOP code, u(k) is the bit rate of the channel, and [TeX:] $$F_r$$ is the frame rate. The error increment is

(7)

[TeX:] $$\Delta e_g(i)=e_g(i)-e_g(i-1)$$

According to incremental PID control, the target bit allocation of the V-GOP layer is given by

(8)

[TeX:] $$\begin{aligned} & T_{V-\text { Gop }}(i) \\ & = \begin{cases}\frac{u(i)}{F_r} \times N_{\text {gop }}(i) \times N_{\text {view }} * \lambda_i & i\lt 3 \\ T_{V-\text { Gop }}(i-1)+g_p \cdot \Delta e_g(i-1)+g_i \cdot e_g(i-1)+g_d \cdot\left[\Delta e_g(i-1)-\Delta e_g(i-2)\right] & i \geq 3\end{cases} \end{aligned}$$

where [TeX:] $$\lambda_i$$ denotes the weight factor of each viewpoint GOP, and [TeX:] $$g_p, g_i, \text { and } g_d$$ are the proportional, integral and differential coefficients, respectively. The good or bad selection of parameters directly affects the control effect and control robustness of the controller, which can be effectively combined with the traditional PID controller and fuzzy control to model the rules of an engineer’s empirical knowledge and obtaining the optimal parameters of the PID model via fuzzy reasoning.

3.2 Frame Layer Rate Control based on Scene Detection

Because the model parameters in rate control are predicted from previous encoded frames, and the attributes of the images suddenly change at this time, the model parameters lose effectiveness, which leads to inaccurate rate control. Simultaneously, the parameter update is a slow-changing process; therefore, failing to adjust the parameters in time will affect the rate control accuracy of subsequent frames.

In 3D video coding frame level code rate allocation, the previous algorithm allocates bits per frame using the target buffer capacity, frame rate, and actual buffer size, which do not account for the residual energy of the current coded frame, which is likely to cause image quality degradation or frame skipping. Thus, the target allocation bit F(j) of the B-frame (or P-frame) is weighted by [TeX:] $$F_r(j) \text { and } \Delta T_{b u f, P I D} .$$ We use motion content in the frame layer and adopt the motion content complexity factor as the key factor for bit allocation in the frame layer. For simplicity, the B-frame case is not considered, and the principle of bit allocation in the frame layer based on the motion complexity code rate control algorithm is still buffered by the current values. For the I-frame and first B-frame of the first GOP and first P-frames of other GOPs in each V-GOP, [TeX:] $$Q P_0$$ is used for encoding, and target bits need not be allocated. For the other B-frames (or P-frames), target bits F(j) are allocated, weighted by [TeX:] $$F_r(j) \text { and } \Delta T_{b u f, P I D} \text { : }$$

(9)

[TeX:] $$F(j)=\Delta {cof} f_j \cdot \Delta T_{b u f, P I D}+\left(1-\Delta {cof} f_j\right) \cdot F_r(j)$$

The incremental digital PID controller omits the accumulative operation, which necessitates a large amount of arithmetic, and only the last few error samples affect the increment of the control quantity, which overcomes the accumulation of errors and improves the calculation accuracy. The method based on visual perception focuses on allocating bits according to the visual characteristics of the human eye. By contrast, the emerging content-adaptive method emphasizes the dynamic allocation of resources according to the real-time content of the video:

(10)

[TeX:] $$\begin{aligned} & \Delta T_{b u f, P I D} \quad \\ & \quad=\eta \cdot\left(k_P \cdot \Delta e(t)+k_I \cdot e(t)+k_D \cdot[\Delta e(t)-\Delta e(t-1)]\right)+\theta \\ & \quad \cdot\left(T b l(t)-B_c(t-1)\right) \end{aligned}$$

where [TeX:] $${Tbl}(t) \text { and } B_c(j)$$ are the buffer target fullness and the actual buffer occupancy of the j-th P-frame of the current GOP, respectively.

In the original MV-HEVC frame layer bit rate allocation, bits are allocated only according to the complexity characteristics of the texture content. The above allocation is mainly an additional consideration for some scene complex texture blocks, especially those containing foreground and background. For a current coding block that has no depth map changes and belongs to a smooth block, this factor need not be considered in terms of depth complexity. To improve the previous algorithm by using the residual energy of the encoded frames, Eq. (11), which presents a frame level target bit allocation algorithm, is proposed:

(11)

[TeX:] $$F_r(j)=F_r^{\prime}(j-1) \cdot \frac{\sum_{l=1}^L W(l) \cdot 2^n}{\sum_{l=1}^L \Delta {cof} f_l \cdot F D(l)+\sum_{l=1}^L W_B(l) \cdot\left(2^n-1\right)}+F_j$$

In the above equation, [TeX:] $$F_j$$ represents the bits consumed by the frame header information of the j-th frame, n represents the time layer in which the current frame is located, W(l) represents the weight of the complexity of each frame, [TeX:] $$W_B(l)$$ represents the weight of the B-frame, FD(l) denotes the active time domain degree of the j-th encoded frame, and [TeX:] $$\Delta {cof} f_l$$ represents the similarity of each frame, respectively.

However, because the front and back frames are no longer temporally correlated, a large amount of coding resources will be wasted if the subsequent frames use the pre-switching frame as the reference frame. Because the independently encoded I-frame is an important frame in encoding, it is the reference frame for other B-frames or P-frames, and the quality of the reference frame directly determines the quality of other encoded frames. Therefore, in this study, the current GOP is ended early, the current scene switching frame is set as the I-frame, and the following frames do not use the frame before the scene change as the reference frame. To avoid increasing complexity, for the bit allocation at the frame level and for the CTU level, the algorithms existing in the reference software for bit allocation at the frame level and CTU level are adopted in this study.

4. Experimental Test

To verify the performance of the proposed algorithm, a 3D-VVC simulcast platform was extended with the H.266/VVC encoder reference model VTM platform, and the video sequences of each viewpoint were encoded as a single sequence using independent coding between viewpoints. The experiment was carried out in two steps. Test sequences were tested for scene switching detection before encoding. Owing to our limited skills, a GOP multiplier was used for sequence scene switching. To solve such problems quickly, this algorithm can be implemented by simply changing the configuration file. Subsequently, basic viewpoint bit allocation and the corresponding adjustment of coding parameters and configurations were performed based on inter-viewpoint importance weighting. In this study, only two viewpoints were considered, only basic viewpoint bitrate control data are provided and the scene switching is special. In this study, five video sequences with different properties were used for simulation at different target bit rates. These five sequences were all synthetic sequences, and new synthetic sequences were generated by resampling different sequences. During sampling, the resolution of each sequence was unified. PoznanHall2-GT_Fly-Shank (S1), PoznanHall2-Shank-Dancer (S2), GT_Fly-Dancer-PoznanHall2 (S3), Balloons-Newspaper-Kendo (S4), and Kendo-Newspaper-Balloons (S5), with scene switching, were spliced by three different sequences, as shown in Table 1. For example, sequence S1 represents the splicing of the PoznanHall2, GT_Fly, and Shank sequences; the scenes were alternately switched once; and each scene had a different complexity. When the resolutions of these three synthesized sequences were different, the following sequences were resampled by the resolution of the previous sequence and were subsequently spliced and synthesized alternately.

In this study, PSNR and rate control accuracy were used to measure the performance of the rate control algorithm. In evaluating the objective quality of an image, the average PSNR change of the Y component is taken as the index. If the PSNR change is positive, then the image quality is considered to be improved. However, if the PSNR transform amount is negative, then the image quality is considered degraded. In evaluating the accuracy of the output bit rate, the bit error rate was taken as the index. A smaller bit error rate implies a more accurate output bit rate. In this study, we used the rate control algorithm to optimize the prediction structure, whereas our previous algorithm did not optimize the prediction structure. The basic unit layer rate is similar to that of the previous algorithm.

Table 1 shows the experimental comparison of the coding performance between the proposed rate control and multi-view rate control algorithms with average bit allocation between views and average bit allocation at the frame level for different synthetic sequences under several groups of different target rates [5,31]. As shown in Table 2, in the test sequences, the proposed algorithm improved the BDPSNR by an average of 0.22 dB, compared with previous algorithms. Moreover, compared with the 3D-VVC Simulcast algorithm with a uniform bit distribution, the method proposed in [5] increased the bit rate by 0.96 % on average, whereas our algorithm saved 6.22 % of the bit rate.

Table 1.

Experimental results of basic view rate control for 3D video sequences

Sequence	Suehring [5]		TIP [31]		Proposed
Sequence	BDPSNR (dB)	BDBR (%)	BDPSNR (dB)	BDBR (%)	BDPSNR (dB)	BDBR (%)
S1 Merged241_1920×1088	0.04	-1.49	0.25	-9.62	0.45	-16.98
S2 Merged567_1024×768	-0.03	0.93	-0.03	0.57	-0.25	5.86
S3 Merged123_1920×1088	0.00	5.69	0.59	-14.28	0.77	-21.08
S4 Merged134_1920×1088	0.03	-1.28	-0.43	18.54	-0.09	7.33
Avg.	0.01	0.96	0.10	-1.20	0.22	-6.22

BDPSNR=Bjontegaard delta peak signal-to-noise ratio, BDBR=Bjontegaard delta bit rate.

Fig. 1 shows the PSNR fluctuation patterns of the different synthesized sequences, S1 and S2, at several sets of different target bit rates. In this figure, two rate control algorithms are given to encode the PSNR fluctuation diagram of each frame. Compared with the average bit allocation algorithm, the proposed rate control algorithm achieved a smaller PSNR fluctuation ratio, stable quality changes before and after frames, and better visual effects.

Fig. 1.

Objective quality fluctuations for synthetic sequences: (a) S1 sequence and (b) S2 sequence.

Table 2 shows the experimental comparison of the coding performance between the proposed rate control and multi-view rate control algorithms with average bit allocation between views and average bit allocation at the frame level for different synthetic sequences under several groups of different target rates. The test sequence results shown in Table 2 demonstrate that the proposed algorithm improved BDPSNR by an average of 0.279 dB, compared with the previous algorithm. Moreover, the proposed algorithm improved BDPSNR by an average of 0.282 dB, compared with the fixed code rate allocation algorithm, which is more effective than the proposed algorithm. Furthermore, compared with the 3D-VVC simulcast algorithm with uniform bit distribution, the bit rate increased by 7.010 % on average in [5], whereas our algorithm saved 8.087% of the bit rate.

These results can be attributed to the proposed algorithm accounting for late scene switching and the use of I-frame coding to allocate more bits such that the early allocation does not avoid frame skipping. In this way, the PSNR values of some frames would be relatively low, but the coding quality of the test sequence S2 would be improved as a whole. The rate control algorithm in our previous method did not consider the situation of video scene switching. When video scene switching occurs, the adaptive algorithm separates the video frames during scene switching and generates a new V-GOP for coding. This method easily produces smaller video frames (such as a V-GOP with only one frame), but assigning a V-GOP to these small video frames incurs a large coding cost.

Table 2.

Experimental results of basic view rate control for 3D video sequences

Sequence	Suehring [5]		Proposed
	vs. 3D-VVC Simulcast		vs. 3D-VVC Simulcast		vs. Suehring [5]
	BDPSNR (dB)	BDBR (%)	BDPSNR (dB)	BDBR (%)	BDPSNR (dB)	BDBR (%)
S1	0.038	-1.492	0.449	-16.980	0.424	-16.476
S2	-0.034	0.927	-0.247	5.858	-0.164	3.903
S3	-0.001	5.688	0.770	-21.083	0.694	-23.088
S4	0.022	-0.289	0.530	-10.185	0.512	-10.175
S5	0.031	-1.282	-0.090	7.332	-0.071	5.402
Avg.	0.011	0.710	0.282	-7.012	0.279	-8.087

5. Conclusion

Because the characteristics of video sequences change under scene switching, if different scenes are evenly distributed, then the bit rates of fast-moving scenes will be low, whereas the bit rates of relatively flat scenes will be in surplus, resulting in a waste of coding resources and the degradation of video quality. Owing to our limited skills, the basic viewpoint bit allocation of the comparison algorithm is relatively small, and the test platforms differ. The sequence scene switching used in this study was a GOP multiplier. To solve such problems quickly, this algorithm can be implemented simply by changing the configuration file. Based on a full analysis of the influences of scene switching on video coding, the rate control can handle the scene transformation well by adaptively adjusting the parameters and changing the coding structure. Moreover, this algorithm can allocate bit rate resources more reasonably, improve video quality, and reduce coding complexity, thus effectively improving coding efficiency. For scene switching processing, this study mainly solved the special case of mutation processing. However, in gradient situations, further research and experimental verification are required to address related frames. Future research can focus on describing the complexity of a scene more accurately and applying the algorithm to an actual video sequence with scene switching.

Conflict of Interest

Biography

Tao Yan

https://orcid.org/0000-0002-8304-8733

He received the Ph.D. degrees in communication and information systems from Shanghai University, Shanghai, China, in 2010. He has been with the faculty of the School of Information Engineering, Putian University, where he is currently a professor. His major research interests include multiview high-efficiency video coding, rate control, and video codec optimization. He currently presides National Natural Science Foundation project.

Biography

Qian Zhang

https://orcid.org/0000-0003-0760-9241

She is now the associate professor of Shanghai normal University, China. She received a Ph.D. from Shanghai University in China. Her research interest fields include video processing.

References

1 Z. Peng, L. Shen, Q. Ding, X. Dong, and L. Zheng, "Block-dependent partition decision for fast intra coding of VVC," IEEE Transactions on Consumer Electronics, vol. 70, no. 1, pp. 277-289, 2024. https://doi.org/10. 1109/TCE.2023.3324794doi:[[[10.1109/TCE.2023.3324794]]]
2 L. He, X. He, S. Xiong, Z. Zhao, H. Xiao, and H. Chen, "Efficient rate control in versatile video coding with adaptive spatial–temporal bit allocation and parameter updating," IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 6, pp. 2920-2934, 2023. https://doi.org/10.1109/TCSVT.2022.3224723doi:[[[10.1109/TCSVT.2022.3224723]]]
3 K. Choi, T. Van Le, Y . Choi, and J. Y . Lee, "Low-complexity intra coding in versatile video coding," IEEE Transactions on Consumer Electronics, vol. 68, no. 2, pp. 119-126, 2022. https://doi.org/10.1109/TCE.2022. 3145397doi:[[[10.1109/TCE.2022.3145397]]]
4 Y . Mao, M. Wang, S. Wang, and S. Kwong, "High efficiency rate control for versatile video coding based on composite Cauchy distribution," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 4, pp. 2371-2384, 2022. https://doi.org/10.1109/TCSVT.2021.3093315doi:[[[10.1109/TCSVT.2021.3093315]]]
5 K. Suehring, VVC Software VTM-22.0," 2023 (Online). Available: https://vcgit.hhi.fraunhofer.de/yueli/VVC Software_VTM/-/tree/VTM-22.0.custom:[[[https://vcgit.hhi.fraunhofer.de/yueli/VVCSoftware_VTM/-/tree/VTM-22.0]]]
6 J. R. Lin, M. J. Chen, C. H. Yeh, Y . C. Chen, L. J. Kau, C. Y . Chang, and M. H. Lin, "Visual perception based algorithm for fast depth intra coding of 3D-HEVC," IEEE Transactions on Multimedia, vol. 24, pp. 17071720, 2022. https://doi.org/10.1109/TMM.2021.3070106doi:[[[10.1109/TMM.2021.3070106]]]
7 T. Li, L. Yu, H. Wang, and Z. Kuang, "A bit allocation method based on inter-view dependency and spatiotemporal correlation for multi-view texture video coding," IEEE Transactions on Broadcasting, vol. 67, no. 1, pp. 159-173, 2021. https://doi.org/10.1109/TBC.2020.3028340doi:[[[10.1109/TBC.2020.3028340]]]
8 L. Shen, K. Li, G. Feng, P. An, and Z. Liu, "Efficient intra mode selection for depth-map coding utilizing spatiotemporal, inter-component and inter-view correlations in 3D-HEVC," IEEE Transactions on Image Processing, vol. 27, no. 9, pp. 4195-4206, 2018. https://doi.org/10.1109/TIP.2018.2837379doi:[[[10.1109/TIP.2018.2837379]]]
9 Y . Liu, Z. G. Li, and Y . C. Soh, "Rate control of H.264/A VC scalable extension," IEEE Transactions on Circuits and Systems for Video Technology, vol. 18, no. 1, pp. 116-121, 2008. https://doi.org/10.1109/TCSVT. 2007.903325doi:[[[10.1109/TCSVT.2007.903325]]]
10 S. Tan, S. Ma, S. Wang, S. Wang, and W. Gao, "Inter-view dependency-based rate control for 3D-HEVC," IEEE Transactions on Circuits and Systems for video Technology, vol. 27, no. 2, pp. 337-351, 2017. https://doi.org/10.1109/TCSVT.2015.2511878doi:[[[10.1109/TCSVT.2015.2511878]]]
11 R. Abolfathi, H. Roodaki, and S. Shirmohammadi, "A novel rate control method for free-viewpoint video in MV-HEVC," in Proceedings of 2019 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 2019, pp. 582-587. https://doi.org/10.1109/ICCNC.2019.8685633doi:[[[10.1109/ICCNC.2019.8685633]]]
12 P. J. Lee and Y . C. Lai, "Perceptual awareness rate control for multi-view video encoder in stereoscopic display," Journal of Display Technology, vol. 9, no. 7, pp. 552-560, 2013. https://doi.org/10.1109/JDT.2012. 2237382doi:[[[10.1109/JDT.2012.2237382]]]
13 T. Y . Chung, J. Y . Sim, and C. S. Kim, "Bit allocation algorithm with novel view synthesis distortion model for multiview video plus depth coding," IEEE Transactions on Image Processing, vol. 23, no. 8, pp. 32543267, 2014. https://doi.org/10.1109/TIP.2014.2327801doi:[[[10.1109/TIP.2014.2327801]]]
14 J. Lei, X. He, H. Yuan, F. Wu, N. Ling, and C. Hou, "Region adaptive R-λ model-based rate control for depth maps coding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 6, pp. 13901405, 2018. https://doi.org/10.1109/TCSVT.2017.2658024doi:[[[10.1109/TCSVT.2017.2658024]]]
15 H. Yuan, S. Kwong, X. Wang, W. Gao, and Y . Zhang, "Rate distortion optimized inter-view frame level bit allocation method for MV-HEVC," IEEE Transactions on Multimedia, vol. 17, no. 12, pp. 2134-2146, 2015. https://doi.org/10.1109/TMM.2015.2477682doi:[[[10.1109/TMM.2015.2477682]]]
16 J. Chen, L. Chen, H. Zeng, C. H. Hsia, T. Wang, and K. K. Ma, "3D-gradient guided rate control model for screen content video coding," IEEE Transactions on Multimedia, vol. 25, pp. 7930-7942, 2023. https://doi.org/10.1109/TMM.2022.3232020doi:[[[10.1109/TMM.2022.323]]]
17 S. Hu, S. Kwong, Y . Zhang, and C. C. J. Kuo, "Rate-distortion optimized rate control for depth map-based 3-D video coding," IEEE Transactions on Image Processing, vol. 22, no. 2, pp. 585-594, 2013. https://doi.org/10.1109/TIP.2012.2219549doi:[[[10.1109/TIP.2012.229]]]
18 Y . Chen, M. Wang, S. Wang, Z. Ni, and S. Kwong, "A CTU-level screen content rate control for low-delay versatile video coding," IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 9, pp. 5227-5241, 2023. https://doi.org/10.1109/TCSVT.2023.3243225doi:[[[10.1109/TCSVT.2023.3243225]]]
19 F. Shao, G. Jiang, W. Lin, M. Yu, and Q. Dai, "Joint bit allocation and rate control for coding multi-view video plus depth based 3D video," IEEE Transactions on Multimedia, vol. 15, no. 8, pp. 1843-1854, 2013. https://doi.org/10.1109/TMM.2013.2269897doi:[[[10.1109/TMM.2013.2269897]]]
20 M. Hofbauer, C. B. Kuhn, G. Petrovic, and E. Steinbach, "Preprocessor rate control for adaptive multi-view live video streaming using a single encoder," IEEE Transactions on Circuits and Systems for Video Technology, vol. 32, no. 8, pp. 5551-5565, 2022. https://doi.org/10.1109/TCSVT.2022.3142403doi:[[[10.1109/TCSVT.2022.3142403]]]
21 J. T. Fang, C. C. Chan, S. R. Hsu, and P. C. Chang, "Robust rate control mechanism against scene change for H. 264/A VC," in Proceedings of 2013 IEEE International Symposium on Consumer Electronics (ISCE), Hsinchu, Taiwan, 2013, pp. 241-242. https://doi.org/10.1109/ISCE.2013.6570206doi:[[[10.1109/ISCE.2013.6570206]]]
22 J. Qin, H. Bai, and Y . Zhao, "Rate control algorithm in HEVC based on scene-change detection," in Proceedings of 2019 Data Compression Conference (DCC), Snowbird, UT, USA, 2019, pp. 600-600. https://doi.org/10.1109/DCC.2019.00112doi:[[[10.1109/DCC.2019.00112]]]
23 H. Yang, L. Shen, Y . Yang, and W. Lin, "A novel rate control scheme for video coding in HEVC-SCC," IEEE Transactions on Broadcasting, vol. 66, no. 2, pp. 333-345, 2020. https://doi.org/10.1109/TBC.2019.2954062doi:[[[10.1109/TBC.2019.2954062]]]
24 Z. Liu, L. Wang, X. Li, and X. Ji, "Optimize x265 rate control: an exploration of lookahead in frame bit allocation and slice type decision," IEEE Transactions on Image Processing, vol. 28, no. 5, pp. 2558-2573, 2019. https://doi.org/10.1109/TIP.2018.2887200doi:[[[10.1109/TIP.2018.2887200]]]
25 M. Zhou, X. Wei, S. Kwong, W. Jia, and B. Fang, "Rate control method based on deep reinforcement learning for dynamic video sequences in HEVC," IEEE Transactions on Multimedia, vol. 23, pp. 1106-1121, 2021. https://doi.org/10.1109/TMM.2020.2992968doi:[[[10.1109/TMM.2020.2992968]]]
26 J. Huo, X. Zhou, H. Yuan, S. Wan, and F. Yang, "Fast rate-distortion optimization for depth maps in 3-D video coding," IEEE Transactions on Broadcasting, vol. 69, no. 1, pp. 21-32, 2023. https://doi.org/10.1109/ TBC.2022.3192992doi:[[[10.1109/TBC.2022.392]]]
27 M. H. Hyun, B. Lee, and M. Kim, "A VVC intra rate control with small bit fluctuations using a Lagrange multiplier adjustment," IEEE Transactions on Multimedia, vol. 26, pp. 6811-6821, 2024. https://doi.org/10. 1109/TMM.2024.3355633doi:[[[10.1109/TMM.2024.3355633]]]
28 J. Lin, A. Huang, T. Zhao, X. Wang, and S. Kwong, "λ-domain VVC rate control based on Nash equilibrium," IEEE Transactions on Circuits and Systems for Video Technology, vol. 33, no. 7, pp. 3477-3487, 2023. https://doi.org/10.1109/TCSVT.2022.3231335doi:[[[10.1109/TCSVT.2022.3231335]]]
29 L. Shen, Z. Liu, Z. Zhang, and X. Shi, "Rate control based on an incremental proportional-integral-differential algorithm," Optical Engineering, vol. 46, no. 7, article no. 077002, 2007. https://doi.org/10.1117/1.2754303doi:[[[10.1117/1.2754303]]]
30 D. Fani and M. Rezaei, "Novel PID-fuzzy video rate controller for high-delay applications of the HEVC standard," IEEE Transactions on Circuits and Systems for Video Technology, vol. 28, no. 6, pp. 1379-1389, 2018. https://doi.org/10.1109/TCSVT.2017.2669214doi:[[[10.1109/TCSVT.2017.2669214]]]
31 F. Liu and Z. Chen, "Multi-objective optimization of quality in VVC rate control for low-delay video coding," IEEE Transactions on Image Processing, vol. 30, pp. 4706-4718, 2021. https://doi.org/10.1109/TIP.2021.307 2225doi:[[[10.1109/TIP.2021.3072225]]]

Received: March 20 2025

Revision received: May 9 2025

Accepted: May 13 2025

Published (Print): December 31 2025

Published (Electronic): December 31 2025

Corresponding Author: Qian Zhang , qianzhang@shnu.edu.cn

Tao Yan, College of Artificial Intelligence, Putian University, Putian, Fujian, China, yantaoshu@aliyun.com

Qian Zhang, School of Information, Mechanical and Electrical Engineering, Shanghai Normal University, Shanghai, China, qianzhang@shnu.edu.cn