Journal of Zhejiang University

Frontiers of Information Technology & Electronic Engineering 2018 Vol.19 No.3 P.459-470

Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots

Author(s): Xu-guang Zuo, Lu Yu
Affiliation(s): 1. Zhejiang Provincial Key Laboratory of Information Processing, Communication and Networking (IPCAN), Institute of Information and Communication Engineering, Zhejiang University, Hangzhou 310027, China
Corresponding email(s): yul@zju.edu.cn
Key Words: High Efficiency Video Coding (HEVC), Long-term temporal correlation, Long-term prediction, Hierarchical B-picture structure

Share this article to： More <<< Previous Article \|

Xu-guang Zuo, Lu Yu. Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots[J]. Frontiers of Information Technology & Electronic Engineering, 2018, 19(3): 459-470.

@article{title="Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots",
author="Xu-guang Zuo, Lu Yu",
journal="Frontiers of Information Technology & Electronic Engineering",
volume="19",
number="3",
pages="459-470",
year="2018",
publisher="Zhejiang University Press & Springer",
doi="10.1631/FITEE.1601552"
}

%0 Journal Article
%T Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots
%A Xu-guang Zuo
%A Lu Yu
%J Frontiers of Information Technology & Electronic Engineering
%V 19
%N 3
%P 459-470
%@ 2095-9184
%D 2018
%I Zhejiang University Press & Springer
%DOI 10.1631/FITEE.1601552

TY - JOUR
T1 - Long-term prediction for hierarchical-B-picture-based coding of video with repeated shots
A1 - Xu-guang Zuo
A1 - Lu Yu
J0 - Frontiers of Information Technology & Electronic Engineering
VL - 19
IS - 3
SP - 459
EP - 470
%@ 2095-9184
Y1 - 2018
PB - Zhejiang University Press & Springer
ER -
DOI - 10.1631/FITEE.1601552

Abstract
Chinese Summary
Academic Network
Reviewer Comment

Abstract: The latest video coding standard high Efficiency Video Coding (HEVC) can achieve much higher coding efficiency than previous video coding standards. Particularly, by exploiting the hierarchical B-picture prediction structure, temporal redundancy among neighbor frames is eliminated remarkably well. In practice, videos available to consumers usually contain many repeated shots, such as TV series, movies, and talk shows. According to our observations, when these videos are encoded by HEVC with the hierarchical B-picture structure, the temporal correlation in each shot is well exploited. However, the long-term correlation between repeated shots has not been used. We propose a long-term prediction (LTP) scheme to use the long-term temporal correlation between correlated shots in a video. The long-term reference (LTR) frames of a source video are chosen by clustering similar shots and extracting the representative frames, and a modified hierarchical B-picture coding structure based on an LTR frame is introduced to support long-term temporal prediction. An adaptive quantization method is further designed for LTR frames to improve the overall video coding efficiency. Experimental results show that up to 22.86% coding gain can be achieved using the new coding scheme.

重复镜头视频基于分级B帧结构的长期预测编码方案

概要：最新视频编码标准--高效视频编码（HEVC）—获得了远高于之前视频编码标准的编码效率。特别是，在使用分级B帧预测结构时，HEVC可显著去除临近图像间的时间冗余。实际中，消费者使用的视频--例如电视剧、电影和脱口秀等 —通常包含了很多重复镜头。据我们观察，这些视频在分级B帧结构下使用HEVC编码时，每个镜头的时域相关性可以得到很好利用。然而，重复镜头之间的长期相关性并未被使用。我们提出一种长期预测方案，以利用视频中重复镜头之间的长期时域相关性。首先，聚类源视频中的相似镜头并从中抽取代表性图像作为长期参考帧。然后，使用长期参考帧对分级B帧的编码结构进行调整，以实现长期时域预测。最后，为长期参考帧设计一种自适应量化方法，以提高整体的视频编码效率。实验表明，提出的新编码方案可获得高达22.86%的编码性能增益。

关键词：高效视频编码（HEVC）；长期时域相关性；长期预测；分级B帧结构

Darkslateblue:Affiliate; Royal Blue:Author; Turquoise:Article

Reference

[1]Alfonso D, Biffi B, Pezzoni L, 2006. Adaptive GOP size control in H.264/AVC encoding based on scene change detection. Proc 7^th Nordic Signal Processing Symp, p.86-89.

[2]Bjontegaard G, 2001. Calculation of average PSNR differences between RD curves. Document VCEG-M33. Austin, TX, USA.

[3]Bossen F, 2013. Common HM test conditions and software reference configurations. Document JCT-VC L1100. Geneva, Switzerland.

[4]Cendrowski M, 2013. The Hofstadter Insufficiency. The Big Bang Theory. DVD. Season 7. Episode 1. CBS.

[5]Dahl J, 2015. Chapter 33. House of Cards. DVD. Season 3, Episode 7. Netflix.

[6]Gao YB, Zhu C, Li S, 2016. Hierarchical temporal dependent rate-distortion optimization for low-delay coding. Proc IEEE Int Symp on Circuits and Systems, p.570-573.

[7]Hartigan JA, Wong MA, 1979. Algorithm AS 136:a K-means clustering algorithm. J R Stat Soc, 28(1):100-108.

[8]Hu N, Yang EH, 2015. Fast mode selection for HEVC intra-frame coding with entropy coding refinement based on a transparent composite model. IEEE Trans Circ Syst Video Technol, 25(9):1521-1532.

[9]Lee J, Kim S, Lim K, et al., 2015. A fast CU size decision algorithm for HEVC. IEEE Trans Circ Syst Video Technol, 25(3):411-421.

[10]Lenka K, Jaroslav P, Michal M, 2018. Adaptive group of pictures structure based on the positions of video cuts. Proc World Academy of Science, Engineering and Technology, p.377-380.

[11]Li S, Zhu C, Gao YB, et al., 2016. Lagrangian multiplier adaptation for rate-distortion optimization with inter-frame dependency. IEEE Trans Circ Syst Video Technol, 26(1):117-129.

[12]Liu D, Zhao DB, Ji XY, et al., 2010. Dual frame motion compensation with optimal long-term reference frame selection and bit allocation. IEEE Trans Circ Syst Video Technol, 20(3):325-339.

[13]McCarthy C, 2014. The Sign of Three. Sherlock. DVD. Season 3, Episode 2. BBC.

[14]Ngo CW, Pong TC, Zhang HJ, 2001. On clustering and retrieval of video shots. Proc 9^th ACM Int Conf on Multimedia, p.51-60.

[15]Nutter D, 2012. A Man Without Honor. Game of Thrones. DVD. Season 2, Episode 7. HBO.

[16]Pan ZQ, Kwong S, Sun MT, et al., 2014. Early MERGE mode decision based on motion estimation and hierarchical depth correlation for HEVC. IEEE Trans Broadcast, 60(2):405-412.

[17]Pan ZQ, Zhang Y, Lei JJ, et al., 2016a. Early DIRECT mode decision based on all-zero block and rate distortion cost for multiview video coding. IET Image Process, 10(1):9-15.

[18]Pan ZQ, Zhang Y, Kwong S, 2016b. Fast mode decision based on texture–depth correlation and motion prediction for multiview depth video coding. J Real-Time Image Process, 11(1):27-36.

[19]Pan ZQ, Lei JJ, Zhang Y, et al., 2016c. Fast motion estimation based on content property for low-complexity H.265/HEVC encoder. IEEE Trans Broadcast, 62(3):675-684.

[20]Pan ZQ, Jin P, Lei JJ, et al., 2016d. Fast reference frame selection based on content similarity for low complexity HEVC encoder. J Vis Commun Image Represent, 40:516-524.

[21]Paul M, Lin WS, Lau CT, et al., 2011. Explore and model better I-frames for video coding. IEEE Trans Circ Syst Video Technol, 21(9):1242-1254.

[22]Paul M, Lin WS, Lau CT, et al., 2014. A long-term reference frame for hierarchical B-picture-based video coding. IEEE Trans Circ Syst Video Technol, 24(10):1729-1742.

[23]Rosewarne C, Bross B, Naccari M, et al., 2016. High Efficiency Video Coding (HEVC) Test Model 16 (HM 16). Document JCTVC-X1002. Geneva, Switzerland.

[24]Scardino D, 2015. And the Show and Don’t Tell. 2 Broke Girls. DVD. Season 5, Episode 17. CBS.

[25]Schwarz H, Marpe D, Wiegand T, 2007. Overview of the scalable video coding extension of the H.264/AVC standard. IEEE Trans Circ Syst Video Technol, 17(9):1103-1120.

[26]Sullivan GJ, Ohm JR, Han WJ, et al., 2012. Overview of the High Efficiency Video Coding (HEVC) standard. IEEE Trans Circ Syst Video Technol, 22(12):1649-1668.

[27]Tang XL, Dai SK, Cai CH, 2010. An analysis of TZSearch algorithm in JMVC. Proc IEEE Int Conf on Green Circuits and Systems, p.516-520.

[28]Tirone R, 2015. The Price. Once Upon a Time. DVD. Season 5, Episode 2. ABC.

[29]Tiwari M, Cosman PC, 2008. Selection of long-term reference frames in dual-frame video coding using simulated annealing. IEEE Signal Process Lett, 15:249-252.

[30]Vendrig J, Worring M, 2002. Systematic evaluation of logical story unit segmentation. IEEE Trans Multim, 4(4):492-499.

[31]Wang XY, Weng ZK, 2000. Scene abrupt change detection. Proc IEEE Conf on Electrical and Computer Engineering, p.880-883.

[32]Wiegand T, Sullivan GJ, Bjontegaard G, et al., 2003. Overview of the H.264/AVC video coding standard. IEEE Trans Circ Syst Video Technol, 13(7):560-576.

[33]Youm S, Kim W, 2003. Dynamic threshold method for scene change detection. Proc IEEE Int Conf on Multimedia and Expo, p.337-340.

[34]Zhang XG, Liang LH, Huang H, et al., 2010. An efficient coding scheme for surveillance videos captured by stationary cameras. Proc SPIE Visual Communications and Image Processing, p.1-10.

[35]Zhang XG, Tian YH, Huang TJ, et al., 2012. Low-complexity and high-efficiency background modeling for surveillance video coding. Proc IEEE Visual Communications and Image Processing, p.769-784.

[36]Zhang XG, Huang TJ, Tian YH, et al., 2014. Background-modeling-based adaptive prediction for surveillance video coding. IEEE Trans Image Process, 23(2):769-784.

[37]Zheng XL, 2012. Empresses in the Palace. DVD. Beijing Television Arts Centre, Beijing, China (in Chinese).

[38]Zuo XG, Yu L, 2015. A novel interpolation-free scheme for fractional pixel motion estimation. Proc Picture Coding Symp, p.80-84.

Open peer comments: Debate/Discuss/Question/Opinion

<1>