Home Biography Education Research
Ph.D.

Biography

Dr. Dong received the B.Eng. and M.Eng. degrees, both in Information Engineering from Zhejiang University, Hangzhou, China, in 2002 and 2005, respectively, and received the Ph.D. degree in Electronic Engineering in 2009, from the Chinese University of Hong Kong, Hong Kong, China, where she worked as a Postdoctoral Fellow in the following year. In 2011, she joined InterDigital Communications, U.S.A., as Staff Engineer. She resigned from her position in 2014, and enjoyed the life of a stay-at-home mom for the next two and a half years. In 2016, she joined Qualcomm Technologies, Inc., U.S.A., as Senior Staff Engineer. From 2003 to 2009, she was an active participant in Chinese standardization for multimedia with successful submissions to AVS workgroup. From 2011 to 2014, she had been engaged in the standardization effort of HEVC and its extensions. Her current research interests include high-efficiency video coding and processing.

View Jie Dong's LinkedIn Profile LinkedIn Profile          View Jie Dong's Google Scholar Profile Google Scholar Profile          Resume PDF Resume

Photo
Back To Top

Academic Qualifications

Aug. 2005 - Dec. 2009 Ph.D. Degree in Electronic Engineering The Chinese University of Hong Kong, China
Sept. 2002 - Mar. 2005 M.Eng. Degree in Information Engineering Zhejiang University, China
Sept. 1998 - Jul. 2002 B.Eng. Degree in Information Engineering Zhejiang University, China
Back To Top

Theses

1 "Analysis, Coding, and Processing for High-Definition Videos," Ph.D. Thesis, The Chinese University of Hong Kong. Dec. 2009.
Thesis Advisor: Prof. King Ngi NGAN. [Abstract] [BibTeX] [Full Text] [Oral]

Abstract: Today, High-Definition (HD) videos become more and more popular with many applications. This thesis analyzes the characteristics of HD videos and develops the appropriate coding and processing techniques accordingly for hybrid video coding.

Firstly, the characteristics of HD videos are studied quantitatively. The results show that HD videos distinguish from other lower resolution videos by higher spatial correlation and special power spectral density (PSD), mainly distributed along the vertical and horizontal directions.

Secondly, two techniques for HD video coding are developed based on the aforementioned analysis results. To exploit the spatial property, 2D order-16 transforms are proposed to code the higher correlated signals more efficiently. Specially, two series of 2D order-16 integer transforms, named modified integer cosine transform (MICT) and non-orthogonal integer cosine transform (NICT), are developed to provide different trade-offs between the performance and the complexity. Based on the property of special PSD, parametric interpolation filter (PIF) is proposed for motioncompensated prediction (MCP). Not only can PIF track the non-stationary statistics of video signals as the related work shows, but also it represents interpolation filters by parameters instead of individual coefficients, thus solving the conflict of the accuracy of coefficients and the size of side information. The experimental results show the proposed two coding techniques significantly outperform their equivalents in the state-of-the-art international video coding standards.

Thirdly, interlaced HD videos are studied, and to satisfy different delay constraints, two real-time de-interlacing algorithms are proposed specially for H.264 coded videos. They adapt to local activities, according to the syntax element (SE) values. Accuracy analysis is also introduced to deal with the disparity between the SE values and the real motions and textures. The de-interlacers provide better visual quality than the commonly used ones and can de-interlace 1080i sequences in real time on PCs.


BibTeX entry:
@PHDTHESIS{2009_dong_phdthesis,
author = {J. Dong},
title = {Analysis, Coding, and Processing for High-Definition Videos},
school = {The Chinese University of Hong Kong},
year = {2009},
month = {Dec.}
}
2 "Context-based Entropy Coding in Video Compression," Master Thesis, Zhejiang University. Mar. 2005.
Thesis Advisor: Prof. Lu YU. [Abstract]

Abstract: Entropy coding is a basic and critical technology in video compression. No matter what video system is used, all the syntax elements should be converted into binary symbols and removed statistic redundancy by entropy coding.

As a kind of lossless codings, the entropy coding used in video compression includes the scalar lossless coding, the vector lossless coding and the conditional lossless coding. In this thesis, the theoretical limits of the 3 kinds of lossless codings are introduced, and it is obvious that the latter 2 could achieve higher coding efficiency than the scalar lossless coding.

In the area of video compression, Huffman coding and arithmetic coding are widely used. Huffman coding converts a fixed number of symbols into a variable-length codeword, while arithmetic coding converts a variable number of symbols into a variable-length codeword. Huffman and arithmetic methods are probability-model-based, and both can reach the entropy bound asymptotically. Because of the higher coding efficiency and ease of adptation, arithmetic coding is a better alternative than Huffman coding, as long as the computation involved is acceptable.

In the past video coding standards, such as MPEG-2/4, H.261/H.263, syntax elements are compressed by scalar lossless Huffman coding. To improve the coding efficiency, in the Baseline Profile and the Extended Profile of the latest international standard H.264, conditional Huffman coding is employed, known as CAVLC (Context-based Adaptive Variable-Length Coding). CAVLC regards contexts as different conditions, and uses multiple VLC tables for different conditioning states, and improves the coding efficiency greatly.

In order to avoid using multiple VLC tables and further improve the coding efficiency, the Main Profile of H.264 adopts an alternative scheme of entropy coding, known as CABAC (Context-based Adaptive Binary Arithmetic Coding). The arithmetic coding engine it employs can adapt to changes in the source statistics simply by updating the symbol probability table. Compared with CAVLC, CABAC can save 9%~14% bitrate but leads to more computational complexity.

National standard AVS (China's Audio Video Coding Standard) employs the 2-D joint variable-length coding to remove the redundancy between the level and run in transformed coefficients blocks, which means to regard the pair of (level, run) as a joint event, estimate its probability distribution and design 2-D VLC tables accordingly. This method is more suitable for parallel implementation compared with CAVLC and CABAC. Though widely used in many international video coding standards, many existing 2-D joint variable-length coding methods are not conditional lossless coding ones. However, the method used in AVS is a kind of conditional lossless codings, which regards the asymptotically changed (level, run) as contexts and improve the coding efficiency effectively.

Back To Top

Book Chapter

1 Jie Dong and King Ngi Ngan, "Present and future video coding standards," in Intelligent Multimedia Communication: Techniques and Applications, Springer-Verlag Publisher, ISBN 978-3-642-11685-8, Jan. 2010, pp. 75-124. [Abstract] [BibTeX] [Full Text] [Errata]

Abstract: Video coding systems may greatly differ from each other, as they consist of complicated functional modules and each module can be realized by using different techniques. For the interoperability in communication, video coding standards are developed to restrict various video coding systems, such that manufacturers can successfully interwork with each other by producing compliant encoders and decoders, and at the same time still have the freedom to develop competitive and innovative products.

A standard defines a coded representation, syntax, which describes the video in a compressed form. In other words, a standard only specifies the output of the encoder, i.e., the input of the decoder, instead of the codec itself. Although the standard also provides a method of decoding the syntax to reconstruct the video, the manufacturers are free to develop alternative decoders as long as they can decode the syntax and produce the same result as that in the standard.

Video coding standards have been developing for about 20 years, driven by applications and advances in hardware capability. This chapter begins with an introduction of the block-based hybrid coding scheme, which is essentially the core of all the video coding standards. In Section 2, the past video coding standards are briefly reviewed, including H.261, MPEG-1, MPEG-2/H.262, H.263, and MPEG-4. The latest standard, H.264/AVC, is the emphasis of this chapter, which is introduced in Section 3, including the new technical developments and profiles favoring a wide range of applications. The recently finalized amendments on scalable video coding (SVC) and multiview video coding (MVC) are another two highlights. Section 4 presents the Audio and Video Coding Standard (AVS) of China, which has received much attention throughout the world, even though it is a national standard. Finally, Section 5 introduces the current topics intensively studied in the standardization groups, which may become the key techniques in the future coding standards.



BibTeX entry:
@INBOOK{2010_dong_chapter,
chapter = {Present and future video coding standards},
pages = {75-124},
title = {Intelligent Multimedia Communication: Techniques and Applications},
publisher = {Springer-Verlag Publisher},
year = {2010},
editor = {C. W. Chen and Z. Li and S. Lian},
author = {J. Dong and K. N. Ngan},
month = {Jan.},
doi = {10.1007/978-3-642-11686-5_3}
}
Back To Top

Journal Papers

1 Jie Dong and Yan Ye, "Adaptive downsampling for high-definition video coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 24, No. 3, pp. 480-488, Mar. 2014. [Abstract] [BibTeX] [Full Text]

Abstract: Previous research has shown that downsampling prior to encoding and upsampling after decoding can improve the rate-distortion (R-D) performance compared with directly coding the original video using standard technologies, e.g., JPEG and H.264/AVC, especially at low bit rates. This paper proposes a practical algorithm to find the optimal downsampling ratio that balances the distortions caused by downsampling and coding, thus achieving the overall optimal R-D performance. Given the optimal sampling ratio, dedicated filters for down- and up-sampling are also designed. Simulations show this algorithm improves the R-D performance over a wide range of bit rates.

BibTeX entry:
@ARTICLE{2014_dong_tcsvt,
author = {J. Dong and Y. Ye},
title = {Adaptive downsampling for high-definition video coding},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2014},
volume = {24},
pages = {480-488},
number = {3},
month = {Mar.}
}
2 Jie Dong and King Ngi Ngan, "Two-layer directional transform for high performance video coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 22, No. 4, pp. 619-625, Apr. 2012. [Abstract] [BibTeX] [Full Text]

Abstract: This paper presents a directional transform scheme for coding inter-prediction errors in block-based hybrid video coding. It proposes a two-layer transform structure, where the first layer uses discrete wavelet transform (DWT) to compact the residue energy to the LL band and then the second layer uses 2-D non-separable directional transforms to deal with the arbitrary edge directions in the four subbands. By doing this, the edges in a macroblock (MB) are efficiently compacted to a few coefficients and at the same time the overhead used to indicate the transform directions is affordable. Experimental results show that the proposed scheme provides PSNR gain up to 0.46 dB, compared with H.264/AVC High Profile.

BibTeX entry:
@ARTICLE{2012_dong_tcsvt,
author = {J. Dong and K. N. Ngan},
title = {Two-layer directional transform for high performance video coding},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2012},
volume = {22},
pages = {619-625},
number = {4},
month = {Apr.}
}
3 Jie Dong and King Ngi Ngan, "Adaptive pre-interpolation filter for high efficiency video coding," Journal of Visual Communication and Image Representation, Special Issue on Emerging Techniques for High Performance Video Coding, Vol. 22, Issue 8, pp. 697-703, Nov. 2011. [Abstract] [BibTeX] [Full Text]

Abstract: The proposed interpolation filter comprises two concatenating filters, adaptive pre-interpolation filter (APIF) and the normative interpolation filter in H.264/AVC. The former is applied only to the integer pixels in the reference frames; the latter generates all the sub-position samples, supported by the output of APIF. The convolution of APIF and the standard filter minimizes the motion prediction error on a frame basis. APIF preserves the merits of the adaptive interpolation filter (AIF) and the adaptive loop filter (ALF) in the key technical area (KTA) software and at the same time overcomes their drawbacks. The experimental results show that APIF outperforms either AIF or ALF. Compared with the joint use of AIF and ALF, APIF provides comparable performance, but has much lower complexity.

BibTeX entry:
@ARTICLE{2011_dong_jvcir,
author = {J. Dong and K. N. Ngan},
title = {Adaptive pre-interpolation filter for high efficiency video coding},
journal = {Journal of Visual Communication and Image Representation, Special Issue on Emerging Techniques for High Performance Video Coding},
year = {2011},
volume = {22},
pages = {697-703},
issue = {8},
month = {Nov.}
}
4 Jie Dong and King Ngi Ngan, "Parametric interpolation filter for HD video coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol.20, No.12, pp.1892-1897, Dec. 2010. [Abstract] [BibTeX] [Full Text]

Abstract: Recently, adaptive interpolation filter (AIF) for motion-compensated prediction (MCP) has received increasing attention. This paper studies the existing AIF techniques, and points out that making trade-off between the two conflicting aspects: the accuracy of coefficients and the size of side information, is the major obstacle to improving the performance of the AIF techniques that code the filter coefficients individually. To overcome this obstacle, parametric interpolation filter (PIF) is proposed for MCP, which represents interpolation filters by a function determined by five parameters instead of by individual coefficients. The function is designed based on the fact that high frequency energy of HD video source is mainly distributed along the vertical and horizontal directions; the parameters are calculated to minimize the energy of prediction error. The experimental results show that PIF significantly outperforms the existing AIF techniques and approaches the efficiency of the optimal filter.

BibTeX entry:
@ARTICLE{2010b_dong_tcsvt,
author = {J. Dong and K. N. Ngan},
title = {Parametric interpolation filter for {HD} video coding},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2010},
volume = {20},
pages = {1892-1897},
number = {12},
month = {Dec.}
}
5 Jie Dong and King Ngi Ngan, "Real-time de-interlacing for H.264 coded HD videos," IEEE Transactions on Circuits and Systems for Video Technology, Vol.20, No.8, pp.1144-1149, Aug. 2010. [Abstract] [BibTeX] [Full Text]

Abstract: It is very challenging to de-interlace HD videos in real time, as both good visual quality and low complexity should be fulfilled, which, however, are conflicting. To resolve the conflict, a syntax-based de-interlacer is proposed specially for H.264 coded videos, by which the values of the syntax elements (SE) in H.264 bitstreams, such as macroblock (MB) type, intra prediction modes, and motion vectors (MV), are used to detect edge and motion, estimate the MV, and select proper local de-interlacing algorithms accordingly. A verification mechanism is also proposed to make sure the referred SE values are reliable. The experimental results show the proposed deinterlacer provides better visual quality than common ones and can de-interlace 1080i sequences in real time on PCs.

BibTeX entry:
@ARTICLE{2010a_dong_tcsvt,
author = {J. Dong and K. N. Ngan},
title = {Real-time de-interlacing for {H}.264 coded {HD} videos},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2010},
volume = {20},
pages = {1144-1149},
number = {8},
month = {Aug.}
}
6 Jie Dong, King Ngi Ngan, Chi Keung Fong, and Wai Kuen Cham, "2D order-16 integer transforms for HD video coding," IEEE Transactions on Circuits and Systems for Video Technology, Vol.19, No.10, pp.1463-1474, Oct. 2009. [Abstract] [BibTeX] [Full Text] [Comments from Wong and Siu]

Abstract: In this paper, the spatial properties of high definition (HD) videos are investigated based on a large set of HD video sequences. Compared with lower resolution videos, the prediction errors of HD videos have higher correlation. Hence, we propose using 2-D order-16 transforms for HD video coding, which are expected to be more efficient to exploit this spatial property, and specifically propose two types of 2-D order-16 integer transforms, nonorthogonal integer cosine transform (ICT) and modified ICT. The former resembles the discrete cosine transform (DCT) and is approximately orthogonal, of which the transform error introduced by the nonorthogonality is proven to be negligible. The latter modifies the structure of the DCT matrix and is inherently orthogonal, no matter what the values of the matrix elements are. Both types allow selecting matrix elements more freely by releasing the orthogonality constraint and can provide comparable performance with that of the DCT. Each type is integrated into the audio video coding standard (AVS) enhanced profile and the H.264 high profile, respectively, and used adaptively as an alternative to the 2-D order-8 transform according to local activities. At the same time, many efforts have been devoted to further reducing the complexity of the 2-D order-16 transforms and specially for the modified ICT, a fast algorithm is developed and extended to a universal approach. Experimental results show that 2-D order-16 transforms provide significant performance improvement for both AVS enhanced profile and H.264 high profile, which means they can be efficient coding tools especially for HD video coding.

BibTeX entry:
@ARTICLE{2009_dong_tcsvt,
author = {J. Dong and K. N. Ngan and C. K. Fong and W. K. Cham},
title = {2{D} order-16 integer transforms for {HD} video coding},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2009},
volume = {19},
pages = {1462-1474},
number = {10},
month = {Oct.}
}
7 Cixun Zhang, Lu Yu, Jian Lou, Wai Kuen Cham, and Jie Dong, "The technique of prescaled integer transform: concept, design and applications," IEEE Transactions on Circuits and Systems for Video Technology, Vol. 18, No. 1, pp. 84-97, Jan. 2008. [Abstract] [BibTeX] [Full Text]

Abstract: Integer cosine transform (ICT) is adopted by H.264/AVC for its bit-exact implementation and significant complexity reduction compared to the discrete cosine transform (DCT) with an impact in peak signal-to-noise ratio (PSNR) of less than 0.02 dB. In this paper, a new technique, named prescaled integer transform (PIT), is proposed. With PIT, while all the merits of ICT are kept, the implementation complexity of decoder is further reduced compared to corresponding conventional ICT, which is especially important and beneficial for implementation on low-end processors. Since not all PIT kernels are good in respect of coding efficiency, design rules that lead to good PIT kernels are considered in this paper. Different types of PIT and their target applications are examined. Both fixed block-size transform and adaptive block-size transform (ABT) schemes of PIT are also studied. Experimental results show that no penalty in performance is observed with PIT when the PIT kernels employed are derived from the design rules. Up to 0.2 dB of improvement in PSNR for all intra frame coding compared to H.264/AVC can be achieved and the subjective quality is also slightly improved when PIT scheme is carefully designed. Using the same concept, a variation of PIT, Post-scaled Integer Transform, can also be potentially designed to simplify the encoder in some special applications. PIT has been adopted in audio video coding standard (AVS), Chinese National Coding standard.

BibTeX entry:
@ARTICLE{2008_zhang_tcsvt,
author = {C. Zhang and L. Yu and J. Lou and W. K. Cham and J. Dong},
title = {The technique of prescaled integer transform: concept, design and applications},
journal = {IEEE Transactions on Circuits and Systems for Video Technology},
year = {2008},
volume = {18},
pages = {84-97},
number = {1},
month = {Jan.}
}
8 Feng Yi, Qichao Sun, Jie Dong, and Lu Yu "Low-complexity tools in AVS Part 7," Journal of Computer Science and Technology, Vol. 21, No. 3, pp. 345-353, May 2006. [Abstract] [BibTeX] [Full Text]

Abstract: Audio Video coding Standard (AVS) is established by the AVS Working Group of China. The main goal of AVS Part 7 is to provide high compression performance with relatively low complexity for mobility applications. There are 3 main low-complexity tools: deblocking filter, context-based adaptive 2D-VLC and direct intra prediction. These tools are presented and analyzed, respectively. Finally, we compare the performance and the decoding speed of AVS Part 7 and H.264 Baseline Profile. The analysis and results indicate that AVS Part 7 achieves similar performance with lower cost.

BibTeX entry:
@ARTICLE{2006_yi_jcst,
author = {F. Yi and Q. Sun and J. Dong and L. Yu},
title = {Low-complexity tools in {AVS} {P}art 7},
journal = {Journal of Computer Science and Technology},
year = {2006},
volume = {21},
pages = {345-353},
number = {3},
month = {May}
}
Back To Top

Conference Papers

1 Yuwen He, Yan Ye, and Jie Dong, "Robust 3D LUT estimation method for SHVC color gamut scalability," 2014 IEEE Visual Communications and Image Processing (VCIP2014), Valletta, Malta, Dec.7-10, 2014. [Abstract] [BibTeX] [Full Text]

Abstract: Color gamut scalability (CGS) in scalable extensions of High Efficiency Video Coding (SHVC) supports scalable coding with multiple layers in different color spaces. Base layer conveying HDTV video in BT.709 color space and enhancement layer conveying UHDTV video in BT.2020 color space is identified as a practical use case for CGS. Efficient CGS coding can be achieved using a 3D Look-up Table (LUT) based color conversion process. This paper proposes a robust 3D LUT parameter estimation method that estimates the 3D LUT parameters globally using the Least Square method. Problems of matrix sparsity and uneven sample distribution are carefully handled to improve the stability and accuracy of the estimation process. Simulation results confirm that the proposed 3D LUT estimation method can significantly improve coding performance compared with other gamut conversion methods.

BibTeX entry:
@CONFERENCE{2014_he_vcip,
author = {Y. He, Y. Ye, and J. Dong},
title = {Robust 3{D} {LUT} Estimation Method for {SHVC} Color Gamut Scalability},
booktitle = {Proceedings of 2014 IEEE Visual Communications and Image Processing (VCIP2014)},
year = {2014},
address = {Valletta, Malta},
month = {Dec.}
}
2 Thorsten Laude, Xiaoyu Xiu, Jie Dong, Yuwen He, Yan Ye, and Jörn Ostermann, "Scalable extension of HEVC using enhanced inter-layer prediction," 2014 IEEE International Conference on Image Processing (ICIP2014), Paris, France, Oct.27-30, 2014. [Abstract] [BibTeX] [Full Text]

Abstract: In Scalable High Efficiency Video Coding (SHVC), inter-layer prediction efficiency may be degraded because much high frequency information can be removed during: 1) the down-sampling/up-sampling process and, 2) the base layer coding/quantization process. In this paper, we present a method to enhance the quality of the inter-layer reference (ILR) picture by combining the high frequency information from enhancement layer temporal reference pictures with the low frequency information from the up-sampled base layer picture. Experimental results show that on average 3.9% weighted BD-rate gain is achieved compared to SHM-2.0 under SHVC common test conditions.

BibTeX entry:
@CONFERENCE{2014_laude_icip,
author = {T. Laude, X. Xiu, J. Dong, Y. He, Y. Ye, and J. Ostermann},
title = {Scalable extension of {HEVC} using enhanced inter-layer prediction},
booktitle = {Proceedings of 2014 IEEE International Conference on Image Processing (ICIP2014)},
year = {2014},
address = {Paris, France},
month = {Oct.}
}
3 Srinivas Gudumasu, Yuwen He, Yan Ye, Yong He, Eun-Seok Ryu, Jie Dong, and Xiaoyu Xiu, "Real-time SHVC software decoding with multi-threaded parallel processing," 2014 Applications of Digital Image Processing XXXVII, San Diego, USA, Aug. 18-21, 2014. [Abstract] [BibTeX] [Full Text]

Abstract: This paper proposes a parallel decoding framework for scalable HEVC (SHVC). Various optimization technologies are implemented on the basis of SHVC reference software SHM-2.0 to achieve real-time decoding speed for the two layer spatial scalability configuration. SHVC decoder complexity is analyzed with profiling information. The decoding process at each layer and the up-sampling process are designed in parallel and scheduled by a high level application task manager. Within each layer, multi-threaded decoding is applied to accelerate the layer decoding speed. Entropy decoding, reconstruction, and in-loop processing are pipeline designed with multiple threads based on groups of coding tree units (CTU). A group of CTUs is treated as a processing unit in each pipeline stage to achieve a better trade-off between parallelism and synchronization. Motion compensation, inverse quantization, and inverse transform modules are further optimized with SSE4 SIMD instructions. Simulations on a desktop with an Intel i7 processor 2600 running at 3.4 GHz show that the parallel SHVC software decoder is able to decode 1080p spatial 2x at up to 60 fps (frames per second) and 1080p spatial 1.5x at up to 50 fps for those bitstreams generated with SHVC common test conditions in the JCT-VC standardization group. The decoding performance at various bitrates with different optimization technologies and different numbers of threads are compared in terms of decoding speed and resource usage, including processor and memory.

BibTeX entry:
@CONFERENCE{2014_gudumasu_spie,
author = {S. Gudumasu, Y. He, Y. Ye, Y. He, E.-S. Ryu, J. Dong, and X. Xiu},
title = {Real-time {SHVC} software decoding with multi-threaded parallel processing},
booktitle = {Proceedings of 2014 Applications of Digital Image Processing XXXVII},
year = {2014},
address = {San Diego, USA},
month = {Aug.}
}
4 Chia-Ming Tsai, Yuwen He, Jie Dong, Yan Ye, Xiaoyu Xiu, and Yong He, "Joint-layer encoder optimization for HEVC scalable extensions," 2014 Applications of Digital Image Processing XXXVII, San Diego, USA, Aug. 18-21, 2014. [Abstract] [BibTeX] [Full Text]

Abstract: Scalable video coding provides an efficient solution to support video playback on heterogeneous devices with various channel conditions in heterogeneous networks. SHVC is the latest scalable video coding standard based on the HEVC standard. To improve enhancement layer coding efficiency, inter-layer prediction including texture and motion information generated from the base layer is used for enhancement layer coding. However, the overall performance of the SHVC reference encoder is not fully optimized because rate-distortion optimization (RDO) processes in the base and enhancement layers are independently considered. It is difficult to directly extend the existing joint-layer optimization methods to SHVC due to the complicated coding tree block splitting decisions and in-loop filtering process (e.g., de-blocking and sample adaptive offset (SAO) filtering) in HEVC. To solve those problems, a joint-layer optimization method is proposed by adjusting the quantization parameter (QP) to optimally allocate the bit resource between layers. Furthermore, to make more proper resource allocation, the proposed method also considers the viewing probability of base and enhancement layers according to packet loss rate. Based on the viewing probability, a novel joint-layer RD cost function is proposed for joint-layer RDO encoding. The QP values of those coding tree units (CTUs) belonging to lower layers referenced by higher layers are decreased accordingly, and the QP values of those remaining CTUs are increased to keep total bits unchanged. Finally the QP values with minimal joint-layer RD cost are selected to match the viewing probability. The proposed method was applied to the third temporal level (TL-3) pictures in the Random Access configuration. Simulation results demonstrate that the proposed joint-layer optimization method can improve coding performance by 1.3% for these TL-3 pictures compared to the SHVC reference encoder without joint-layer optimization.

BibTeX entry:
@CONFERENCE{2014_tsai_spie,
author = {C.-M. Tsai, Y. He, J. Dong, Y. Ye, X. Xiu, and Y. He},
title = {Joint-layer encoder optimization for {HEVC} scalable extensions},
booktitle = {Proceedings of 2014 Applications of Digital Image Processing XXXVII},
year = {2014},
address = {San Diego, USA},
month = {Aug.}
}
5 Thorsten Laude, Xiaoyu Xiu, Jie Dong, Yuwen He, Yan Ye, and Jörn Ostermann, "Improved inter-layer prediction for the scalable extensions of HEVC," 2014 Data Compression Conference (DCC 2014), Snowbird, USA, Mar. 26-28, 2014. [Abstract] [BibTeX] [Full Text]

Abstract: In Scalable High Efficiency Video Coding (SHVC), inter-layer prediction efficiency may be degraded because high frequency information is removed in: 1) the down-sampling/up-sampling process and, 2) the base layer coding/quantization process. In this paper, we present a method to enhance the quality of the inter-layer reference (ILR) picture by combining the high frequency information from enhancement layer temporal reference pictures with the low frequency information from the up-sampled base layer picture. Experimental results show that average BD-rate gains of 2.7%, 7.1% and 8.1% for the Y, U and V components, respectively, are achieved compared to SHM-2.0 under SHVC common test conditions.

BibTeX entry:
@CONFERENCE{2014_laude_dcc,
author = {T. Laude, X. Xiu, J. Dong, Y. He, Y. Ye, and J. Ostermann},
title = {Improved inter-layer prediction for the scalable extensions of {HEVC}},
booktitle = {Proceedings of 2014 Data Compression Conference (DCC 2014)},
year = {2014},
address = {Snowbird, USA},
month = {Mar.}
}
6 Jie Dong, Yan Ye, and Yuwen He, "Cross-plane chroma enhancement for SHVC inter-layer prediction," 30th Picture Coding Symposium (PCS2013), San Jose, USA, Dec. 8-10, 2013. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: This paper proposes a cross-plane chroma enhancement (CPCE) scheme to enhance the chroma planes of the inter layer reference (ILR) pictures for the Scalable extensions of HEVC (SHVC), the on-going scalable video coding project in JCT-VC. The CPCE scheme restores the blurred edges and textures in the chroma planes using the corresponding information from the luma plane. Experimental results under the SHVC common test conditions show that the average BD-rate reductions for the Cb and Cr chroma planes are as much as -7.5% and -8.5%, respectively, when compared with SHM-1.0.

BibTeX entry:
@CONFERENCE{2013_dong_pcs,
author = {J. Dong, Y. Ye, and Yuwen He},
title = {Cross-plane chroma enhancement for {SHVC} inter-layer prediction},
booktitle = {Proceedings of 2013 Picture Coding Symposium (PCS2013)},
year = {2013},
address = {San Jose, USA},
month = {Dec.}
}
7 Jie Dong and Yan Ye, "Adaptive downsampling for high-definition video coding," 2012 IEEE International Conference on Image Processing (ICIP2012), Orlando, USA, Sept. 30-Oct. 3, 2012. [Abstract] [BibTeX] [Poster] [Full Text]

Abstract: Previous research has shown that downsampling prior to encoding and upsampling after decoding can improve the rate-distortion (R-D) performance compared with directly coding the original video using standard coding technolo-gies, e.g., JPEG and H.264/AVC, especially at low bit rates. This paper proposes a practical algorithm to find the optimal downsampling ratio that balances the distortions caused by downsampling and coding, thus achieving the overall op-timal R-D performance. Given the optimal sampling ratio, dedicated filters for down- and up-sampling are also de-signed. Simulations show this algorithm improves the R-D performance over a wide range of bit rates, e.g., from 1.0 dB at high bit rates to 2.5 dB at low bit rates.

BibTeX entry:
@CONFERENCE{2012_dong_icip,
author = {J. Dong and Y. Ye},
title = {Adaptive downsampling for high-definition video coding},
booktitle = {Proceedings of 2012 IEEE International Conference on Image Processing (ICIP2012)},
year = {2012},
address = {Orlando, USA},
month = {Oct.}
}
8 Jie Dong and Yan Ye, "Core transform design for high efficiency video coding," 2012 Applications of Digital Image Processing XXXV, San Diego, USA, Aug. 12-16, 2012. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: High Efficiency Video Coding (HEVC) is the next generation video coding standard currently being developed by the Joint Collaborative Team on Video Coding (JCT-VC). It employs various coding unit sizes 2Kx2K, where K is a positive integer with the typical values from 3 to 6; it also uses larger transform sizes up to 32x32. This raises the interest of seeking high performance higher order integer transforms with low computation requirement. This paper presents approaches to designing order-N (N=4, 8, 16, 32) integer transforms, by which the derived integer transforms have special symmetry structures to ensure the matrix factorization. The proposed set of high order integer transforms with well selected elements demonstrates excellent coding performance, compared with the core transform design in HEVC.

BibTeX entry:
@CONFERENCE{2012_dong_spie,
author = {J. Dong and Y. Ye},
title = {Core transform design for high efficiency video coding},
booktitle = {Proceedings of 2012 Applications of Digital Image Processing XXXV},
year = {2012},
address = {San Diego, USA},
month = {Aug.}
}
9 Jie Dong and King Ngi Ngan, "Adaptive pre-interpolation filter for motion-compensated prediction," 2011 IEEE International Symposium on Circuits and Systems (ISCAS2011), Rio de Janeiro, Brazil, May 15-18, 2011. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: The proposed interpolation filter comprises two concatenating filters, adaptive pre-interpolation filter (APIF) and the normative interpolation filter in H.264/AVC. The former is applied only to the integer pixels in the reference frames; the latter generates all the sub-position samples, supported by the output of APIF. The convolution of APIF and the standard filter minimizes the motion prediction error on a frame basis. APIF preserves the merits of the adaptive interpolation filter (AIF) and the adaptive loop filter (ALF) in the key technical area (KTA) software and overcomes their drawbacks. The experimental results show that APIF has comparable or even better performance compared with the joint use of AIF and ALF.

BibTeX entry:
@CONFERENCE{2011_dong_iscas,
author = {J. Dong and K. N. Ngan},
title = {Adaptive pre-interpolation filter for motion-compensated prediction},
booktitle = {Proceedings of 2011 IEEE International Symposium on Circuits and Systems (ISCAS2011)},
year = {2011},
address = {Rio de Janeiro, Brazil},
month = {May}
}
10 Jie Dong and King Ngi Ngan, "Parametric interpolation filter for motion compensated prediction," 2009 IEEE International Conference on Image Processing (ICIP2009), Cairo, Egypt, Nov. 7-10, 2009. [Abstract] [BibTeX] [Poster] [Full Text] [Errata]

Abstract: Recently, adaptive interpolation filter (AIF) has received increasing attention for motion-compensated prediction (MCP). The existing methods code the filter coefficients individually and the accuracy of coefficients and the size of side information are conflicting. This paper studies the effect of making trade-off between the two conflicting aspects and proposes the parametric interpolation filter (PIF), which represents filters by five parameters instead of individual coefficients and approximates the optimal filter by tuning the parameters. The experimental results show that PIF approaches the efficiency of the optimal filter and outperforms the related work.

BibTeX entry:
@CONFERENCE{2009_dong_icip,
author = {J. Dong and K. N. Ngan},
title = {Parametric interpolation filter for motion compensated prediction},
booktitle = {Proceedings of 2009 IEEE International Conference on Image Processing (ICIP2009)},
year = {2009},
address = {Cairo, Egypt},
month = {Nov.}
}
11 Jie Dong and King Ngi Ngan, "An adaptive and parallel scheme for HD video de-interlacing," 2008 IEEE International Conference on Multimedia and Expo (ICME2008), Hannover, Germany, Jun. 23-26, 2008. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: It is very challenging to de-interlace HD videos in real time, as both high efficiency and low complexity should be fulfilled, which, however, are conflicting. This paper presents a deinterlacer to resolve the conflict specially for H.264 coded videos. It adapts to spatially and temporally local activities by making full use of the syntax element (SE) values in bitstreams, which give many hints of the motions and textures of video sequences. Accuracy analysis is also introduced to deal with the disparity between the SE values and the real motions and textures. The experimental results show the proposed deinterlacer provides better visual quality than common ones and can de-interlace 1080i sequences in real time on PCs.

BibTeX entry:
@CONFERENCE{2008_dong_icme,
author = {J. Dong and K. N. Ngan},
title = {An adaptive and parallel scheme for {HD} video de-interlacing},
booktitle = {Proceedings of 2008 IEEE International Conference on Multimedia and Expo (ICME2008)},
year = {2008},
address = {Hannover, Germany},
month = {Jun.}
}
12 Jie Dong, King Ngi Ngan, Chi Keung Fong, and Wai Kuen Cham, "A universal approach to developing fast algorithm for simplified order-16 ICT," 2007 IEEE International Symposium on Circuits and Systems (ISCAS2007), New Orleans, USA, May 27-30, 2007. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: Simplified order-16 Integer Cosine Transform (ICT) has been proved to be an efficient coding tool especially for High-Definition (HD) video coding and is much simpler than ICT and Discrete Cosine Transform (DCT). To further reduce the computational complexity, a universal approach to developing fast algorithm mainly for, but not restricted to, simplified order-16 ICT is proposed in this paper. The fast algorithm developed by the proposed approach involves additions and shiftings only and can save about 90% of the computational time compared with matrix multiplication.

BibTeX entry:
@CONFERENCE{2007_dong_iscas,
author = {J. Dong and K. N. Ngan and C. K. Fong and W. K. Cham},
title = {A universal approach to developing fast algorithm for simplified order-16 {ICT}},
booktitle = {Proceedings of 2007 IEEE International Symposium on Circuits and Systems (ISCAS2007)},
year = {2007},
address = {New Orleans, USA},
month = {May}
}
13 Jie Dong and King Ngi Ngan, "16×16 integer cosine transform for HD video coding," the Seventh IEEE Pacific Rim Conference on Multimedia (PCM2006), Hangzhou, China, Nov. 2-4, 2006. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: High-Definition (HD) videos often contain rich details as well as large homogeneous regions. To exploit such a property, Variable Block-size Transforms (VBT) should be in place so that transform block size can adapt to local activities. In this paper, we propose a 16×16 Integer Cosine Transform (ICT) for HD video coding, which is simple and efficient. This 16×16 ICT is integrated into the AVS Zengqiang Profile and used adaptively as an alternative to the 8×8 ICT. Experimental results show that 16×16 transform can be a very efficient coding tool especially for HD video coding.

BibTeX entry:
@CONFERENCE{2006_dong_pcm,
author = {J. Dong and K. N. Ngan},
title = {16$\times$16 integer cosine transform for {HD} video coding},
booktitle = {Proceedings of the Seventh IEEE Pacific-Rim Conference on Multimedia (PCM2006)},
year = {2006},
address = {Hangzhou, China},
month = {Nov.}
}
14 Jie Dong, Jian Lou, Cixun Zhang, and Lu Yu, "A new approach to compatible adaptive block-size transforms," 2005 Visual Communications and Image Processing (VCIP2005), Beijing, China, Jul. 23-25, 2005. [Abstract] [BibTeX] [Slides] [Full Text]

Abstract: Adaptive Block-size Transforms (ABT) has been widely used in image/video coding, since it exploits the maximum feasible signal length for transform coding. However, if the transforms in an ABT coding system are Integer Cosine Transforms (ICT), not only separate transform units but also different scaling matrices are required, which consume a vast amount of resources in practical implementations. In this paper, a new approach to compatible ABT is presented, by which 8×8, 8×4, 4×8 and 4×4 transforms can be processed in one transform unit. Furthermore, with Pre-scaled Integer Transform (PIT), the compatibility of scaling matrices especially for 8×4 and 4×8 ICT can be achieved and a single scaling matrix is required. Simulation results and analysis reveal that this approach greatly saves hardware resources and makes the implementation of ABT much easier without loss of performance.

BibTeX entry:
@CONFERENCE{2005_dong_vcip,
author = {J. Dong and J. Lou and C. Zhang and L. Yu},
title = {A new approach to compatible adaptive block-size transforms},
booktitle = {Proceedings of SPIE Visual Communication and Image Process (VCIP2005)},
year = {2005},
address = {Beijing, China},
month = {Jul.}
}
15 Lu Yu, Feng Yi, Jie Dong, and Cixun Zhang, "Overview of AVS-Video: tools, performance and complexity," 2005 Visual Communications and Image Processing (VCIP2005), Beijing, China, Jul. 23-25, 2005. [Abstract] [BibTeX] [Full Text]

Abstract: Audio Video coding Standard (AVS) is established by the Working Group of China in the same name. AVS-video is an application driven coding standard. AVS Part 2 targets to high-definition digital video broadcasting and high-density storage media and AVS Part 7 targets to low complexity, low picture resolution mobility applications. Integer transform, intra and inter-picture prediction, in-loop deblocking filter and context-based two dimensional variable length coding are the major compression tools in AVS-video, which are well-tuned for target applications. It achieves similar performance to H.264/AVC with lower cost.

BibTeX entry:
@CONFERENCE{2005_yu_vcip,
author = {L. Yu and F. Yi and J. Dong and C. Zhang},
title = {Overview of {AVS-V}ideo: tools, performance and complexity},
booktitle = {Proceedings of SPIE Visual Communication and Image Process (VCIP2005)},
year = {2005},
address = {Beijing, China},
month = {Jul.}
}
16 Dianfu Li, Lu Yu, and Jie Dong, "A decoder architecture for advanced video coding standard," 2005 Visual Communications and Image Processing (VCIP2005), Beijing, China, Jul. 23-25, 2005. [Abstract] [BibTeX] [Full Text]

Abstract: In this paper, we describe a VLSI architecture of video decoder for AVS (Audio Video Coding Standard). The system architecture, as well as the design of major function-specific processing units (Variable Length Decoder, Deblockding Filter), is discussed. Analyzing the architecture of decoder system and the feature of each processing unit, we develop a system controller combined the centralized and decentralized control scheme, which provides high efficient communication between the processing units and minimizes the size of interconnected buffers. A bus-arbitration algorithm named Token Ring algorithm is designed to control the allocation of the SDRAM bus. This algorithm can avoid the conflicts on bus and reduce the internal buffer size, and its control logic is simple. Our simulation shows that this architecture can meet the requirement of AVS Jizhun Profile@4.0 level real time decoding, without a high cost in hardware and clock rate. Moreover, some design idea in the AVS decoder can be expanded to H.264 because of the similarity between the two video coding standards.

BibTeX entry:
@CONFERENCE{2005_li_vcip,
author = {D. Li and L. Yu and J. Dong},
title = {A decoder architecture for advanced video coding standard},
booktitle = {Proceedings of SPIE Visual Communication and Image Process (VCIP2005)},
year = {2005},
address = {Beijing, China},
month = {Jul.}
}
17 Cixun Zhang, Jian Lou, Lu Yu, Jie Dong, and Wai Kuen Cham, "The techniques of pre-scaled integer transform," 2005 IEEE International Symposium on Circuits and Systems (ISCAS2005), Kobe, Japan, May 23-26, 2005. [Abstract] [BibTeX] [Full Text]

Abstract: Integer Cosine Transform (ICT) is adopted by H.264/AVC for its bit-exact implementation and significant complexity reduction compared to Discrete Cosine Transform (DCT) with an impact in peak signal-to-noise ratio (PSNR) of less than 0.02dB. In this paper, a new technique, named Pre-Scaled Integer Transform (PIT), is proposed. With PIT, the implementation complexity is further reduced compared to conventional ICT, especially for low-end processors while all the merits of ICT are kept. Extensive experiments show that no obvious penalty in performance is observed but rather slightly gain in PSNR is obtained by using PIT when the integer transform matrix used meets certain requirements.

BibTeX entry:
@CONFERENCE{2005_zhang_iscas,
author = {C. Zhang and J. Lou and L. Yu and J. Dong and W. K. Cham},
title = {The techniques of pre-scaled integer transform},
booktitle = {Proceedings of 2005 IEEE International Symposium on Circuits and Systems (ISCAS2005)},
year = {2005},
address = {Kobe, Japan},
month = {May}
}
Back To Top

Patents

1 US 8548065, Parametric interpolation filter for motion compensated prediction.
2 US 8165211, Method and apparatus of de-interlacing video
3 US 8228983, Method and device for order-16 integer transform from order-8 integer cosine transform
4 CN 2005100493122, 用于视频或图像压缩的帧内预测模式编解码的方法和装置
5 CN 2005100493137, 数字信号处理中联合变长编解码的方法和装置
6 CN 2005100490020, 视频或图像去块滤波的方法和装置
7 CN 2004100535826, 视频或图像压缩中准能量守恒变换的方法和装置
8 CN 2004100254108, 视频或图像压缩中扫描变换系数的方法和装置
9 CN 2003101094845, 信息熵保持编码方法与装置
10 CN 2003101094991, 信息熵保持解码方法与装置
11 CN 2003101084701, 视频编解码中变换系数块的扫描方法和装置
Back To Top
AVSX.org