4K Ultra HD Video Decoder for H.265/HEVC

Figure 1: Split system pipeline for HEVC decoder to save 24KB of Coeff SRAM and address variable DRAM latency.

Figure 1: Split system pipeline for HEVC decoder to save 24KB of Coeff SRAM and address variable DRAM latency.

High Efficiency Video Coding (HEVC)[1], the latest video standard, uses larger and variable-sized coding units and longer interpolation filters than H.264/AVC to better exploit redundancy in video signals. These algorithmic techniques enable a 50% decrease in bitrate at the cost of increased computational complexity; external memory bandwidth; and, for ASIC implementations, on-chip SRAM of the video codec. To handle all coding unit sizes from 8×8 to 64×64, we develop a variable-sized system pipeline[2]. The pipeline is split into two groups – one with entropy decoding and the other with all the pixel processing, as shown in Figure 1. First-in, first-out (FIFO)-based buffering between the two groups allows the chip to work with variable DRAM latency. Further, the FIFO is sized for transform units, which are smaller than coding units, to save 24 KB of SRAM. Similarly, the prediction engine is designed to process pixels in 32×32 blocks to save another 36 KB of SRAM. HEVC inverse transform has 8x logic and 16x memory complexity compared to H.264/AVC owing to 32×32 transforms and higher bit-precision multiplications. This work uses mathematical properties of the inverse transform to enable an area-efficient implementation using multiple constant multiplications[3]. The prediction and inverse transform engines are designed for all 25 prediction unit types including symmetric and asymmetric partitions and all 8 transform units of square and non-square sizes. A high-throughput cache and a DRAM-latency aware memory map are designed to reduce the large DRAM bandwidth requirement by 67% and save 122mW of DRAM access power. The chip is built for the HEVC Working Draft 4 Low Complexity configuration and occupies 1.77mm2 in 40nm CMOS. It performs 4K Ultra HD 30fps video decoding at 200MHz while consuming 1.19nJ/pixel of normalized system power.

  1. G. Sullivan, J. Ohm-Raimer, W.-J. Han and T. Wiegand, IEEE Transactions on Circuits and Systems for Video Technology, vol. 22, no. 12, pp. 1649-1668, 2012. []
  2. C.-T. Huang, M. Tikekar, C. Juvekar, V. Sze, A. Chandrakasan, “A 249MPixel/s HEVC video-decoder chip for Quad Full HD applications,” Digest of Technical Papers IEEE International Solid-State Circuits Conference (ISSCC), pp. 162-163, Feb. 2013. []
  3. M. Potkonjak, M. B. Srivastava, A. P. Chandrakasan, “Multiple constant multiplications: efficient and versatile framework and algorithms for exploring common subexpression elimination,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 15, no. 2, pp. 151-156, Feb. 1996. []