In HEVC, deblocking filtering (DF) is responsible for about 20% of the time consumed to perform video compression. In a typical parallel DF scheme, a set of horizontal and vertical edges are processed using deblocking filters. In conventional parallel DF schemes, deblocking filters could be applied to the same edges more than once. Moreover, some edges are assigned to cores to be filtered even though those edges are not designated to be filtered. Accordingly, the used parallel hardware architecture requires more on-chip memory modules. Those challenges negatively affect HEVC performance resulting in an increase in computational complexity. In this paper, an optimized parallel DF scheme is proposed for HEVC using graphical processing units (GPUs). The proposed scheme outperforms competing ones in terms of reducing the decoding time of all frames of video sequences by average speed-up factors of 2.83 and 2.45 using the all-intra and low-delay video coding configuration modes, respectively. The proposal does not change the rate-distortion between the decoded video sequences and their original sequences.

Additional Metadata
Keywords Deblocking filtering, GPU, HEVC, Parallel processing, Video coding
Persistent URL dx.doi.org/10.1007/s11042-017-4876-6
Journal Multimedia Tools and Applications
Citation
Fouad, M.M. (Mohamed M.), & Dansereau, R. (2017). An optimized parallel order scheme of the deblocking filtering process for enhancing the performance of the HEVC standard using GPUs. Multimedia Tools and Applications, 1–26. doi:10.1007/s11042-017-4876-6