Recently, many deep learning based researches are conducted to explore the potential quality improvement of compressed videos. These methods mostly utilize either the spatial or temporal information to perform frame-level video enhancement. However, they fail in combining different spatial-temporal information to adaptively utilize adjacent patches to enhance the current patch and achieve limited enhancement performance especially on scene-changing and strong-motion videos. To overcome these limitations, we propose a patch-wise spatial-temporal quality enhancement network which firstly extracts spatial and temporal features, then recalibrates and fuses the obtained spatial and temporal features. Specifically, we design a temporal and spatial-wise attention-based feature distillation structure to adaptively utilize the adjacent patches for distilling patch-wise temporal features. For adaptively enhancing different patch with spatial and temporal information, a channel and spatial-wise attention fusion block is proposed to achieve patch-wise recalibration and fusion of spatial and temporal features. Experimental results demonstrate our network achieves peak signal-to-noise ratio improvement, 0.55 - 0.69 dB compared with the compressed videos at different quantization parameters, outperforming state-of-the-art approach.
Download full-text PDF |
Source |
---|---|
http://dx.doi.org/10.1109/TIP.2021.3092949 | DOI Listing |
Enter search terms and have AI summaries delivered each week - change queries or unsubscribe any time!