Screen Content Video Quality Assessment:Subjective and Objective Study

Shan Chen1, Huanqiang Zeng1, Jing Chen1, Junhui Hou2,Jianqing Zhu1, and Kai-Kuang Ma3

1School of Information Science and Engineering, Huaqiao University, Fujian, China

2Department of Computer Science, Hong Kong, China

3School of Electrical and Electronic Engineering, Nanyang Technological University, Singapore

Fig. 1. All illustration of the first frame of all the 16 reference screen content videos (SCVs) in our SCVD named as: (a) ¡°Online Education¡±, (b) ¡°Animation¡±, (c) ¡°Dynamic Charts¡±, (d) ¡°Boxes¡±, (e) ¡°Magazine¡±, (f) ¡°Mobile Phone¡±, (g) ¡°Game¡±, (h) ¡°Map¡±, (i) ¡°Comic & Video¡±, (j) ¡°News Broadcasting¡±, (k) ¡°Meeting¡±, (l) ¡°Slides¡±, (m) ¡°Software Installation¡±, (n) ¡°Word Documents¡±, (o) ¡°Weather¡±, and (p) ¡°Webpage¡±.

Subjective Study: Proposed Screen Content Video Database (SCVD)

Fig. 2. The MOS ditribution of the entire 800 distorted SCVs included in our established SCVD in terms of (a) Scatter plot; and (b) Histogram.

We construct the first largescale video quality assessment (VQA) database specifically for the SCVs, called the screen content video database (SCVD). This SCVD provides 16 reference SCVs, 800 distorted SCVs, and their corresponding subjective scores. The distorted SCVs are generated from each reference SCV with 10 distortion types and 5 degradation levels for each distortion type. Each distorted SCV is rated by at least 32 subjects in the subjective test. The 16 reference SCVs are created through screen recording with thoughtfully considerations of the SCVs¡¯ special characteristics and application scenarios. From the spatial perspective, they cover not only a diverse mixture of various contents (e.g., texts, graphics, symbols, patterns, natural scenes) but also a wide variety of percentages and location distributions between the discontinues-tone and continues-tone contents. From the temporal perspective, they cover a wide range of motion activities, including motionless, moderate and fast motion, abrupt screen change, etc. From the application perspective, they cover various scenarios, such as online education, screen sharing, advertisement, live games, animation, conference, and so on. All the SCVs are of 10 seconds with a frame rate of 30 per second and none of them contain audio components. Moreover, they are in raw uncompressed progressive scan YUV format with a resolution of 1920*1080.

Objective Study: Proposed Spatiotemporal Gabor Feature Tensor-Based Model (SGFTM) for the SCVs

Fig. 3. The framework of proposed spatiotemporal Gabor feature tensor-based model (SGFTM) for evaluating the perceptual quality of the SCVs.

We propose a novel FR-VQA model to objectively evaluate the perceptual quality of the distorted SCVs through the spatiotemporal visual feature representation using 3D-Gabor filter, called the spatiotemporal Gabor feature tensor-based model (SGFTM). An overall framework of the proposed SGFTM is shown in Fig. 3, which mainly consists three stages: (1) Spatiotemporal feature tensor extraction: the horizontal (i.e., x-axis), vertical (i.e., y-axis), and temporal (i.e., t-axis) orientation 3D-Gabor filters are selected to extract features. (2) Spatiotemporal feature tensor similarity measurement: the computed spatiotemporal feature tensor, one from the reference SCV and the other from the distorted SCV, will be compared to yield their spatiotemporal similarity measurement. For example, the two SFTS, respectively obtained from the reference SCV and the distorted SCV, will be compared to arrive at the spatial similarity tensor SST. Likewise, the temporal similarity tensor TST will be generated based on the corresponding pair of SFTT, respectively. (3) Spatiotemporal feature tensor pooling: compute the final SGFTM score using our proposed spatiotemporal feature tensor pooling process.


We have decided to make the data set available to the research community free of charge. You can download the screen content video database as well as the supporting file. If you use these video in your research, we kindly ask that you cite our paper listed below.

Shan Cheng, Huanqiang Zeng, Jing Chen, Junhui Hou, Jianqing Zhu, Kai-Kuang Ma, "Screen Content Video Quality Assessment: Subjective and Objective Study", IEEE Transactions on Image Processing, vol. 29, no. 10, pp. 8636 - 8651, Aug. 2020.

You can download the database at the Baidu Netdisk or Dubox

You can also download the paper as well as the supporting file via IEEE Xplore: Download Paper

For requiring the download password of the database, please download and sign the Agreement File, and then scan and send the signed agreement to us ( ).


Copyright (c) 2021 The Huaqiao University

All rights reserved. Permission is hereby granted, without written agreement and without license or royalty fees, to use, copy, and distribute this database (the images, the results, and the source files) and its documentation for non-commercial research and educational purposes only, provided that the copyright notice in its entirety appear in all copies of this database, and the original source of this database, Smart Visual Information Processing Laboratory (SmartVIPLab, at The Huaqiao University, is acknowledged in any publication that reports research using this database.

Contact Me

If you have any questions, please feel free to contact us (

Back to top © 2016-2025. All Rights Reserved.