High Dynamic Range (HDR) and Wide Color Gamut (WCG) content is now mainstream across content creation, with content playback supported on millions of devices. Thus, having a reliable way of evaluating HDR systems is essential. One common way of measuring the quality of an HDR system is measuring the color errors introduced along the imaging pipeline. Unfortunately, performance of color difference metrics has mostly been evaluated on databases composed of simple test patches, as opposed to natural imagery. Some key differences between test patches and natural imagery is that test patches typically involve lower frequencies and non-contiguous regions, while the natural imagery has much higher frequencies, masking due to texture, as well contiguous color region effects and gradients.
Thus, we evaluate several color difference metrics on five publicly available HDR databases consisting of natural images and subjective scores. The different databases focus on differing distortions and the aggregation of these cover a wide variety of both luminance and chromatic distortions. There’re lower frequency distortions resulting from tone-mapping and gamut mapping operations. In addition, there’re higher frequency distortions resulting from compression artifacts by the various compression schemes such as JPEG, JPEG-XT, JPEG2000, and HEVC. While perceptually dominated by luminance distortions, these also contain physical chromatic distortions due to chromatic subsampling and different processes acting on Y, Cr and Cb signals. Since it is desirable to evaluate as many images as possible, a total of 64 source images and a total of 672 distorted images were evaluated by 94 observers across all databases.
The color difference metrics we analyze include CIE94 and CIEDE00 metrics based on the CIE L*a*b* color space. In addition, we analyzed the newer ones derived for HDR applications: DEITP based on the ICTCP color space, and DEz based on the Jzazbz color space. Since it’s generally agreed that color quality doesn’t change significantly when motion is added, we used still image databases for evaluation.
To quantify the performance, we use four standard performance evaluation procedures – Root mean square error, Pearson linear correlation coefficient, Spearman rank-order correlation coefficient and Outlier ratio. The color spaces derived for HDR were the best performers across the different databases, but neither of those two metrics performed the best for every database. These databases have different experimental conditions and display specifications. Analysis is currently underway to understand why the two different HDR color space metrics performed best in terms of these differing conditions.
Technical Depth of Presentation
Our presentation will be a mix. We plan to start with fundamentals and walk through towards intermediate/advanced in a way that we hope will keep those audience members who began at a fundamental level engaged. The presentation will be a build-up so that all audience members should be able to take away something from the presentation.
What Attendees will Benefit Most from this Presentation
Our presentation will be geared towards a mix of audience members. We will have technical information which will be interesting to Engineers/Technologists and we will also have higher level concepts and demonstrations which will be engaging and interesting at a higher level for Executives/Managers.
Take-Aways from this Presentation
We will give practical guidance and information about using color difference metrics in evaluation of HDR/WCG image quality. We hope that the audience will have a better understanding of the traditional and state-of-the-art color difference metrics and how its performance scales from using being used for simple test patches to use in optically-captured and computer graphics imagery. Furthermore, our analysis will hopefully give the audience a better idea of which color difference metric is more reliable and perceptually accurate especially when dealing with HDR/WCG content.