Inferring Point Cloud Quality via Graph Similarity
/ Abstract
Objective quality estimation of media content plays a vital role in a wide range of applications. Though numerous metrics exist for 2D images and videos, similar metrics are missing for 3D point clouds with unstructured and non-uniformly distributed points. In this paper, we propose <inline-formula><tex-math notation="LaTeX">${\sf GraphSIM}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">GraphSIM</mml:mi></mml:math><inline-graphic xlink:href="yang-ieq1-3047083.gif"/></alternatives></inline-formula>āa metric to accurately and quantitatively predict the human perception of point cloud with superimposed geometry and color impairments. Human vision system is more sensitive to the high spatial-frequency components (e.g., contours and edges), and weighs local structural variations more than individual point intensities. Motivated by this fact, we use graph signal gradient as a quality index to evaluate point cloud distortions. Specifically, we first extract geometric <italic>keypoints</italic> by resampling the reference point cloud geometry information to form an object skeleton. Then, we construct <italic>local graphs</italic> centered at these keypoints for both reference and distorted point clouds. Next, we compute three <italic>moments of color gradients</italic> between centered keypoint and all other points in the same local graph for <italic>local significance</italic> similarity feature. Finally, we obtain similarity index by pooling the local graph significance across all color channels and averaging across all graphs. We evaluate <inline-formula><tex-math notation="LaTeX">${\sf GraphSIM}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">GraphSIM</mml:mi></mml:math><inline-graphic xlink:href="yang-ieq2-3047083.gif"/></alternatives></inline-formula> on two large and independent point cloud assessment datasets that involve a wide range of impairments (e.g., re-sampling, compression, and additive noise). <inline-formula><tex-math notation="LaTeX">${\sf GraphSIM}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">GraphSIM</mml:mi></mml:math><inline-graphic xlink:href="yang-ieq3-3047083.gif"/></alternatives></inline-formula> provides state-of-the-art performance for all distortions with noticeable gains in predicting the subjective mean opinion score (MOS) in comparison with point-wise distance-based metrics adopted in standardized reference software. Ablation studies further show that <inline-formula><tex-math notation="LaTeX">${\sf GraphSIM}$</tex-math><alternatives><mml:math><mml:mi mathvariant="sans-serif">GraphSIM</mml:mi></mml:math><inline-graphic xlink:href="yang-ieq4-3047083.gif"/></alternatives></inline-formula> can be generalized to various scenarios with consistent performance by adjusting its key modules and parameters. Models and associated materials will be made available at <uri>https://njuvision.github.io/GraphSIM</uri> or <uri>http://smt.sjtu.edu.cn/papers/GraphSIM</uri>.
Journal: IEEE Transactions on Pattern Analysis and Machine Intelligence