Synthetic Visual Genome 2: Extracting Large-scale Spatio-Temporal Scene Graphs from Videos — arXiv2