SpatialRGPT: Grounded Spatial Reasoning in Vision Language Models — arXiv2