Benchmarking the Generality of Vision-Language-Action Models — arXiv2