Gaia2: Benchmarking LLM Agents on Dynamic and Asynchronous Environments — arXiv2