Accelerating Pre-training of Multimodal LLMs via Chain-of-Sight — arXiv2