ModelGrow: Continual Text-to-Video Pre-training with Model Expansion and Language Understanding Enhancement — arXiv2