Boosting Multi-modal Model Performance with Adaptive Gradient Modulation — arXiv2