Scaling Laws for Optimal Data Mixtures — arXiv2