SlimPajama-DC: Understanding Data Combinations for LLM Training — arXiv2