Gradient Diversity: a Key Ingredient for Scalable Distributed Learning — arXiv2