Exploiting Sparsity in Pruned Neural Networks to Optimize Large Model Training — arXiv2