Software-Hardware Co-design for Fast and Scalable Training of Deep Learning Recommendation Models — arXiv2