Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation — arXiv2