The smooth output assumption, and why deep networks are better than wide ones — arXiv2