Unsupervised Multimodal Representation Learning across Medical Images and Reports — arXiv2