A reinforcement learning approach for VQA validation: an application to diabetic macular edema grading — arXiv2