Look and Listen: A Multi-modality Late Fusion Approach to Scene Classification for Autonomous Machines — arXiv2