Dual-modality seq2seq network for audio-visual event localization — arXiv2