SpeechBERT: Cross-Modal Pre-trained Language Model for End-to-end Spoken Question Answering

/ Authors

Yung-Sung Chuang, Chi-Liang Liu, Hung-yi Lee

/ Abstract

While end-to-end models for spoken language understanding tasks have been explored recently, there is still no end-to-end model for spoken question answering (SQA) tasks, which would be catastrophically influenced by speech recognition errors. Meanwhile, pre-trained language models, such as BERT, have performed successfully in text question answering. To bring this advantage of pre-trained language models into spoken question answering, we propose SpeechBERT, a cross-modal transformer-based pre-trained language model. Our model can outperform conventional approaches on the dataset which contains both correctly recognized answers and incorrectly recognized answers. Our experimental results show the potential of end-to-end SQA models.

Journal: ArXiv