Applying Image Recognition to Automatic Speech Recognition: Determining Suitability of Spectrograms for Training a Deep Neural Network for Speech
Recognition

In speech recognition, Neural Networks are used to recognise the sequence of phonemes in an audio signal. These networks are trained on audio data pre-processed into some (type of) spectral vector. We present an alternative method that pre-processes speech utterances into visual representations, called spectrograms, and train a neural network suitable for image recognition to identify phonemes. The resulting network was able to classify 99.73% of a set of vowels containing samples of ‘iy’, ‘ah’ and ‘uw’ correctly, 91.87% of a set of vowels containing samples ‘iy’, ‘ih’ and ‘eh’, and 75.97% of the full dataset of twelve vowels. These results show that using image recognition in automatic speech recognition is worth further investigating.

Keywords

Speech Recognition, Neural Network, Spectrogram, Image Recognition,

URI

https://studenttheses.uu.nl/handle/20.500.12932/27440

Applying Image Recognition to Automatic Speech Recognition: Determining Suitability of Spectrograms for Training a Deep Neural Network for Speech Recognition

Files

Publication date

Authors

DOI

Document Type

Metadata

Collections

License

Abstract

Keywords

Citation

URI