Applying Image Recognition to Automatic Speech Recognition: Determining Suitability of Spectrograms for Training a Deep Neural Network for Speech Recognition
Publication date
Authors
DOI
Document Type
Bachelor Thesis
Metadata
Show full item recordCollections
License
CC-BY-NC-ND
Abstract
In speech recognition, Neural Networks are used to recognise the sequence of phonemes in an audio signal. These networks are trained on audio data pre-processed into some (type of) spectral vector. We present an alternative method that pre-processes speech utterances into visual representations, called spectrograms, and train a neural network suitable for image recognition to identify phonemes. The resulting network was able to classify 99.73% of a set of vowels containing samples of ‘iy’, ‘ah’ and ‘uw’ correctly, 91.87% of a set of vowels containing samples ‘iy’, ‘ih’ and ‘eh’, and 75.97% of the full dataset of twelve vowels. These results show that using image recognition in automatic speech recognition is worth further investigating.
Keywords
Speech Recognition, Neural Network, Spectrogram, Image Recognition,