In the first part of this talk, we describe a method that was developed to take advantage of audio with imperfect transcriptions for training acoustic models. Labels that are only approximate, e.g. closed captions, are more common than carefully transcribed speech, and therefore it is useful to be able to take advantage of these types of transcriptions for training. An iterative algorithm is presented that attempts to improve the transcriptions by inserting or removing filler words and pauses, before they are handed to the training process.
In the second part of the talk, we describe an information-retrieval based language model adaptation technique, which is employed to try to improve recognition performance in topic-oriented TED lectures, together with a new technique for implicit language model interpolation.
Note: This seminar will be held in English, if required.