MIT System- Learning spoken language

System learns to distinguish words’ and phonetic components without human annotation of training data.

MIT researchers have developed a revolutionary new machine-learning system that can learn to distinguish spoken words and unlike its predecessors, it can also learn to distinguish lower-level phonetic units, such as syllables and phonemes.

Every language has its own collection of phonemes, or the basic phonetic units from which spoken words are composed. The English language has somewhere between 35 and 45. Knowing a language’s phonemes can make it much easier for automated systems to learn to interpret speech.

This new machine could aid in the development of speech-processing systems for languages that are not widely spoken and don’t have the benefit of decades of linguistic research on their phonetic systems.

It could also help make speech-processing systems more portable, since information about lower-level phonetic units could help iron out distinctions between different speakers’ pronunciations.

Unlike the machine-learning systems that led to, say, the speech recognition algorithms on today’s smartphones, the MIT researchers’ system is unsupervised, which means it acts directly on raw speech files.

It doesn’t depend on the laborious hand-annotation of its training data by human experts and hence could prove much easier to extend to new sets of training data and new languages.

Finally, the system could offer some insights into human speech acquisition. “When children learn a language, they don’t learn how to write first,” says Chia-ying Lee, who completed her PhD in computer science and engineering at MIT last year and is first author on the paper. “They just learn the language directly from speech. By looking at patterns, they can figure out the structures of language. That’s pretty much what our paper tries to do.”

For more information please visit: www.mit.edu