Main navigation

Event

PhD defence of Loren Lugosch - Deep Neural Networks for Voice Control

Monday, May 15, 2023 13:00to15:00

McConnell Engineering Building , Room 603, 3480 rue University, Montreal, QC, H3A 0E9, CA

Abstract

Voice control systems enable people to control their computers by speaking to them. After a review of the state-of-the-art in sequence modeling, speech recognition, and language understanding using deep learning, this thesis describes a number of contributions to the art of voice control. The first contribution is a study of large-scale semi-supervised learning through pseudo-labeling for massively multilingual speech recognition. The second contribution is a study of the use of autoregressive models for conditional computation with neural networks, using speech recognition as a test case. The third contribution is a method for training end-to-end spoken language understanding models using speech synthesis. The fourth contribution is a crowdsourced dataset, Timers and Such, for spoken language understanding involving numbers, along with baseline experimental results and open-source software infrastructure for using the dataset. The fifth contribution is our part in the design and implementation of SpeechBrain, an open-source software toolkit for speech processing. Finally, using some of the tools and techniques developed earlier in the thesis, we propose a simplified and unified approach to voice control in which the entire traditional pipeline, composed of an automatic speech recognition subsystem, a natural language understanding subsystem, and human-programmed control logic, is subsumed within a single deep neural network.

Loren Lugosch

Loren Lugosch is a machine learning researcher currently working at Apple. He is completing a PhD in electrical engineering at McGill University and the Mila Quebec AI Institute under the supervision of Prof. Brett H. Meyer and Prof. Derek Nowrouzezahrai. He has worked on problems in the domains of audio, language, signal processing, and artificial intelligence. Before his PhD, he worked as a research engineer at Fluent.ai. He is originally from Houston, Texas and now lives in Boston, Massachusetts.