Tutorial 10: Bayesian Learning for Speech and Language Processing

Presented by

Shinji Watanabe, Jen-Tzung Chien

Abstract

In this tutorial, we will present some recent studies on Bayesian learning for speech and language processing; mainly focusing on Bayesian acoustic and language models. In general, speech and language processing involves extensive knowledge of statistical models. Both acoustic and language models are important parts of modern speech recognition systems where the models learned from real-world data present large complexity, ambiguity and uncertainty. Modeling the uncertainty is crucial to tackle model regularization for robust speech recognition.

We will introduce the applications of variational Bayesian (VB) techniques to acoustic modeling and speaker adaptation. The VB estimation of continuous density hidden Markov models will be formulated and applied for automatic determination of model topologies. We will also introduce a new representation of acoustic features based on a set of state-dependent basis vectors. The Bayesian sensing hidden Markov models are reliably estimated from heterogeneous training data, where the hybrid dictionary learning and basis representation is developed.

In language modeling, we will address topic models and present a Dirichlet class language model, which projects the sequence of history words onto a latent class space and maximizes the marginal likelihood over the uncertainties of classes, which are expressed by Dirichlet priors. A Bayesian class-based language model is established. In addition, we will extend Bayesian learning for nonstationary source separation. An online Gaussian process with incrementally-updated priors is presented. Finally, this tutorial will address Bayesian nonparametric approaches for unsupervised structural learning and its potential applications in speech and language processing.

The first presenter (Shinji Watanabe) will mainly focus on a basic theory of a Bayesian approach for general pattern recognition and machine learning problems, and some applications in speech recognition. The second presenter (Jen-Tzung Chien) will describe some advanced topics and future directions for Bayesian speech and language processing.

Speaker Biography

Shinji Watanabe received his B.S., M.S., and Dr. Eng. degrees from Waseda University, Tokyo, Japan, in 1999, 2001, and 2006, respectively. From 2001 to 2011, he was a research scientist at NTT Communication Science Laboratories, Kyoto, Japan. From January to March in 2009, he was a visiting scholar in Georgia institute of technology at Biing-Hwang (Fred) Juang's laboratory. From 2011, he has been working at Mitsubishi Electric Research Laboratories (MERL), Cambridge, MA. His research interests include Bayesian learning, pattern recognition, and speech and spoken language processing. He is a member of the Acoustical Society of Japan (ASJ), the Institute of Electronics, Information and Communications Engineers (IEICE), and IEEE. He received the Awaya Award from the ASJ in 2003, the Paper Award from the IEICE in 2004, the Itakura Award from ASJ in 2006, and the TELECOM System Technology Award from the Telecommunications Advancement Foundation in 2006.

Jen-Tzung Chien received his Ph.D. degree in electrical engineering from National Tsing Hua University, Hsinchu, Taiwan, in 1997. Since 1997, he has been with the Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan, Taiwan, where he is currently a Professor. He held the Visiting Researcher positions at the Panasonic Technologies Inc., Santa Barbara, CA, the Tokyo Institute of Technology, Tokyo, Japan, the Georgia Institute of Technology, Atlanta, GA, the Microsoft Research Asia, Beijing, China, and the IBM T. J. Watson Research Center, Yorktown Heights, NY. His research interests include Bayesian learning, speech recognition, information retrieval and blind source separation. Dr. Chien is a senior member of the IEEE Signal Processing Society. He serves as an associate editor in the IEEE Signal Processing Letters and the guest editor of a special issue on Deep Learning for Speech and Language Processing in the IEEE Transactions on Audio, Speech, and Language Processing. He received the Young Investigator Award (Ta-You Wu Memorial Award) from National Science Council (NSC), Taiwan, in 2003, the Research Award for Junior Research Investigators from Academia Sinica, Taiwan, in 2004, the NSC Distinguished Research Awards, in 2006 and 2010.