Tutorial 9: Deep Learning and Its Applications in Signal Processing

Presented by

Dong Yu and Li Deng

Abstract

Deep learning refers to a class of machine learning techniques, developed mainly since 2006, where many layers of non-linear information processing stages or hierarchical architectures are exploited. It becomes of increasingly high importance, thanks to the decreasing cost of hardware, the increasing processing abilities, and its powerful modeling ability. It has been recently applied to many signal processing areas such as image, video, audio, speech, and text and has produced surprisingly good results, especially in speech recognition.

What makes the deep models different from the shallow models? What are the difficulties in learning deep models? How can deep models generate and discriminate natural signals (e.g., image, audio, speech, and text)? How can we learn the massive parameters in these deep models? These questions will be answered in this tutorial. The main objectives of this tutorial are to provide a concise and intuitive overview of the deep learning technology with the signal processing oriented technical language, and to demonstrate its design thinking etapasin tasks including image generation, image/object classification, speech feature coding, and speech recognition. We review the state of the art in three related fields: (a) deep generative model, (b) deep discriminative model, and (c) applications of deep learning and practical considerations. More specifically, Li Deng will cover restricted Boltzmann machine (RBM), deep belief network (DBN) and deep stacking model (DSN), and Dong Yu will cover energy based models, factored or gated Boltzmann machine, recurrent neural network (RNN), sum-product network (SPN), deep neural network (DNN), and the relationship among these models. The emphasis of the tutorial is to give the intuition behind this powerful tool, as well as to give case studies that illustrate its practical use.

The tutorial is targeted at researchers and students who want to get up to speed in mastering the basic concepts and major tools in deep learning, and at practitioners who want a concise, intuitive overview of the state of the art in this emerging area. No specific prerequisites are assumed although familiarity with artificial neural network and machine learning (esp. graphical models) would be helpful to better understand the materials. We also plan to adjust the coverage and content of the tutorial depending on the level and desire of the audience.

Speaker Biography

Dong Yu is a researcher at Microsoft Research and a senior member of IEEE. His research interests are mainly in speech processing fields with a focus on applications of machine learning techniques in speech recognition. He has published more than 90 papers in these areas, has given many talks in conferences and universities, and is the inventor/coinventor of more than 40 granted/pending patents. He has served as an associate editor of IEEE signal processing magazine (2009-2011) and the lead guest editor of IEEE transactions on audio, speech, and language processing - special issue on deep learning for speech and language processing (2010-2011).

Li Deng is a principal researcher at Microsoft Research and an affiliate professor in the Department of Electrical Engineering at University of Washington, Seattle. His research areas include automatic speech and speaker recognition, spoken language identification and understanding, speech-to-speech translation, machine translation, language modeling, statistical methods and machine learning, neural information processing, deep-structured learning, machine intelligence, audio and acoustic signal processing, statistical signal processing and digital communication, human speech production and perception, acoustic phonetics, auditory speech processing, auditory physiology and modeling, noise robust speech processing, speech synthesis and enhancement, multimedia signal processing, and multimodal human-computer interaction. In these areas, he has published over 300 refereed papers in leading journals and conferences and 3 books.He is elected by ISCA (International Speech Communication Association) as its Distinguished Lecturer 2010-2011. He has been granted over 45 US or international patents in acoustics/audio, speech/language technology, and other fields of signal processing. He received numerous awards/honors bestowed by IEEE, ISCA, ASA, Microsoft, and other organizations. He is a Fellow of the Acoustical Society of America, a Fellow of the IEEE, and a Fellow of International Speech Communication Association (ISCA). He serves on the Board of Governors of the IEEE Signal Processing Society (2008-2010), and as Editor-in-Chief for IEEE Signal Processing Magazine (2009-2011). He is appointed as Editor in Chief for IEEE Transactions on Audio, Speech & Language Processing, 2012-2014, and is General Chair of ICASSP-2013.