Plenary Talks - Chin-Hui Lee

From Signal Processing to Information Extraction of Speech: A New Perspective on Automatic Speech Recognition

Dr. Chin-Hui Lee

Professor, Georgia Institute of Technology, USA

Abstract

The field of automatic speech recognition (ASR) has enjoyed more than 30 years of technology advances due to the extensive utilization of the hidden Markov model (HMM) framework and a concentrated effort by the community to make available a vast amount of language resources. State-of-the-art ASR systems achieve a high recognition accuracy for well-formed utterances of many languages by decoding speech into the most likely sequence of words among all possible sentences represented by a finite-state network (FSN) approximation of all the knowledge sources required by the task constraints. However the ASR problem is still far from being solved because not all information available in the speech knowledge hierarchy can be directly integrated into the FSN to improve ASR performance and enhance system robustness. It is believed that some of the current knowledge insufficiency issues can be partially addressed by processing techniques that can take advantage of the full set of acoustic and language information in speech. On the other hand it has long been postulated that human speech recognition (HSR) determines the linguistic identity of a sound based on detected evidences that exist at various levels of the speech knowledge hierarchy, ranging from acoustic phonetics to syntax and semantics. This calls for a bottom-up knowledge integration framework that links speech processing with information extraction, by spotting speech cues with a bank of attribute detectors, weighing and combining acoustic evidences to form cognitive hypotheses, and validating these assumptions until a consistent recognition decision can be reached. The recently proposed ASAT (automatic speech attribute transcription) framework is an attempt to mimic some HSR capabilities with asynchronous speech event detection followed by bottom-up speech knowledge integration and verification. In the last few years it has demonstrated new potentials in detection-based speech processing and information extraction.

This presentation is intended to illustrate new possibilities for signal processing researchers to contribute to ASR via linking processing of raw signals with extracting multiple layers of useful speech information. By organizing these probabilistic evidences from the speech knowledge hierarchy based on the ASAT paradigm, and integrating them into the already-powerful, top-down HMM framework we can facilitate a knowledge-rich, data-driven collaborative framework that will lower entry barriers to ASR research and further enhance the capabilities and alleviates some of the limitations in the current state-of-the-art ASR systems.

Speaker Biography

Chin-Hui Lee is a professor at School of Electrical and Computer Engineering, Georgia Institute of Technology. Dr. Lee received the B.S. degree in Electrical Engineering from National Taiwan University, Taipei, in 1973, the M.S. degree in Engineering and Applied Science from Yale University, New Haven, in 1977, and the Ph.D. degree in Electrical Engineering with a minor in Statistics from University of Washington, Seattle, in 1981.

Dr. Lee started his professional career at Verbex Corporation, Bedford, MA, and was involved in research on connected word recognition. In 1984, he became affiliated with Digital Sound Corporation, Santa Barbara, where he engaged in research and product development in speech coding, speech synthesis, speech recognition and signal processing for the development of the DSC-2000 Voice Server. Between 1986 and 2001, he was with Bell Laboratories, Murray Hill, New Jersey, where he became a Distinguished Member of Technical Staff and Director of the Dialogue Systems Research Department. His research interests include multimedia communication, multimedia signal and information processing, speech and speaker recognition, speech and language modeling, spoken dialogue processing, adaptive and discriminative learning, biometric authentication, and information retrieval. From August 2001 to August 2002 he was a visiting professor at School of Computing, The National University of Singapore. In September 2002, he joined the Faculty Georgia Institute of Technology.

Prof. Lee has participated actively in professional societies. He is a member of the IEEE Signal Processing Society (SPS), Communication Society, and the International Speech Communication Association (ISCA). In 1991-1995, he was an associate editor for the IEEE Transactions on Signal Processing and Transactions on Speech and Audio Processing. During the same period, he served as a member of the ARPA Spoken Language Coordination Committee. In 1995-1998 he was a member of the Speech Processing Technical Committee and later became the chairman from 1997 to 1998. In 1996, he helped promote the SPS Multimedia Signal Processing Technical Committee in which he is a founding member.

Dr. Lee is a Fellow of the IEEE, and has published more than 350 papers and 25 patents. He received the SPS Senior Award in 1994 and the SPS Best Paper Award in 1997 and 1999, respectively. In 1997, he was awarded the prestigious Bell Labs President's Gold Award for his contributions to the Lucent Speech Processing Solutions product. Dr. Lee often gives seminal lectures to a wide international audience. In 2000, he was named one of the six Distinguished Lecturers by the IEEE Signal Processing Society. He was also named one of the two ISCA's inaugural Distinguished Lecturers in 2007-2008. He won the IEEE SPS's 2006 Technical Achievement Award for "Exceptional Contributions to the Field of Automatic Speech Recognition". More recently, he was awarded the 2012 ISCA Medal for "pioneering and seminal contributions to the principles and practices of automatic speech and speaker recognition, including fundamental innovations in adaptive learning, discriminative training and utterance verification".