Technical Program

Paper Detail

Paper: SP-P3.5
Session: Adaptation in ASR
Location: Poster Area A
Session Time: Tuesday, March 27, 16:30 - 18:30
Presentation Time: Tuesday, March 27, 16:30 - 18:30
Presentation: Poster
Topic:
Paper Title: A LINEAR PROJECTION APPROACH TO ENVIRONMENT MODELING FOR ROBUST SPEECH RECOGNITION
Authors: Yu Tsao, Academia Sinica, Taiwan; Chien-Lin Huang, Shigeki Matsuda, Chiori Hori, Hideki Kashioka, National Institute of Information and Communications Technology, Japan
Abstract: Use of a linear projection (LP) function to transform multiple sets of acoustic models into a single set of acoustic models is proposed for characterizing testing environments for robust automatic speech recognition. The LP function is an extension of the linear regression (LR) function used in maximum likelihood linear regression (MLLR) and maximum a posteriori linear regression (MAPLR) by incorporating local information in the ensemble acoustic space to enhance the environment modeling capacity. To estimate the nuisance parameters of the LP function, we developed maximum likelihood LP (MLLP) and maximum a posteriori LP (MAPLP) and derived a set of integrated prior (IP) densities for MAPLP. The IP densities integrate multiple knowledge sources from the training set, previously seen speech data, current utterance, and a prepared tree structure. We evaluated the proposed MLLP and MAPLP on the Aurora-2 database in an unsupervised model adaptation manner. Experimental results show that the LP function outperforms the LR function with both ML- and MAP-based estimates over different test conditions. Moreover, because the MAP-based estimate can handle over-fittings well, MAPLP has clear improvements over MLLP. Compared to the baseline result, MAPLP provides a significant 10.99% word error rate reduction.