Speechocean Workshop at ICASSP

Tuesday, March 27, 2012
Presenter: Mr. Xianfeng Cheng Manager, Speechocean

Session 1: 13:30 - 16:30
Session Title: Huge Data Resources Building for Industrial R&D of Human Language Technology

Nowadays, the quality and the size of Data play a key role on the model training and testing of Industrial research and development of human language technologies. However, due to the high cost and long time of building works, the huge data become valuable resources. Effective solution on getting these huge data resources will not only significantly improve the quality of technologies but also speed up the extent of commercialization. Based on years of experience on huge data resources building works, some effective solutions on the huge data collecting and processing are highly praised by customers for their significant time and cost reductions.

In this seminar, by a general introduction on Speechocean we will discuss several solutions on how to build the huge data by considering the practical industrial developing demands in the fields of speech technology, web search, natural language computing (NLC), machine translation(MT) and patter recognition technologies and etc and will also share several remarkable data in 40+ languages as an example.

Several Highlights of the presentation will be presented as follows:

  1. Huge speech data resources building and solutions for Speech technologies (TTS and ASR).
  2. Special large text data resources and solution on NLC and MT technologies
  3. Special solution on the web data processing for search technology.
  4. Large image data resource and processing on pattern recognition technologies.
  5. Other linguistic resources regarding the industrial R&D regarding HLT technologies

About the Presenter:

Xianfeng Cheng joined Speechocean in 2005 as the project manager of Data collection and processing in the fields of speech, web data, etc in many languages and led many large international projects. He is currently a senior manager in business cooperation with global customers on effective huge data solution exploration.