How to Borrow a Language

Pascale FungPascale FungDepartment of Electronic & Computer Engineering

The Hong Kong University of Science & Technology, Clear Water Bay, Kowloon, Hong Kong

Wednesday, July 10, 2013
3:00 p.m., ICSI Lecture Hall

 

Abstract:

Pascale Fung, "How to Borrow a Language," available at http://youtu.be/oSjheILgLOI
Watch this talk on YouTube

In this talk, I will give an overview of our latest work on addressing the challenge faced by processing multilingual speech, especially those with low resources such as non-standard languages and mixed code speech.  Non-standard languages such as Cantonese Chinese have little to no transcribed text for training language models. Similarly for mixed code speech, where a bilingual speaker mixes two languages in the same sentence, it is difficult to obtain enough mixed code data to train the language model. Similar to our previous work in multilingual acoustic model training,  it is desirable to borrow texts from a resource-rich language to train the low-resource language model. The key question is how to borrow a language in such a cross-lingual language modeling framework. We show in this talk that making use of the syntactic relation between the source and target languages brings significant improvement to cross-lingual language modeling, over conventional approaches such as language model interpolation or simply translating the lexicon. We show that, by placing syntactic constraints at certain phrase boundaries, we can make better use of the statistics of target resource-rich languages for speech recognition of the low-resource language.

Our approach uses a WFST-based decoder that recognizes speech by combining the acoustic model, the pronunciation model and the cross-lingual language model in an integrated approach. For Cantonese speech recognition, we improved the system performance significantly with up to 12.5% relative character error rate reduction and 18.5% relative BLEU score improvement. For mixed code Chinese-English speech recognition, we reduced the combined word error rate by  5.3% on lecture speech and by 5.1% on conversational speech over baseline interpolation models.

Bio:

Professor Fung received her PhD in computer science from Columbia University in 1997. She received her MSc in computer science from Columbia in 1993 and her BS in electrical engineering from Worcester Polytechnic Institute in Massachusetts in 1988.

She is one of the founding faculty members of the Human Language Technology Center (HLTC) at HKUST, Director of InterACT@HKUST, and the founding chair of the Women Faculty Association at HKUST. Professor Fung was a research affiliate with AT&T Research Laboratories (formerly with Bell Laboratories) (Forham Park, New Jersey, Murray Hill, New Jersey) from 1993-1997. During 1991-1992, she was Associate Scientist at BBN Systems & Technologies (Cambridge, Mass.). She was a visiting researcher at LIMSI, Centre National de la Recherche Scientifique (France) in 1991.  From 1989-1991, she was a research student in the Department of  Information Science, Kyoto University (Japan).

Professor Fung is an Associate Editor for the IEEE Transactions on Audio, Speech and Language Processing, the IEEE  Signal Processing Letter, the ACM Transactions on Speech and Language Processing, and the Transactions on Association for Computational Linguistics. She is Program Chair for the 2013 Annual Meeting of the Association for Computational Linguistics (ACL), Area Chair for the IEEE International Conference on Audio, Speech and Signal Processing (ICASSP) from 2009 to 2014,  Area Coordinator for Interspeech 2011 and 2010, Technical Chair for the 2009 IEEE Workshop on Automatic Speech Recognition and Understanding, and Chair of 2009 ACL 2nd Workshop on Building and Using Comparable Corpora. She has been Area Chair for EACL 2012, NAACL 2011, ACL 2008 and ACL 2004. She was Chair (1999) and Area Chair (2004) for EMNLP. She is a Senior Member of the IEEE and a Committee Member of the IEEE Signal Processing Society Speech and Language Technology Committee (SLTC), and a Board Member of the ACL SIGDAT. She has been a panelist and reviewer for the US National Science Foundation, the French National Science Foundation, and the Hong Kong Research Grants Council.