Arlo's publications

Publications

Presentations

Multi-Layer Perceptrons for Speech Recognition (Intel/ParLab, 2008)
Historical note: It's a "neural network", which was considered taboo.

Beyond WER: How to Evaluate Speech Technologies (SpeechTek, 2018)
tl;dr: There's a lot of problems with the industry-standard benchmark.

Papers

M-Y. Hwang, G. Peng, M. Ostendorf, W. Wang, A. Faria, and A. Heidel
Building A Highly Accurate Mandarin Speech Recognizer with Language-Independent Technologies and Language-Dependent Modules
IEEE Transactions on Audio, Speech, and Language Processing. 2009.

D. Vergyri, A. Mandal, W. Wang, A. Stolcke, J. Zheng, M. Graciarena, D. Rybach, C. Gollan, R. Schlater, K. Kirchoff, A. Faria, and N. Morgan
Development of the SRI/Nightingale Arabic ASR System
Proc. of Interspeech. 2008.

J. Chong, Y. Yi, A. Faria, S. Rajagopalan, K. Keutzer
Data-Parallel Large Vocabulary Continuous Speech Recognition on Graphics Processors
Workshop on Emerging Applications and Many-core Architecture (EAMA). 2008.

A. Faria and N. Morgan.
Corrected Tandem Features for Acoustic Model Training.
Intl. Conf. Acoustics, Speech, Signal Processing (ICASSP). 2008.

A. Faria and N. Morgan.
When a Mismatch Can Be Good: Large vocabulary speech recognition trained with idealized Tandem features.
Proc. ACM Symposium on Applied Computing (SAC). 2008.

M-Y. Hwang, G. Peng, W. Wang, A. Faria, A. Heidel.
Building a Highly Accurate Mandarin Speech Recognizer
Proc. Automatic Speech Recognition and Understanding (ASRU). 2007.

S. Petrov, A. Faria, P. Michaillat, A. Berg, A. Stolcke, D. Klein, J. Malik.
Detecting Categories in News Video Using Acoustic, Speech, and Image Features
Proc. TREC Video Retrieval Workshop (TRECVID). 2006.

A. Faria
Accent Classification for Speech Recognition
Proc. Machine Learning and Multimodal Interaction (MLMI), LNCS 3869. 2005.

A. Faria and D. Gelbart
Efficient Pitch-based Estimation of VTLN Warp Factors
Proc. Eurospeech. 2005.

Course-related

An Investigation of Tandem MLP Features for ASR (EE 225D: Spring 2007)
MapReduce: Distributed Computing for Machine Learning (CS 262A: Fall 2006)
HMMs for ASR (CS 188: Spring 2006)
HMM-GMM Acoustic Models for Speech Recognition (CS 281A: Fall 2005)
Estimation of Glottal Source Parameters from Diverse Signals (Ling 113: Spring 2004)
Applied Phonetics: Portuguese Text-to-Speech (Ling 110: Spring 2003)