Recent publications of the ICSI Speech Group
This page contains papers by the
Speech Group at
ICSI.
Most papers are available by
FTP, or through other hyperlinks.
All papers are in Acrobat PDF (.pdf) format. For the most
up-to-date list of our publications, please visit
the ICSI publications page.
You might also be interested in non-speech
papers on the
vector microprocessor architecture,
which we previously used to accelerate speech recognition and training. This was
developed as part of the Realization Group, which was the previous
incarnation of the current Speech Group.
2005
-
Toward Joint Segmentation and Classification of Dialog Acts in Multiparty
Meetings
(preliminary version)
M. Zimmermann, Y. Liu, E. Shriberg, and A. Stolcke
Proc. MLMI, Edinburgh, 2005.
-
Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System
X. Anguera, C. Wooters, B. Peskin, and M. Aguilo
Proc. NIST MLMI Meeting Recognition Workshop, Edinburgh, 2005.
-
Further Progress in Meeting Recognition: The ICSI-SRI Spring
2005 Speech-to-Text Evaluation System
A. Stolcke, X. Anguera, K. Boakye, O. Cetin, F. Grezl, A. Janin, A. Mandal,
B. Peskin, C. Wooters, and J. Zheng
Proc. NIST MLMI Meeting Recognition Workshop, Edinburgh, 2005.
-
Efficient Pitch-based Estimation of VTLN Warp Factors
A. Faria and D. Gelbart
To appear in Proc. Eurospeech, Lisbon, Sep. 2005.
-
Automatic Data Selection for MLP-based Feature Extraction for ASR
Carmen Pelaez-Moreno, Qifeng Zhu, Barry Chen, and Nelson Morgan
To appear in Proc. Eurospeech, Lisbon, Sep. 2005.
-
Improved MLP Structures for Data-Driven Feature Extraction for ASR
Q. Zhu, B. Chen, F. Grezl, and N. Morgan
To appear in Proc. Eurospeech, Lisbon, Sep. 2005.
-
Using MLP Features in SRI's Conversational Speech Recognition System
Q. Zhu, A. Stolcke, B. Y. Chen, and N. Morgan
To appear in Proc. Eurospeech, Lisbon, Sep. 2005.
-
Does Active Learning Help Automatic Dialog Act Tagging in Meeting Data?
A. Venkataraman, Y. Liu, E. Shriberg, and A. Stolcke
To appear in Proc. Eurospeech, Lisbon, Sep. 2005.
-
Speaker Recognition in the Text-Independent Domain Using Keyword Hidden Markov Models
Kofi A. Boakye
M.S. Thesis, University of California at Berkeley, May 2005.
-
The Sequential GMM: A Gaussian Mixture Model Based Speaker Verification
System that Captures Sequential Information
Stephen James Stafford
M.S. Thesis, University of California at Berkeley, May 2005.
-
Learning Discriminant Narrow-Band Temporal Patterns for Automatic
Recognition of Conversational Telephone Speech
Barry Y. Chen
Ph.D. Thesis, University of California at Berkeley, May 2005.
-
Speaker Detection Without Models
D. Gillick, S. Stafford, and B. Peskin
Proc. ICASSP, Philadelphia, March 2005.
-
Tonotopic Multi-Layered Perceptron: A Neural Network for Learning
Long-Term Temporal Features for Speech Recognition
B. Y. Chen, Q. Zhu, and N. Morgan
Proc. ICASSP, Philadelphia, March 2005.
-
Improved Phonetic Speaker Recognition Using Lattice Decoding
A. Hatch, B. Peskin, and A. Stolcke
Proc. ICASSP, Philadelphia, March 2005.
-
Automatic Dialog Act Segmentation and Classification in Multiparty
Meetings
J. Ang, Y. Liu, and E. Shriberg
Proc. ICASSP, Philadelphia, March 2005.
-
Structural Metadata Research in the EARS Program
Y. Liu, E. Shriberg, A. Stolcke, B. Peskin, J. Ang, D. Hillard,
M. Ostendorf, M. Tomalin, P. Woodland, and M. Harper
Proc. ICASSP, Philadelphia, March 2005.
-
The 2004 ICSI-SRI-UW Meeting Recognition System
C. Wooters, N. Mirghafori, A. Stolcke, T. Pirinen,
I. Bulyko, D. Gelbart, M. Graciarena, S. Otterson, B. Peskin,
and M. Ostendorf
Lecture Notes in Computer Science,
Volume 3361, Jan 2005, Pages 196 - 208.
2004
-
The ICSI Meeting Corpus: Close-talking and Far-field, Multi-channel
Transcriptions for Speech and Language Researchers
Jane A. Edwards
LREC 2004, Workshop on Compiling and Processing
Spoken Language Corpora,
Lisbon, Portugal, May 2004.
-
Incorporating Tandem/HATs MLP Features into SRI's Conversational
Speech Recognition System
Q. Zhu, A. Stolcke, B. Y. Chen, and N. Morgan
Proceedings of the EARS RT-04F Workshop,
Palisades, New York, November 2004.
-
Towards Robust Speaker Segmentation: The ICSI-SRI Fall 2004
Diarization System
C. Wooters, J. Fung, B. Peskin, and X. Anguera
Proceedings of the EARS RT-04F Workshop,
Palisades, New York, November 2004.
-
Auditory-based Automatic Speech Recognition
Werner Hemmert, Marcus Holmberg, and David Gelbart
Proc. ISCA Tutorial and Research Workshop on Statistical and Perceptual Audio Processing,
Jeju, Korea, October 2004.
-
Vocabulary and Language Model Adaptation using Information
Retrieval
Brigitte Bigi, Yan Huang, Renato De Mori
Proc. Intl. Conf. Spoken Language Processing,
Jeju, Korea, October 2004.
-
Learning Long-Term Temporal Features in LVCSR Using Neural Networks
Barry Chen, Qifeng Zhu, and Nelson Morgan
Proc. Intl. Conf. Spoken Language Processing,
Jeju, Korea, October 2004.
-
On using MLP features in LVCSR
Q. Zhu, B. Chen, N. Morgan, and A. Stolcke
Proc. Intl. Conf. Spoken Language Processing,
Jeju, Korea, October 2004.
-
The ICSI-SRI-UW Metadata Extraction System
Y. Liu, E. Shriberg, A. Stolcke, D. Hillard, M. Ostendorf, B. Peskin, and
M. Harper
Proc. Intl. Conf. Spoken Language Processing,
Jeju, Korea, October 2004.
-
Using Machine Learning to Cope with Imbalanced Classes in Natural Speech:
Evidence from Sentence Boundary and Disfluency Detection
Y. Liu, E. Shriberg, A. Stolcke, and M. Harper
Proc. Intl. Conf. Spoken Language Processing,
Jeju, Korea, October 2004.
-
From Switchboard to Meetings:
Development of the 2004 ICSI-SRI-UW Meeting Recognition System
N. Mirghafori, A. Stolcke, C. Wooters, T. Pirinen, I. Bulyko,
D. Gelbart, M. Graciarena, S. Otterson, B. Peskin, and M. Ostendorf
Proc. Intl. Conf. Spoken Language Processing,
Jeju, Korea, October 2004.
-
Time delay based failure-robust direction of arrival estimation
T. Pirinen and J. Yli-Hietanen
IEEE SAM 2004,
Sitges, Barcelona, Spain, July 2004.
-
Identifying Agreement and Disagreement in Conversational Speech:
Use of Bayesian Networks to Model Pragmatic Dependencies
M. Galley, K. McKeown, J. Hirschberg, and E. Shriberg
Proc. ACL,
Barcelona, July 2004.
-
Comparing and Combining Generative and Posterior Probability Models:
Some Advances in Sentence Boundary Detection in Speech
Y. Liu, A. Stolcke, E. Shriberg, and M. Harper
Proc. Conf. on Empirical Methods in Natural Language Processing,
Barcelona, Spain, July 2004.
-
Text-Constrained Speaker Recognition on a Text-Independent Task
K. Boakye and B. Peskin
Odyssey 2004 - The Speaker and Language Recognition Workshop,
Toledo, Spain, June 2004.
-
The ICSI Meeting Project: Resources and Research
A. Janin, J. Ang, S. Bhagat, R. Dhillon, J. Edwards,
J. Macias-Guarasa, N. Morgan, B. Peskin, E. Shriberg, A. Stolcke,
C. Wooters, B. Wrede
NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, May 2004.
-
Progress in Meeting Recognition:
The ICSI-SRI-UW Spring 2004 Evaluation System
A. Stolcke, C. Wooters, N. Mirghafori, T. Pirinen, I. Bulyko,
D. Gelbart, M. Graciarena, S. Otterson, B. Peskin, and M. Ostendorf
NIST ICASSP 2004 Meeting Recognition Workshop, Montreal, May 2004.
-
Detection and compensation of sensor malfunction in time delay
based direction of arrival estimation
T. Pirinen, J. Yli-Hietanen, P. Pertilä and A. Visa
In Proc. IEEE ISCAS, Vancouver, May 2004.
-
Improving Automatic Sentence Boundary Detection with Confusion Networks
D. Hillard, M. Ostendorf, A. Stolcke, Y. Liu, and E. Shriberg
Proc. HLT-NAACL Conference, April-May 2004, Boston.
-
The ICSI Meeting Recorder Dialog Act (MRDA) Corpus
E. Shriberg, R. Dhillon, S. Bhagat, J. Ang, and H. Carvey
Proc. HLT-NAACL SIGDIAL Workshop, April-May 2004, Boston.
-
Meeting Recorder Project: Dialog Act Labeling Guide
R. Dhillon, S. Bhagat, H. Carvey, and E. Shriberg
ICSI Technical Report TR-04-002
-
TRAPping Conversational Speech: Extending TRAP/Tandem approaches to
conversational telephone speech recognition
N. Morgan, B. Y. Chen, Q. Zhu, and A. Stolcke
Proc. IEEE ICASSP,
Montreal, May 2004.
-
Parameterization of the Score Threshold for a Text-Dependent Adaptive Speaker Verification System
N. Mirghafori and M. Hebert
Proc. IEEE ICASSP,
Montreal, May 2004.
-
Desperately Seeking Impostors: Data-Mining for Competitive Impostor Testing in a Text-Dependent Speaker Verification System
M. Hebert and N. Mirghafori
Proc. IEEE ICASSP,
Montreal, May 2004.
-
Direct Modeling of Prosody: An Overview of Applications in Automatic Speech Processing
E. Shriberg and A. Stolcke
Proc. International Conference on Speech Prosody,
Nara, Japan, March 2004.
-
Prosody Modeling for Automatic Speech Recognition and Understanding
E. Shriberg and A. Stolcke
Mathematical Foundations of Speech and Language Modeling,
M. Johnson, M. Ostendorf, S. Khudanpur, R. Rosenfeld (eds.),
Volume 138 in IMA Volumes in Mathematics and its Applications, pp. 105-114,
Springer-Verlag.
-
Scaling up: learning large-scale recognition methods from small-scale recognition tasks
N. Morgan, B. Chen, Q. Zhu, and A. Stolcke
Special Workshop in Maui(SWIM)
paper 218
-
Show what you know: musings on the reporting of negative results in speech recognition research
H. Hermansky and N. Morgan
Journal of Negative Results in Speech and Audio Sciences
2004 Issue
2003
-
A Robust Speaker Clustering Algorithm
J. Ajmera and C. Wooters
Proc. IEEE Speech Recognition and Understanding Workshop,
St. Thomas, U.S. Virgin Islands, Dec. 2003.
-
The Relationship Between Dialogue Acts and Hot Spots in Meetings
B. Wrede and E. Shriberg
Proc. IEEE Speech Recognition and Understanding Workshop,
St. Thomas, U.S. Virgin Islands, Dec. 2003.
-
Pitch-based Vocal Tract Length Normalization
Arlo Faria
ICSI Technical Report TR-03-001
-
Detection Of Agreement vs. Disagreement In Meetings: Training With
Unlabeled Data
D. Hillard, M. Ostendorf, and E. Shriberg
Proc. HLT-NAACL Conference, Edmonton, Canada, May 2003
-
Automatic disfluency identification in conversational speech using
multiple knowledge sources
Y. Liu, E. Shriberg, and A. Stolcke
EUROSPEECH 2003, Geneva, September 2003
-
Spotting "Hot Spots" in Meetings: Human Judgements and
Prosodic Cues
B. Wrede and E. Shriberg
EUROSPEECH 2003, Geneva, September 2003
-
Learning Discriminative Temporal Patterns in Speech: Development of
Novel TRAPS-Like Classifiers
B. Chen, S. Chang, and S. Sivadas
EUROSPEECH 2003, Geneva, September 2003
-
Far-field ASR on Inexpensive Microphones
L. Docio-Fernandez, D. Gelbart, and N. Morgan
EUROSPEECH 2003, Geneva, September 2003
-
Feature Transformations and Combinations for Improving ASR Performance
P. Somervuo, B. Chen, Q. Zhu
EUROSPEECH 2003, Geneva, September 2003
-
Data-Driven Speaker and Subword Unit Clustering in Speech Processing
Micha Hersch
EPFL Diploma Thesis, ICSI, March 2003
-
Automatically Generated Prosodic Cues to Lexically Ambiguous Dialog
Acts in Multiparty Meetings
S. Bhagat, H. Carvey, E. Shriberg
ICPhS 2003, Barcelona, August 2003
-
The SuperSID Project: Exploiting high-level information for
high-accuracy speaker recognition
D. Reynolds, W. Andrews, J. Campbell, J. Navratil, B. Peskin,
A. Adami, Q. Jin, D. Klusacek, J. Abramson, R. Mihaescu, J. Godfrey,
D. Jones, and B. Xiang
ICASSP-2003, Hong Kong, April 2003
-
Using prosodic and conversational features for high-performance
speaker recognition: Report from JHU WS'02
B. Peskin, J. Navratil, J. Abramson, D. Jones, D. Klusacek,
D. Reynolds, and B. Xiang
ICASSP-2003, Hong Kong, April 2003
-
The ICSI Meeting Corpus
A. Janin, D. Baron, J. Edwards, D. Ellis, D. Gelbart,
N. Morgan, B. Peskin, T. Pfau, E. Shriberg, A.
Stolcke, C. Wooters
ICASSP-2003, Hong Kong, April 2003
-
Meetings about meetings: research at ICSI on speech in multiparty conversations
N. Morgan, D. Baron, S. Bhagat, H. Carvey, R.
Dhillon, J. Edwards, D. Gelbart, A. Janin, A. Krupski,
B. Peskin, T. Pfau, E. Shriberg, A. Stolcke, and C.
Wooters
ICASSP-2003, Hong Kong, April 2003
-
Experiments With Linear And Nonlinear Feature Transformations In
HMM Based Phone Recognition
Panu Somervuo
ICASSP-2003, Hong Kong, April 2003
2002
-
Prosodic Cues For Emotion Recognition In Communicator Dialogs
Jeremy C. Ang
M.S. Thesis, University of California at Berkeley, December 2002.
-
A Syllable, Articulatory-Feature, and Stress-Accent Model of Speech Recognition
Shuangyu Chang
Ph.D. Thesis, University of California at Berkeley, September 2002.
-
Prosody-Based Automatic Detection of Annoyance and Frustration in
Human-Computer Dialog
J. Ang, R. Dhillon, A. Krupski, E. Shriberg, and A. Stolcke
ICSLP-2002, Denver, Colorado, USA, September 2002.
-
Automatic Punctuation and Disfluency Detection in Multi-Party Meetings
Using Prosodic and Lexical Cues
D. Baron, E. Shriberg, and A. Stolcke
ICSLP-2002, Denver, Colorado, USA, September 2002.
-
Qualcomm-ICSI-OGI Features for ASR
A. Adami, L. Burget, S. Dupont, H. Garudadri, F. Grezl, H. Hermansky, P. Jain, S. Kajarekar, N. Morgan, and S. Sivadas
ICSLP-2002, Denver, Colorado, USA, September 2002.
-
Improving Word Accuracy with Gabor Feature Extraction
Michael Kleinschmidt and David Gelbart
The accompanying web page is here.
ICSLP-2002, Denver, Colorado, USA, September 2002.
-
Spectro-temporal Gabor Features as a Front End for Automatic Speech Recognition
Michael Kleinschmidt
Forum Acusticum 2002, Seville, Spain, September 2002.
-
Double the Trouble: Handling Noise and Reverberation in Far-Field Automatic Speech Recognition
David Gelbart and Nelson Morgan
There is additional information here.
ICSLP-2002, Denver, Colorado, USA, September 2002.
-
Speech Modeling Using Variational Bayesian Mixture of Gaussians
Panu Somervuo
ICSLP-2002, Denver, Colorado, USA, September 2002.
-
Prosody-Based Automatic Detection of Punctuation and Interruption Events
in the ICSI Meeting Recorder Corpus
D. Baron
M.S. Thesis, University of California at Berkeley, May 2002.
-
Using Prosodic and Lexical Information for Speaker Identification
F. Weber, L. Manganaro, B. Peskin, E. Shriberg
ICASSP-2002, Orlando, Florida, USA, May 2002.
-
Hierarchical Tandem Feature Extraction
S. Sivadas and H. Hermansky
ICASSP-2002, Orlando, Florida, USA, May 2002.
-
A New Speaker Change Detection Method for Two-Speaker Segmentation
A. Adami, S. Kajarekar and H. Hermansky
ICASSP-2002, Orlando, Florida, USA, May 2002.
-
Reducing the Effect of Room Acoustics on Human-Computer Interaction
David Gelbart
Avios-2002, San Jose, California, USA, May 2002.
2001
-
Multispeaker Speech Activity Detection
for the ICSI Meeting Recorder
T. Pfau, D. Ellis, and A. Stolcke
Proceedings Automatic Speech Recognition and Understanding
Workshop (ASRU), Trento, Italy, December 2001.
-
Evaluating Long-term Spectral Subtraction for Reverberant ASR
David Gelbart and Nelson Morgan
There is a correction and additional information here.
ASRU-2001, Madonna di Campiglio, Italy, December 2001.
-
Relating Frame Accuracy with Word Error in Hybrid ANN-HMM ASR
Michael Shire
Eurospeech-2001, Aalborg, September 2001.
-
Can Prosody Aid the Automatic Processing of Multi-Party Meetings?
Evidence from Predicting Punctuation, Disfluencies, and Overlapping
Speech
Elizabeth Shriberg, Andreas Stolcke, and Don Baron
ISCA Tutorial and Research Workshop on Prosody in Speech Recognition and
Understanding, Red Bank, NJ, October 2001.
-
SpeechCorder, The Portable Meeting Recorder
Adam Janin and Nelson Morgan
Workshop on Hands-Free Speech Communication
Kyoto, Japan, April 2001.
-
Meeting Recorder
Adam Janin
Avios, San Jose, April 2001.
-
Robust ASR front-end using spectral-based and discriminant features:
experiments on the Aurora tasks,
Carmen Benitez, Lukas Burget, Barry Chen, Stephane Dupont, Hari
Garudadri, Hynek Hermansky, Pratibha Jain, Sachin Kajarekar, Sunil Sivadas
Eurospeech-2001, Aalborg, September 2001.
-
Observations on Overlap: Findings and Implications for Automatic Processing
of Multi-Party Conversation
Elizabeth Shriberg, Andreas Stolcke, and Don Baron
Eurospeech-2001, Aalborg, September 2001.
-
An Elitist Approach to Articulatory-Acoustic Feature Classification.
Chang, S., Greenberg, S., Wester, M.
Eurospeech-2001, Aalborg, September 2001.
-
From Here to Utility -Melding Phonetic Insight with
Speech Technology.
Greenberg, S.
Eurospeech-2001, Aalborg, September 2001.
-
Whither Speech Technology? -A Twenty-First Century
Perspective.
Greenberg, S.
Eurospeech-2001, Aalborg, September 2001.
-
The Relation Between Speech
Intelligibility and the Complex Modulation Spectrum.
Greenberg, S., Arai, T.
Eurospeech-2001, Aalborg, September 2001.
-
Vowel Height is Intimately Associated with Stress Accent in Spontaneous American English Discourse.
Hitchcock, L., Greenberg, G.
Eurospeech-2001, Aalborg, September 2001.
-
A Dutch Treatment of an Elitist Approach to Articulatory-Acoustic Feature Classification.
Wester, M., Greenberg, S., Chang, S.
Eurospeech-2001, Aalborg, September 2001.
-
Multi-Stream ASR trained with Heterogeneous Reverberant Environments
Michael L. Shire
ICASSP-2001, Salt Lake City, May 2001.
-
Global Posterior Probability Estimates as Confidence Measures in an Automatic Speech Recognition System
Warren Warren
ICASSP-2001, Salt Lake City, May 2001.
-
The Meeting Project at ICSI
Nelson Morgan, Don Baron, Jane Edwards, Dan Ellis, David Gelbart,
Adam Janin, Thilo Pfau, Elizabeth Shriberg, and Andreas Stolcke
Human Language Technologies Conference, San Diego, March 2001
-
Speech Intelligibility Derived From
Asynchrounous Processing of Auditory-Visual Information.
Grant, K.W., Greenberg, S.
AVSP Workshop, 2001.
-
Corpus Variation and Parser Performance
Daniel Gildea
Empirical Methods in Natural Language Processing, Pittsburgh, June 2001
-
Word-Level Confidence Estimation for Automatic
Speech Recognition
Andy Hatch
M.S. Thesis, University of California at Berkeley, August 2001.
2000
-
Using mutual information to design feature combinations
Dan Ellis and Jeff Bilmes
ICSLP-2000, Beijing, October 2000
-
Decoding speech in the presence of other sound sources
Jon Barker, Martin Cooke, and Dan Ellis
ICSLP-2000, Beijing, October 2000
-
Using acoustic condition clustering to improve acoustic change detection on Broadcast News
Javier Ferreiros Lopez, and Dan Ellis
ICSLP-2000, Beijing, October 2000
-
Consonant discrimination in elicited and spontaneous speech: A case for signal-adaptive front ends in ASR
Kemal Sönmez, Madelaine Plauché, Elizabeth Shriberg, and Horacio Franco
ICSLP-2000, Beijing, October 2000
-
On data-derived temporal processing in speech feature extraction
Michael Shire and Barry Chen
ICSLP-2000, Beijing, October 2000
-
Automatic Phonetic Transcription of Spontaneous Speech American English
Shawn Chang, Lokendra Shastri, and Steven Greenberg
ICSLP-2000, Beijing, October 2000
-
A comparison of data-derived and knowledge-based modeling of pronunciation variation
Mirjam Wester and Eric Fosler-Lussier
ICSLP-2000, Beijing, October 2000
-
Automatic Labeling of Semantic Roles
Daniel Gildea and Daniel Jurafsky
ACL-2000, Hong Kong, October 2000, pp. 512-520
-
Tandem connectionist feature stream extraction for conventional HMM systems
Hynek Hermansky, Dan Ellis, and Sangita Sharma
ICASSP-2000, Istanbul, June 2000, III-1635-1638
-
Feature extraction using non-linear transformation for robust speech recognition on the Aurora database
Sangita Sharma, Dan Ellis, Sachin Kajarekar, Pratibha Jain, and Hynek Hermansky
ICASSP-2000, Istanbul, June 2000, II-1117-1120
-
Data-driven RASTA filters in reverberation
Mike Shire, Barry Chen
ICASSP-2000, Istanbul, June 2000, III-1627-1630
-
Improved recognition by combining different features and different systems
D.P.W. Ellis
Proc. AVIOS-2000, San Jose, May 2000
-
Stream combination before and/or after the acoustic model
D.P.W. Ellis
ICSI Technical Report, TR-00-007, Berkeley, CA.
-
An introduction to the diagnostic evaluation of the Switchboard-corpus automatic speech recognition systems
Steven Greenberg, Shawn Chang, and Joy Hollenback
NIST Speech Transcription Workshop, College Park, MD, May 16-19, 2000
-
Prosodic stress revisited: Reassessing the fole of fundamental frequency
Rosaria Silipo and Steven Greenberg
NIST Speech Transcription Workshop, College Park, MD, May 16-19, 2000
-
The uninvited guest: Information's role in guiding the production of spontaneous speech
Steven Greenberg and Eric Fosler-Lussier
Crest Workshop on Models of Speech Production: Motor Planning and Articulatory Modelling, Kloster Seeon, Germany, May 1-4, 2000
-
Linguistic dissection of switchboard-corpus automatic speech recognition
systems,
Steven Greenberg and Chang, S.
ISCA Workshop on Automatic Speech Recognition: Challenges for the New
Millennium, Paris, 2000.
-
Discriminant Training of Front-End and Acoustic Modeling Stages to
Heterogeneous Acoustic Environments for Multi-stream Automatic Speech
Recognition
Michael Shire
PhD Dissertation, University of California at Berkeley, Fall 2000.
1999
-
Effects of Speaking Rate and Word Frequency on Conversational Pronunciations,
Eric Fosler-Lussier and Nelson Morgan
Speech Communication 29 2-4, pp. 37-157
-
Using knowledge to organize sound: The prediction-driven approach to computational auditory scene analysis and its application to speech/nonspeech mixtures
Dan Ellis
Speech Communication 27 3-4, pp. 281-298
-
Contextual word and syllable pronunciation models
Eric Fosler-Lussier
ASRU-99, Keystone CO, December 1999
-
Combined speech and speaker recognition with speaker-adapted connectionist models
Dominique Genoud, Dan Ellis and Nelson Morgan
ASRU-99, Keystone CO, December 1999
-
Multi-Level Decision Trees for Static and Dynamic Pronunciation Models
Eric Fosler-Lussier
Eurospeech-99, Budapest, pp. I-463-466
-
Multi-stream speech recognition: Ready for prime time?
Adam Janin, Dan Ellis and Nelson Morgan
Eurospeech-99, Budapest, pp. II-591-594
-
Speech/music discrimination based on posterior probability features
Gethin Williams and Dan Ellis
Eurospeech-99, Budapest, pp. II-687-690
-
Temporal constraints on speech intelligibility as deduced from exceedingly sparse spectral representations
Rosaria Silipo, Steven Greenberg and Takayuki Arai
Eurospeech-99, Budapest, pp. VI-2687-2690
Accompanying sound files
-
Data-driven modulation filter design under adverse acoustic conditions and using phonetic and syllabic units
Michael L. Shire
Eurospeech-99, Budapest, pp. III-1123-1126
-
Topic-based language models using EM
Daniel Gildea and Thomas Hofmann
Eurospeech-99, Budapest, pp. V-2167-2170
-
Sooner or Later: Exploring Asynchrony in Multi-Band Speech Recognition
Nikki Mirghafori and Nelson Morgan
Eurospeech-99, Budapest, pp. II-595-598
-
Dynamic Pronunciation Models for Automatic Speech Recognition
Eric Fosler-Lussier
PhD Dissertation, University of California at Berkeley, August 1999.
-
Forms of English function words - Effects of disfluencies, turn position,
age and sex, and predictability
Alan Bell, Daniel Jurafsky, Eric Fosler-Lussier, Cynthia Girand and Daniel Gildea
Int. Cong. of Phonetic Sciences, San Francisco, August 1999, pp. 1:395-398
-
Incorporating contextual phonetics into automatic speech recognition
Eric Fosler-Lussier, Steven Greenberg and Nelson Morgan
Int. Cong. of Phonetic Sciences, San Francisco, August 1999, pp. 1:611-614
-
Statistical Acoustic Indications of Coarticulation
Katrin Kirchoff and Jeff Bilmes
Int. Cong. of Phonetic Sciences, San Francisco, August 1999, pp. 3:1729-1732
-
Syllable Detection and Segmentation Using Temporal Flow Neural Networks
Lokendra Shastri, Shuangyu Chang and Steven Greenberg
Int. Cong. of Phonetic Sciences, San Francisco, August 1999, pp. 3:1721-1724
-
Automatic Transcription of Prosodic Stress for Spontaneous English Discourse
Rosaria Silipo and Steven Greenberg
Int. Cong. of Phonetic Sciences, San Francisco, August 1999, pp. 3:2351-2354
-
Natural Statistical Models for Automatic Speech Recognition
Jeff Bilmes
PhD Dissertation, University of California at Berkeley, May 1999.
-
Buried Markov models for speech recognition
Jeff Bilmes
ICASSP-99, Phoenix, pp. II-713-716
-
Size matters: An empirical study of neural network training for large vocabulary continuous speech recognition
Dan Ellis and Nelson Morgan
ICASSP-99, Phoenix, pp. II-1013-1016
-
Dynamic classifier combinations in hybrid speech recognition systems using utterance-level confidence values
Katrin Kirchhoff and Jeff Bilmes
ICASSP-99, Phoenix, pp. II-693-696
-
Using Boosting to Improve a Hybrid HMM/Neural Network Speech Recognizer
Holger Schwenk
ICASSP-99, Phoenix, pp. II-1009-1012
-
Not just what, but also when: Guided automatic pronunciation modeling for Broadcast News
Eric Fosler-Lussier and Gethin Williams
DARPA Broadcast News Transcription and Understanding Workshop, Herndon VA, 1999feb28
-
Reducing errors by increasing the error rate: MLP Acoustic Modeling for Broadcast News Transcription
Nelson Morgan, Dan Ellis, Eric Fosler-Lussier, Adam Janin, and Brian Kingsbury
DARPA Broadcast News Transcription and Understanding Workshop, Herndon VA, 1999feb28
-
An Overview of the SPRACH System for the Transcription of Broadcast News
Gary Cook, James Christie, Dan Ellis, Eric Fosler-Lussier, Yoshi Gotoh, Brian Kingsbury, Nelson Morgan, Steve Renals, Tony Robinson, and Gethin Williams
DARPA Broadcast News Transcription and Understanding Workshop, Herndon VA, 1999feb28
1998
-
Speech Recognition with Dynamic Bayesian Networks
Geoff Zweig
PhD Dissertation, University of California at Berkeley, Spring 1998.
-
Speech intelligibility in the presence of cross-channel spectral asynchrony
Takayuki Arai and Steven Greenberg
ICASSP-98, Seattle, pp. 933-936
-
Data-Driven Extensions to HMM Statistical Dependencies
Jeff Bilmes
ICSLP-98, Sydney, Australia, pp. 69-72
-
Maximum Mutual Information Based Reduction Strategies for
Cross-Correlation based Joint Distributional Modeling
Jeff Bilmes
ICASSP-98, Seattle, pp. 469-472
-
The auditory organization of speech in listeners and machines
Martin Cooke, and Dan Ellis
ICSI Technical Report, TR-98-016, Berkeley, CA.
-
Midlevel representations for computational auditory scene analysis: The weft element
Dan Ellis and David Rosenthal
In Computational Auditory Scene Analysis, D.F. Rosenthal, and H.G. Okuno, eds., Lawrence Erlbaum, pp. 257-272
-
Effects of Speaking Rate and Word Predictability on Conversational
Pronunciations
Eric Fosler-Lussier and Nelson Morgan
ESCA Workshop on Modeling Pronunciation for ASR, May 1998
-
Recognition in a new key - Towards a science of spoken
language
Steven Greenberg
ICASSP-98, Seattle, pp. 1041-1045
-
Speaking in shorthand - A syllable-centric perspective for
understanding pronunciation variation
Steven Greenberg
ESCA Workshop
on Modeling Pronunciation Variation for Automatic Speech Recognition,
Kekrade Netherlands, pp. 47-56
-
Speech intelligibility is highly tolerant of
cross-channel spectral asynchrony
Steven Greenberg and Takayuki Arai
Joint Meeting of the
Acoustical Society of America and the International Congress on Acoustics,
Seattle, pp. 2677-2678
-
Speech intelligibility derived from
exceedingly sparse spectral information
Steven Greenberg, Takayuki Arai, and Rosaria Silipo
ICSLP-98, Sydney, Australia, pp. 74-77
-
Robust speech recognition using the modulation spectrogram
Brian Kingsbury, Nelson Morgan and Steven Greenberg
Speech Communication 25 pp. 117-132
-
Perceptually-inspired signal processing strategies for robust speech recognition in reverberant environments
Brian Kingsbury
PhD Dissertation, University of California at Berkeley, December 1998.
-
A Multi-Band Approach to Automatic Speech Recognition
Nikki Mirghafori
PhD Dissertation, University of California at
Berkeley, December 1998. Reprinted as ICSI Technical Report,
TR-99-004, Berkeley, CA, January 1999.
-
Combining Connectionist Multi-Band
and Full-Band Probability Streams for Speech Recognition of Natural Numbers
Nikki Mirghafori and Nelson Morgan
ICSLP-98, Sydney, Australia, pp. 743-746
-
Transmissions and Transitions: A
Study of Two Common Assumptions in Multi-Band ASR
Nikki Mirghafori and Nelson Morgan
ICASSP-98, Seattle, pp. 713-716
-
Combining Multiple Estimators of Speaking Rate
Nelson Morgan and Eric Fosler-Lussier
ICASSP-98, Seattle, pp. 729-732
-
Incorporating Information from Syllable-length Time Scales into
Automatic Speech Recognition
Su-Lin Wu
Ph.D. Thesis, UC Berkeley, Spring 1998, ICSI Technical Report TR-98-014.
-
Incorporating Information from Syllable-length Time Scales into
Automatic Speech Recognition
Su-Lin Wu, Brian Kingsbury, Nelson Morgan, and Steven Greenberg
ICASSP-98, Seattle, pp. 721-724
-
Performance
improvements through combining phone- and syllable-length information
in automatic speech recognition
Su-Lin Wu, Brian Kingsbury, Nelson Morgan, and Steven Greenberg
ICSLP-98, Sydney, Australia, pp. 854-857
1997
- The
temporal properties of spoken Japanese are similar to those of English
Takayuki Arai and Steven Greenberg
Eurospeech-97, Rhodes, vol. 2 pp. 1011-1014
- Speech
Recognition using On-line Estimation of Speaking Rate
Nelson Morgan, Eric Fosler and Nikki Mirghafori
Eurospeech-97, Rhodes, vol. 4 pp. 2079-2082
- On
the origins of speech intelligibility in the real world
Steven Greenberg
ESCA workshop of Robust Speech Recog., Pont-a-Mousson, pp. 23-32
-
Robust features and environmental compensation: A few comments
Nelson Morgan
ESCA workshop of Robust Speech Recog., Pont-a-Mousson, pp. 43-44
- Improving
ASR performance for reverberant speech
Brian Kingsbury, Nelson Morgan and Steven Greenberg
ESCA workshop of Robust Speech Recog., Pont-a-Mousson, pp. 87-90
- A
Space-Time theory of Pitch and Timbre based on Cortical Expansion of the
Cochlea Traveling Wave Delay
Steven Greenberg, D. Poeppel and T.Roberts
XIth Int. Symp. on Hearing, Grantham
- The
modulation spectrogram: In pursuit of an invariant representation of speech
Steven Greenberg and Brian Kingsbury
ICASSP-97, Munich, vol. 3 pp. 1647-1650
- Recognizing
reverberant speech with RASTA-PLP
Brian Kingsbury and Nelson Morgan
ICASSP-97, Munich, vol. 2 pp. 1259-1262
- Integrating
syllable boundary information into speech recognition
Su-Lin Wu, Michael Shire, Steven Greenberg and Nelson Morgan
ICASSP-97, Munich, vol. 2 pp. 987-990
- The
Weft: A representation for periodic sounds
Dan Ellis
ICASSP-97, Munich, vol. 2 pp. 1307-1310
- Computational
Auditory Scene Analysis exploiting Speech-Recognition knowledge
Dan Ellis
IEEE workshop on Apps. of Sig. Proc. to Aud. & Acous.,
Mohonk
- Joint Distributional Modeling with Cross-Correlation
Based Features
Jeff Bilmes
ASRU-97, Santa Barbara, pp. 148-155
1996
- Towards
Robustness to Fast Speech in ASR
Nikki Mirghafori, Eric Fosler and Nelson Morgan
ICASSP-96, Atlanta
- REMAP
- Experiments with speech recognition
Yochai Konig, Hervé Bourlard and Nelson Morgan
ICASSP-96, Atlanta
- On
Reversing the Generation Process in Optimality Theory
Eric Fosler
ACL-96, Santa Cruz
- Automatic
Learning of Word Pronunciation from Data
Eric Fosler, Mitch Weintraub, Steven Wegmann, Yu-Huang Kao, Sanjeev
Khudanpur, Charles Galles and Murat Saraclar
ICSLP-96, Philadelphia
- Stochastic
perceptual speech models with durational dependence
Jeff Bilmes, Nelson Morgan, Su-Lin Wu and Hervé Bourlard
ICSLP-96, Philadelphia
- Insights
into spoken language gleaned from phonetic transcriptions of the Switchboard
corpus
Steven Greenberg, Joy Hollenback and Dan Ellis
ICSLP-96, Philadelphia
- Prediction-driven
computational auditory scene analysis for dense sound mixtures
Dan Ellis
ESCA workshop on Aud. Basis of Speech Percept., Keele '96
- Understanding
Speech Understanding
Steven Greenberg
ESCA workshop on Aud. Basis of Speech Percept., Keele '96
1995
- Digit
Recognition with Stochastic Perceptual Models
Nelson Morgan, Su-Lin Wu, and Herve Bourlard.
Eurospeech-95, Madrid
- Building
Multiple Pronunication Models for Novel Words using Exploratory Computational
Phonology
Gary Tajchman, Eric Fosler and Daniel Jurafsky
Eurospeech-95, Madrid
- REMAP:
Recursive Estimation and Maximization of A Posteriori probabilities
in connectionist speech recognition
Hervé Bourlard, Yochai Konig and Nelson Morgan
Eurospeech-95, Madrid
- Fast
Speakers in Large Vocabulary Continuous Speech Recognition: Analysis &
Antidotes
Nikki Mirghafori, Eric Fosler and Nelson Morgan
Eurospeech-95, Madrid
- Stochastic
Perceptual Models of Speech
Nelson Morgan, Herve Bourlard, Steven Greenberg, Hynek Hermansky, and
Su-Lin Wu
ICASSP-95, Detroit
- Using
A Stochastic Context-Free Grammar as a Language Model for Speech Recognition
Daniel Jurafsky, Chuck Wooters, Jonathan Segal, Andreas Stolcke, Eric
Fosler, Gary Tajchman and Nelson Morgan
ICASSP-95, Detroit
- SPAM:
Experiments with Digit Recognition
Nelson Morgan, Su-Lin Wu, and Herve Bourlard
Speech Research Symposium '95
- Remap
modeling for connectionist speech recognition
Yochai Konig, Hervé Bourlard and Nelson Morgan
Speech Research Symposium '95
- Learning
Phonological Rule Probabilities from Speech Corpora with Exploratory Computational
Phonology
Gary Tajchman, Daniel Jurafsky and Eric Fosler
ACL-95, Boston
- REMAP:
Recursive Estimation and Maximization of A Posteriori Probabilities - Application
to Transition-Based Connectionist Speech Recognition
Yochai Konig, Hervé Bourlard and Nelson Morgan
NIPS-95
- Why
Is ASR Harder For Fast Speech And What Can We Do About It?
Nikki Mirghafori, Eric Fosler and Nelson Morgan
IEEE Snowbird workshop '95
- Transition-based
statistical training for ASR
Nelson Morgan, Yochai Konig, Su-Lin Wu and Hervé Bourlard
IEEE Snowbird workshop '95
- An Introduction to Hybrid HMM/Connectionist Continuous Speech Recognition
Nelson Morgan and Hervé Bourlard
IEEE Signal Processing Magazine, pp. 25-42, May 1995
1994
- The
Berkeley Restaurant Project
Daniel Jurafsky, Chuck Wooters, Gary Tajchman, Jonathan Segal, Andreas
Stolcke, Eric Fosler and Nelson Morgan.
ICSLP-94
- Multiple-Pronunciation
Lexical Modeling in a Speaker Independent Speech Understanding System
Chuck Wooters and Andreas Stolcke
ICSLP-94
- Stochastic
Perceptual Auditory-Event-Based Models for Speech Recognition
Nelson Morgan, Hervé Bourlard, Steven Greenberg and Hynek Hermansky
ICSLP-94
- Modeling
Dynamics in Connectionist Speech Recognition - the Time Index Model
Yochai Konig and Nelson Morgan
ICSLP-94
- Integrating
Experimental Models of Syntax, Phonology, and Accent/Dialect in a Speech
Recognizer
Daniel Jurafsky, Chuck Wooters, Gary Tajchman, Jonathan Segal, Andreas
Stolcke and Nelson Morgan
AAAI-94
- Parallel Training of MLP Probability Estimators for Speech Recognition: A Gender-Based Approach
Nikki Mirghafori, Nelson Morgan and Hervé Bourlard
NNSP-94, Greece
1993
- Connectionist Speech Recognition: A Hybrid Approach
Hervé Bourlard and Nelson Morgan
Kluwer Press, 1993
1992
wooters@icsi.berkeley.edu - $Date: 2007/11/01 00:10:33 $ GMT