# ICSI_SPEECH_FAQ # # First thoughts on the FAQ set that we should have at the ICSI Speech group. # 2000-05-17 dpwe@icsi.berkeley.edu # $Header: /n/www/export/htdocs/speech/faq/RCS/ICSI_SPEECH_FAQ,v 1.37 2002/10/18 00:44:19 gelbart Exp gelbart $ # # ? How much of this is redundant with the comp.speech.research FAQ? # # ? Seems to break into theoretical and practical questions. Good to # link them, but need to mark them as different? # Blank lines are currently ignored by the parser (as are comments). # Category headings are detected by starting in the first column; # component questions have whitespace in the first column. Meta What is covered in this FAQ? What documentation resources are available at ICSI? How do I add a new question to the FAQ? How do I add a new answer to the FAQ? Basics What is speech recognition? What are the basic approaches to speech recognition? Why do we use connectionist rather than GMM? What are the different speech corpora at ICSI or elsewhere? Tell me more about the ICSI/Speech file system resources. What disk space can I use? How can I get more? What computational resources are available at ICSI? What are the principal programs used to do speech recognition research at ICSI? How do I search for relevant research publications? How do I write a script for general use? How do I compile the ICSI tools at other sites? What is SPRACHcore? Which C/C++ compilers are available? How do I use run-command to run programs on idle machines? File/data formats I found this file. What is it? What are the wavfile data formats, and how can I manipulate wavfiles? What are the feature data formats? What are the training target data formats? What are the neural net data formats? What are the posterior probabilty data formats? What are the HMM model data formats? What are phi files? How do I build one? What are the dictionary data formats? What are the grammar data formats? What are the label data formats? What are the reference transcript data formats? What about phoneset definitions and files? What are lattices? What other data formats are there? How should I structure the directories for my new task? Signal processing and audio How is the SNR of a speech example defined? How do I convert a time in seconds into a frame index? How can I simulate different acoustic conditions? How can I make acoustic measurements? Features What are features? What are their desirable properties? What features are commonly used? How do you calculate rasta and/or plp features? How do you calculate MSG features? What kinds of normalization are there? How do you calculate them? What are delta features? How do you calculate them? How can I create my own novel features? How do different features compare performancewise? How can I run the SRI front-end standalone? # What situations are they best suited for (spectral, cepstral, plain, plp, rasta, modspec/msg)? Neural nets What is the function of the neural net? What kinds of neural nets are there? How are neural nets trained? How long and how many epochs does it take to train a net? How do I apply a previously-trained neural net to some data? Tell me about the SPERT boards. Tell me about the MultiSPERT systems. How does neural net size affect performance? How does neural net randomization affect performance? What is posterior combination? What other kinds of combination are possible? Training a recognizer What does it mean to train a speech recognizer? I just got this new data. How can I start training from it? How do I get target labels to use in training? What is forced alignment? How can I use the alignment files produced by the SRI trainer? What is embedded training? Grammars and dictionaries What are grammars for? How do I build a bigram grammar for Y0? How do I build an n-gram grammar for noway or chronos? What are dictionaries for? How do I build a dictionary? Decoding What is decoding? What are the decoders we use at ICSI? How do I decode... How do I decode with Y0? How do I decode with Noway? How do I decode with Chronos? How do I decode with HTK??? How do I balance insertions and deletions? What is the format of Noway acoustic scores? How should I test for significant differences between systems? Tangential topics How do I prepare and print a poster?