ICSI Speech FAQ:
2.8 What are the principal programs used to do speech recognition research at ICSI?

Answer by: dpwe+gelbart - 2000-07-19, 2000-11-26


This page gives a brief list of the most important programs used in speech recognition research at ICSI. Many are locally-written, with source in /u/drspeech/src . Most entries link to the relevant man page. Please extend at will.

qnstrn
The quicknet neural-net training program.

qnsfwd
Neural-net forward pass with quicknet (for recognition runs).

qncopy
Feature file manipulation using the quicknet run-time processing (deltas, normalization etc.).

qnnorm
Calculates the global normalization constants for a pfile feature archive.

feacat
Universal feature-file conversion and modification (overlaps with qncopy).

labcat
Universal label-file conversion and modification (based on feacat).

linecat
Copies the arguments of feacat and labcat, but operates on the lines of ascii text files (e.g. filename lists).

sndplay
Play a soundfile over the local audio hardware. Attempts to guess the correct format

sndcat
Implements a range of conversions between different sound data and file formats.

feacalc
Standard feature calculation: converts sets of wav files into Rasta or PLP feature archives.

rasta
Classic feature calculation program for Rasta-PLP. Provides the core routines to feacalc, but has a more awkward command-line interface.

y0
Venerable Viterbi HMM stack decoder. Use is now deprecated, mainly because it uses archaic file formats.

noway
Steve Renal's start-synchronous HMM decoder. Our favorite decoder - handles all the right file formats, has lots of options, works well.

chronos
The superfast time-first HMM decoder from Tony Robinson/Softsound. This works best for Broadcast News, but is less flexible than noway, and, incidentally, proprietary.

wordscore
The old, reliable dynamic-programming-based word-error-rate (WER) calculation program written at ICSI. More complex tasks use the much more sophisticated package based around NISTS's sclite.

dr_embed
Brian Kingsbury's integrated training/testing/realignment script, with which he did so many Numbers tests. Deprecated because it uses Y0.

xwaves
The xwaves program can be used for graphical display and analysis of audio files. It is part of the ESPS/waves+ package from Entropic. Online documentation for the package is available using the program exman. Click here for an explanation of how to set up your environment variables to use the ESPS/waves+ package.

wavesurfer.tcl
The wavesurfer.tcl program can be used for graphical display and analysis of audio files. The executable is 'wavesurfer.tcl' (if not found, put /u/drspeech/share/bin in your path). If there is a library problem try 'setenv TCLLIBPATH "$SPEECH_DIR/share/lib $SPEECH_DIR/$SPEECH_ARCH/lib"' where SPEECH_DIR is '/u/drspeech' and $SPEECH_ARCH is 'sun4-sunos5' or whatever architecture you are on (it may not presently be installed on other architectures, but the software is portable).


Previous: 2.7 What computational resources are available at ICSI? - Next: 2.9 How do I search for relevant research publications?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:14 PDT 2009