ICSI Speech FAQ:
2.8 What are the principal programs used to do speech recognition research at ICSI?

Answer by: dpwe+gelbart - 2000-07-19, 2000-11-26

This page gives a brief list of the most important programs used in speech recognition research at ICSI. Many are locally-written, with source in /u/drspeech/src . Most entries link to the relevant man page. Please extend at will.

The quicknet neural-net training program.

Neural-net forward pass with quicknet (for recognition runs).

Feature file manipulation using the quicknet run-time processing (deltas, normalization etc.).

Calculates the global normalization constants for a pfile feature archive.

Universal feature-file conversion and modification (overlaps with qncopy).

Universal label-file conversion and modification (based on feacat).

Copies the arguments of feacat and labcat, but operates on the lines of ascii text files (e.g. filename lists).

Play a soundfile over the local audio hardware. Attempts to guess the correct format

Implements a range of conversions between different sound data and file formats.

Standard feature calculation: converts sets of wav files into Rasta or PLP feature archives.

Classic feature calculation program for Rasta-PLP. Provides the core routines to feacalc, but has a more awkward command-line interface.

Venerable Viterbi HMM stack decoder. Use is now deprecated, mainly because it uses archaic file formats.

Steve Renal's start-synchronous HMM decoder. Our favorite decoder - handles all the right file formats, has lots of options, works well.

The superfast time-first HMM decoder from Tony Robinson/Softsound. This works best for Broadcast News, but is less flexible than noway, and, incidentally, proprietary.

The old, reliable dynamic-programming-based word-error-rate (WER) calculation program written at ICSI. More complex tasks use the much more sophisticated package based around NISTS's sclite.

Brian Kingsbury's integrated training/testing/realignment script, with which he did so many Numbers tests. Deprecated because it uses Y0.

The xwaves program can be used for graphical display and analysis of audio files. It is part of the ESPS/waves+ package from Entropic. Online documentation for the package is available using the program exman. Click here for an explanation of how to set up your environment variables to use the ESPS/waves+ package.

The wavesurfer.tcl program can be used for graphical display and analysis of audio files. The executable is 'wavesurfer.tcl' (if not found, put /u/drspeech/share/bin in your path). If there is a library problem try 'setenv TCLLIBPATH "$SPEECH_DIR/share/lib $SPEECH_DIR/$SPEECH_ARCH/lib"' where SPEECH_DIR is '/u/drspeech' and $SPEECH_ARCH is 'sun4-sunos5' or whatever architecture you are on (it may not presently be installed on other architectures, but the software is portable).

Previous: 2.7 What computational resources are available at ICSI? - Next: 2.9 How do I search for relevant research publications?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:14 PDT 2009