recogviz - Visualizing the speech recognition process


recogviz is an [incr Tcl] program that displays a number of stages of the recognition process. It was originally developed as a spin-off from the BeRP demo, but oriented towards inspecting offline-calculated data rather than live speech input.

The intended operation is as follows: You configure the script (through a config file, eventually, I suppose) to use the feature file, classifier net and decoder parameters that you are researching. You then run the program, and it gives you a display of all the utterance IDs in your pfile (which it got from the listfile you pointed it to, of course). You can then click on difference utterance IDs and it will display the results of running one or more recognizers on that utterance, e.g.:

What it shows you for each recognizer is a spectrogram of the original sound (simple FFT-based with fixed parameter not related to the actual speech features). Below that is the time scale in seconds, then a grayscale image showing the actual feature frames fed to the neural net forward pass (rather uninformative for decorrelated cepstral-style features). Below that is a representation of the posterior probabilities emitted by the forward pass, which feed into the decoder; finally, when the decoder finishes, it generates backtrace information which is displayed as phone and word alignments.


How to use it

The code currently resides in /u/dpwe/projects/recogviz/. The executable is recogviz (a shell script which normalizes the environment then launches the [incr Tcl] windowing shell), and you run it with something like:

    ./recogviz defaults=./defaults.def recog1=./recog1.def

defaults.def and recog1.def are parameter files (actually pure Tcl source) that define the recognizers to be used; the defaults file is intended to define the stuff that mostly stays the same, whereas the recog files define the very specific things.

Specifying recog2=... will cause the program to display a second recognizer underneath the first, feeding the same utterance to both (assuming the pfiles both conform to the same list file!).

Thus, for this demonstration, the defaults file contains the following:

Then the definition file is few specifications needed to define a particular recognition strategy:

Notice how the Tcl variable DPWE is defined in the defaults file but used in the recog file; this is OK because they're actually just executed by the Tcl interpreter one after the other.

These parameter files are just examples; if necessary, you could completely redefine the forward-pass and decoder invocations. But this structure should cover a number of interesting instances. Why not have a go!


Updated: $Date: 1997/10/03 03:40:16 $
DAn Ellis <dpwe@icsi.berkeley.edu>
International Computer Science Institute, Berkeley CA