ICSI Speech FAQ:
3.4 What are the training target data formats?

Answer by: dpwe - 2000-07-27


A neural net training with qnstrn essentially finds a net to map between an input feature archive and a corresponding set of input class labels. These labels consist of a small integer (typically in the range 0-50) for each frame in the feature archive, interpreted as an index into a specific phoneset file (from /u/drspeech/data/phonesets), which then defines the actual monophone class to which this frame has been assigned during labeling. Since every training requires a target file, there are a large number of these target label archives knocking around.

(n.b. the target files described in this page hold almost exactly the same data as phone label files, whose format is described on the label file formats page. The difference is in the use - label files are for editing, display, scoring etc. whereas target files are specifically designed for use in trainings.)

In the old days, both the features and the targets used to be stored in the same file. This made sense, since the two data had to be exactly conformal - we need a target frame for each feature frame, same number of utterances, same order etc. However, this turned out to be not such a great idea, since the features and the labels for a given corpus may take several forms independently. For instance, there may be several different feature calculation methods that you want to try out with the same target labels. More seriously, you may want to try trainings, based on the same features, with several different labellings. This frequently arises in embedded training, where the neural net from an initial training is used to relabel the training set (through forced alignment). We certainly don't want to have multiple copies of the feature archive identical except for different labels, since these are the largest single component of our filesystem. We used to go through rewriting the feature files with new labels as they were updated, but this is messy and dangerous: How can you be sure which labels are currently resident in the feature archive?

So now we have separate files for the labels, and just rely on good management practice to ensure that they exactly match the intended feature files in terms of utterance length etc. (qnstrn will tell you soon enough if they don't). The target label files come in two main types: For the historical reasons above, labels can be stored in pfiles, which is in theory the same file format as the feature archives, but the ones containing labels have the peculiar characteristic of containing zero columns of features (and only the one optional column of labels). These pfiles are often given the extension ".pflab" to distinguish them from regular feature archives. Even so, pfiles are horribly inefficient, needing 12 bytes for each frame.

More recently, the ilab (ICSI label) format has been devised. This uses run-length encoding to store sequences of identical labels (as frequently occur) in a compact manner, and are often 1/30th the size of the same information stored in a pflab file. ilab files use the extension *.ilab .

Different label file formats can be interconverted with labcat. Particularly handy is the -phonesetfile option which allows ascii formats to use the phone symbols rather than the numerical indices. Thus, you can come up with idioms such as the following, to display the phone sequence in a particular utterance of a label file:


labcat -phonesetfile /u/drspeech/data/phonesets/icsi56.phset \
   -ipf ilab -opf ascii \
   -sr 0 /u/drspeech/data/aurora/label/a2-clean/a2-train-rand-i1.ilab \
| awk '{print $3}' | uniq | awk '{printf "%s ",$1;}'

Previous: 3.3 What are the feature data formats? - Next: 3.5 What are the neural net data formats?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:14 PDT 2009