ICSI Speech FAQ:
3.8 What are phi files? How do I build one?

Answer by: dpwe - 2000-08-15


phi files are data files describing the phone set and phone models to be used in decoding. They are required for chronos, and are supposedly supported as an option by noway, although there have been problems getting them to work with noway.

The phi file format was defined by Cambridge. Recognizing the fact that phone model files were being used only to define minimum durations and exit probabilities, the phi file makes these parameters explicit. phi files also specify the phoneset being used, and, implicitly, the correspondence between phones and the elements in the posterior probability frames being fed to the decoder. There is more information on the abbotDoc page on phone models.

You can build a phi file, including estimating the phone prior, the minimum durations and the exit probabilities, using the SoftSound program states2phi, which takes as input a set of target labels (from a previous alignment) plus a "tab" file, which is simply the phone symbols, one per line, in the order that they occur in the posterior frames. A typical use of states2phi might look like (from /u/drspeech/data):

labcat -opf pre aurora/label/a2-plp+msg-2/a2-train-rand.ilab \
| states2phi -ninp 3 -phntab phonesets/icsi56.tab - aurora/lex/digits.phi

(The -ninp 3 allows states2phi to correctly skip the three pad bytes inserted between each label in the pre file format, which pads frames to multiples of four bytes, with the first byte in each frame being the target/EOS flag).

Warning: chronos (and noway?) get very upset if the prior on any phone is too small - the 16 bit integer representation of the log-probability can 'wrap' around, giving total nonsense results. This can easily happen if there are certain phones in your phoneset (which may be general purpose) that don't occur in your task (which may be small-vocabulary). Thus, after you have built your phi file, you should go in manually and alter any priors (the first numerical column) that are smaller than 0.000200 to be exactly 0.000200.

Note that there are two styles of phi files, selected by the -icsi flag to states2phi. This corresponds to the method of (crude) duration modeling used through repeated states and exit probability variation. We've had mixed results; it's probably worth trying both.

Converting phi files to noway-format phone model files (gelbart, 12/10/02)

This email from Dan explains it:
From dpwe@ee.columbia.edu Tue Dec 10 16:48:06 2002
Date: Tue, 19 Nov 2002 21:39:09 -0500
From: Dan Ellis 
To: David Gelbart 
Subject: Re: noway with phi files 

Dave - 

I never got noway to work with phi files.  I believe it's a known bug,
and it's probably something quite simple.

/u/drspeech/sun4-sunos5/bin/softsound/phi2models

does a pretty good job.  I use -icsi -sil_index 54 (or whatever) 

Then you also have to construct a priors file; I do this by hand based
on the phi file (one number per line which is the prior column from
the phi file).
The phi2models tool will create an interword-pause model. This model is required by noway. It shares the acoustic model of the silence phone (thus the need for the sil_index argument to phi2models, I guess). I don't think a prior needs to be assigned to interword-pause; I think it shares the silence phone's prior. I have had trouble with it when -sil_index was not zero. It seemed like it was not respecting the value of -sil_index that I gave it, since it kept setting the acoustic model of interword-pause to be model 0 in the noway phone model file that it created. I addressed this by manually editing the noway phone model file. I am still not sure why this happened or whether I addressed it the right way.

Previous: 3.7 What are the HMM model data formats? - Next: 3.9 What are the dictionary data formats?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:15 PDT 2009