ICSI Speech FAQ:
3.13 What about phoneset definitions and files?

Answer by: dpwe - 2000-08-02


Our neural net classifiers estimate posterior probabilities for a set of monophones; these numbers show up in separate columns of the LNA output of the net, but of course there's the problem of establishing the correspondence between different columns in the file and specific phone symbols. An equivalent statement of this problem is how to interpret the numerical indices in target files in terms of the phone symbols used in phone label files. This information is typically defined by a phoneset file (*.phset), and the few standard sets we use are in /u/drspeech/data/phonesets . The basic format simply starts with the total number of phone symbols, then lists them in order along with the numerical index to which they correspond (in sequence from 0 to N-1).

The phoneset definition is a profound attribute of the recognizer, since all the acoustic models and pronunciations depend on it, thus it's nice if we could use just one for all tasks. In fact that's not possible, since different dialects (and different languages) need different phone sets. But even beyond that, historical factors result in several slightly different phonesets for basically equivalent tasks. The ones you might come across include:

icsi56
For a while, this was almost the only phoneset used at ICSI. It's large, including some rare phones like /q/ (glottal stop) and /nx/ (nasal flap). You can spot target label files indexed into this set because /h#/ (background) is number 54.
cam54
Much like icsi56, except without /q/ (maps to ? /tcl/) and /nx/ (maps to ? /n/ or possibly /dx/), and in a different order (stops and bursts are at the end rather than the beginning, and, critically, /h#/ is 0 instead of 54). This was what Cambridge was using for Broadcast News, so we use it in our Broadcast News-derived large-vocabulary work too.
timit61
The even more comprehensive phone set defined for the original TIMIT task. Includes all of icsi56, plus /eng/ (syllabic /ng/, maps to /ng/), /ux/ (rounded /iy/, that french thing, maps to /uw/), /epi/ (epinthetic stop, conventionally mapped to h#), /pau/ (short pause, maps to h#) and /ax-h/ (aspirated schwa, maps to /ax/). h# is 57 in this set; you won't come across it often.
beep45
45 phone set used at Cambridge for original British English work (esp. Tony's BEEP dictionary).
bbc42
Cut down version of BEEP used in BBC Broadcast News recognition in the THISL project. Three dipthongs in beep45 were discarded and mapped as follow: /ea/ -> /eh/ /ax/, /ia/ -> /ih/ /ax/, /ua/ -> /uh/ /ax/ (although mostly ends up as /uh/).
cmu39
Commonly used minimal English phoneset. We don't actually use this or have a phoneset file for it. Doesn't distinguish stops from bursts (i.e. no /tcl/, just /t/) which saves a lot of phones (this is also why the British English sets are smaller).

Except as noted, each name above corresponds to a *.phset file in /u/drspeech/data/phonsets.

The same information is held in the more modern phi file which also defines the basic structure of the per-phone HMM models.


Previous: 3.12 What are the reference transcript data formats? - Next: 3.14 What are lattices?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:15 PDT 2009