ICSI Speech FAQ:
9.5 What is the format of Noway acoustic scores?

Answer by: gelbart - 2001-05-04


What is the format of Noway acoustic scores?

The word acoustic scores output by noway (for example in lattices) puzzled me at first. Notice from the lattice fragment below that the acoustic score (in the a= field) can be both positive and negative, which is not normal behavior for a probability or a log probability.


J=0 S=0 E=1 W=[uh] v=0 a=-4.43322 r=-0.00427245
J=2 S=1 E=3 W=<SIL> v=0 a=1.74243 r=0

I received clarification on this from Eric Fosler-Lussier:

It actually is log10(acoustic score).  Just for the record:

J=link #
S=start node
E=end node
W=word
v=pronunciation version
a=acoustic score (log10)
r=pronunciation score (log10)

The pronunciation probability is always between 0 and 1, so you'll
always get a negative score in this field.

However, the acoustic probability can be greater than 1 -- this is
because it's not really a probability, but a scaled likelihood
(instead of the likelihood P(X|Q).  That is, we're accumulating the
following:

 n  P(q_i | x_i)
sum ------------
i=i P(q_i)

where x_1..x_n are the acoustic vectors, and {q_1..q_n} is the (best)
state path of the word hypothesis.  The upshot is because we don't
factor in the probability of the acoustics (P(X)) you'll get things
that can be greater than one (particularly when a word has an unlikely
phone in it).
and also from Steve Renals, the principal author of noway:
The short answer is that acoustic scores are product of scaled
likelihoods and transition probabilities (with some fudge factors).
Scaled likelihoods are not probabilities and can be greater than 1,
hence the values you see.

The details:

If PhLP is the log scaled likelihood of state(phone) q at time t:

  PhLP(q,t) = acoustic_scale*log10(out(q,t) - log10(prior(q))

and TrLP is the log transition probability between states 

if(q2==EXIT-STATE && q1!=ENTRY-STATE)
 TrLP(q1, q2) = log10(tprob(q1,q2)*phone_deletion_penalty*duration_scale)
else
 TrLP(q1, q2) = log10(tprob(q1,q2)*duration_scale)

then if a word goes from time t to time t+n, with a the state(phone)
path given by q(t ... t+n), then WdLP is the log acoustic probability
of that word, which is written into lattices:

  WdLP = \sum_{i=t}^{i=t+n} [PhLP(q(t), t) + TrLP(q(t),q(t+1))]

This is slightly complicated by entry and exit states, but since
  TrLP(ENTRY-STATE,FIRST-STATE) = 0
  TrLP(LAST-STATE,EXIT-STATE) = x  (typically x = log(0.5))
  TrLP(EXIT-STATE(q), ENTRY-STATE(q')) = 0
we have the "right number" of transitions.
(ENTRY and EXIT are null states, FIRST and LAST are real states).



Previous: 9.4 How do I balance insertions and deletions? - Next: 10.1 How do I prepare and print a poster?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:17 PDT 2009