ICSI Speech FAQ:
5.1 What are features? What are their desirable properties?

Answer by: dpwe - 2000-05-26

The statistical pattern recognition approach to speech recognition is based on the idea that, even though no two utterances of the same words are exactly alike, they share distinctive characteristics that may be learned by a suitably sophisticated pattern classification system. The simplest form of this is some kind of partitioning of a multidimensional space into regions corresponding to each possible token - for instance, the phone units identified by linguists.

Speech is represented in its most basic form as waveform data. However, it turns out that trying to build a classifier that can classify this representation directly is pretty difficult. Alternative representations, derived from the waveform data by a more-or-less simple sequence of signal processing operations, can give the pattern recognition a much easier task, leading to a more successful system. These specialized representations are known as features, and there is considerable research into the most successful form for them, which depends both on the nature of speech, any corruption that might attach to the speech, and the characteristics of the statistical classifier to be used.

Given this role, as the representation of the basic waveform data in a space that makes statistical classification easier, we can list some desirable properties for feature sets:

Previous: 4.4 How can I make acoustic measurements? - Next: 5.2 What features are commonly used?
Back to ICSI Speech FAQ index

Generated by build-faq-index on Tue Mar 24 16:18:15 PDT 2009