How do we recognize the speech of those familiar to us? We might
call on any of a huge web of inter-related features, including the
basic "sound" of a speaker's voice in terms of timbre, pitch, and
acoustic spectrum, but we might also be aided by a characteristic
accent or a distinctive laugh or turn of phrase. Depending on the
context and the acoustic conditions, certain features may be
particularly valuable -- recognizing the melody of a speaker's voice
even if the words can't be distinguished, or the choice and
articulation of an opening greeting when picking up the telephone, or
even the language from a written transcription of a meeting when the
acoustics are not available. Clearly we as humans draw on a
number of different types of information at a number of different
levels, thus providing us with a singularly robust and adaptive
mechanism for identifying the speakers we know.
Yet, most automatic speaker recognition systems today rely entirely on
low-level acoustic features, extracted from the speech signal every
10-20 milliseconds and encoded in a series of frames generally modeled
as independent events without recourse to temporal evolution (beyond,
potentially, simple local difference parameters). The goal of
this project is to explore higher level features (prosodic patterns,
pronunciation preferences, word usage, speaker idiosyncrasies, etc.) to
aid in recognizing and distinguishing between speakers.
The research plan consists of two main "feature discovery"
tracks: one focused on the exploration of features motivated by
existing linguistic constructs and expert-guided feature extraction,
the other on the purely data-driven discovery of characteristic speaker
"performances" as sequences in spectro-temporal space, independent of
such linguistic constructs. We believe that this powerful pairing
between low-risk mining of expert-guided features and
highly-exploratory, higher risk data-driven feature discovery provides
the best framework for this research, providing a natural contrast for
assessing progress and an opportunity to combine two very different
families of features to enable improved overall performance.
For local project pages click here
(you must be on an ICSI machine or know the password).