Date: Mon, 13 Nov 2006
From: David Gelbart
To: Tony Ezzat, Jake Bouvrie, Ken Schutte

Compared to conventional MFCC / PLP feature extraction, I see
Michael's Gabor filter approach as (1) a way to capture
spectrotemporal modulations and (2) a way for feature extraction to
involve longer time spans.  The work at ICSI did not continue past
2003 and I did not reach any final conclusions [about the use of Gabor
filters in the recognition of conversational speech].

[...]

As far as where the work at ICSI ended up...

In the Kleinschmidt and Gelbart ICSLP 02 paper, Gabor filters were successfully introduced
into a version of the Qualcomm-ICSI-OGI Aurora front end that did not include TRAPS features.
The final version of the QIO front end (Adami et al ICSLP 02 paper) boosted performance by
including TRAPS features.  It would have been a natural next step to try to further improve
performance by adding Gabor filters to that, or to perform some controlled comparisons between
QIO with TRAPS and QIO with Gabor filters. (Comparing the ICSLP 02 results is somewhat unfair
because, as I recall, the TRAPS implementation in the Adami et al paper was constrained to be
nearly causal by the rules of the ETSI Aurora standards committee, and we did not apply to the
same rules to the Gabor filter implementation.)  I decided not to pursue that, not for
scientific reasons but because of changes in our funding mix and limits on my available time
and energy.

By the way, if I recall correctly the Aurora 2 data was used as development data for the
Kleinschmidt and Gelbart ICSLP 02 paper, in that a number of different Gabor filter sets were
tried on Aurora 2 and only the best performing sets were used in the paper.  As I recall, on
Aurora 3 we only tried the filter sets mentioned in the paper.  Thus the Aurora 3 results are
better measurements of generalization performance than the Aurora 2 results.

The web page you found also mentions an ICSI experiment with conversational speech, and that
the experiment was not successful but "we were using a Gabor filter set which had not been
optimized for conversational telephone speech or for use together with PLP and HATS."  For
optimization for use with PLP and HATS, we could have found various Gabor filter sets using
FFNN and then measured the performance of all of them in a full speech recognition system also
using PLP and HATS, and chosen the best set.  Another possibility would have been to try and
integrate PLP and HATS into the FFNN process, e.g., by placing PLP and HATS features
permanently in part of the feature vector used by FFNN and allowing other components of the
feature vector to be chosen by the usual FFNN Gabor filter selection process.  I did not
pursue any of this, again not for scientific reasons.

Regards,
David


Back to Gabor page.