Date: Mon, 13 Nov 2006 From: David Gelbart To: Tony Ezzat, Jake Bouvrie, Ken Schutte Compared to conventional MFCC / PLP feature extraction, I see Michael's Gabor filter approach as (1) a way to capture spectrotemporal modulations and (2) a way for feature extraction to involve longer time spans. The work at ICSI did not continue past 2003 and I did not reach any final conclusions [about the use of Gabor filters in the recognition of conversational speech]. [...] As far as where the work at ICSI ended up... In the Kleinschmidt and Gelbart ICSLP 02 paper, Gabor filters were successfully introduced into a version of the Qualcomm-ICSI-OGI Aurora front end that did not include TRAPS features. The final version of the QIO front end (Adami et al ICSLP 02 paper) boosted performance by including TRAPS features. It would have been a natural next step to try to further improve performance by adding Gabor filters to that, or to perform some controlled comparisons between QIO with TRAPS and QIO with Gabor filters. (Comparing the ICSLP 02 results is somewhat unfair because, as I recall, the TRAPS implementation in the Adami et al paper was constrained to be nearly causal by the rules of the ETSI Aurora standards committee, and we did not apply to the same rules to the Gabor filter implementation.) I decided not to pursue that, not for scientific reasons but because of changes in our funding mix and limits on my available time and energy. By the way, if I recall correctly the Aurora 2 data was used as development data for the Kleinschmidt and Gelbart ICSLP 02 paper, in that a number of different Gabor filter sets were tried on Aurora 2 and only the best performing sets were used in the paper. As I recall, on Aurora 3 we only tried the filter sets mentioned in the paper. Thus the Aurora 3 results are better measurements of generalization performance than the Aurora 2 results. The web page you found also mentions an ICSI experiment with conversational speech, and that the experiment was not successful but "we were using a Gabor filter set which had not been optimized for conversational telephone speech or for use together with PLP and HATS." For optimization for use with PLP and HATS, we could have found various Gabor filter sets using FFNN and then measured the performance of all of them in a full speech recognition system also using PLP and HATS, and chosen the best set. Another possibility would have been to try and integrate PLP and HATS into the FFNN process, e.g., by placing PLP and HATS features permanently in part of the feature vector used by FFNN and allowing other components of the feature vector to be chosen by the usual FFNN Gabor filter selection process. I did not pursue any of this, again not for scientific reasons. Regards, David
Back to Gabor page.