Automatic Speech Recognition with an Adaptation Model Motivated by Auditory Processing

This web page accompanies the IEEE Transactions on Speech and Audio Processing paper by Marcus Holmberg, Werner Hemmert, and David Gelbart. It provides a more detailed performance breakdown than is available in the paper. In the future, we may also use this page for updates or other additional information. Please contact the authors if you have any questions.

Update (November 2008): In 2008, Sheng-Chiuan Chiou of National Sun Yat-Sen University re-ran our baseline MFCC Aurora 3 experiments. In the Danish well-matched condition, he obtained 78.6%, while in our paper we reported 92.1%. So far, we have not been able to determine the reason for this difference. His results were identical or nearly identical to ours for the remaining two Danish conditions, as well for all three conditions for German, Finnish and Spanish. (Out of those 11 cases, in eight cases the word accuracies were identical to ours to two decimal places, and in the remaining three cases there was less than 0.05% difference.)

To present Aurora 2 results, we have used the usual Aurora 2 spreadsheet format, and added plots of performance vs. SNR. The reference, against which relative performance is measured, corresponds to the MFCC baseline used in the paper, which was 13 cepstral coefficients including C0 calculated using the ETSI Distributed Speech Recognition reference source code, with delta and double-delta features calculated within HTK using two frames of past context and two frames of future context. Where we present relative performance results for Wiener filtered speech, the reference relative to which we measure performance also uses Wiener filtered speech. So, for example, the relative performance results for the front end using Wiener filtering and RASTA are calculated relative to the results for the front end using only Wiener filtering.

For the Aurora 3 results, we present the relative performance results of each method relative both to a plain MFCC baseline and to a front end using Wiener filtering. The REFERENCE MFCC page of the Aurora 3 results also present the relative performance of MFCCs using Wiener filtering compared to MFCCs without Wiener filtering.

First spreadsheet for Table 1 (Recognition scores for frame-level adaptation processing with FFT-based features, without Wiener filtering): HTML format and Microsoft Excel format.

Second spreadsheet for Table 1 (Recognition scores for frame-level adaptation processing with FFT-based features, with Wiener filtering): HTML format and Microsoft Excel format.

Spreadsheet for Table 2: HTML format and Microsoft Excel format.