Speech encoding in a model of peripheral auditory processing: Quantitative assessment by means of automatic speech recognition

This web page accompanies the Speech Communication paper by Marcus Holmberg, David Gelbart, and Werner Hemmert.

We may update this page in the future to publish references to related work published after our paper or other updates of interest to our readers.

After work on this paper began, we experimented with acoustic model size and found that we could obtain better performance if we increased the acoustic model size to six states per word (one state for the pause model) and eight diagonal-covariance Gaussians per state. Further increases gave only minor improvements. For the sake of consistency with earlier experiments, we used a smaller size in this paper, but we used the new size in the INTERSPEECH 2008 submission by Wang et al.

For more detailed information about our Noisy ISOLET corpus, as well as the tools needed to reproduce the corpus and a downloadable copy of the HTK configuration and scripts that we used for automatic speech recognition, click here.

For the SPRACHcore software package, which contains the pfile_gaussian tool which we used for feature gaussianization, click here.

For a web page giving additional results related to the ICASSP 2000 paper by Sharma, Ellis et al., "Feature extraction using non-linear transformation for robust speech recognition on the Aurora database", which we cite in our bibliography, click here.