Multi-Microphone Signal Processing for Speech Enhancement

This page provides source code for several blind multi-microphone speech enhancement techniques. These were implemented by Marc Ferras while pursuing his Masters thesis on multi-microphone signal processing for automatic speech recognition in meeting rooms. These techniques were evaluated on the ICSI Meeting Recorder Digits (MRD) Corpus, which can be downloaded here. The Beamformit beamforming toolkit may also be of interest.

This work was developed during Ferras' visit at ICSI in 2004-2005 and funded by the Augmented Multiparty Interaction (AMI) training program. (David Gelbart's involvement as an unofficial co-advisor was funded by the SmartWeb project.)


Delay-and-Sum

Delay-and-Sum (DS) operates by coherently averaging the multi-microphone speech signals. Coherence is achieved by focusing on an acoustic source, which is specified by the estimated Time-Differences of Arrival (TDOA). Incoherent averaging attenuates sources not focused by these TDOAs.

Concerning TDOA, time-delay was estimated between a specified reference mic signal and the other mics' signals. Non-weighted Cross-Correlation and PHAT-weighted Generalized Cross-Correlation (GCC-PHAT) were implemented.

  • NW-DS MATLAB source code (1/1)
  • PHAT-DS MATLAB source code (1/1)
  • Phase-Error Based Filtering

    Phase-error based filtering (PBF) performs time-frequency masking in the STFT domain. For each pair of input frames, their phase-error spectrum is computed and used to modulate the amplitude spectrum. High error yields lower masking values, and viceversa. This has the effect of reducing time-alignment mismatch for each frequency bin, which is supposed to be related to reverberation and noise.

    Please refer to:

  • [1] P. Aarabi and G. Shi, "Phase-based dual-microphone robust speech enhancement", IEEE Transactions on Systems, Man and Cybernetics, vol. 34, August 2004.
  • [2] C. Y. Lai and P. Aarabi, "Multiple-microphone time-varying filters for robust speech recognition", Proc. ICASSP, 2004.
  • for further reading on this technique.

    PBF was implemented using only PHAT-derived TDOA due to its high sensitivity to time-alignment.

  • PHAT-PBF MATLAB source code (Time-alignment, 1/1)
  • PHAT-PBF MATLAB source code (PBF, 1/2)
  • Correlation Shaping

    Correlation shaping (CS) aims to reshape the autocorrelation function of the input signal by means of linear filtering. For dereverberation purposes, linear prediction analysis is first performed to more effectively deal with reverberation. A delta function is set as the target autocorrelation function.

    Please refer to:

  • [3] B. W. Gillespie, "Strategies for improving audible quality and speech recognition accuracy of reverberant speech", Ph.D. thesis, University of Washington, 2002.
  • [4] B. Gillespie and L. Atlas, "Strategies for Improving Audible Quality and Speech Recognition Accuracy of Reverberant Speech", Proceedings of the 2003 IEEE ICASSP, 2003
  • for further reading on this technique.

    Multi-channel CS was implemented as an off-line adaptive technique, as opposed to the on-line adaptation described in [3] and [4]. PHAT-derived TDOAs were used for shaping filter initialization.

  • PHAT-CS MATLAB source code (Time-alignment, 1/1)
  • PHAT-CS MATLAB source code (Correlation Shaping, 1/2)
  • PHAT-CS C source code (Correlation Shaping Gradient, 1/3)
  • Please note that all files must be downloaded and csgrad.c must be compiled for your platform by running

         mcc csgrad.c

    in the MATLAB command line.