Journals
1. Wordless Sounds: Robust Speaker Diarization using Privacy-Preserving Audio Representations
SHK Parthasarathi, H Bourlard and D Gatica-Perez
In IEEE Transactions on Audio, Speech, and Language Processing, 2012.
Abstract
This paper investigates robust privacy-sensitive audio features for speaker diarization in multiparty conversations: ie., a set of audio features having low linguistic information for speaker diarization in a single and multiple distant microphone scenarios. We systematically investigate Linear Prediction (LP) residual. (...). Next, we propose a supervised framework using deep neural architecture for deriving privacy-sensitive audio features...
[Read more - PDF]
2. Privacy-Sensitive Audio Features for Speech/Nonspeech Detection
SHK Parthasarathi, D Gatica-Perez, H Bourlard and M M.-Doss
In IEEE Transactions on Audio, Speech, and Language Processing, 19(8), 2011.
Abstract
The goal of this paper is to investigate features for speech/nonspeech detection (SND) having low linguistic information from the speech signal. Towards this, we present a comprehensive study of privacy-sensitive features for SND in multiparty conversations. Our study investigates three different approaches to privacy-sensitive features...
[Read more - PDF]
3. Robustness of Group Delay Representations for Noisy Speech Signals
SHK Parthasarathi, P Rajan, and H A Murthy
In IJST (Springer), 14(4), 2011.
Abstract
This paper demonstrates the robustness of group delay based features to additive noise. First, we analytically show the robustness of group delay based represen- tations. The analysis makes use of the fact that, for minimum-phase signals, the group delay function can be represented in terms of the cepstral coefficients of the log-magnitude spectrum. Such a representation results in the speech spectrum dominating over the noise spectrum, both at low and high SNRs...
[Read more - PDF]
Conferences
1. LP Residual Features for Robust, Privacy-Sensitive Speaker Diarization
SHK Parthasarathi, H Bourlard and D Gatica-Perez
Proceedings of Interspeech 2011. Florence, Italy. Aug 2011.
Abstract
We present a comprehensive study of linear prediction residual for speaker diarization on single and multiple distant microphone conditions in
privacy-sensitive settings, a requirement to analyze a wide range of spontaneous conversations. Two representations of the residual are compared, namely real-cepstrum and MFCC, with the latter performing better...
[Read more - PDF]
2. Evaluating the Robustness of Privacy-Sensitive Audio Features for Speech Detection in Personal Audio Log Scenarios
SHK Parthasarathi, M M.-Doss, H Bourlard and D Gatica-Perez
Proceedings of ICASSP 2010. Dallas, US. March 2010.
Abstract
Personal audio logs are often recorded in multiple environments. This poses challenges for robust front-end processing, including speech/nonspeech detection (SND). Motivated by this, we investigate the robustness of four different privacy-sensitive features for SND, namely energy, zero crossing rate, spectral flatness, and kurtosis...
[Read more - PDF]
3. Speaker Change Detection with Privacy-Preserving Audio Cues
SHK Parthasarathi, M M.-Doss, D Gatica-Perez and H Bourlard
Proceedings of ICMI-MLMI 2009. MIT Media Labs, Cambridge, US. Nov 2009.
Abstract
In this work we investigate a set of privacy-sensitive audio features for speaker change detection (SCD) in multiparty conversations. These features are based on three different principles: characterizing the excitation source information using linear prediction residual, characterizing subband spectral information shown to contain speaker information, and characterizing the general shape of the spectrum...
[Read more - PDF]
4. Investigating privacy-sensitive features for speech detection in multiparty conversations
SHK Parthasarathi, M M.-Doss, H Bourlard and D Gatica-Perez
Proceedings of Interspeech 2009. Brighton, UK. September 2009.
Abstract
We investigate four different privacy-sensitive features, namely energy, zero crossing rate, spectral flatness, and kurtosis, for speech detection in multiparty conversations. We liken this scenario to a meeting room and define our datasets and annotations accordingly. The temporal context of these features is modeled...
[Read more - PDF]
5. Robustness of Phase based Features for Speaker Recognition
P Rajan, SHK Parthasarathi, and H A Murthy
Proceedings of Interspeech 2009. Brighton, UK. September 2009.
Abstract
This work demonstrates the robustness of group-delay based features for speech processing. An analysis of group delay functions is presented which show that these features retain formant structure even in noise. Furthermore, a speaker verification task performed on the NIST 2003 database show lesser error rates..
[Read more - PDF]
6. Exploiting contextual information for speech/non-speech detection
SHK Parthasarathi, P Motlicek, and H Hermansky
Proceedings of TSD 2008, LNCS/LNAI series, Springer-Verlag. Brno, Czech Republic. September 2008.
Abstract
In this paper, we investigate the effect of temporal context for speech/non-speech detection (SND). It is shown that even a simple feature such as full-band energy, when employed with a large-enough context, shows promise for further investigation...
[Read more - PDF]
7. A Data-driven Approach to Speech/Non-speech Detection
SHK Parthasarathi, P Motlicek, and H Hermansky
Idiap Research institute research report. Martigny, Switzerland.
Abstract
We present a data-driven approach to weighting the temporal context of signal energy to be used in a simple speech/non-speech detector (SND). The optimal weights are obtained using linear discriminant analysis (LDA). Regularization is performed to handle numerical issues inherent to the usage of correlated features...
[Read more - PDF]
8. A Pattern Recognition Approach to VAD using modified group delay
P Rajan, SHK Parthasarathi, and H A Murthy
Proceedings of NCC 2008. IIT Bombay, India. January 2008.
Abstract
This paper explores the use of phase-based features (in particular, group delay) for voice activity detection (VAD). We establish via theoretical analysis the robustness of the group delay function in noise. Based on this, we extract group delay based features and pose the VAD problem as a two-class classification task...
[Read more - PDF]
9. Design and Development of a Text-To-Speech Synthesizer for Indian Languages
Y R Venugopalakrishna, SHK Parthasarathi, S Thomas, K Bommepally, K Jayanthi, H Raghavan, S Murarka, H A Murthy
Proceedings of NCC 2008 IIT Bombay, India. January 2008.
Abstract
This paper describes the design and implementation of a unit selection based text-to-speech synthesizer with syllables and polysyllables as units of concatenation. The choice of syllable as a unit for Indian languages is appropriate as Indian languages are syllable-centered. Although, syllable based synthesis does not require significant prosodic modification...
[Read more - PDF]
10. Voice Activity Detection using Group Delay Processing on Buffered Short-term Energy
SHK Parthasarathi, P Rajan, and H A Murthy
Proceedings of NCC 2007 IIT Kanpur, India. January 2007.
Abstract
In this paper, we present an algorithm for Voice Activity Detection (VAD) in speech signals using the minimum phase group delay function. The proposed method considers a buffer consisting of contiguous frames of the given signal and computes the short-term energy (STE) for that buffer. By appending a surrogate signal to STE and viewing the resultant signal as a positive part of the magnitude spectrum of an arbitrary signal...
[Read more - PDF]
11. Robust Voice Activity Detection using Group Delay Functions
SHK Parthasarathi, P Rajan, and H A Murthy
Proceedings of IEEE ICIT 2006 IIT Bombay, India. December 2006.
Abstract
In this paper, we present an algorithm for Voice Activity Detection (VAD) in speech signals with very low SNR. In the proposed algorithm, the short-term energy of the speech signal is viewed as the positive frequency part of the magnitude spectrum of a minimum phase signal. The group delay of this signal is then computed. The speech regions of the signal are characterized by well-defined peaks in the group delay spectrum...
[Read more - PDF]
