The Meeting Recorder project aims to develop technologies that would enable a palm-sized device to make a useful record of a meeting, through speech recognition, automatic segmentation and information retrieval technologies. Although the final outcome of such a project is still rather vague, we have begun to collect a corpus of meetings, with recordings made simultaneously on head-worn microphones (for optimal speech recognition) and on a mock-up of a 'speechcorder' PDA equipped with two low-cost microphones, as seen in the picture. We will produce transcriptions of these recordings, and hope ultimately to have tens of hours of real meetings with sample-aligned recordings from close-talking mics, from high-quality PZM ambient mics, and from the dummy PDA, along with speaker- and time-tagged word-level transcripts. We can use this corpus to develop speech recognition algorithms better able to cope with the noise present in non-head-worn microphones.
One of the novel technical problems posed by the prospect of automatic speech recognition within such a device is the issue of tracking the contributions of each individual speaker. Although a considerable amount of research has been done into speaker change detection, this has largely been applied to mono signals recorded from news broadcasts. Although meetings pose a much harder problem owing to the informal, overlapping speaking style (and the added noise picked up by the microphones), we do have the potential advantage of using two (or perhaps more) microphones to recover spatial cues to speaker identity.
The pages in this directory detail some of the investigations I have been doing with this data, mostly concerned with isolating the different sources, as well as some other information. Specifically, there are pages on:
More pages will be added as the project progresses.
Back to ICSI Meeting Recorder homepage - DAn's homepage - ICSI Realization group homepage