In order for a speech recognition system to work, it must be trained on transcribed audio in the domain in which it will be used. To form a baseline and to control for factors such as noise, reverberation, multiple speakers at a time, and so on, our test subjects during corpus collection will wear head-mounted microphones. We will simultaneously record the meeting using desktop omni-directional mics and PDA mounted mics. This will allow us to generate very good acoustic data for training, in addition to the degraded acoustics similar to the eventual target application.
We are in the process of collecting the first such corpus. The domain consists of meetings at the International Computer Science Institute (ICSI), including speech recognition and natural language processing meetings. We are instrumenting a meeting room at ICSI with the following equipment:
Initially, we plan to record and hand-transcribe 40 or so hours of speech recognition and natural language processing meetings. Later, we plan to record meetings in other domains, such as hardware design.
For more details on the Meeting Recorder audio hardware, see http://www.icsi.berkeley.edu/~dpwe/research/mtgrcdr/setup.html.
For more details on the Meeting Recorder audio software, see http://www.icsi.berkeley.edu/~dpwe/research/mtgrcdr/rcd-sw.html.
For an example transcript including audio (converted into crude HTML), see http://www.icsi.berkeley.edu/~janin/mix0123.html.
[ Home Page | Other Issues ]