The THISL project is a research collaboration funded by the European Union between ICSI and several European labs including Sheffield University (the managing partner) and BBC R&D. The goal of the project is to investigate providing speech access to a large speech database, specifically an archive of BBC news broadcasts. The idea is to provide journalists with rapid access to large audio databases for which no transcriptions exist - automatic speech recognition (ASR) is used to generate less-than-perfect transcriptions, and information retrieval (IR) is used to find the best-matching documents for a given query. Spoken queries are themselves recognized, and the nature of the query analyzed via natural language processing (NLP). The overall block-diagram of the archive and query system is shown below:
ICSI's role in this project has included basic speech recognition of broadcast news, work on nonspeech identification, language modelling, and developing a stand-alone, speech-input demonstrator, running as a front-end to the IR engine developed by Sheffield. The ICSI THISL GUI, developed in Tcl/Tk, is shown in the screenshot below:
The panel in the top left controls live spech input, as well as showing the level being received from the microphone. When an utterance is detected, it is passed to the speech recognition, which returns a best hypothesis, shown in the "Recog:" field of the panel below, as well as a lattice of possible word hypotheses. The Natural Language Processing module, written by Thomson-CSF, attempts to find some standard question forms in the lattice, and tags keywords if it can find them. The query found by the NLP is shown in the "Parsed:" field as well as in the parse tree, and the keywords are drawn in red in the parse tree.
These are passed on the the righthand portion of the display, which consitutes the information retrieval part. The query at the top (which is either typed in or supplied by the speech recognition) is filtered to remove common words (with a "stop list"), then handed to the IR engine which returns a list of documents containing some or all of the terms. These 'documents' are actually the transcriptions of segments of a broadcast news audio database. Summaries of these 'hits' are shown in the second panel: Clicking on one of these lines displays the full transcript in the lower panel. The user can then click anywhere in that transcript to start playback of the original audio from the point that was clicked.
The demo is installed locally at ICSI and can be run from Linux and Solaris workstations by the command ThislGui .
On to: The THISL project homepage at Sheffield University.
Back to: ICSI - Realization group - Projects