BeRP is a medium-vocabulary, speaker-independent spontaneous continuous speech understanding system currently under development at ICSI. BeRP functions as a knowledge consultant whose domain is the restaurants in the city of Berkeley. The system serves as a testbed for several research projects, including robust feature extraction, connectionist phonetic likelihood estimation, automatic induction of multiple pronunciation lexicons, foreign accent detection and modeling, advanced language models, and lip-reading.
The BeRP system operates as a mixed-initiative query system. The system prompts users for information in order to fill the database query slots, but the user is not required to respond to the particular question asked. When most of the query slots (such as restaurant type, day of week, meal wanted, or expense) are filled, the system provides the user with a list of possible restaurants. The user can then ask for more information about the restaurant.
The BeRP recognizer consists of six components: the RASTA-PLP feature extractor, a multi-layer perceptron phonetic likelihood estimator, a viterbi decoder called Y0, an HMM pronunciation lexicon, a bigram or stochastic context free grammar language model, and the natural language backend, which includes a database of restaurants. The system runs on a SPARCstation, although for speed we usually offload the phonetic likelihood estimation to special purpose neural network hardware.
The BeRP corpus, collected at ICSI, currently contains approximately 7500 sentences, with 1500 words, comprising 6.4 hours of speech. The current system trained on 4786 sentences and tested on 364 sentences achieves a 21% word error.