Computational Psycholinguistics Research in Dan Jurafsky's Lab

Together with my students and colleagues, I have been investigating the hypothesis that humans act as probabilistic reasoners when they process natural language. We have been building probabilistic and computational models of psycholinguistic results on human language processing, including syntactic comprehension (`sentence processing'), disambiguation, and lexical production For example, we perform psychological experiments and do corpus research to study the nature of human knowledge of syntax: what kind of lexical and syntactic structures do people store in their mental grammars? We are especially interested in the probabilistic nature of these structures: the ways that the probabilities of different senses of a word or the frequencies of different syntactic structures affect human processing, and how these probabilities may be computed from corpora or by experiments. In the most recent work, we have started to ask questions about the underlying causes of this probabilistic behavior.

Bayesian Models of Sentence Processing: Srini Narayanan and I are very interested in what we have called the Bayesian Model of sentence processing, which claims that human sentence interpretation proceeds by computing probabilities of different possible interpretation of ambiguous sentences, and ranking them by their probabilities. See especially Jurafsky (1996), Narayanan and Jurafsky (1998), and Narayanan and Jurafsky (2001) (Postscript) (PDF). We are currently working on a journal-length exposition of these ideas.

Probabilistic Verb Argument Structure: Doug Roland, Susanne Gahl, Lise Menn, Srini Narayanan and I are interested in evidence for the human use of "verb-argument probabilities". See Jurafsky (1996) and Narayanan and Jurafsky (1998) for arguments that verb-argument probabilities play a role in garden path sentences. Roland and Jurafsky (2002) argued that these probabilities must be kept at the level of the semantic lemma. More recently, Susanne Gahl, Lise Menn and I and colleagues have shown that aphasics also seem to make use of these verb-argument probabilities. (Gahl et al in press)

Probabilities in Lexical Production: We are interested in the way that probabilistic knowledge plays a role in human lexical production. When they produce words, humans seem to shorten words that have a higher probability. We have used evidence about such word shortening to explore what sorts of probabilities are being kept in the mental grammar See for example Jurafsky, Bell, Gregory, and Raymond 2000) and a number of related papers which argue, based on evidence from phonological reduction, that the human production grammar must store probabilistic relations between words. More recently, Michelle Gregory's dissertation studied potential causes for this probabilistic reduction effect. She showed that probabilistic reduction is partly due to speaker-specific shortening of repetitive words, and partly due to speakers shortening uniformative words that hearers can easily interpret ( Gregory 2001, Gregory, Healy and Jurafsky (submitted)).