Computational Psycholinguistics Research in Dan Jurafsky's Lab
Together with my students and colleagues, I have been investigating the
hypothesis that humans act as probabilistic reasoners when they
process natural language. We have been
building probabilistic and computational models of psycholinguistic
results on human language processing, including syntactic comprehension
(`sentence processing'), disambiguation, and lexical production
For example, we perform
psychological experiments and do corpus research to study the nature of
human knowledge of syntax: what kind of lexical and syntactic
structures do people store in their mental grammars? We are especially
interested in the probabilistic nature of these structures: the ways
that the probabilities of different senses of a word or the frequencies
of different syntactic structures affect human processing, and how
these probabilities may be computed from corpora or by experiments.
In the most recent work, we have started to ask questions about
the underlying causes of this probabilistic behavior.
Bayesian Models of Sentence Processing:
Srini Narayanan and I are very interested in what we have called the Bayesian
Model of sentence processing, which claims that human
sentence interpretation proceeds by computing probabilities
of different possible interpretation of ambiguous sentences,
and ranking them by their probabilities. See especially
Jurafsky (1996),
Narayanan and Jurafsky (1998), and
Narayanan and Jurafsky (2001) (Postscript)
(PDF). We are currently working on
a journal-length exposition of these ideas.
Probabilistic Verb Argument Structure:
Doug Roland, Susanne Gahl, Lise Menn, Srini Narayanan and I are interested in evidence for
the human use of "verb-argument probabilities".
See
Jurafsky (1996)
and
Narayanan and Jurafsky (1998)
for arguments that verb-argument probabilities play a role
in garden path sentences.
Roland and Jurafsky (2002)
argued that these probabilities must be kept at the level of the semantic
lemma. More recently, Susanne Gahl, Lise Menn and I and colleagues
have shown that aphasics also seem to make use of these verb-argument
probabilities. (Gahl et al in press)
Probabilities in Lexical Production:
We are interested in the way that probabilistic knowledge
plays a role in human lexical production. When they produce
words, humans seem to shorten words that have a higher probability.
We have used evidence about such word shortening to explore
what sorts of probabilities are being kept in the mental grammar
See for example Jurafsky, Bell, Gregory, and Raymond 2000)
and a number of related papers
which argue, based on evidence from phonological reduction,
that the human production grammar must store
probabilistic relations between words.
More recently, Michelle Gregory's dissertation studied
potential causes for this probabilistic reduction effect.
She showed that probabilistic reduction is partly due to speaker-specific
shortening of repetitive words, and partly due to speakers shortening
uniformative words that hearers can easily interpret (
Gregory 2001,
Gregory, Healy and Jurafsky (submitted)).