Probability in Computational Linguistics Seminar
                               (PiCLS)

PiCLS is an informal reading group seminar concerned with all aspects
of probabilistic modeling of language.  It features discussions of the
research literature, presentations of work-in-progress, and invited
talks.

PiCLS meets on Wednesdays, 11:15am, in ICSI Conference Room 6A.
All are welcome.  Send mail to picls-request@icsi.berkeley.edu to
receive announcements of upcoming readings and talks.
Hardcopies of readings are available in Room 560 at ICSI.


                 Schedule of past and upcoming sessions


5/18/93		A Practical Part-of-Speech Tagger
		by Doug Cutting et al. (Xeroc PARC)
		in Proc. 3rd Conf. Applied Natural Lang. (ACL), Trento, Italy.
	        *** compressed PostScript available as parc-targger.ps.Z

5/25/93		A Statistical Approach to Machine Translation
		by Peter Brown et al. (IBM Yorktown Heights)
		in Computational Linguistics, 16(2), 79-85, 1992.

6/1/93		Hidden Markov Model Induction by Bayesian Model Merging
		by A. Stolcke and S. Omohundro (ICSI)
		in C. L. Giles et al. (eds.), Advances in Neural Information
		Processing Systems 5, Morgan Kaufman, 1993.
	        *** compressed PostScript available as stolcke-nips5.ps.Z

6/8/93		Class-Based n-gram Models of Natural Language
		by P. Brown et al. (IBM Yorktown)
		in Computational Linguistics, 18(4), 467-479, 1992.
	
6/15/93		A comparison of the enhanced Good-Turing and deleted
		estimation methods for estimating probabilities of
		English bigrams
		by K. W. Church and W. A. Gale (Bell Labs)
		in Computer Speech and Language, 5, 19-54, 1991.

6/22/93		Inside-Outside Reestimation from Partially Bracketed Corpora
		by F. Pereira and Y. Schabes
		in ACL-92 Proceedings, 128-135.

6/29/93		Talk by Dekai Wu, HKUST, on English/Chinese modelling.
		Abstract:

		  We describe our experience with automatic alignment
		  of sentences in parallel English-Chinese text.  The
		  talk will touch on three related topics: (1) progress
		  on the HKUST English-Chinese Parallel Bilingual
		  Corpus; (2) experiments addressing the applicability
		  of Gale and Church's (1991) length-based statistical
		  method to the task of alignment involving a
		  non-Indo-European language; and (3) an improved
		  method that also incorporates domain-specific lexical
		  cues.

7/6/93		An Estimate of an Upper Bound for the Entropy of English
		by P. Brown et al. (IBM Yorktown)
		in Computational Linguistics, 18(1), 31-40, 1992.

7/13/93		An algorithm for Estimating the Parameters of Unrestricted
		Hidden Stochastic Context-Free Grammars
		by J. Kupiec (Xerox PARC)
		TR SSL-91-60, PARC System Sciences Lab, 1991.

7/20/93		A Bayesian-Network Approach to Lexical Disambiguation
		by Eizirik, Barbosa and Mendes (UFRJ, Brazil)
		in Cognitive Science 17, 257-283, 1993.

7/27/93		Talk by Julian Kupiec, Xeroc PARC, on
		"Hidden Markov Models for Linguistic Analysis". Abstract:

		  In addition  to their  applications to  spoken
		  language, hidden Markov models can also be  applied
		  usefully  to written  language.   Training
		  algorithms  that  account for their  `hidden' aspect
		  enable parameter estimation to be  done using
		  ordinary  unlabelled text.  In the talk I will
		  review  two  applications,  namely  part-of-speech
		  tagging  and context-free  parsing.   Their
		  practicality and  range  of application will be
		  discussed.

8/3/93		Yochai Koenig (ICSI/UCB) talks about acoustical modelling
		involving HMMs, experiments with time-indexed HMMs,
		and his research plans.

8/10/93		Marti Hearst (UCB) talks about on her thesis work.

		 Her work includes a computationally viable method for
		 discovering the subtopic structure of full-length
		 texts, as well as an algorithm for assigning main
		 topic and subtopic categories, from a fixed set of
		 categories, to the text.  Subtopic structure should be
		 useful for supporting new information retrieval
		 paradigms applicable to long texts (as opposed to
		 abstracts and short newswire articles).

8/17/93		Andreas Stolcke (UCB/ICSI) on Earley parsing with
		stochastic context-free grammars.

		 Using a mix of several old ideas by various people,
		 one can compute both substring probabilities and
		 prefix probabilities in an on-line manner while
		 Earley-parsing a string left-to-right, for SCFGs of
		 (almost) arbitrary format.  The algorithm closely
		 follows Earley's original non-probabilistic version,
		 and is thus efficient on special subclasses of
		 grammars.

		*** compressed PostScript available as stolcke-earley.ps.Z

8/24/93		Efficiency, Robustness and Accuracy in Picky Chart Parsing
		by David Magerman (Stanford) and Carl Weir (Paramax),
		in ACL-92 Proceedings, 40-47.

         *** New time for Fall 1993: Wednesdays, 11:15 am ***

9/8/93		On the Relationship between Complexity and Entropy
		for Markov Chains and Regular Languages
		by Wentian Li (Santa Fe Institute)
		Complex Systems 5, 381-399, 1991.

9/15/93		Bayesian Learning of Gaussian Mixture Densities for
		Hidden Markov Models
		by J.-L. Gauvain and C.-H. Lee (Bell Labs)
		DARPA Speech and Natural Language Workshop 1991,
		pp. 272-277.

9/22/93		Computation of Probabilities for an Island-Driven Parser
		by A. Corazza, R. De Mori, R. Gretter and G. Satta,
		IEEE PAMI 13(9), 936-950, 1991.

9/29/93		Jonathan Segal on closed-form computation of expected number
		of substring occurrences and n-gram probabilities from
		stochastic context-free grammars.

10/6/93		No meeting

10/13/93	Applying Probability Measure to Abstract Languages
		by T. L. Booth and R. A. Thompson
		IEEE Trans. Comp., Vol C-22 (5), 442-450, 1973.

10/20/93	Generalized Probabilistic LR Parsing of Natural (Corpora) with
		Unification-based Grammars
		by T. Briscoe and J. Carroll
		Computational Linguistics 19(1), 61-74, 1993.

(tentative)	Compression, Information Theory, and Grammars:
		A Unified Approach
		by A. Bookstein and S. T. Klein, ACM Transactions on
		Information Systems, 8(1), 27-49, Jan 1990.