Corpora: Re: Unsupervised learning Andrew McCallum (mccallum@sandbox.jprc.com) Wed, 10 Mar 1999 12:15:17 -0500 * Messages sorted by: [ date ][ thread ][ subject ][ author ] * Next message: Yorick Wilks: "Corpora: THIRSD WORKSHOP ON HUMAN-MACHINE CONVERSATION" * Previous message: Carlos Martin Vide: "Corpora: Final CFP: Grammar Systems" From: Jose Maria Gomez Hidalgo Date: Wed, 10 Mar 1999 18:01:48 +0100 I would like to know about attempts to build classifiers through unsupervised learning, or to integrate other information sources in a supervised learning-based classifier. The only one I am aware of is the one by Yang and Chute [1]. Integrating supervised and unsupervised learning has been a focus of mine and several others at CMU and elsewhere. There was a NIPS workshop on the subject ("Integrating Supervised and Unsupervised Learning" http://www.cs.cmu.edu/~mccallum/supunsup). Here are some examples of supervised/unsupervised learning applied to text classification: "Learning to Classify Text from Labeled and Unlabeled Documents" Kamal Nigam, Andrew McCallum, Sebastian Thrun and Tom Mitchell. AAAI-98 http://www.cs.cmu.edu/~mccallum/papers/emcat-aaai98.ps.gz A longer version of the above, to appear in the Machine Learning Journal: http://www.cs.cmu.edu/~knigam/papers/emcat-mlj99.ps "Employing EM in Pool-Based Active Learning for Text Classification" Andrew McCallum and Kamal Nigam. Proc. of International Conference on Machine Learning (ICML-98) http://www.cs.cmu.edu/~mccallum/papers/emactive-icml98.ps.gz Shrinkage can also be seen as unsupervised learning, in that it uses EM to "cluster" words into different ancestors in the hierarchy. Here is a paper on using shrinkage in a hierarchy of classes to improve document classification: "Improving Text Classification by Shrinkage in a Hierarchy of Classes" Andrew McCallum, Ronald Rosenfeld, Tom Mitchell and Andrew Ng. ICML-98. http://www.cs.cmu.edu/~mccallum/papers/hier-icml98.ps.gz We also use unsupervised learning and unlabeled data to classify research papers into the 70-leaf topic hierarchy in Cora, a search engine over computer science research papers (www.cora.justresearch.com). A paper describing Cora is: "Building Domain-Specific Search Engines with Machine Learning Techniques". Andrew McCallum, Kamal Nigam, Jason Rennie and Kristie Seymore. AAAI-99 Spring Symposium. http://www.cs.cmu.edu/~mccallum/papers/cora-aaaiss99.ps.gz * Next message: Yorick Wilks: "Corpora: THIRSD WORKSHOP ON HUMAN-MACHINE CONVERSATION" * Previous message: Carlos Martin Vide: "Corpora: Final CFP: Grammar Systems"