[cssmall.gif] Find: ________________________________ Documents Citations Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching Andrew McCallum WhizBang! Labs - Research 4616 Henry Street Pittsburgh, PA... [home.gif] Home [more.gif] More [context.gif] Context [document.gif] Document [related.gif] Related [tellfriend.gif] Tell A Friend [edit.gif] Update View or download from Source: [lbracket.gif] [document.gif] http://www.kamalnigam.com/papers/canopy-kdd00.pdf Different Cached copies: [pdf.gif] PDF [compressed.gif] PS.gz [ps.gif] PS [image.gif] PNG Image [compare.gif] Compare [html.gif] HTML [new.gif] Other links: Update Update Cache [help.gif] Help [edit.gif] Enter Author Homepages Abstract: [edit.gif] Update Many important problems involve clustering large datasets. Although naive implementations of clustering are computationally expensive, there are established efficient techniques for clustering when the dataset has either (1) a limited number of clusters, (2) a low feature dimensionality, or (3) a small number of data points. However, there has been much less work on methods of efficiently clustering datasets that are large in all three ways at once--for example, having millions of data points that exist in many thousands of dimensions representing many thousands of clusters. We present a new technique for clustering these large, highdimensional datasets. The key idea involves using a cheap, approximate distance measure to efficiently... Similar documents based on text: [more.gif] More [all.gif] All 0.3: A Hybrid Architecture for USAR Robot Development and .. - Oishi, Gennari.. ( Update Update) 0.3: Curriculum Vita Tai Gyu Kim Graduate School of.. - Education Carnegie.. ( Update Update) 0.3: REPORT DOCUMENTATION PAGE Form Approved OMB No. 0704-0188 - Public Reporting Burden ( Update Update) 0.2: In Proceedings of the 17th Annual Conference of the.. - The Interaction Of ( Update Update) 0.2: Pittsburgh, PA 15213-3890 - Cmu Sei- Tr- ( Update Update) BibTeX entry: [edit.gif] Update @misc{ whizbang-efficient, author = "Andrew Mccallum Whizbang", title = "Efficient Clustering of High-Dimensional Data Sets with Application to Reference Matching" } Citations (may not include all citations): 1 On entropy maximization principle (context) - Akaike Rating Window [rate.gif] Rate this article: [r1.gif] [r2.gif] [r3.gif] [r4.gif] [r5.gif] (best) Comment Window Short Comment: _______________ _______________ _______________ _______________ Submit Comment Comment More on this article [more.gif] More about SMEALSearch [submit.gif] Submit documents [feedback.gif] Feedback SMEALSearch eBusiness Research Center (eBRC) | SMEAL College of Business | The Pennsylvania State University SMEALSearch.org | People | Terms of Service | Privacy Policy © 2000-2005 eBRC