#RSS [zawodny-headshot.jpg] Peter Norvig (Google), High Order Bit at Web 2.0 Some random bits scribbled by Jeremy Zawodny Statistical machine translation. Looking at text in one language and using the information in another. You need to grok syntax and semantics of both, a big dictionary, etc. Google has access to lots of CPU and lots of text, so they took a statistical approach using world pairs, phrases, etc. Example of a news story translated from Arabic to English. Named entity extraction (people, companies, products, etc). Lots of relationships to find in the text they've got. They started with simple patterns in "easy" sentences. If text such "such as" they're using it. It helps them extract facts like "HP is a computer manufacturer." Word clusters is next. They build a bayesian network of words and word clusters. On-line demo time. Interactive use of word clusters. Using "george bush" and "john kerry". Amusing results. "That's what the web says." See Also: My Web 2.0 post archive for coverage of all the other sessions I attended. Posted by jzawodn at October 07, 2004 11:13 AM | edit Spread the word: Find related stories via Technorati related | del.icio.us bookmark it! | submit Peter%20Norvig%20%28Google%29%2C%20High%20Order%20Bit%20at%20Web%202.0 digg.com digg it! | reddit reddit! Reader Comments # Ole Aamot said: Efficient clustering of words have already been done. McCallum, Andrew Kachites. "Bow: A toolkit for statistical language modeling, text retrieval, classification and clustering." http://www.cs.cmu.edu/~mccallum/bow. 1996. on May 18, 2005 04:33 AM Post a comment Your Name __________________________________________________ Your Email Address (won't be displayed on the site) __________________________________________________ Your Weblog URL (no weblog? leave it blank) __________________________________________________ Type "Jeremy" below (required) __________________________________________________ Comment here. Stay on topic (policy). No HTML tags, sorry. ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ ____________________________________________________________ [_] Remember info? POST Disclaimer: The opinions expressed here are mine and mine alone. My current or past employers are not responsible for what I write here, the comments left by others, or the photos I may share. If you have questions, please contact me. Also, I am not a journalist. Privacy: I do not share or publish the email addresses or IP addresses of anyone posting a comment here without consent. However, I do reserve the right to remove comments that are spammy, off-topic, or otherwise unsuitable based on my comment policy. In a few cases, I may leave spammy comments but remove any URLs they contain. up blog home page previous The Telephone is a Platform, a Panel at Web 2.0 next Rick (Microsoft Labs), High Order Bit on Web 2.0 recent blog entries * Amazing 747SP Retirement Landing * Dear Everyone * A Visit to the Western Aerospace Museum in Oakland, California * Yahoo Messenger Head Games * Help Paul win a scholarship. Your Vote Counts. * What's your stripper name? recent links * Unisys Weather: NAM Model Forecasts "This is a set of contour plots using data from the NAM model." * what i mean when i say "email is dead" in reference to teens "Email is not gone but it is dead in the sense that it is no longer a site of deep emotional passion. People still have accounts, just like they still have mailboxes. But their place for sociable communication is elsewhere." * Why's (Poignant) Guide to Ruby one of the most amusing pogramming books you'll ever read * Web AIM "Web AIM provides standards-based APIs to access the Buddy List feature, the sending and receiving of IMs, rich presence information, and more. WeâEUR(TM)ve even provided a few widgets that use the Web AIM APIs to help you get started." * firefox, rss, xsl - from anger to apathy "Brilliant: I'll just add that to the body of obscure CSS hacks and bug workarounds that'll someday make me give up web development for raising sheep. Thanks a million, guys." * Blogging Faux Pas "Jeremy Zawodny could change his background to migraine-inducing black, change the logo to some hideous dripping blood goth design, make the text lime green with hot pink links..." I am SO doing that! :-) * White House Caught Doctoring Mission Accomplished Video "how the Bush Administration is attempting to alter history" * Pronto condoms - the best way to get it on "The PRONTO condom can be applied in a few seconds." Is "applied" really the best word for that? * My expanded role at Yahoo! "as of today I am officially a Yahoo! Mail Beta evangelist" nice! * Join us for the Movable Type Hack-a-thon "join us for the first ever, Global Movable Type Hack-a-thon on Tuesday, November 14th" more links [xml.gif] sponsors My blog is brought to you by the folks at iphpBB, providing anyone with a free phpBB forum. (It helps to read German!)