![]() |
Information
Retrieval & Extraction (a subtopic of Applications) |
|
|||||||
|
How in the world can anyone find just the right bit of information that they need, out of the available ocean of information, an ocean that continues to expand at an astonishing rate? Our accustomed systems of retrieving particular bits of information no longer fill the needs of many people. Searching traditional indexes of print publications has been aided by computerized databases, but still usually requires time-consuming serial searching of one database after the other, and then moving on to other methods of searching for internet sources. And what if the information being sought is a sound byte? A video clip? Yesterday's e-mail exchange between respected scientists? Artificial intelligence may hold the key to information retrieval in an age where widely different formats contain the information being sought, and the universe of knowledge is simply too big and growing too rapidly for successful searching to proceed at a human's slow speed. Spying an intelligent search engine - Innovation in Web search using artificial intelligence may lead to the day when you could expect the Web to do the tedious tasks for you. By Stefanie Olsen. CNET News.com (August 18, 2006). "Search is like oxygen for many people now, and considering Google's breakthroughs in Web document analysis, supercomputing and Internet advertising, it can be easy to think this is as good as it gets. But some entrepreneurs in artificial intelligence (AI) say that Google is not the end of history. Rather, its techniques are a baseline of where we're headed next. For example, one day people will be able to search for the plot of a novel, or list all the politicians who said something negative about the environment in the last five years, or find out where to buy an umbrella just spotted on the street. Techniques in AI such as natural language, object recognition and statistical machine learning will begin to stoke the imagination of Web searchers once again. 'This is the beginning for the Web being at work for you in a smart way, and taking on the tedious tasks for you,' said Alain Rappaport, CEO and founder of Medstory, a search engine for medical information that went into public beta in July. 'The Web and the amount of information is growing at such a pace that it's an imperative to build an intelligent system that leverages knowledge and exploits it efficiently for people,' he added. ... Rappaport said one of the more recent progressions in AI has been in moving from relying on humans to catalog connections between various data to programming computers to do the work, or what he calls the automation of knowledge structure. Tom Mitchell, chair of machine learning at Carnegie Mellon University calls it machine learning for statistical language processing, or learning algorithms that allow computers to read text. ... Technologies like speech recognition will fuel advances. ... The field of AI called computer vision, which encompasses facial detection and recognition, is coming of age for several reasons." AI Knows It’s Out There. Red Herring (August 22, 2005 print issue). "Intelligent Search ... More upscale, with costs in the hundreds of thousands of dollars, are the intelligent search systems sold by InQuira of San Bruno, California. The systems are based on natural language processing, a branch of AI that enables the system to comprehend what a person is really asking, at least if the question is posed in standard English. 'Pointing customers at documents does not approach the productivity of being able to understand a request and pull the right paragraph up to their screen,' says Bob Macdonald, chief marketing officer at InQuira."
Information Service Agent Research. "The Information Service Agent Lab at Simon Fraser University develops novel techniques for interactive information gathering and integration. The research applies artificial intelligence planning and learning techniques and database technologies to create knowledge bases from large collections of dynamically changing, potentially inconsistent and heterogeneous data sources, permitting users access to information at the right abstraction level." Projects. Software Agents Group, MIT Media Lab. Wide-ranging approaches to information retrieval that include user profiling, information filtering, privacy, recommender systems, communityware, negotiation mechanisms and coordination. Q&A: Yahoo’s Ron Brachman - Yahoo hires the former director of one of DARPA's most important units to expand its research team. Red Herring (December 15, 2005). "Many, especially in the academic community, look to artificial intelligence as a key factor in the evolution of search. Mr. Brachman shared his views on this subject and others with Red Herring. Q: People in the academic search community often say companies don’t use AI enough to improve search. Do you agree? A: Companies like Yahoo are already using AI technologies. They don’t make a public fuss about it. For example, with expert systems, such as those which can help in data mining … or in search, aspects of AI matter. Q: Could you give us an example of where AI could improve search? ..." Inside Google - From the Labs, Google Labs [audio]. Presentation by Peter Norvig at the 2005 O'Reilly Emerging Technology Conference. Available from IT Conversations. "Google has expanded from searching webpages to searching videos, books, places and even files on your own desktop. This expansion is made possible though Google's understanding and classification of information, facilitated by the application of algorithms in the domains of Machine Learning, Natural Language Processing and Artificial Intelligence. ... Peter Norvig is the Director of Search Quality at Google Inc. He is a Fellow and Councilor of the American Association for Artificial Intelligence and co-author of Artificial Intelligence: A Modern Approach, the leading textbook in the field."
Information Agents Group at the Information Sciences Institute, University of Southern California. CIRES - Content Based Image REtrieval System developed by Qasim Iqbal at the Computer and Vision Research Center (CVRC) in the Department of Electrical and Computer Engineering at The University of Texas at Austin. " CIRES is a robust content-based image retrieval system based upon a combination of higher-level and lower-level vision principles. Higher-level analysis uses perceptual organization, inference and grouping principles to extract semantic information describing the structural content of an image. Lower-level analysis employs a channel energy model to describe image texture, and utilizes color histogram techniques. ... The system is able to serve queries ranging from scenes of purely natural objects such as vegetation, trees, sky, etc. to images containing conspicuous structural objects such as buildings, towers, bridges, etc." Be sure to check out the sample queries. CIIR. The Center for Intelligent Information Retrieval at UMass. "The scope of the CIIR's work is broad and goes significantly beyond traditional areas of information retrieval such as search strategies and information filtering. The research includes both low-level systems issues such as the design of protocols and architectures for distributed search, as well as more human-centered topics such as user interface design, visualization and data mining with text, and multimedia retrieval."
Seeking Better Web Searches - Deluged with superfluous responses to online queries, users will soon benefit from improved search engines that deliver customized results. By Javed Mostafa. Scientific American (February 2005). "New search engines are improving the quality of results by delving deeper into the storehouse of materials available online, by sorting and presenting those results better, and by tracking your long-term interests so that they can refine their handling of new information requests. In the future, search engines will broaden content horizons as well, doing more than simply processing keyword queries typed into a text box. They will be able to automatically take into account your location--letting your wireless PDA, for instance, pinpoint the nearest restaurant when you are traveling. New systems will also find just the right picture faster by matching your sketches to similar shapes. They will even be able to name that half-remembered tune if you hum a few bars." Smart Search. By David Pacchioli. Research|PennState (May 2003; Volume 24, Issue 2). "[Lee] Giles, the David Reese professor of information sciences and technology at Penn State, has devoted his career to finding better ways to get at information, to wring the most out of it, to marshal it efficiently. His background is in artificial intelligence, a field for which the processing of oceans of information is practically raison d'etre. ... The ultimate goal, Giles says, is to create search engines that incorporate artificial intelligence. ... A prime example [of a niche search engine] is CiteSeer, a tool that Giles and Steve Lawrence created for the field of computer and information science. CiteSeer crawls the growing body of computer-science literature available on the Web and ignores everything else. Because the amount of information it finds relevant is relatively small, it can offer users important features that generic engines can’t."
From data storage to information retrieval. By Tony Rose, Vice-Chair, BCS Information Retrieval Specialist Group. BCS Annual Review 2006. "[M]uch IR research effort in recent years has been directed toward developing more sophisticated representation models and matching algorithms, often based around natural language processing (NLP) techniques. NLP technology can provide many of the basic building blocks for advanced search, such as: * Summarisation: the ability to produce a coherent summary or abstract of a document; * Named entity recognition: the ability to identify key conceptual units within a document, such as the names of people, places, companies, etc; * Topic detection and tracking: the ability to follow different themes in a changing news feed; * Word sense disambiguation: (the ability to differentiate the particular senses a word may have, e.g. 'bank' as in 'the edge of a river' and 'bank' as in 'financial institution'; * Information extraction: a combination of the above and other techniques to enable specific patterns or facts to be extracted from text or other unstructured data (sometimes referred to as text mining); * Machine translation: the ability to translate one natural language to another. Yet despite many recent successes in NLP research (and the subsequent over-inflated claims of many search technology providers), we are still a long way from the Holy Grail of understanding the conceptual content of a document. Consequently, the many information professionals who rely on such tools will have to wait a little longer for an answer to their prayers, and the numerous artificial intelligence (AI) researchers around the world need not fear for their jobs just yet." Academia's quest for the ultimate search tool. By Stefanie Olsen. CNET News.com (August 15, 2005). "The University of California at Berkeley is creating an interdisciplinary center for advanced search technologies and is in talks with search giants including Google to join the project, CNET News.com has learned. ... The principal areas of focus: privacy, fraud, multimedia search and personalization. ... The success of the $5 billion-a-year search-advertising business is fueling Internet research and development in many ways. ... The search problems of today are different from those of five years ago. ... Jaime Carbonell, director of CMU's Language Technologies Institute, said his research team is perfecting a technology for personalized search that would solve some of the privacy concerns surrounding the wide-scale collection of sensitive data, such as names and query histories. ... CMU is also working under a government grant on a longer-term project called Javelin, focused on question-and-answer search technology. ... The universities of Texas and Pennsylvania are also exploring different approaches to the same problem. Stanford continues in its role as a breeding ground for search projects. ... Stanford associate professor Andrew Ng, among others, is working on artificial-intelligence techniques for extracting knowledge from text in a search index. ... Stanford, the Massachusetts Institute of Technology and many other universities are working to solve problems presented by the library of tomorrow, which will be largely digitized. Sifting through and organizing billions of digital documents will require new search technology." ... and here are some more articles from our AI in the news collection:
Learning Probabilistic User Profiles. By Mark Ackerman and et al. AI Magazine 18(2): Summer 1997, 47-56. Applications for finding interesting web sites and notifying users of changes. The Web as a Database: New Extraction Technologies and Content Management. Katherine C. Adams (2001). Online Magazine; Volume 25, Number 2. "Information extraction research in the United States has a fascinating history. It is a product of the Cold War. In the late 1980s, a number of academic and industrial research sites were working on extracting information from naval messages in projects sponsored by the U.S. Navy. To compare the performance of these software systems, the Message Understanding Conferences (MUC) were started. These conferences were the first large-scale effort to evaluate natural language processing (NLP) systems and they continue to this day." Moving Up the Information Food Chain. By Oren Etzioni. AI Magazine 18(2): Summer 1997, 11-18. A look at deploying softbots on the World Wide Web. When the web starts thinking for itself. By David Green. vnunet's Ebusinessadvisor (December 20, 2002). "The so-called semantic web is an extension of the current web in which data is given meaning through the use of a series of technologies. ... Ontologies provide a deeper level of meaning by providing equivalence relations between terms (i.e. term A on my web page is expressing the same concept as term B on your web page). An ontology is a file that formally defines relations among terms, for example, a taxonomy and set of inference rules. By providing such 'dictionaries of meaning' (in philosophy ontology means 'nature of existence') ontologies can improve the accuracy of web searches by allowing a search program to seek out pages that refer to a specific concept rather than just a particular term as they do now. While XML, RDF and ontologies provide the basic infrastructure of the semantic web, it is intelligent agents that will realise its power." Is There an Intelligent Agent in Your Future? By James A. Hendler (1999). (This wonderful paper received the AAAI-2000 Effective Expository Writing Award.) Savvysearch... By Adele Howe, and Daniel Dreilinger (1997). AI Magazine 18 (2): 19-25. Description of a metasearch engine that learns which search engines to query. Designing Systems That Adapt to Their Users. An AAAI-02 Tutorial by Anthony Jameson, Joseph Konstan, and John Riedl. "Personalized recommendation of products, documents, and collaborators has become an important way of meeting user needs in commerce, information provision, and community services, whether on the web, through mobile interfaces, or through traditional desktop interfaces. This tutorial first reviews the types of personalized recommendation that are being used commercially and in research systems. It then systematically presents and compares the underlying AI techniques, including recent variants and extensions of collaborative filtering, demographic and case-based approaches, and decision-theoretic methods. The properties of the various techniques will be compared within a general framework, so that participants learn how to match recommendation techniques to applications and how to combine complementary techniques." Microsoft Research seeks better search. By Michael Kanellos. CNET News (April 17, 2003). "Microsoft Research is plugging away at one of the growing dilemmas in computing: so much data, so little time. Scientists in the Redmond, Wash.-based software giant's labs are experimenting with new types of search and user interface technology that will let individuals and businesses tap into the vast amounts of data on the Internet, or inside their own computers, that increasingly will be impractical or impossible to find." 18th
century theory is new force in computing. By Michael Kanellos. ZDNet
(February 19, 2003). IBM aims to get smart about AI. By Michael Kanellos. CNET News (January 20, 2003). "In the coming months, IBM will unveil technology that it believes will vastly improve the way computers access and use data by unifying the different schools of thought surrounding artificial intelligence. The Unstructured Information Management Architecture (UIMA) is an XML-based data retrieval architecture under development at IBM."
The Hidden Web. By Henry Kautz, Bart Selman, and Mehul Shah. AI Magazine 18(2): Summer 1997, 27-36. A project that helps users locate experts on the Web. Lifestyle Finder: Intelligent User Profiling Using Large-Scale Demographic Data. By Bruce Krulwich. AI Magazine 18(2): Summer 1997, 37-45. In Search of a Lost Melody - Computer assisted music: identification and retrieval. By Kjell Lemstrom. Finnish Music Quarterly Magazine 3-4/2000. The Search Engine That Could. Reported by Spencer Michels. The NewsHour (PBS; November 29, 2002). Also available in audio and video formats. Hear/see Larry Page and Sergay Brin, co-founders of Google, Skip Battle, the new CEO at Ask Jeeves, and others. Diagnosing Delivery Problems in the White House Information-Distribution System. By Mark Nahabedian and Howard Shrobe. AI Magazine 17(4): Winter 1996, 21-30. Use of AI in selective information distribution.
Search engines try to find their sound. By Stefanie Olsen. CNET News (May 27, 2004). "Most 'spiders' that crawl and index the Web are effectively blind to audio and video content, making NPR's highly regarded radio programming all but invisible to mainstream search engines. ... Consumers armed with broadband connections at home are driving new demand for multimedia content and setting off a new wave of technology development among search engine companies eager to extend their empires from the static world of text to the dynamic realm of video and audio. ... Most ambitiously of all, a handful [of search engines] are bent on searching inside the files to extract meaning and relevance by examining audio and video features directly. StreamSage is starting to make waves with its audio and video search technology, introduced late last year. The Washington, D.C.-based company developed software after roughly three years of research that uses speech recognition technology to transcribe audio and video. It then uses contextual analysis to understand the language and parse the themes of the content. As a result, it can generate a kind of table of contents for the topics discussed in the files." The Revolution in Legal Information Retrieval or: The Empire Strikes Back. By Erich Schweighofer (1999). The Journal of Information, Law and Technology 1999(1). "The issue is how to deal with the Artificial Intelligence (AI)-hard problem of making sense of the mass of legal information." Text Mining Technology - Turning Information Into Knowledge. A white paper from IBM (1998), Daniel Tkach, editor.
The Role of Intelligent Systems in the National Information Infrastructure. An American Association for Artificial Intelligence Policy Report. Edited by Daniel S. Weld. A
cure for info overload. ACM Special Interest Group on Information Retrieval (SIGIR). "ACM SIGIR addresses issues ranging from theory to user demands in the application of computers to the acquisition, organization, storage, retrieval, and distribution of information." Be sure to check out their collection of Information Retrieval Resources. Brainboost Answer Engine. "Brainboost uses Machine Learning and Natural Language Processing techniques to go the extra mile, by actually answering questions, in plain English." The British Computer Society Information Retrieval Specialist Group. CMU Text Learning Group. "Our goal is to develop new machine learning algorithms for text and hypertext data. Applications of these algorithms include information filtering systems for the Internet, and software agents that make decisions based on text information." Among their many projects you'll find:
HP SpeechBot - audio search using speech recognition. From Hewlett-Packard.
Introduction to Information Extraction Technology. IJCAI-99 Tutorial by Douglas E. Appelt and David Israel, Artificial Intelligence Center, SRI International. In addition to the notes from the tutorial, you'll find these collections of links: Research Projects and Systems, Papers, and Resources and Tools for building information extraction systems. MARVEL: "The Intelligent Information Management Department at IBM Research is developing a multimedia analysis and retrieval system called MARVEL. MARVEL helps organize the large and growing amounts of multimedia data (e.g., video, images, audio) by using machine learning techniques to automatically label its content. The system recently won the Wall Street Journal 2004 Innovation Award in the multimedia category." A demo is available. The National Centre for Text Mining (NaCTeM). "We provide text mining services in response to the requirements of the UK academic community. Our initial focus is on applications in the biological and medical domains, where the major successes in the mining of scientific texts have so far occurred."
"NewsInEssence is a system for finding and summarizing clusters of related news articles from multiple sources on the Web. It is under development by the CLAIR group at the University of Michigan." You can see it in action here. "START, the world's first Web-based question answering system, has been on-line and continuously operating since December, 1993. It has been developed by Boris Katz and his associates of the InfoLab Group at the MIT Computer Science and Artificial Intelligence Laboratory. Unlike information retrieval systems (e.g., search engines), START aims to supply users with 'just the right information,' instead of merely providing a list of hits." SUMMARIST: Automated Text Summarization project from The Natural Language Processing group at the Information Sciences Institute of the University of Southern California (USC/ISI). "Summarization is a hard problem of Natural Language Processing because, to do it properly, one has to really understand the point of a text. This requires semantic analysis, discourse processing, and inferential interpretation (grouping of the content using world knowledge)." Sun Microsystems Labs: Conceptual Indexing for Precision Content Retrieval. "How often have you failed to find what you wanted in an online search because the words you used failed to match words in the material that you needed? Concept-based retrieval systems attempt to reach beyond the standard keyword approach of simply counting the words from your request that occur in a document. The Conceptual Indexing Project is developing techniques that use knowledge of concepts and their interrelationships to find correspondences between the concepts in your request and those that occur in text passages. Our goal is to improve the convenience and effectiveness of online information access. The central focus of this project is the 'paraphrase problem,' in which the words used in a query are different from, but conceptually related to, those in material that you need." Transinsight's GoPubMed, "an ontology-based search engine for the life sciences. In contrast to classical search engines it can answer questions using its background knowledge."
Aluri, Rao, and Donald E. Riggs, editors. 1990. Expert Systems in Libraries. Norwood, NJ: Ablex Pub. Corp. Association of Research Libraries. 1991. Expert Systems in ARL Libraries. Washington, DC: ARL. Davies, Peter. 1991. Artificial Intelligence: Its Role in the Information Industry. Medford, NJ: Learned Information, Inc. Ford, Nigel. 1991. Expert Systems and Artificial Intelligence: An Information Manager's Guide. London: Library Association Pub. Hovy, Eduard and Dragomir Radev, Cochairs. Intelligent Text Summarization: Papers from the 1998 Spring Symposium. Technical Report SS-98-06. American Association for Artificial Intelligence, Menlo Park, California.. Jacobs, Paul S., editor. 1992. Text-Based Intelligent Systems : Current Research and Practice in Information Extraction and Retrieval. Hillsdale, NJ: L. Erlbaum Associates. Jones, Karen Sparck. 1999. Information Retrieval and Artificial Intelligence. Artificial Intelligence 114(1-2): 257-281. Kautz, Henry, ed. 1998. Recommender Systems: Papers from the AAAI Workshop. Technical Report WS-98-08. American Association for Artificial Intelligence, Menlo Park, California. "Over the past few years a new kind of application, the 'recommender system,' has appeared, based on a synthesis of ideas from artificial intelligence, human-computer interaction, sociology, information retrieval, and the technology of the WWW. Recommender systems assist and augment the natural process of relying on friends, colleagues, publications, and other sources to make the choices that arise in everyday life. Examples of the kinds of questions that could be answered by a recommender system include: What kind of car should I buy? What web-pages would I find most interesting? What people in my company would be best assigned to a particular project team?" Lyons, Daniel. 1997. The Buzz About Firefly. The New York Times Magazine (June 29, 1997):36-37+. Maybury, Mark T., editor. 1993. Intelligent Multimedia Interfaces. Menlo Park and Cambridge: AAAI Press/MIT Press. This book covers the ground where artificial intelligence, multimedia computing, information retrieval and human-computer interfaces all overlap. Michelson, Avra. 1991. Expert Systems Technology and its Implication for Archives. Washington, DC: National Archives and Records Administration. Special Libraries Association. 1991. Expert Systems and Library Applications : An SLA Information Kit. Washington, DC: Special Libraries Assn. van Rijsbergen, Keith.1979. Information Retrieval, 2nd Edition. London: Butterworths. Verity, John W. 1997. Coaxing Meaning Out of Raw Data. Business Week (February 3, 1997): 134+. |