San José State University


Introduction to Information Retrieval

The popup window showing the Typology of an information system illustrates the goal and process of information retrieval. The goal is to link a person with the document which contains the information he or she wants. (A document can be any information-bearing entity -- text, image, or 3-dimensional object; see Michael Buckland's 1997 article on documents, in which he argues that under certain circumstances, an antelope can be a document.)

Linking a person with a useful document requires that the document be stored in such a way that it can identified as relevant for that specific purpose. The Typology highlights some of the dimensions of the task. The collection of documents must be organized in some way that will allow intellectual access to their contents. (Even before that, a collection of documents must be assembled, but that's a different class!) Organizing information involves what is called "representation" - providing a way to represent the intellectual content of the work. In print libraries, this is often done with subject headings in a catalog - looking up a subject heading will allow you to find a list of the works that are about that subject. In an electronic environment, it is often done by making the actual contents of the document searchable. There are many other ways of creating a searchable representation of the document, often called a "surrogate," and these methods will occupy much of this class.

Once a system is in place for representing the documents in the collection for searching, an actual search has to be conducted. This requires that the person looking for information be able to express the information need appropriately for the information retrieval system - knowing whether to search for U.S., US, U.S.A., United States, or United States of America, for instance. Making the problem more complex, people often need information before they know enough about the subject to identify exactly what kind of information will be helpful.

Once the wish or need for information is expressed correctly to the system, some mechanism is needed for matching the documents to the search query - this is the search engine, the technique for determining which documents match and which do not. And the documents are then retrieved for the searcher, either the full document or information about the full document so that it may be found.

Meadow, Boyce, and Kraft (2000) emphasize the importance of selectivity - only some of the documents in the collection are relevant to a given request, and not all the relevant documents are equally relevant. One of the important features of an information retrieval system is its ability to aggregate and discriminate - to find all and only those documents which will meet the information need. Understanding the factors which improve and degrade this ability is another important dimension of this class.

So - an information retrieval system includes people needing information, information stored in some kind of document, and processes for getting the two together. One of the primary jobs of information professionals is designing these systems.

Key concepts


Buckland, Michael K. (1997). "What is a 'document'?" Journal of the American Society for Information Science 48, 804-9.

Meadow, Charles T., Boyce, Bert R., and Kraft, Donald H. (2000). Text Information Retrieval Systems, 2nd ed. San Diego: Academic Press.