San José State University


Introduction to the Class

Information retrieval is central to library and information science. Whether you're involved in designing new services or systems, understanding how people seek information and how it can be provided is fundamental to professional tasks. This class concentrates primarily on systems rather than services, but the two are interrelated in many ways. You'll find the concepts from this class in various guises throughout the LIS curriculum. A diagram of the relationships between 202 and other classes (presented in a popup window) shows one view of how important information retrieval is in our curriculum. 202 has three central concerns: the organization of information, how to design the systems that organize information, and how to retrieve information from those systems. Essential to all three is an understanding of people's information needs, and how people go about seeking information to fulfill those needs.

How information can be retrieved is completely dependent on how it has been stored. In fact, this field used to be called "information storage and retrieval," but over time the term has been shortened and you'll often find both referred to as "information retrieval." About half of this class takes an intensive look at information storage; how to retrieve information based on how it has been stored is the subject of the other half of the class. The relationship works in reverse, as well: understanding how people retrieve information should inform our decisions about storage.

We'll begin the class by defining information retrieval, taking a look at how IR systems work, and looking at types of search engines and the different ways that they can match documents with queries.

With that basic understanding of what an information retrieval system can do, we'll begin to talk about what goes into IR systems - that is, the actual documents and document surrogates that are stored there to be retrieved. Surrogates must give information about both the documents themselves and their content.

The goal of describing the documents is to provide enough information to distinguish unique intellectual works (for instance, separate the book "Organizational Behavior" by Philip Applewhite from the book "Organizational Behavior" by Robert Vecchio, or to distinguish a lanternslide of the first building on a university campus from a photograph of the same building) and to aggregate intellectual works (for instance, the paperback and the hardcover forms of a book, or an audiotape and the movie based on a particular book). The rules for description have to be surprisingly precise to accomplish this goal, and we'll look at several examples.

Representation of content or subject is even more complex than description. Content can be represented by natural language (terms derived directly from titles, abstracts, or the documents themselves), or by various controlled vocabularies (descriptors or subject headings), or by a classification system. Each of these three techniques has advantages and disadvantages, which we'll discuss. They can be used singly or in combination. Again, we'll look at several examples of each of these ways of representing content.

LIBR 202 places a great deal of emphasis on how people search existing information systems. Both library science and information science have strong research traditions in information seeking behavior. The more we understand about both how people look for information and why they are looking for it, the better we can assist them, both in providing professional service and in designing systems. Using Dialog and various OPACS and Internet search engines as models, we'll discuss analytic and browsing strategies. An important dimension of this discussion is how the systems support or hinder these search strategies.

Another major topic will be how one evaluates information retrieval systems. We'll talk a little about the history of evaluation in information retrieval, and how it's being done today. Precision and recall based on determinations of relevance are the best measures we have for assessing how well a system performs.

We'll also begin to consider the possibility of improving IR systems. The design of a system's interface determines how well a user will be able to exploit the functionality provided by the system. Using what we know about search strategies and information seeking behavior, we can look at specific elements of good interface design. New developments in IR and advanced system features will be introduced. This topic brings together the material we've covered dealing with the systems and their functionality with the material on how people use systems.


Here are some links to sites providing glossaries or dictionaries of basic terms related to LIS. Please let a 202 instructor know if the links cease to work, or if you find other good glossaries to link here.