LIBR 246-06
LIBR 246-14
Information Technology Tools and Applications – Advanced Topic: Web/Text/Data Mining for LIS
Fall 2013 Greensheet

Dr. Geoffrey Z. Liu
Other contact information: telephone: (408) 924-2467
Office location: Clark Hall 418L, SJSU Campus
Office Hours: Email, Blackboard IM, and in-person by appointment

Greensheet Links
Additional Links
Online Resource
Text/Data Mining Tools
D2L Login and Tutorials
iSchool eBookstore

D2L & Blackboard Collaborate (Elluminate) Information: Class activities will be carried out in both the D2L and Blackboard Collaborate (previously known as Elluminate) systems. Students will be automatically enrolled into the D2L class based on MySJSU registration, and therefore do not need to manually enroll. This course will be available one week prior to the start of semester.

Course Description

This course is an introduction to the emerging field of text/data/web mining from the practical perspective of library and information services. Emphasis is on basic theoretical concepts, major approaches/techniques, process of mining, and use of software tools. Web content mining, text mining, and data miningmining are covered in sequence. Students learn theoretical basics and practical skills by conducting topical research in groups and completing one individual project of multiple stages.

The main objective is to learn necessary skills of using exisitng text/data mining software tools to extract information/knowledge for improving information services and developing intelligence.

Course Requirements

Students' performance in this class will be evaluated on the basis of the following assignments,

  • One group project of topical research  -- SLO #1, #2, #3
    • Presentation (5%)
    • Written report (10%)
  • Individual project of web/text/data mining (60%, 10% for each stage) -- SLO #4, #5
    • Installation/configuration of RapidMiner
    • Extraction of web content (text)
    • Preprocessing of texts
    • Document clustering
    • Statistical analysis (correlation & ANOVA)
    • Advanced data mining
  • Online discussion -- SLO #1, #2
    • Self introduction (2%)
    • Leading/moderating one thread (4%)
    • Participation by responding (4%)
  • Final exam (15%)

At the start of semester, students will be randomly assigned into groups (of optimal size five) to complete the project of topical research. Each group will work on a different topic collaboratively, reading/summarizing book chapters and reviewing recent scholarly publications. As part of topical research, students will conduct online discussion within groups, to share findings and comment on related issues. Each member will lead/moderate online discussion on one aspect of the group's research topic, and be responsible for writing up related parts/sections of the group's research report. The groups will be scheduled to give a presentation in Blackboard Collaboarte at different points of time throughout the semester. Attendance in presentation sessions is optional, and the meeting will be recorded/archived for later viewing.

Detailed instructions on each stage of the individual mining project will be provided in the D2Ll class site, along with other course materials.

All written work should be professionally prepared following the APA editorial style and established convention of academic writing, free of grammatical errors and spelling mistakes. Tutorial, assistence, and resources for improving academic writing skills are available at the Writing Resources Center.

It is students' responsibility to submit and maintain the electronic version of their work until the final grade is issued.

Course Calendar
(The brief calendar below is tentative. A final/extensive version will be provided in the D2L class site.)


Week Topic Assignment/Tasks
(All due by 11:30pm)
Orientation Blackboard Collaborate
Introduction Posting self introduction
Survey of text/data mining software IP-1: Installing Rapid Miner
Overview of Rapid Miner GP - Choice of topic
Web mining (content and structure)  
NLP and preprocessing of texts IP-2: Web page extraction
Statistical text mining  
Entity-relation extraction IP-3: Text preprocessing
Data set: attributes vs. variables  
Statistical data analysis IP-4: Document clustering
Data transformation & modeling  
Transaction log analysis IP-5: Statistical analysis
Reporting/presentation of findings  
Competitive intelligence IP-6: Data mining
Ethical and social issues  
Final exam (online)  


Students' work will be evaluated according to the following specific criteria:

  • Basic content as required (70%);
  • Originality and creativity (20%);
  • Quality of writing (format, rigor, and clarity)  (10%).

Letter grades will be assigned to all assignments, the final exam yields scores on a 100-point scale. The Standard SJSU SLIS Grading Scale will be used to translate letter grades to points and vice versa to calculate a proportionate total of points for the final grade. No extra credit is offered for additional work to make up for missed assignment.

Late submission will not be accepted unless appropriate documentation of legitimate cause for the delay is provided. Request for deadline extension will be treated the same as of Incomplete in accordance to the university/school policy.

Software Requirement

  • Microsoft Excel (version 2009 or later, included in Microsoft Office)
  • Rapid Miner (open source application, free installation)

Course Workload Expectations

Success in this course is based on the expectation that students will spend, for each unit of credit, a minimum of forty-five hours over the length of the course (normally 3 hours per unit per week with 1 of the hours used for lecture) for instruction or preparation/studying or course related activities including but not limited to internships, labs, clinical practica. Other course structures will have equivalent workload expectations as described in the syllabus.

Instructional time may include but is not limited to:
Working on posted modules or lessons prepared by the instructor; discussion forum interactions with the instructor and/or other students; making presentations and getting feedback from the instructor; attending office hours or other synchronous sessions with the instructor.

Student time outside of class:
In any seven-day period, a student is expected to be academically engaged through submitting an academic assignment; taking an exam or an interactive tutorial, or computer-assisted instruction; building websites, blogs, databases, social media presentations; attending a study group;contributing to an academic online discussion; writing papers; reading articles; conducting research; engaging in small group work.

Course Prerequisites

LIBR 202, other prerequisites may be added depending on content

Student Learning Outcomes

Upon successful completion of the course, students will be able to:

  1. Define the basic syntax of coding PHP programs.
  2. Use HTML forms with PHP.
  3. Use standard PHP functions and be able to write their own custom functions.
  4. Demonstrate a basic understanding of MySQL and be able to use it in a PHP program.
  5. Build and maintain a small Web application.
  6. Identify the features of JavaScript.
  7. Incorporate JavaScript/Jscript into HTML using current versions of popular Internet browsers.
  8. Identify the types of data and operators in JavaScript.
  9. Incorporate variables in JavaScript.
  10. Declare functions and add objects along with their methods and properties in JavaScript.
  11. Manage JavaScript events by using event handlers.
  12. Create interactive HTML forms by applying the properties and methods of form objects and elements.
  13. Implement loops in JavaScript programs.
  14. Manipulate the images displayed on a Web page.
  15. Identify how information about a Web page is stored.
  16. Identify the functions of cookie attributes; create and manipulate cookies.
  17. Identify information provided by navigator object properties.
  18. Manipulate strings using the string object method.

Core Competencies (Program Learning Outcomes)

LIBR 246 supports the following core competencies:

  1. E Design, query and evaluate information retrieval systems.
  2. G Demonstrate understanding of basic principles and standards involved in organizing information, including classification, cataloging, metadata, or other systems.
  3. H Demonstrate proficiency in identifying, using, and evaluating current and emerging information and communication technologies.


Required Textbooks:

  • Han, J., & Kamber, M. (2005). Data mining: Concepts and techniques (2nd ed.). Morgan Kaufmann. Available through Amazon: 1558609016 arrow gif indicating link outside sjsu domain

Recommended Textbooks:

  • Feldman, R., & Sanger, J. (2006). The text mining handbook: Advanced approaches in analyzing unstructured data. Cambridge University Press. Available through Amazon: 0521836573 arrow gif indicating link outside sjsu domain
  • Hofmann, M., & Klinkenberg, R. (Ed.). (2013). RapidMiner: Data mining use cases and business analytics applications. Chapman & Hall/CRC. Available through Amazon: 1482205491arrow gif indicating link outside sjsu domain
  • Markov, Z., & Larose, D. (2007). Data mining the web: Uncovering patterns in web content, structure, and usage. John Wiley & Sons, Inc. Available through Amazon: 0471666556 arrow gif indicating link outside sjsu domain

Grading Scale

The standard SJSU School of Information Grading Scale is utilized for all iSchool courses:

97 to 100 A
94 to 96 A minus
91 to 93 B plus
88 to 90 B
85 to 87 B minus
82 to 84 C plus
79 to 81 C
76 to 78 C minus
73 to 75 D plus
70 to 72 D
67 to 69 D minus
Below 67 F


In order to provide consistent guidelines for assessment for graduate level work in the School, these terms are applied to letter grades:

  • C represents Adequate work; a grade of "C" counts for credit for the course;
  • B represents Good work; a grade of "B" clearly meets the standards for graduate level work;
    For core courses in the MLIS program (not MARA) — INFO 200, INFO 202, INFO 204 — the iSchool requires that students earn a B in the course. If the grade is less than B (B- or lower) after the first attempt you will be placed on administrative probation.  You must repeat the class the following semester. If -on the second attempt- you do not pass the class with a grade of B or better (not B- but B) you will be disqualified.
  • A represents Exceptional work; a grade of "A" will be assigned for outstanding work only.

Students are advised that it is their responsibility to maintain a 3.0 Grade Point Average (GPA).

University Policies

General Expectations, Rights and Responsibilities of the Student

As members of the academic community, students accept both the rights and responsibilities incumbent upon all members of the institution. Students are encouraged to familiarize themselves with SJSU's policies and practices pertaining to the procedures to follow if and when questions or concerns about a class arises. See University Policy S90-5 at More detailed information on a variety of related topics is available in the SJSU catalog at In general, it is recommended that students begin by seeking clarification or discussing concerns with their instructor. If such conversation is not possible, or if it does not serve to address the issue, it is recommended that the student contact the Department Chair as a next step.

Dropping and Adding

Students are responsible for understanding the policies and procedures about add/drop, grade forgiveness, etc. Refer to the current semester's Catalog Policies section at Add/drop deadlines can be found on the current academic year calendars document on the Academic Calendars webpage at The Late Drop Policy is available at Students should be aware of the current deadlines and penalties for dropping classes.

Information about the latest changes and news is available at the Advising Hub at

Consent for Recording of Class and Public Sharing of Instructor Material

University Policy S12-7,, requires students to obtain instructor's permission to record the course and the following items to be included in the syllabus:

  • "Common courtesy and professional behavior dictate that you notify someone when you are recording him/her. You must obtain the instructor's permission to make audio or video recordings in this class. Such permission allows the recordings to be used for your private, study purposes only. The recordings are the intellectual property of the instructor; you have not been given any rights to reproduce or distribute the material."
    • It is suggested that the syllabus include the instructor's process for granting permission, whether in writing or orally and whether for the whole semester or on a class by class basis.
    • In classes where active participation of students or guests may be on the recording, permission of those students or guests should be obtained as well.
  • "Course material developed by the instructor is the intellectual property of the instructor and cannot be shared publicly without his/her approval. You may not publicly share or upload instructor generated material for this course such as exam questions, lecture notes, or homework solutions without instructor consent."

Academic integrity

Your commitment, as a student, to learning is evidenced by your enrollment at San Jose State University. The University Academic Integrity Policy F15-7 at requires you to be honest in all your academic course work. Faculty members are required to report all infractions to the office of Student Conduct and Ethical Development. The Student Conduct and Ethical Development website is available at

Campus Policy in Compliance with the American Disabilities Act

If you need course adaptations or accommodations because of a disability, or if you need to make special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible, or see me during office hours. Presidential Directive 97-03 at requires that students with disabilities requesting accommodations must register with the Accessible Education Center (AEC) at to establish a record of their disability.

icon showing link leads to the PDF file viewer known as Acrobat Reader Download Adobe Acrobat Reader to access PDF files.

More accessibility resources.