Information Technology Tools and Applications – Advanced Topic: Web/Text/Data Mining for LIS
Spring 2015 Greensheet
Text/Data Mining Tools
Canvas Information: Courses will be available beginning January 22nd, 12:01am PST unless you are taking an intensive or a one unit or two unit class that starts on a different day. In that case the class will open at 12:01am PST on the first day that the class meets.
You will be enrolled into the Canvas site automatically.
Class activities will be carried out in both the Canvas and Blackboard Collaborate (previously known as Elluminate) systems.
This course is an introduction to web, text, and data mining from the perspective of library and information services. Students will learn basic concepts, approaches, and practical techniques of web/text/data mining by conducting group topical research and completing one individual mining project (consisting of ten stages/exercises) with Rapid Miner (a free data mining software with extensions for web/text processing).
Students' performance in this class will be evaluated on the basis of the following assignments,
- Self introduction (5%) -- SLO #1, #2
- Group topical research (25%) -- SLO #1, #2, #3
- Online discussion in group forums (10%) -- SLO #1, #2
- Lead/moderating a thread
- Participation by responding
- Written group report (15%)
- Individual mining project (10 stages, 7% each) -- SLO #4, #5
- RapidMiner installation/configuratio | Excel data import/exporting
- Reading/writing text file | Correlation analysis
- Web crawling and text content extractio | Preprocessing of texts
- Document clustering | Constructing random set for model training
- Building | testing a data model (NN/Bayesian classifier)
At the start of semester, students will be randomly assigned into groups (of optimal size five) to complete the group topical research and conduct related online discussion. Each group will choose a topic of focus and work collaboratively to produce a final group report, by reading/summarizing book chapters and reviewing recent scholarly publications. As part of the topical research project, students will particiate in group online discussion to share findings and to comment on related issues. Each member will lead/moderate one thread in the group's forum, and be responsible for writing up related sections of the group's final report.
Detailed instruction on each of the ten stages of the individual mining project will be provided in the Canvas class site, along with other course materials.
Of both the group topical research and the individual mining project, students are encouraged (but not required) to record presentations in Blackboard Collaborate. The presentation recordings may be used later for LIBR289 e-portfolio as competency evidence or for job interview as demo of skills.
All written work should be professionally prepared following the APA editorial style and established convention of academic writing, free of grammatical errors and spelling mistakes. Tutorial, assistence, and resources for improving academic writing skills are available at the Writing Resources Center.
It is students' responsibility to submit and maintain the electronic version of their works until the final grade is issued.
(Brief and tentative. A final/extensive version will be provided in the Canvas class site.)
(All due by 11:30pm)
|Orientation (Blackboard Collaborate)||9:00-11:30am|
GD-1: Self introduction start
Lab Session 1: Initiation on Rapid Miner
(All together, 9:00-11:45am)
Ex-1 completed, Ex-2 start
GR/GD: Proposing topic for group research
|Survey of software tools||
Ex-2 due, Ex-3 start
GD-1 done, GR/GD topic approval
|Statistical data anlaysis (for mining)||
Ex-3 due, Ex-4 start
GD-2: Concpets/principles start
|Web mining (content and structure)||
Ex-4 due, Ex-5 start
Lab Session** 2: Web crawling & extraction
|GD-2 done, GD-3: Current developments start|
|NLP and text preprocessing||Ex-5 due, Ex-6 start|
Lab Session 3: Text preprocessing
|GD-3 done, GD-4: application in LIS start|
|Statistical text mining||Ex-6 due, Ex-7 start|
Lab Session 4: Document clustering
|GD-3 done, GD-4: ethic/social issues start|
|Data transformation & modeling||
Ex-7 due, Ex-8 start
Lab Session 5: Bayesian/NN data modeling
Ex-8 Due, Ex-9 start
|Transaction log anlaysis (web analytics)||GR: written report due|
|Competitive intelligence||Ex-9 due, Ex-10 start|
|Conclusion: Student recording of presentations (optional)||Ex-10 due|
* Ex-#: Stage of individual mining project (exercises); GR: Group topical research; GD-#: group online discussion.
** Lab sessions 2-5 are individual meetings for 1-to-1 tutoring. Students will sign up for hourly timelots on the specified day(s), either by marking on a shared Google doc or via a Doodle poll.
Students' work will be evaluated according to the following specific criteria:
- Basic content as required (70%);
- Originality and creativity (20%);
- Quality of writing (format, rigor, and clarity) (10%).
Letter grades will be assigned to all assignments, and online discussion will be graded quantitatively based on class/group average counts of postings. The Standard SJSU SLIS Grading Scale will be used to translate letter grades to points and vice versa to calculate a proportionate total of points for the final grade. No extra credit is offered for additional work to make up for missed assignment.
Late submission will not be accepted unless appropriate documentation of legitimate cause (such as unexpected medical urgency and/or personal hardship) for the delay is provided. Request for deadline extension will be treated the same as of Incomplete in accordance to the university/school policy.
- Microsoft Excel (version 2009 or later, included in Microsoft Office)
- Screen capturer (such as the snipping tool of Windows)
- Rapid Miner (free open source application)
Course Workload Expectations
Success in this course is based on the expectation that students will spend, for each unit of credit, a minimum of forty-five hours over the length of the course (normally 3 hours per unit per week with 1 of the hours used for lecture) for instruction or preparation/studying or course related activities including but not limited to internships, labs, clinical practica. Other course structures will have equivalent workload expectations as described in the syllabus.
Instructional time may include but is not limited to:
Working on posted modules or lessons prepared by the instructor; discussion forum interactions with the instructor and/or other students; making presentations and getting feedback from the instructor; attending office hours or other synchronous sessions with the instructor.
Student time outside of class:
In any seven-day period, a student is expected to be academically engaged through submitting an academic assignment; taking an exam or an interactive tutorial, or computer-assisted instruction; building websites, blogs, databases, social media presentations; attending a study group;contributing to an academic online discussion; writing papers; reading articles; conducting research; engaging in small group work.
LIBR 202, other prerequisites may be added depending on content.
Student Learning Outcomes
Upon successful completion of the course, students will be able to:
- Describe key concepts and terminologies in the field of text, data, and Web mining.
- Describe major approaches and techniques of text, data, and Web mining.
- Discuss the roles of text, data, and Web mining in intelligence and knowledge discovery.
- Use a software tool to accomplish a reasonably sophisticated text, data, or Web mining task.
- Integrate, summarize, and report the findings of mining research.
Core Competencies (Program Learning Outcomes)
LIBR 246 supports the following core competencies:
- E Design, query and evaluate information retrieval systems.
- G Demonstrate understanding of basic principles and standards involved in organizing information, including classification, cataloging, metadata, or other systems.
- H Demonstrate proficiency in identifying, using, and evaluating current and emerging information and communication technologies.
- I Use service concepts, principles, and techniques to connect individuals or groups with accurate, relevant, and appropriate information.
- Feldman, R., & Sanger, J. (2006). The text mining handbook: Advanced approaches in analyzing unstructured data. New York: Cambridge University Press. Available through Amazon: 0521836573
- Han, J., & Kamber, M. (2005). Data mining: Concepts and techniques (2nd ed.). San Francisco, CA: Morgan Kaufmann. Available through Amazon: 1558609016
- Hofmann, M., & Klinkenberg, R. (Ed.) (2013). RapidMiner: Data mining use cases and business analytics applications. Boca Raton, FL: Chapman & Hall/CRC. Available through Amazon: 1482205491
- Markov, Z., & Larose, D. (2007). Data mining the web: Uncovering patterns in web content, structure, and usage. Hoboken, NJ: John Wiley & Sons, Inc. Available through Amazon: 0471666556
- Zanasi, A. (2007). Text Mining and Its Applications to Intelligence, CRM and Knowledge Management. Billerica, MA: WIT Press. Available through Amazon: 1845641310
The standard SJSU School of Information Grading Scale is utilized for all iSchool courses:
|97 to 100||A|
|94 to 96||A minus|
|91 to 93||B plus|
|88 to 90||B|
|85 to 87||B minus|
|82 to 84||C plus|
|79 to 81||C|
|76 to 78||C minus|
|73 to 75||D plus|
|70 to 72||D|
|67 to 69||D minus|
In order to provide consistent guidelines for assessment for graduate level work in the School, these terms are applied to letter grades:
- C represents Adequate work; a grade of "C" counts for credit for the course;
- B represents Good work; a grade of "B" clearly meets the standards for graduate level work;
For core courses in the MLIS program (not MARA) — INFO 200, INFO 202, INFO 204 — the iSchool requires that students earn a B in the course. If the grade is less than B (B- or lower) after the first attempt you will be placed on administrative probation. You must repeat the class the following semester. If -on the second attempt- you do not pass the class with a grade of B or better (not B- but B) you will be disqualified.
- A represents Exceptional work; a grade of "A" will be assigned for outstanding work only.
Students are advised that it is their responsibility to maintain a 3.0 Grade Point Average (GPA).
General Expectations, Rights and Responsibilities of the Student
As members of the academic community, students accept both the rights and responsibilities incumbent upon all members of the institution. Students are encouraged to familiarize themselves with SJSU's policies and practices pertaining to the procedures to follow if and when questions or concerns about a class arises. See University Policy S90-5 at http://www.sjsu.edu/senate/docs/S90-5.pdf. More detailed information on a variety of related topics is available in the SJSU catalog at http://info.sjsu.edu/web-dbgen/catalog/departments/LIS.html. In general, it is recommended that students begin by seeking clarification or discussing concerns with their instructor. If such conversation is not possible, or if it does not serve to address the issue, it is recommended that the student contact the Department Chair as a next step.
Dropping and Adding
Students are responsible for understanding the policies and procedures about add/drop, grade forgiveness, etc. Refer to the current semester's Catalog Policies section at http://info.sjsu.edu/static/catalog/policies.html. Add/drop deadlines can be found on the current academic year calendars document on the Academic Calendars webpage at http://www.sjsu.edu/provost/services/academic_calendars/. The Late Drop Policy is available at http://www.sjsu.edu/aars/policies/latedrops/policy/. Students should be aware of the current deadlines and penalties for dropping classes.
Information about the latest changes and news is available at the Advising Hub at http://www.sjsu.edu/advising/.
Consent for Recording of Class and Public Sharing of Instructor Material
University Policy S12-7, http://www.sjsu.edu/senate/docs/S12-7.pdf, requires students to obtain instructor's permission to record the course and the following items to be included in the syllabus:
- "Common courtesy and professional behavior dictate that you notify someone when you are recording him/her. You must obtain the instructor's permission to make audio or video recordings in this class. Such permission allows the recordings to be used for your private, study purposes only. The recordings are the intellectual property of the instructor; you have not been given any rights to reproduce or distribute the material."
- It is suggested that the syllabus include the instructor's process for granting permission, whether in writing or orally and whether for the whole semester or on a class by class basis.
- In classes where active participation of students or guests may be on the recording, permission of those students or guests should be obtained as well.
- "Course material developed by the instructor is the intellectual property of the instructor and cannot be shared publicly without his/her approval. You may not publicly share or upload instructor generated material for this course such as exam questions, lecture notes, or homework solutions without instructor consent."
Your commitment, as a student, to learning is evidenced by your enrollment at San Jose State University. The University Academic Integrity Policy F15-7 at http://www.sjsu.edu/senate/docs/F15-7.pdf requires you to be honest in all your academic course work. Faculty members are required to report all infractions to the office of Student Conduct and Ethical Development. The Student Conduct and Ethical Development website is available at http://www.sjsu.edu/studentconduct/.
Campus Policy in Compliance with the American Disabilities Act
If you need course adaptations or accommodations because of a disability, or if you need to make special arrangements in case the building must be evacuated, please make an appointment with me as soon as possible, or see me during office hours. Presidential Directive 97-03 at http://www.sjsu.edu/president/docs/directives/PD_1997-03.pdf requires that students with disabilities requesting accommodations must register with the Accessible Education Center (AEC) at http://www.sjsu.edu/aec to establish a record of their disability.
Download Adobe Acrobat Reader to access PDF files.
More accessibility resources.