ISDA 140-10
Big Data Analytics and Management
Spring 2022 Syllabus
Dr. Prateek Jain
E-mail
Office: Virtual
Phone: (408) 924-2490
Office Hours: Virtual office hours. Telephone and in-person advising by appointment
Syllabus Links Textbooks CLOs Program Learning Outcomes (PLOs) Course Calendar |
Resources Canvas Login and Tutorials iSchool eBookstore |
Canvas Information: Courses will be available beginning 26th January, 6 am PT unless you are taking an intensive or a one-unit or two-unit class that starts on a different day. In that case, the class will open on the first day that the class meets.
You will be enrolled in the Canvas site automatically.
Course Description
Big data technologies and trends, including large-scale databases, map-reduce paradigm, big data mining, and big data platforms. Focus is on hands-on learning with tools such as Splunk and Scala/Hadoop using as applied to real-world data sets.
Assignments
This schedule and related dates/readings/assignments is tentative and subject to change with fair notice. Any changes will be announced in due time in class and on the course’s website in the Canvas Learning Management System. The students are obliged to consult the most updated and detailed version of the reading material and syllabus, which will be posted on the course’s website.
Detailed information on assignments, including the research paper grading rubric, will be provided on the course Canvas site.
Other Relevant Information:
Students are encouraged and expected to engage in discussions facilitated via Canvas and also in small groups for a final project.
Course Schedule
(tentative; subject to minor changes with fair notice)
Week/ |
Topic(s) |
Instructional Content (Lectures, Reading, PPTs, Links, etc.) |
Assignments |
1 01/26/22- 01/30/22 |
Introduction to Big Data -What is Data -What is Big Data -Why we need big data analytics - Presence of big data in various industries e.g. social media, sports analytics |
- Lecture video - Furht, B., & Villanustre, F. (2016). Introduction to big data. In Big data technologies and applications (pp. 3-11). Springer, Cham. - Big Data Analytics: What It Is, How It Works, Benefits, And Challenges -
|
Canvas introduction
|
2 01/31/22- 02/06/22 |
Big Data Visualization - Overview and applications
|
-Lecture video - Ware, C. (2020). Information Visualization: Perception for Design (4th edition). Morgan Kaufmann.
|
Project teams assigned Canvas discussion post due 02/05/2022 |
3 02/07/22- |
Big Database Management -Data lakes/data fabric/cloud |
Lecture video -Introduction to cloud computing -Miloslavskaya, Natalia, & Tolstoy, Alexander. (2016). Big Data, Fast Data and Data Lake Concepts. Procedia Computer Science, 88, 300-305. |
Assignment 1 and
|
4c 02/14/22-02/20/22 |
Big Database Management -NoSQL datases |
Lecture video
-Atzeni, Paolo, Bugiotti, Francesca, Cabibbo, Luca, & Torlone, Riccardo. (2020). Data modeling in the NoSQL world. Computer Standards & Interfaces, 67, 103149. -Ben Hamadou, Hamdi, Ghozzi, Faiza, Péninou, André, & Teste, Olivier. (2019). Schema-independent querying for heterogeneous collections in NoSQL document stores. Information Systems, 85, 48-67. |
Canvas discussion
|
5 02/21/22-02/27/22 |
MapReduce and Distributed Computing Hadoop & HDFS |
Lecture video -Bengfort, B., & Kim, J. (2016). Data analytics with Hadoop: An introduction for data scientists. O’Reilly Media. -Li, Feng, Ooi, Beng Chin, Özsu, M. Tamer, & Wu, Sai. (2014). Distributed data management using MapReduce. ACM Computing Surveys (CSUR), 46(3), 1-42. |
Canvas discussion
|
6 02/28/22-03/06/22 |
- Hadoop Ecosystem Spark/Scala |
Lecture video - Karau, Holden, Konwinski, Andy, Wendell, Patrick and Zaharia, Matei. Learning Spark. : O'Reilly, 2015. |
Project proposal |
7 03/07/22-03/13/22 |
Big Data Mining -Big data mining theories and applications -Using Weka for big data mining |
Lecture -Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97-107. - Witten, I. H., Frank, E.,, Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Amsterdam: Morgan Kaufmann. ISBN: 978-0-12-374856-0 (Recommended, please access ebook from library) |
|
8 03/14/22-03/20/22 |
-Big Data Platforms AWS and Google Cloud Migrating to the cloud Serverless, scaling, and automation |
Lecture - Venema, W. (2020). Building Serverless Applications with Google Cloud Run: A Real-World Guide to Building Production-Ready Services. Opgehaal van https://books.google.com/books?id=Q_E8zQEACAAJ (Recommended, please access ebook from library) |
Assignment 2
|
9 03/21/22-03/27/22 |
Introduction to Splunk |
Lecture |
|
10 03/28/22-04/03/22 |
Splunk – Advanced features |
Lecture |
Assignment 3 |
11 04/04/22-04/10/22 |
Machine Learning
|
Lecture
|
|
12 04/11/22-04/17/22 |
Machine Learning Part II |
Lecture
|
Project |
13 04/18/22-04/24/22 |
Advanced PyTorch Advanced deployment scenarios |
Lecture |
|
14 04/25/22-05/01/22 |
Big Data Management Case Studies: Opportunities & Challenges |
Lecture
|
Assignment 4
|
15 05/02/22-05/08/22 |
Big Data Governance and Security |
Lecture
|
Case Study Due
|
16 05/09/22-05/16/22 |
Project Presentation |
|
Final Project |
Course Workload Expectations
Success in this course is based on the expectation that students will spend, for each unit of credit, a minimum of forty-five hours over the length of the course (normally 3 hours per unit per week with 1 of the hours used for lecture) for instruction or preparation/studying or course related activities including but not limited to internships, labs, clinical practica. Other course structures will have equivalent workload expectations as described in the syllabus.
Instructional time may include but is not limited to:
Working on posted modules or lessons prepared by the instructor; discussion forum interactions with the instructor and/or other students; making presentations and getting feedback from the instructor; attending office hours or other synchronous sessions with the instructor.
Student time outside of class:
In any seven-day period, a student is expected to be academically engaged through submitting an academic assignment; taking an exam or an interactive tutorial, or computer-assisted instruction; building websites, blogs, databases, social media presentations; attending a study group;contributing to an academic online discussion; writing papers; reading articles; conducting research; engaging in small group work.
Course Prerequisites
ISDA 111.
Course Learning Outcomes
Upon successful completion of the course, students will be able to:
- Describe and explain the main technologies and trends in big data work, specifically data visualization, large-scale database management, map-reduce paradigm, big data mining, and big data platforms.
- Demonstrate proficiency in using Splunk/Scala/Hadoop to solve big data analytical problems.
- Interpret and communicate big data analysis and visualization results appropriately, effectively and accurately.
- Discuss, articulate and compare various big data managerial issues (e.g., data governance and privacy) in different organizational settings.
- Make informed and strategic decisions with the presence of large-scale data sets.
- Information Science and Data Analytics SLO 2: Identify and apply appropriate data management strategies, carry out relevant analyses, interpret and apply the results to inform understanding and solve specific problems in context; and communicate analysis and visualization results appropriately to a diverse non-technical audience.
- Information Science and Data Analytics SLO 3: Demonstrate proficiency in the computing skills needed to support information and data analysis, including prototype building and scripting for working with structured data (data that is clearly defined and easily searchable) and unstructured data (data that is not easily searchable such as email, audio, video, and social media postings).
SLOs & PLOs
ISDA 140 supports:
- SLO 2 and SLO 3 supports the following Information and Data Science Program Learning Outcomes (PLOs):
- PLO 1: Apply information and data science concepts and methods by thinking critically and creatively to conceptualize and solve real world problems.
- PLO 2: Demonstrate an understanding of the data lifecycle, including data curation and stewardship. distributed computing, and the data pipeline eco system.
Textbooks
No Textbooks For This Course.
Grading Scale
The standard SJSU School of Information Grading Scale is utilized for all iSchool courses:
97 to 100 | A |
94 to 96 | A minus |
91 to 93 | B plus |
88 to 90 | B |
85 to 87 | B minus |
82 to 84 | C plus |
79 to 81 | C |
76 to 78 | C minus |
73 to 75 | D plus |
70 to 72 | D |
67 to 69 | D minus |
Below 67 | F |
In order to provide consistent guidelines for assessment for graduate level work in the School, these terms are applied to letter grades:
- C represents Adequate work; a grade of "C" counts for credit for the course;
- B represents Good work; a grade of "B" clearly meets the standards for graduate level work or undergraduate (for BS-ISDA);
For core courses in the MLIS program (not MARA, Informatics, BS-ISDA) — INFO 200, INFO 202, INFO 204 — the iSchool requires that students earn a B in the course. If the grade is less than B (B- or lower) after the first attempt you will be placed on administrative probation. You must repeat the class if you wish to stay in the program. If - on the second attempt - you do not pass the class with a grade of B or better (not B- but B) you will be disqualified. - A represents Exceptional work; a grade of "A" will be assigned for outstanding work only.
Graduate Students are advised that it is their responsibility to maintain a 3.0 Grade Point Average (GPA). Undergraduates must maintain a 2.0 Grade Point Average (GPA).
University Policies
Per University Policy S16-9, university-wide policy information relevant to all courses, such as academic integrity, accommodations, etc. will be available on Office of Graduate and Undergraduate Programs' Syllabus Information web page at: https://www.sjsu.edu/curriculum/courses/syllabus-info.php. Make sure to visit this page, review and be familiar with these university policies and resources.
In order to request an accommodation in a class please contact the Accessible Education Center and register via the MyAEC portal.
Download Adobe Acrobat Reader to access PDF files.
More accessibility resources.