ISDA 140-10
Big Data Analytics and Management
Spring 2022 Syllabus

Dr. Prateek Jain
E-mail
Office: Virtual
Phone: (408) 924-2490
Office Hours: Virtual office hours. Telephone and in-person advising by appointment


Syllabus Links
Textbooks
CLOs
Program Learning
Outcomes (PLOs)

Course Calendar
Resources
Canvas Login and Tutorials
iSchool eBookstore
 

Canvas Information: Courses will be available beginning 26th January, 6 am PT unless you are taking an intensive or a one-unit or two-unit class that starts on a different day. In that case, the class will open on the first day that the class meets.

You will be enrolled in the Canvas site automatically.

Course Description

Big data technologies and trends, including large-scale databases, map-reduce paradigm, big data mining, and big data platforms. Focus is on hands-on learning with tools such as Splunk and Scala/Hadoop using as applied to real-world data sets.

Assignments

This schedule and related dates/readings/assignments is tentative and subject to change with fair notice. Any changes will be announced in due time in class and on the course’s website in the Canvas Learning Management System. The students are obliged to consult the most updated and detailed version of the reading material and syllabus, which will be posted on the course’s website.

Detailed information on assignments, including the research paper grading rubric, will be provided on the course Canvas site.

Other Relevant Information:

Students are encouraged and expected to engage in discussions facilitated via Canvas and also in small groups for a final project.

Course Schedule

(tentative; subject to minor changes with fair notice)

Week/
Module

Dates

Topic(s)

Instructional Content

(Lectures, Reading, PPTs, Links, etc.)

Assignments 
Assessments
Due Dates
CLOs

1

01/26/22-

01/30/22

Introduction to Big Data

-What is Data

-What is Big Data

-Why we need big data analytics

- Presence of big data in various industries e.g. social media, sports analytics

- Lecture video

- Furht, B., & Villanustre, F. (2016). Introduction to big data. In Big data technologies and applications (pp. 3-11). Springer, Cham.

- Big Data Analytics: What It Is, How It Works, Benefits, And Challenges

-

 

Canvas introduction
post due 01/29/2022

 

CLOs 1-5

2

01/31/22-

02/06/22

Big Data Visualization

- Overview and applications

 

-Lecture video

- Ware, C. (2020). Information Visualization: Perception for Design (4th edition). Morgan Kaufmann.

 

Project teams assigned

Canvas discussion post due 02/05/2022
CLOs 1, 3

3

02/07/22-
02/13/22

Big Database Management

-Data lakes/data fabric/cloud

Lecture video

-Introduction to data lakes

-Introduction to data fabric

-Introduction to cloud computing

-Miloslavskaya, Natalia, & Tolstoy, Alexander. (2016). Big Data, Fast Data and Data Lake Concepts. Procedia Computer Science, 88, 300-305.

Assignment 1 and
report due
02/12/2022

 

CLOs 1-5

4c

02/14/22-02/20/22

Big Database Management

-NoSQL datases

Lecture video

 

-Atzeni, Paolo, Bugiotti, Francesca, Cabibbo, Luca, & Torlone, Riccardo. (2020). Data modeling in the NoSQL world. Computer Standards & Interfaces, 67, 103149.

-Ben Hamadou, Hamdi, Ghozzi, Faiza, Péninou, André, & Teste, Olivier. (2019). Schema-independent querying for heterogeneous collections in NoSQL document stores. Information Systems, 85, 48-67.

Canvas discussion
due  02/19/2022

 

CLOs 1-5

5

02/21/22-02/27/22

MapReduce and Distributed Computing

Hadoop & HDFS

Lecture video

-Bengfort, B., & Kim, J. (2016). Data analytics with Hadoop: An introduction for data scientists. O’Reilly Media.

-Li, Feng, Ooi, Beng Chin, Özsu, M. Tamer, & Wu, Sai. (2014). Distributed data management using MapReduce. ACM Computing Surveys (CSUR), 46(3), 1-42.

Canvas discussion
post due
02/26/2022

 

CLOs 1-5

6

02/28/22-03/06/22

- Hadoop Ecosystem

Spark/Scala

Lecture video

-Introduction to Scala

- Karau, Holden, Konwinski, Andy, Wendell, Patrick and Zaharia, Matei. Learning Spark. : O'Reilly, 2015.

Project proposal
due 03/05/2022

CLOs 1-5

7

03/07/22-03/13/22

Big Data Mining

-Big data mining theories and applications

-Using Weka for big data mining

Lecture

-Wu, X., Zhu, X., Wu, G.-Q., & Ding, W. (2014). Data mining with big data. IEEE Transactions on Knowledge and Data Engineering, 26(1), 97-107.

- Witten, I. H., Frank, E.,, Hall, M. A. (2011). Data Mining: Practical Machine Learning Tools and Techniques. Amsterdam: Morgan Kaufmann. ISBN: 978-0-12-374856-0

(Recommended, please access ebook from library)

 

 

 

8

03/14/22-03/20/22

-Big Data Platforms

AWS and Google Cloud

Migrating to the cloud

Serverless, scaling, and automation

Lecture

-Google cloud tutorial

-AWS Educate

- Venema, W. (2020). Building Serverless Applications with Google Cloud Run: A Real-World Guide to Building Production-Ready Services. Opgehaal van https://books.google.com/books?id=Q_E8zQEACAAJ

(Recommended, please access ebook from library)

Assignment 2
due 03/19/2022

 

CLOs 1-5

9

03/21/22-03/27/22

Introduction to Splunk

Lecture

 

10

03/28/22-04/03/22

Splunk – Advanced features

Lecture

Assignment 3
due 04/03/2022

CLOs 1,2

11

04/04/22-04/10/22

Machine Learning

  • Traditional methods, Neural Networks
  • Browser-based and device-based models with Pytorch
  • Data pipelines with PyTorch data services

Lecture

 

12

04/11/22-04/17/22

Machine Learning Part II

Lecture

Project
Milestone II
Due

CLOs 1-5

13

04/18/22-04/24/22

Advanced PyTorch

Advanced deployment scenarios

Lecture

 

14

04/25/22-05/01/22

Big Data Management Case Studies: Opportunities & Challenges

Lecture

  • Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z., Mahmoud Ali, W. K., Alam, M., … Gani, A. (2014). Big Data: Survey, Technologies, Opportunities, and Challenges. The Scientific World Journal, 2014, 712826. doi:10.1155/2014/712826

Assignment 4
due 04/23/2022

 

CLOs 1-5

15

05/02/22-05/08/22

Big Data Governance and Security

Lecture

  • Morabito, V. (2015). Big Data Governance. In V. Morabito (Red), Big Data and Analytics: Strategic and Organizational Impacts (bll 83–104). doi:10.1007/978-3-319-10665-6_5
  • Kacha, L., & Zitouni, A. (2018). An Overview on Data Security in Cloud Computing. ArXiv, abs/1812.09053.

Case Study Due

 

CLOs 1-5

16

05/09/22-05/16/22

Project Presentation

 

Final Project
Report and
Demo Due

CLOs 1-5

Course Workload Expectations

Success in this course is based on the expectation that students will spend, for each unit of credit, a minimum of forty-five hours over the length of the course (normally 3 hours per unit per week with 1 of the hours used for lecture) for instruction or preparation/studying or course related activities including but not limited to internships, labs, clinical practica. Other course structures will have equivalent workload expectations as described in the syllabus.

Instructional time may include but is not limited to:
Working on posted modules or lessons prepared by the instructor; discussion forum interactions with the instructor and/or other students; making presentations and getting feedback from the instructor; attending office hours or other synchronous sessions with the instructor.

Student time outside of class:
In any seven-day period, a student is expected to be academically engaged through submitting an academic assignment; taking an exam or an interactive tutorial, or computer-assisted instruction; building websites, blogs, databases, social media presentations; attending a study group;contributing to an academic online discussion; writing papers; reading articles; conducting research; engaging in small group work.

Course Prerequisites

ISDA 140 has no prequisite requirements.

Course Learning Outcomes

Upon successful completion of the course, students will be able to:

  1. Describe and explain the main technologies and trends in big data work, specifically data visualization, large-scale database management, map-reduce paradigm, big data mining, and big data platforms.
  2. Demonstrate proficiency in using Splunk/Scala/Hadoop to solve big data analytical problems.
  3. Interpret and communicate big data analysis and visualization results appropriately, effectively and accurately.
  4. Discuss, articulate and compare various big data managerial issues (e.g., data governance and privacy) in different organizational settings.
  5. Make informed and strategic decisions with the presence of large-scale data sets.

    SLOs & PLOs

    ISDA 140 supports:

  1. Information Science and Data Analytics SLO 2: Identify and apply appropriate data management strategies, carry out relevant analyses, interpret and apply the results to inform understanding and solve specific problems in context; and communicate analysis and visualization results appropriately to a diverse non-technical audience.
  2. Information Science and Data Analytics SLO 3: Demonstrate proficiency in the computing skills needed to support information and data analysis, including prototype building and scripting for working with structured data (data that is clearly defined and easily searchable) and unstructured data (data that is not easily searchable such as email, audio, video, and social media postings).
  1. SLO 2 and SLO 3 supports the following Information and Data Science Program Learning Outcomes (PLOs):

  2. PLO 1: Apply information and data science concepts and methods by thinking critically and creatively to conceptualize and solve real world problems.
  3. PLO 2: Demonstrate an understanding of the data lifecycle, including data curation and stewardship. distributed computing, and the data pipeline eco system.

Textbooks

No Textbooks For This Course.

Grading Scale

The standard SJSU School of Information Grading Scale is utilized for all iSchool courses:

97 to 100 A
94 to 96 A minus
91 to 93 B plus
88 to 90 B
85 to 87 B minus
82 to 84 C plus
79 to 81 C
76 to 78 C minus
73 to 75 D plus
70 to 72 D
67 to 69 D minus
Below 67 F

 

In order to provide consistent guidelines for assessment for graduate level work in the School, these terms are applied to letter grades:

  • C represents Adequate work; a grade of "C" counts for credit for the course;
  • B represents Good work; a grade of "B" clearly meets the standards for graduate level work or undergraduate (for BS-ISDA);
    For core courses in the MLIS program (not MARA, Informatics, BS-ISDA) — INFO 200, INFO 202, INFO 204 — the iSchool requires that students earn a B in the course. If the grade is less than B (B- or lower) after the first attempt you will be placed on administrative probation. You must repeat the class if you wish to stay in the program. If - on the second attempt - you do not pass the class with a grade of B or better (not B- but B) you will be disqualified.
  • A represents Exceptional work; a grade of "A" will be assigned for outstanding work only.

Graduate Students are advised that it is their responsibility to maintain a 3.0 Grade Point Average (GPA). Undergraduates must maintain a 2.0 Grade Point Average (GPA).

University Policies

Per University Policy S16-9, university-wide policy information relevant to all courses, such as academic integrity, accommodations, etc. will be available on Office of Graduate and Undergraduate Programs' Syllabus Information web page at: https://www.sjsu.edu/curriculum/courses/syllabus-info.php. Make sure to visit this page, review and be familiar with these university policies and resources.

In order to request an accommodation in a class please contact the Accessible Education Center and register via the MyAEC portal.

icon showing link leads to the PDF file viewer known as Acrobat Reader Download Adobe Acrobat Reader to access PDF files.

More accessibility resources.