Analytical Methodologies for Big Data (KSE526)
Jae-Gil Lee (Office: E2 2203, Phone: x 1617, E-mail: jaegil(at)kaist.ac.kr)
- Time and Place:
10:30 a.m. ~ 12:00 p.m. Monday and Wednesday, E2 1122
- Facebook Group: 2018 KSE526
- Course Summary:
This course discusses basic analytical methodologies for big data, which are vital to data scientists. Big data analytics calls for extending existing algorithms so that they can support big data. In this course, the instructor will first teach MapReduce, which is the representative framework of processing big data, and then the methodologies of extending data mining algorithms into MapReduce. The students will also learn how to implement those algorithms using Apache Hadoop. As a result, the students will achieve the basic capabilities needed to design the algorithms of big data analytics.
- Data Mining and Knowledge Discovery (KSE525) or equivalent course
- Java programming skills: a programming intensive course
- Main textbook: Tom White, Hadoop: The Definitive Guide, 4th edition, O'Reilly, 2015.
- Main textbook: Mahmoud Parsian, Data Algorithms: Recipes for Scaling Up with Hadoop and Spark, O'Reilly, 2015.
- Auxiliary textbook: Donald Miner and Adam Shook, MapReduce Design Patterns: Building Effective Algorithms and Analytics for Hadoop and Other Systems, O'Reilly, 2013.
- Course Requirements:
- Three programming assignments
- Grading Policy:
- Midterm exam: 30% (class time on October 17)
- Final exam: 30% (class time on December 12)
- Programming assignments: 30% (latency penalty: 20%)
- Class activity (quizzes and/or sudden questions): 10%
- Class participation: optional (deduct 1 point for each absence after 3 absences)
- Teaching Materials:
- Introduction (August 27, 29): download
- MapReduce Basics (September 3, 5): download
- Hadoop: The Definitive Guide, Ch. 1, 2 (September 10, 12): download
- Hadoop: The Definitive Guide, Ch. 3, 4, 5 (September 17, 19, October 1): part1 download part2 download
- Hadoop: The Definitive Guide, Ch. 6, 7 (October 8, 10): part1 download practice part2 download
- Hadoop: The Definitive Guide, Ch. 8 (October 22): download
- Hadoop: The Definitive Guide, Ch. 10 & Microsoft Azure (October 24): practice
- MapReduce Data Mining Algorithms (October 29, 31, November 5): part1 download part2 download
- MapReduce Graph Algorithms (November 7, 12): part1 download part2 download
- MapReduce Design Patterns (November 14, 19): download
- Deep Learning and Big Data Part 1 (November 21, 26): download
- Deep Learning and Big Data Part 2 (December 3, 5): download
- Programming Assignments:
- Video Lectures:
The students who enrolled in this course can watch the video lectures being recorded, which are available here.
- Teaching Assistants:
- Hwanjun Song (E-mail: songhwanjun(at)kaist.ac.kr)
- Sejin Kim (E-mail: ksj614(at)kaist.ac.kr)
- Minseok Kim (E-mail: minseokkim(at)kaist.ac.kr)
- Syllabus: download