Data Mining and Knowledge Discovery (KSE525)
Jae-Gil Lee (Office: E2 2203, Phone: x 1617, E-mail: jaegil(at)kaist.ac.kr)
- Time and Place:
10:30 a.m. ~ 12:00 p.m. Monday and Wednesday, E2 1501
- Facebook Group: 2018 KSE525
- Course Summary:
Data mining plays an important role in discovering useful knowledge from huge amounts of data. This course teaches the basic concepts and methods of data mining. More specifically, frequent patterns and associations; classification and prediction; and cluster analysis will be covered. The main goal of this course is to give the students a broad knowledge of various data mining methods without confining to a specific domain. This course is intended as a prerequisite for advanced data mining courses and thus is suitable for both undergraduate and graduate students. The students will understand how data mining can be exploited for discovering useful knowledge.
- Main textbook: Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011.
- Auxiliary textbook: Luis Torgo, Data Mining with R: Learning with Case Studies, Chapman and Hall/CRC, 2010.
- Auxiliary textbook: Norman S. Matloff, The Art of R Programming: A Tour of Statistical Software Design, No Starch Press, 2011.
- Auxiliary textbook: John D. Kelleher, Brian Mac Namee, and Aoife D'Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015.
- Grading Policy:
- Midterm exam: 30%
- Final exam: 40%
- Assignments: 20% (latency penalty: 20%)
- Project: 10%
- Class participation: optional (deduct 1 point for each absence after 3 absences)
- Teaching Materials:
- Introduction (February 26, 28): download
- Getting to Know Your Data (March 5, 7): download
- Data Preprocessing (March 12, 14): download
- Association Analysis (Basic) (March 19, 21, 26): download
- Association Analysis (Advanced) (March 28): TBD
- Basics of R Programming (April 2, 4): TBD
- Classification (Decision Tree) (April 9, 11, 23): TBD
- Classification (Bayes, Lazy) (April 25): TBD
- Classification (SVM, Ensemble) (April 30, May 2, 9): TBD
- Clustering (Basic 1) (May 14, 16): TBD
- Clustering (Basic 2) (May 21, 23, 28): TBD
- Case Studies Using R (May 30): TBD
- Conclusion (June 4): TBD
- Additional Materials: full list
- Online Lectures:
The students who enrolled in this course can watch the video lectures being recorded, which are available here.
- Assignment #1 (due: 10:30 a.m. on March 28)
- Assignment #2: TBD (will be released on March 28)
- Assignment #3: TBD (will be released on April 25)
- Assignment #4: TBD (will be released on May 9)
- Project: TBD (will be released on May 23)
- Teaching Assistants:
- Hwanjun Song (E-mail: songhwanjun(at)kaist.ac.kr)
- Susik Yoon (E-mail: susikyoon(at)kaist.ac.kr)
- Sejin Kim (E-mail: ksj614(at)kaist.ac.kr)
- Syllabus: download