Data Mining and Knowledge Discovery (KSE525)

Spring 2017


  1. Instructor:
    Jae-Gil Lee (Office: E2 2203, Phone: x 1617, E-mail: jaegil(at)kaist.ac.kr)
  2. Time and Place:
    10:30 a.m. ~ 12:00 p.m. Monday and Wednesday, E2 1501
  3. Facebook Group: 2017 KSE525 
  4. Course Summary:
    Data mining plays an important role in discovering useful knowledge from huge amounts of data. This course teaches the basic concepts and methods of data mining. More specifically, frequent patterns and associations; classification and prediction; and cluster analysis will be covered. The main goal of this course is to give the students a broad knowledge of various data mining methods without confining to a specific domain. This course is intended as a prerequisite for advanced data mining courses and thus is suitable for both undergraduate and graduate students. The students will understand how data mining can be exploited for discovering useful knowledge.
  5. Textbooks:
    • Main textbook: Jiawei Han, Micheline Kamber, and Jian Pei, Data Mining: Concepts and Techniques, 3rd edition, Morgan Kaufmann, 2011.
    • Auxiliary textbook: Luis Torgo, Data Mining with R: Learning with Case Studies, Chapman and Hall/CRC, 2010.
    • Auxiliary textbook: Norman S. Matloff, The Art of R Programming: A Tour of Statistical Software Design, No Starch Press, 2011.
    • Auxiliary textbook: John D. Kelleher, Brian Mac Namee, and Aoife D'Arcy, Fundamentals of Machine Learning for Predictive Data Analytics, MIT Press, 2015.
  6. Grading Policy:
    • Midterm exam: 30%
    • Final exam: 40%
    • Assignments: 20% (latency penalty: 20%)
    • Project: 10%
    • Class participation: optional (deduct 1 point for each absence after 3 absences)
  7. Teaching Materials:
    • Introduction (February 27, March 6): download
    • Getting to Know Your Data (March 8, 13): download
    • Data Preprocessing (March 15, 20): download
    • Association Analysis (Basic) (March 22, 27, 29): download
    • Association Analysis (Advanced) (April 3): download
    • Basics of R Programming (April 5, 10): download
    • Classification (Decision Tree) (April 12, 24, 26): download
    • Classification (Bayes, Lazy) (May 1): download
    • Classification (SVM, Ensemble) (May 8, 10, 15): download
    • Clustering (Basic 1) (May 17, 22): download
    • Clustering (Basic 2) (May 24, 29, 31): download
    • Case Studies Using R (June 5): download
    • Conclusion (June 7): download
  8. Additional Materials: full list
  9. Online Lectures:
    The students who enrolled in this course can watch the video lectures being recorded, which are available here.
  10. Assignments:
  11. Project: view
  12. Teaching Assistants:
    • Sundong Kim (E-mail: sundong.kim(at)kaist.ac.kr)
    • Hwanjun Song (E-mail: songhwanjun(at)kaist.ac.kr)
    • Susik Yoon (E-mail: susikyoon(at)kaist.ac.kr)
  13. Syllabus: download