MATH 5800-031 (MATH5671): Financial Data Mining and Big Data Analytics


Financial industry specifically, and most of companies in general have been accumulating data for years and mine data to drive their financial decisions. Data are extremely large nowadays, and keep growing exponentially in the future, and become prohibitive to traditional machine learning and data mining methods.

This course introduces standard machine learning and data mining algorithms with financial applications, and prepares students to work with large sized data sets. In the first part, students learn data pre-processing, sampling, statistical tests, standard machine learning and data mining algorithms, such as logistic regression, decision trees, neural networks, SVM, Naïve Bayes, KNN, K-means clustering, agglomerative clustering, association rules, content-based filtering, collaborative filtering, to unlock values in financial data. Deep Learning is overviewed. Code is developed using MATLAB, R, and Python in the assignments. In the second part, relational database systems are introduced using MySQL. Students learn Big Data tools such as Map Reduce, Spark, Hive, and Pig, and develop machine learning models using MLlib library.

The course uses external educational materials such as books, code, videos, and websites to support teaching, and accelerate student learning. Students are expected to spend significant amount of time outside the classroom to digest the assigned materials.

This 3-credit course is allowed as an elective for Applied Financial Math and for Actuarial Science. It however does not assume previous knowledge of finance.


Textbook is not required. Lecture slides/handouts are provided by the instructor.

Reference Books

  1. P.-N. Tan, M. Steinbach, V. Kumar, Introduction to Data Mining,
  2. C. M. Bishop, Pattern Recognition and Machine Learning, Springer Science Business Media, LLC: 2006.
  3. R. O. Duda, P. E. Hart, and D. G. Stork, Pattern classification 2nd, (Wiley Interscience: 2001).
  4. S. Theodoridis, K. Koutroumbas, Pattern Recognition, 4th Edition
  5. J. Han, M. Kamber, J. Pei, Data Mining : Concepts and Techniques : Concepts and Techniques
  6. D. T. Larose, C. D. Larose, Discovering Knowledge in Data: An Introduction to Data Mining
  7. G. Dougherty, Pattern Recognition and Classification an Introduction.
  8. B. Kovalerchuk, E. Vityaev, Data Mining in Finance: Advances in Relational and Hybrid Methods (The Springer International Series in Engineering and Computer Science)
  9. S. Chakrabarti, Data mining know it all