Data Mining and Visualization (DMV)

Description

This course deals with the discovery of structure in data. Indeed, in many cases the process of data analysis starts with data delivered "as is", without much additional information, and is not purposed for a well defined task such as "separate class A from class B". The sheer volume of most datasets prevents manual data analysis, so data exploration approaches must be used. This course will first introduce the basics of data visualization, presenting the main families of visualization and the "dos and donts" of a good data visualization. Then the course will be dedicated to a thorough presentation of the pattern mining field of Data Mining. The goal of pattern mining is to find regularities in data, allowing to find large and small structures that organize the data. These approaches explore a huge combinatorial space, hence relying on elegant algorithms in order for computation to complete in a reasonable amount of time. They also have to cope with the underlying data, borrowing on the one side techniques from databases and system for efficient data access, and on the other side from statistics and information theory to select the most relevant patterns to show.

Keywords

Pattern mining, frequent itemset mining, frequent subgraph mining, patternset mining, generic and declarative data mining, interactive data mining, data visualization

Pre-requisites

Master of Computer-Science level in Algorithmics, interest for data exploration.

Contents

  • Introduction to Knowledge Discovery in Data
  • Data visualization
  • Basic pattern mining : frequent patterns in tables, sequences, graphs
  • Information-theory based pattern set mining
  • Pattern mining in a supervised setting: discriminant pattern mining, subgroup discovery
  • Declarative and interactive pattern mining

Learning outcomes

  • Understand the main challenges of Knowledge Discovery in Data, and the process used to face them
  • Understand the main data visualization techniques, and when to apply each of them
  • Understand in which cases pattern mining can help discover knowledge in data, how the algorithms operate, and how to interpret the patterns found
  • Understand the notion of "pattern set", and how it captures the main structure of the data
  • Understand how subgroup discovery can allow to discover patterns in numerical data, and the interpretation of its results

Teaching staff

Alexandre Termier (responsible), Peggy Cellier, Fernando Argelaguet