Data Mining N Essay

Submitted By QWERT02468
Words: 3142
Pages: 13

Data Mining:
Concepts and Techniques
(3rd ed.)

Jiawei Han, Micheline Kamber, and Jian Pei
University of Illinois at Urbana-Champaign &
Simon Fraser University
©2011 Han, Kamber & Pei. All rights reserved.
Adapted for CSE 347-447, Lecture 1b, Spring 2015

11

Introduction n Why Data Mining?

n

What Is Data Mining?

n

A Multi-Dimensional View of Data Mining

n

What Kind of Data Can Be Mined?

n

What Kinds of Patterns Can Be Mined?

n

What Technologies Are Used?

n

What Kind of Applications Are Targeted?

n

Major Issues in Data Mining

n

A Brief History of Data Mining and Data Mining Society

n

Summary
2

Why Data Mining? n The Explosive Growth of Data: from terabytes to petabytes n Data collection and data availability n Automated data collection tools, database systems, Web, computerized society

n

Major sources of abundant data n Business: Web, e-commerce, transactions, stocks, …

n

Science: Remote sensing, bioinformatics, scientific simulation, …

n

Society and everyone: news, digital cameras, YouTube

n

We are drowning in data, but starving for knowledge!

n

“Necessity is the mother of invention”—Data mining—Automated analysis of massive data sets
3

Evolution of Sciences: New Data Science Era n Before 1600: Empirical science

n

1600-1950s: Theoretical science n n

1950s-1990s: Computational science n n

n

Over the last 50 years, most disciplines have grown a third, computational branch
(e.g. empirical, theoretical, and computational ecology, or physics, or linguistics.)
Computational Science traditionally meant simulation. It grew out of our inability to find closed-form solutions for complex mathematical models.

1990-now: Data science n The flood of data from new scientific instruments and simulations

n

The ability to economically store and manage petabytes of data online

n

The Internet and computing Grid that makes all these archives universally accessible

n

n

n

Each discipline has grown a theoretical component. Theoretical models often motivate experiments and generalize our understanding.

Scientific info. management, acquisition, organization, query, and visualization tasks scale almost linearly with data volumes
Data mining is a major new challenge!

Jim Gray and Alex Szalay, The World Wide Telescope: An Archetype for Online Science,
Comm. ACM, 45(11): 50-54, Nov. 2002

4

Introduction n Why Data Mining?

n

What Is Data Mining?

n

A Multi-Dimensional View of Data Mining

n

What Kind of Data Can Be Mined?

n

What Kinds of Patterns Can Be Mined?

n

What Technologies Are Used?

n

What Kind of Applications Are Targeted?

n

Major Issues in Data Mining

n

A Brief History of Data Mining and Data Mining Society

n

Summary
5

What Is Data Mining? n Data mining (knowledge discovery from data) n Extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data

n

n

Alternative names n n

Data mining: a misnomer?
Knowledge discovery (mining) in databases (KDD), knowledge extraction, data/pattern analysis, data archeology, data dredging, information harvesting, business intelligence, etc.

Watch out: Is everything “data mining”? n Simple search and query processing

n

(Deductive) expert systems
6

Knowledge Discovery (KDD) Process n n

This is a view from typical database systems and data
Pattern Evaluation warehousing communities
Data mining plays an essential role in the knowledge discovery
Data Mining process Task-relevant Data
Data Warehouse

Selection

Data Cleaning
Data Integration
Databases

7

Example: A Web Mining Framework n Web mining usually involves n Data cleaning

n

Data integration from multiple sources

n

Warehousing the data

n

Data cube construction

n

Data selection for data mining

n

Data mining

n

Presentation of the mining results

n

Patterns and