Data Mining, 2ed: Concepts, Models, Methods and Algorithms

Mehmed Kantardzic

ISBN: 9788126570348

552 pages

INR 629


This book reviews state-of-the-art methodologies and techniques for analyzing enormous quantities of raw data in high-dimensional data spaces, to extract new information for decision making. The goal of this book is to provide a single introductory source, organized in a systematic way, in which we could direct the readers in analysis of large data sets, through the explanation of basic concepts, models and methodologies developed in recent decades.

Preface to the Second Edition

Preface to the First Edition


1 Data-Mining Concepts

1.1 Introduction  

1.2 Data-Mining Roots  

1.3 Data-Mining Process  

1.4 Large Data Sets

1.5 Data Warehouses for Data Mining

1.6 Business Aspects of Data Mining: Why a Data-Mining Project Fails

1.7 Organization of This Book

1.8 Review Questions and Problems

1.9 References for Further Study


2 Preparing The Data

2.1 Representation of Raw Data

2.2 Characteristics of Raw Data

2.3 Transformation of Raw Data

2.4 Missing Data

2.5 Time-Dependent Data

2.6 Outlier Analysis

2.7 Review Questions and Problems

2.8 References for Further Study


3 Data Reduction

3.1 Dimensions of Large Data Sets

3.2 Feature Reduction

3.3 Relief Algorithm

3.4 Entropy Measure for Ranking Features

3.5 PCA

3.6 Value Reduction

3.7 Feature Discretization: ChiMerge Technique

3.8 Case Reduction

3.9 Review Questions and Problems

3.10 References for Further Study


4 Learning from Data

4.1 Learning Machine

4.2 SLT

4.3 Types of Learning Methods

4.4 Common Learning Tasks

4.5 SVMs

4.6 kNN: Nearest Neighbor Classifier

4.7 Model Selection versus Generalization

4.8 Model Estimation

4.9 90% Accuracy: Now What?

4.10 Review Questions and Problems

4.11 References for Further Study


5 Statistical Methods

5.1 Statistical Inference

5.2 Assessing Differences in Data Sets

5.3 Bayesian Inference

5.4 Predictive Regression


5.6 Logistic Regression

5.7 Log-Linear Models

5.8 LDA

5.9 Review Questions and Problems

5.10 References for Further Study


6 Decision Trees and Decision Rules

6.1 Decision Trees

6.2 C4.5 Algorithm: Generating a Decision Tree

6.3 Unknown Attribute Values

6.4 Pruning Decision Trees

6.5 C4.5 Algorithm: Generating Decision Rules

6.6 CART Algorithm & Gini Index

6.7 Limitations of Decision Trees and Decision Rules

6.8 Review Questions and Problems

6.9 References for Further Study


7 Artificial Neural Networks

7.1 Model of an Artificial Neuron

7.2 Architectures of ANNs

7.3 Learning Process

7.4 Learning Tasks Using ANNs

7.5 Multilayer Perceptrons (MLPs)

7.6 Competitive Networks and Competitive Learning

7.7 SOMs

7.8 Review Questions and Problems

7.9 References for Further Study


8 Ensemble Learning

8.1 Ensemble-Learning Methodologies

8.2 Combination Schemes for Multiple Learners

8.3 Bagging and Boosting

8.4 AdaBoost

8.5 Review Questions and Problems

8.6 References for Further Study


9 Cluster Analysis

9.1 Clustering Concepts

9.2 Similarity Measures

9.3 Agglomerative Hierarchical Clustering

9.4 Partitional Clustering

9.5 Incremental Clustering

9.6 DBSCAN Algorithm

9.7 BIRCH Algorithm

9.8 Clustering Validation

9.9 Review Questions and Problems

9.10 References for Further Study


10 Association Rules

10.1 Market-Basket Analysis

10.2 Algorithm Apriori

10.3 From Frequent Item sets to Association Rules

10.4 Improving the Efficiency of the Apriori Algorithm

10.5 FP Growth Method

10.6 Associative-Classification Method

10.7 Multidimensional Association--Rules Mining

10.8 Review Questions and Problems

10.9 References for Further Study


11 Web Mining and Text Mining

11.1 Web Mining

11.2 Web Content, Structure and Usage Mining

11.3 HITS and LOGSOM Algorithms

11.4 Mining Path--Traversal Patterns

11.5 PageRank Algorithm

11.6 Text Mining

11.7 Latent Semantic Analysis (LSA)

11.8 Review Questions and Problems

11.9 References for Further Study


12 Advances in Data Mining

12.1 Graph Mining

12.2 Temporal Data Mining

12.3 Spatial Data Mining (SDM)

12.4 Distributed Data Mining (DDM)

12.5 Correlation Does Not Imply Causality

12.6 Privacy, Security, and Legal Aspects of Data Mining

12.7 Review Questions and Problems

12.8 References for Further Study


13 Genetic Algorithms

13.1 Fundamentals of GAs

13.2 Optimization Using GAs

13.3 A Simple Illustration of a GA

13.4 Schemata

13.5 TSP

13.6 Machine Learning Using GAs

13.7 GAs for Clustering

13.8 Review Questions and Problems

13.9 References for Further Study


14 Fuzzy Sets and Fuzzy Logic

14.1 Fuzzy Sets

14.2 Fuzzy-Set Operations

14.3 Extension Principle and Fuzzy Relations

14.4 Fuzzy Logic and Fuzzy Inference Systems

14.5 Multifactorial Evaluation

14.6 Extracting Fuzzy Models from Data

14.7 Data Mining and Fuzzy Sets

14.8 Review Questions and Problems

14.9 References for Further Study


15 Visualization Methods

15.1 Perception and Visualization

15.2 Scientific Visualization and Information Visualization

15.3 Parallel Coordinates

15.4 Radial Visualization

15.5 Visualization Using Self-Organizing Maps (SOMs)

15.6 Visualization Systems for Data Mining

15.7 Review Questions and Problems

15.8 References for Further Study  


Appendix A

A.1 Data-Mining Journals

A.2 Data-Mining Conferences

A.3 Data-Mining Forums / Blogs

A.4 Data Sets

A.5 Commercially and Publicly Available Tools

A.6 Web Site Links


Appendix B: Data-Mining Applications

B.1 Data Mining for Financial Data Analysis

B.2 Data Mining for the Telecomunications Industry

B.3 Data Mining for the Retail Industry

B.4 Data Mining in Health Care and Biomedical Research

B.5 Data Mining in Science and Engineering

B.6 Pitfalls of Data Mining