Data Analytics

Description

The goal of this book is to provide a smooth transition from traditional data analytics to recent algorithms for massive data analysis including real-time analytics. It focuses on concepts, principles, and techniques applicable to any technology environment and industry and establishes a baseline that can be enhanced further by additional real-world experience. This book aims to be a ready reckoner to either a novice or a professional working in the field. A whole section is devoted to classical supervised methods of analysis like regression, times series, Bayesian analysis, etc. Recent topics in clustering and data streams analysis is covered later. Emphasis is on newer tools like MapReduce and NoSQL. A comprehensive discussion of real-time analytics is included.

About the Author

Dr. Radha Shankarmani is currently working as Professor and Head at Department of Information Technology, Sardar Patel Institute of Technology, Mumbai. Prof Shankarmani holds a Masters degree in Computer Science and Engineering from NIT, Trichy and Bachelors degree from PSG College of Technology in Electronics and Communication Engineering. She has more than 20 years of teaching experience and 4 years of industry experience where she has held designations such as Programmer, Software Engineer and Manager.

Dr. M. Vijayalakshmi is Professor of Information Technology at VES Institute of Technology Mumbai. Currently she is also the Vice Principal of the college. During her career at VESIT, she has served on syllabus board of Mumbai University for BE of Computer Science and Information Technology departments. She has made several contributions to conferences, national and international in field of Data Mining, Big Data Analytics and has conducted several workshops on data mining related fields. Her areas of research include Databases, Data Mining, Business Intelligence and designing new algorithms for Big Data Analytics.

Table of Contents

Preface
About the Authors
Syllabus
Contents

Chapter 1 Introduction to Big Data
1.1 Introduction
1.2 Big Data Characteristics
1.3 Types of Big Data
1.4 Challenges of Traditional Systems
1.5 Web Data
1.6 Evolution of Analytic Scalability
1.7 When to use OLTP, MPP and Hadoop?
1.8 Grid Computing
1.9 Cloud Computing
1.10 MapReduce
1.11 Fault Tolerance
1.12 Analytic Processes and Tools
1.13 Analysis Versus Reporting
1.14 Statistical Concepts

Chapter 2 Data Analysis
2.1 Introduction
2.2 Data Analysis
2.3 Importance of Data Analysis
2.4 Data Analytics Applications
2.5 Regression Modelling Techniques
2.6 Bayesian Modelling, Inference and Bayesian Networks
2.7 Support Vector Machines and Kernel Methods
2.8 Time Series Analysis
2.9 Rule Induction
2.10 Sequential Cover Algorithm

Chapter 3 Neural Networks
3.1 Biological Neuron
3.2 Learning and Generalization
3.3 Competitive Learning
3.4 Principal Component Analysis and Neural Networks
3.5 Fuzzy Logic

Chapter 4 Mining Data Streams
4.1 Introduction
4.2 Data Stream Management Systems
4.3 Data Stream Mining
4.4 Examples of Data Stream Applications
4.5 Stream Queries
4.6 Issues in Data Stream Query Processing
4.7 Sampling in Data Streams
4.8 Filtering Streams
4.9 Counting Distinct Elements in a Stream
4.10 Estimating Moments
4.11 Querying on Windows − Counting Ones in a Window
4.12 Decaying Windows
4.13 Real-Time Analytics Platform (RTAP)

Chapter 5 Frequent Itemsets and Clustering
5.1 Introduction to Frequent Itemsets
5.2 Market-Basket Model
5.3 Algorithm for Finding Frequent Itemsets
5.4 Handling Larger Datasets in Main Memory
5.5 Limited Pass Algorithms
5.6 Counting Frequent Items in a Stream
5.7 Introduction to Clustering
5.8 Overview of Clustering Techniques
5.9 Hierarchical Clustering
5.10 Partitioning Methods
5.11 The CURE Algorithm
5.12 Clustering High-Dimensional Data
5.13 CLIQUE
5.14 Frequent Pattern-Based Clustering Methods
5.15 Clustering Streams

Chapter 6 Frameworks and Visualization
6.1 Introduction
6.2 Introduction to Hadoop
6.3 What is Hadoop?
6.4 Core Components of Hadoop
6.5 Hadoop Ecosystem
6.6 Physical Architecture
6.7 Hadoop Limitations
6.8 Hive
6.9 MapReduce and The New Software Stack
6.10 MapReduce
6.11 Algorithms Using MapReduce
6.12 What is NoSQL?
6.13 NoSQL Business Drivers
6.14 NoSQL Case Studies
6.15 NoSQL Data Architectural Patterns
6.16 Variations of NoSQL Architectural Patterns
6.17 Using NoSQL to Manage Big Data
6.18 Visualizations

Summary
Review Questions