Statistical Data Analytics: Foundations for Data Mining, Informatics and Knowledge Discovery

Author : Walter W. Piegorsch
Price : Rs 999.00
ISBN 13 : 9788126567324
ISBN 10 : 8126567325
Pages : 488
Type : Paperbound

Statistical Data Analytics: Foundations for Data Mining, Informatics and Knowledge Discovery


The use and application of data mining and ˜big data' increasingly take center stage in our modern, data-saturated society, due to explosions in automated data collection, advances in computing power, online and social media development, and interactive / linkable software. This book provides a coherent, technical introduction to the statistical analytics needed to support decision- and policy-making based on these amassing data. It prepares students and interested practitioners to explore and analyze large databases in a variety of application areas, including business & finance, public health, engineering, environmental & climate science, astronomy & astrophysics, and many others. Each featured method is illustrated with real-world examples and links are provided to open-source R computer code for implementation of the methodology. The text features extensive exercises, making it suitable for use as a course sourcebook.




Part I Background: Introductory Statistical Analytics

1 Data analytics and data mining

1.1 Knowledge discovery: finding structure in data  

1.2 Data quality versus data quantity  

1.3 Statistical modeling versus statistical description  


2 Basic probability and statistical distributions

2.1 Concepts in probability

2.2 Multiple random variables

2.3 Univariate families of distributions


3 Data manipulation

3.1 Random sampling

3.2 Data types

3.3 Data summarization

3.4 Data diagnostics and data transformation

3.5 Simple smoothing techniques


4 Data visualization and statistical graphics

4.1 Univariate visualization

4.2 Bivariate and multivariate visualization


5 Statistical inference

5.1 Parameters and likelihood

5.2 Point estimation

5.3 Interval estimation

5.4 Testing hypotheses

5.5 Multiple inferences


Part II Statistical Learning and Data Analytics

6 Techniques for supervised learning: simple linear regression

6.1 What is "supervised learning?"

6.2 Simple linear regression

6.3 Regression diagnostics

6.4 Weighted least squares (WLS) regression

6.5 Correlation analysis


7 Techniques for supervised learning: multiple linear regression

7.1 Multiple linear regression

7.2 Polynomial regression

7.3 Feature selection

7.4 Alternative regression methods

7.5 Qualitative predictors: ANOVA models


8 Supervised learning: generalized linear models

8.1 Extending the linear regression model

8.2 Technical details for GLiMs

8.3 Selected forms of GLiMs


9 Supervised learning: classification

9.1 Binary classification via logistic regression

9.2 Linear discriminant analysis (LDA)

9.3 k-Nearest neighbor classifiers

9.4 Tree-based methods

9.5 Support vector machines


10 Techniques for unsupervised learning: dimension reduction

10.1 Unsupervised versus supervised learning

10.2 Principal component analysis

10.3 Exploratory factor analysis

10.4 Canonical correlation analysis


11 Techniques for unsupervised learning: clustering and association

11.1 Cluster analysis

11.2 Association rules/market basket analysis


A Matrix manipulation

A.1 Vectors and matrices

A.2 Matrix algebra

A.3 Matrix inversion

A.4 Quadratic forms  

A.5 Eigenvalues and eigenvectors

A.6 Matrix factorizations

A.7 Statistics via matrix operations


B Brief introduction to R

B.1 Data entry and manipulation

B.2 A turbo-charged calculator

B.3 R functions

B.4 R packages




Primary Market Undergraduates and graduate students studying statistical techniques for visualizing, summarizing, and analyzing large collections of data in modern science and society.


Secondary Market Practitioners in ‘big data' applications e.g. finance, engineering, medicine, computing, requiring basic training or refresher(s) in statistical methods for knowledge discovery.


Walter W. Piegorsch is a Professor of Mathematics at the University of Arizona and the Director of Statistical Research & Education at its BIO5 Institute for Collaborative Bioresearch. Professor Piegorsch is an experienced and highly regarded author and editor. He has co-authored one previous book for Wiley, and is a founding and current co-Editor for Wiley's StatsRef: Statistics Reference Online, a comprehensive online reference resource which covers the fundamentals and applications of statistical theory, methods and practice. He has also been on the editorial board of many scientific journals, and served as joint-Editor of the Journal of the American Statistical Association (Theory and Methods Section).