Spark in Action

Peter Zecevic, Marko Bonaci

ISBN: 9789351199489

472 pages

INR 849


Big Data systems distribute data sets across cultures of machine, making it to challenge efficiently query, stream and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production quality analytics and machine learning. And Spark 2 adds improved programming APIs, better performance, and countless other upgrades. Spark in Action teaches you the theory and skills need to effectively handle batch and streaming data using Spark. You’ll get comfortable with the Spark CLI as you work through a few introductory examples.

Part 1 First Steps

1 Introduction to Apache Spark

2 Spark fundamentals

3 Writing Spark applications

4 The Spark API in depth


Part 2 Meet the Spark Family

5 Sparkling queries with Spark SQL

6 Ingesting data with Spark Streaming

7 Getting smart with MLlib

8 ML: classification and clustering

9 Connecting the dots with GraphX


Part 3 Spark Ops

10 Running Spark

11 Running on a Spark standalone cluster

12 Running on YARN and Mesos


Part 4 Bringing It Together

13 Case study: real-time dashboard

14 Deep learning on Spark with H2O