Big Data systems distribute data sets across cultures of machine, making it to challenge efficiently query, stream and interpret them. Spark can help. It is a processing system designed specifically for distributed data. It provides easy-to-use interfaces, along with the performance you need for production quality analytics and machine learning. And Spark 2 adds improved programming APIs, better performance, and countless other upgrades. Spark in Action teaches you the theory and skills need to effectively handle batch and streaming data using Spark. You’ll get comfortable with the Spark CLI as you work through a few introductory examples.
Part 1 First Steps
1 Introduction to Apache Spark
2 Spark fundamentals
3 Writing Spark applications
4 The Spark API in depth
Part 2 Meet the Spark Family
5 Sparkling queries with Spark SQL
6 Ingesting data with Spark Streaming
7 Getting smart with MLlib
8 ML: classification and clustering
9 Connecting the dots with GraphX
Part 3 Spark Ops
10 Running Spark
11 Running on a Spark standalone cluster
12 Running on YARN and Mesos
Part 4 Bringing It Together
13 Case study: real-time dashboard
14 Deep learning on Spark with H2O
For experienced programmers with some background in Big Data or machine learning.
Petar Zecevic and Marko Bonaci are seasoned developers heavily involved in Spark community.