内容推荐 在构建分析工具以快速获得洞察力之前,你首先需要知道如何处理实时数据。熟悉Apache Spark的开发人员通过这本实用指南,可以学习如何将该内存框架用于流数据处理。你会发现Spark(如何让你用与编写批处理作业几乎相同的方式编写流作业。 两位作者Gerard Maas和Farancois Garillot将带你探索Apache Spark的理论基础知识。本书通过两个部分对比了Spark(现在支持的两种流API的差异:原始Spark Streaming库和新的结构化流API。 学习基本的流处理概念并研究不同的流体系结构 通过实例探讨结构化流处理;详细介绍流处理的不同方面。 利用Spark流创建和操作流作业和应用程序;将Spark流与其他Spark API集成。 学习不错Spark流处理技术,包括近似算法和机器学习算法。 将Apache Spark与其他流处理项目进行比较,包括Apache Storm、Apache Flink和Apache Kafka Strearns。 目录 Foreword Preface Part Ⅰ.Fundamentals of Stream Processing with Apache Spark 1.Introducing Stream Processing What Is Stream Processing? Batch Versus Stream Processing The Notion of Time in Stream Processing The Factor of Uncertainty Some Examples of Stream Processing Scaling Up Data Processing MapReduce The Lesson Learned: Scalability and Fault Tolerance Distributed Stream Processing Stateful Stream Processing in a Distributed System Introducing Apache Spark The First Wave: Functional APIs The Second Wave: SQL A Unified Engine Spark Components Spark Streaming Structured Streaming Where Next? 2.Stream-Processing Model Sources and Sinks Immutable Streams Defined from One Another Transformations and Aggregations Window Aggregations Tumbling Windows Sliding Windows Stateless and Stateful Processing Stateful Streams An Example: Local Stateful Computation in Scala A Stateless Definition of the Fibonacci Sequence as a Stream Transformation Stateless or Stateful Streaming The Effect of Time Computing on Timestamped Events Timestamps as the Provider of the Notion of Time Event Time Versus Processing Time Computing with a Watermark Summary 3.Streaming Architectures Components of a Data Platform Architectural Models The Use of a Batch-Processing Component in a Streaming Application Referential Streaming Architectures The Lambda Architecture The Kappa Architecture Streaming Versus Batch Algorithms Streaming Algorithms Are Sometimes Completely Different in Nature …… |