This blog post introduces the three streaming frameworks that are bundled in the Hortonworks Data Platform (HDP) – Apache Storm, Spark Streaming, and Kafka Streams – and focuses on the topology supervision features offered to the topologies (aka workflows) running with, or within, these particular frameworks. This post does not attempt to fully described each framework nor does it provide examples of their usage via working code. The goal is to develop an understanding of what, if any, services are available to help with lifecyle events, scalability, management, and monitoring.

Table of Contents

The Frameworks

...

Kafka Streams are tightly coupled with Kafka the Kafka’s messaging platform; especially the streaming input data. Kafka Streams is intentionally designed to fit into any Java or Scala application which gives it plenty of flexibility, but offers no inherent lifecycle, scaling, management, or monitoring features.

...

Apache Spark’s streaming frameworks allow for a variety of input and output data technologies. Spark Streaming apps are themselves Spark applications who, in a Hadoop cluster at least, run under YARN which provides good answers to coverage for many of the lifecycle and management features. The Spark framework addresses a number of the scaling and monitoring needs.

...

	Kafka Streams	Spark Streaming	Apache Storm
Lifecycle Events
Start	RYO	Submitter	Submitter
Stop	RYO	Patterns available	Available
Pause/Restart	RYO	Not available	Available
Scalability
Initial Parallelization	RYO	Parameterized	Parameterized
Runtime Elasticity	RYO	Auto-scaling based on properties for min/max number of executors	No auto-scaling but each component can be scaled +/- individualyindividually
Management
Resource Availability	No inherent resources	Managed	Managed
Failure Restart	RYO	Automatic	Automatic
Monitoring
Topology UI	RYO	Combined with all Spark jobs	Centralized
Integration	RYO	JMX	JMX

...

Let me start by pointing out that it looks like Kafka Streams is “all bad”… it surely isn’tbad”, but that’s not the case. It is build around the concept of writing and deploying standard applications and consciously does not want be part of a runtime framework. Due to that and the focus of this blog post, it should be obvious why it scored so low on these features. The RYO (Roll Your Own – aka “custom”) callouts I gave are likely a badge of honor to the folks who are bringing us this framework.

Kafka Streams also has a lot of early interest and I surely would not discount it for a second. The biggest issue for those teams who stand up a decent sized Hadoop/Spark cluster is that you don’t get to take advantage of all those nodes to run a your Kafka Streams app apps on. You’ll need to size out what you’ll need is needed for each application and ensure that needed resources are available to run your app apps on.

On the other end of the spectrum, one would think that will an almost perfect green checkmark score on the features identified that it Storm would be a no-brainer. Storm is the grandpa of the streaming engines and its event-level isolation provide something the other microbatch frameworks can’t do. This maturity shines through in all of these supervision features, but on the other hand it is the least “exciting” of the frameworks for folks starting their streaming journey in 2019. If you need to get something into production asap and you just need to know it works – all day long and every day… then go with Storm!

This brings me to my personal recommendation (and of Spark Streaming. Note that this comes from a guy who really does love Apache Storm and values the simplicity & flexibility of Kafka Streams) of Spark Streaming. There is simply too much excitement & focus around Spark in general and the ability to transition applications between batch and streaming paradigms with minimal coding make it a no-brainerclose the case. It is still maturing, but its alignment with YARN help it score high on many of these supervision-oriented features.

Versions Compared

Old Version 5

New Version 6

Key

The Frameworks

Page Comparison

Versions Compared

Old Version 5

New Version 6

Key

The Frameworks