Spark Cheat Sheet

Spark test wordcount:

Dynamic Resource Allocation;

Integration details about ElasticSearch and Spark (RDD, Spark SQL, and Streaming) can be found at

Yes, you can partition out your JDBC DataFrame creation efforts as described in 
spark.sql.shuffle.partitions is the property you can modify (defaults to 200) for WHEN you know better (or are experimenting) on how many reducers you want Spark SQL to use on join and aggregation operations as referenced in and 

A great write-up on integrating Spark Streaming to consume data from NiFi via Remote Processor Groups; 

(databricks blog post) Deep Dive into Spark SQL's Catalyst Optimizer;

Francois' blog post about Adaptive Query Execution (i.e. intelligently selecting # of reducers) and other performance concepts;

Cloudera blog on UDF and UDAF development;

Good stuff from Ranga Reddy shows that the pre 3.0 hint was only for broadcast (and didn't require it to actually happy (sounds like a good "hint" to me)) and 3.0 onward we get 4 types of join hints

good article about joining;