/
Spark Cheat Sheet

Spark Cheat Sheet

Spark test wordcount: https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.4/bk_spark-component-guide/content/run-spark2-sample-apps.html

Dynamic Resource Allocation; https://community.hortonworks.com/content/supportkb/49510/how-to-enable-dynamic-resource-allocation-in-spark.html

Integration details about ElasticSearch and Spark (RDD, Spark SQL, and Streaming) can be found at https://www.elastic.co/guide/en/elasticsearch/hadoop/current/spark.html.

Yes, you can partition out your JDBC DataFrame creation efforts as described in https://stackoverflow.com/questions/41085238/what-is-the-meaning-of-partitioncolumn-lowerbound-upperbound-numpartitions-pa 
spark.sql.shuffle.partitions is the property you can modify (defaults to 200) for WHEN you know better (or are experimenting) on how many reducers you want Spark SQL to use on join and aggregation operations as referenced in https://spark.apache.org/docs/latest/sql-performance-tuning.html and https://stackoverflow.com/questions/33297689/number-reduce-tasks-spark 

A great write-up on integrating Spark Streaming to consume data from NiFi via Remote Processor Groups; https://community.hortonworks.com/articles/12708/nifi-feeding-data-to-spark-streaming.html 

(databricks blog post) Deep Dive into Spark SQL's Catalyst Optimizer; https://databricks.com/blog/2015/04/13/deep-dive-into-spark-sqls-catalyst-optimizer.html

Francois' blog post about Adaptive Query Execution (i.e. intelligently selecting # of reducers) and other performance concepts; https://blog.cloudera.com/how-does-apache-spark-3-0-increase-the-performance-of-your-sql-workloads/

Cloudera blog on UDF and UDAF development; https://blog.cloudera.com/working-with-udfs-in-apache-spark/

Good stuff from Ranga Reddy

https://spark.apache.org/docs/3.0.0/sql-ref-syntax-qry-select-hints.html shows that the pre 3.0 hint was only for broadcast (and didn't require it to actually happy (sounds like a good "hint" to me)) and 3.0 onward we get 4 types of join hints

good article about joining; https://towardsdatascience.com/the-art-of-joining-in-spark-dcbd33d693c