NiFi Cheat Sheet
Pretty solid NiFi intro blog post >> https://www.freecodecamp.org/news/nifi-surf-on-your-dataflow-4f3343c50aa2/
NiFi (and other HDF/CDF components) SE training deck >> https://docs.google.com/presentation/d/1EuxZeintuYUzAQ83mfTJcAdFfPgP2DxoUl4lLvptueE/edit?usp=sharing (Cloudera-internal)
UpdateAttribute Processor
Implementing If/Then/Else logic >> https://digitalplumber.me/2018/01/15/the-power-of-the-advanced-button-on-nifis-updateattribute-processor/'
Deleting attributes >> https://community.cloudera.com/t5/Support-Questions/I-want-to-delete-attributes-after-pulling-the-data-from/td-p/223510
ETL-ish references
Example of extracting text using ExtractText as well as EvaluateJSON processors >> https://community.hortonworks.com/questions/173829/nifi-extract-text-from-json.html
Slowing Changing Dimensions (SCD) >> https://community.hortonworks.com/articles/48843/slowly-changing-dimensions-on-hadoop-part-1.html
Removing columns, filtering rows, changing field values >> https://community.hortonworks.com/articles/66861/nifi-etl-removing-columns-filtering-rows-changing.html
Data enrichment with lookups
UpdateRecord Processor
Blog about Wait/Notify processors >> http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/
Pausing a particular FlowFile a period of time before being processed again. Some options include:
Use ExecuteScript to penalize the FlowFile
Use UpdateAttribute to set/increment a time and a RouteOnAttribute to see if the time limit has expired
Just slow down the run schedule for the processors
Use the Wait processor with an expiration since it doesn't have a corresponding Notify processor
Links
Load-balancing a list of files with ListSFTP and FetchSFTP >> https://community.hortonworks.com/articles/97773/how-to-retrieve-files-from-a-sftp-server-using-nif.html
1.8+ solution to not require use of RPG >> https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster
Scheduling & invoking a flow as a batch activity
Nifi & Kafka partitioning line up >> https://community.hortonworks.com/content/kbentry/57262/integrating-apache-nifi-and-apache-kafka.html
NiFi Expression Language Cheat Sheet >> https://www.nifi.rocks/documents/nifi-expression-language-cheat-sheet.pdf
Retry Loop Template >> https://community.cloudera.com/t5/Support-Questions/NiFi-Flowfile-retries/td-p/171724
RDBMS integration
Database extract blog posting >> https://www.batchiq.com/database-extract-with-nifi.html
Database extract HCC q&a >> https://community.hortonworks.com/questions/138406/how-to-extract-text-from-generatetablefetcha-and-i.html
Ingesting new DB tables >> https://community.hortonworks.com/articles/108718/ingesting-rdbms-data-as-new-tables-arrive-automagi.html
Custom processor development >> https://community.hortonworks.com/content/kbentry/4318/build-custom-nifi-processor.html
Leveraging external scripts (STDOUT responses become a FlowFile to be leveraged in the rest of the flow definition)
The original ExecuteProcess (point to a script location)
ExecuteStreamCommand; sends flowfile content to the standard input stream of the process and transfers standard output back into flowfile content. So your code should read from stdin and write to stdout
ExecuteSparkInteractive (spark code is set in the processor)
Some newer "experimental" options
InvokeScriptedProcessor; you can use Javascript, Groovy, Jython, Lua, or JRuby to create a Processor implementation (code can be in the processor or point to a file)
ExecuteScript; Clojure, ECMAScript, Groovy, lua, python, ruby (code can be in the processor or point to a file)
ExecuteGroovyScript (code in the processor)
"Repositories" Details and Configuration
https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#flowfile-repository
HDF/NIFI Best practices for setting up a high performance NiFi installation >> https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html
Thoughts on Yield Duration and Penalty Duration settings for processors >> https://medium.com/@ben2460/nifi-scheduling-a522a1c9e740
Controller Services scoping/availability >> https://community.hortonworks.com/articles/90259/understanding-controller-service-availability-in-a.html
SDLC with NiFi Registry
Quick intro; https://www.youtube.com/watch?v=X_qhRVChjZY
Longer story; https://www.youtube.com/watch?v=yKmVBTeZS4c
From the docs; https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html
Integrating with git
pretty good overview article; https://medium.com/@abdelkrim.hadjidj/fdlc-towards-flow-development-life-cycle-with-nifi-registries-82e1ee866fab
Example of using InvokeHTTP processor to read/write to Dropbox REST services; https://pierrevillard.com/2016/03/13/get-data-from-dropbox-using-apache-nifi/
Hosting REST services ON NiFi
Offloading (flowfiles from) a node >> https://community.cloudera.com/t5/Community-Articles/Offload-NiFi-Cluster-Nodes-using-the-UI-NiFi-1-8-0/ta-p/249070
Atlas Integration