Pretty solid NiFi intro blog post >> https://www.freecodecamp.org/news/nifi-surf-on-your-dataflow-4f3343c50aa2/
NiFi (and other HDF/CDF components) SE training deck >> https://docs.google.com/presentation/d/1EuxZeintuYUzAQ83mfTJcAdFfPgP2DxoUl4lLvptueE/edit?usp=sharing (Cloudera-internal)
UpdateAttribute Processor
- Implementing If/Then/Else logic >> https://digitalplumber.me/2018/01/15/the-power-of-the-advanced-button-on-nifis-updateattribute-processor/'
- Deleting attributes >> https://community.cloudera.com/t5/Support-Questions/I-want-to-delete-attributes-after-pulling-the-data-from/td-p/223510
ETL-ish references
- Example of extracting text using ExtractText as well as EvaluateJSON processors >> https://community.hortonworks.com/questions/173829/nifi-extract-text-from-json.html
- Slowing Changing Dimensions (SCD) >> https://community.hortonworks.com/articles/48843/slowly-changing-dimensions-on-hadoop-part-1.html
Removing columns, filtering rows, changing field values >> https://community.hortonworks.com/articles/66861/nifi-etl-removing-columns-filtering-rows-changing.html
- Data enrichment with lookups
UpdateRecord Processor
- https://community.hortonworks.com/articles/189642/update-the-contents-of-flowfile-by-using-updaterec.html
- https://stackoverflow.com/questions/53665273/apache-nifi-parse-data-with-updaterecord-processor
Blog about Wait/Notify processors >> http://ijokarumawak.github.io/nifi/2017/02/02/nifi-notify-batch/
Load-balancing a list of files with ListSFTP and FetchSFTP >> https://community.hortonworks.com/articles/97773/how-to-retrieve-files-from-a-sftp-server-using-nif.html
- 1.8+ solution to not require use of RPG >> https://blogs.apache.org/nifi/entry/load-balancing-across-the-cluster
Scheduling & invoking a flow as a batch activity
- https://community.hortonworks.com/questions/41532/nifi-job-control-and-date-parameter.html?childToView=41538#answer-41538
- https://community.hortonworks.com/questions/53172/how-can-we-schedule-nifi-data-flow.html?childToView=53163#answer-53163
- https://community.hortonworks.com/questions/41532/nifi-job-control-and-date-parameter.html?childToView=148974#comment-148974
Nifi & Kafka partitioning line up >> https://community.hortonworks.com/content/kbentry/57262/integrating-apache-nifi-and-apache-kafka.html
NiFi Expression Language Cheat Sheet >> https://www.nifi.rocks/documents/nifi-expression-language-cheat-sheet.pdf
Retry Loop Template >> https://community.cloudera.com/t5/Support-Questions/NiFi-Flowfile-retries/td-p/171724
RDBMS integration
- Database extract blog posting >> https://www.batchiq.com/database-extract-with-nifi.html
- Database extract HCC q&a >> https://community.hortonworks.com/questions/138406/how-to-extract-text-from-generatetablefetcha-and-i.html
- Ingesting new DB tables >> https://community.hortonworks.com/articles/108718/ingesting-rdbms-data-as-new-tables-arrive-automagi.html
Custom processor development >> https://community.hortonworks.com/content/kbentry/4318/build-custom-nifi-processor.html
Leveraging external scripts (STDOUT responses become a FlowFile to be leveraged in the rest of the flow definition)
- The original ExecuteProcess (point to a script location)
- ExecuteStreamCommand; sends flowfile content to the standard input stream of the process and transfers standard output back into flowfile content. So your code should read from stdin and write to stdout
- ExecuteSparkInteractive (spark code is set in the processor)
- Some newer "experimental" options
- InvokeScriptedProcessor; you can use Javascript, Groovy, Jython, Lua, or JRuby to create a Processor implementation (code can be in the processor or point to a file)
- ExecuteScript; Clojure, ECMAScript, Groovy, lua, python, ruby (code can be in the processor or point to a file)
- ExecuteGroovyScript (code in the processor)
- InvokeScriptedProcessor; you can use Javascript, Groovy, Jython, Lua, or JRuby to create a Processor implementation (code can be in the processor or point to a file)
"Repositories" Details and Configuration
- https://nifi.apache.org/docs/nifi-docs/html/nifi-in-depth.html
- https://nifi.apache.org/docs/nifi-docs/html/administration-guide.html#flowfile-repository
- https://community.cloudera.com/t5/Community-Articles/Understanding-how-NiFi-s-Content-Repository-Archiving-works/ta-p/249418
HDF/NIFI Best practices for setting up a high performance NiFi installation >> https://community.hortonworks.com/articles/7882/hdfnifi-best-practices-for-setting-up-a-high-perfo.html
Thoughts on Yield Duration and Penalty Duration settings for processors >> https://medium.com/@ben2460/nifi-scheduling-a522a1c9e740
Controller Services scoping/availability >> https://community.hortonworks.com/articles/90259/understanding-controller-service-availability-in-a.html
SDLC with NiFi Registry
- Quick intro; https://www.youtube.com/watch?v=X_qhRVChjZY
- Longer story; https://www.youtube.com/watch?v=yKmVBTeZS4c
- From the docs; https://nifi.apache.org/docs/nifi-registry-docs/html/getting-started.html
- Integrating with git
Example of using InvokeHTTP processor to read/write to Dropbox REST services; https://pierrevillard.com/2016/03/13/get-data-from-dropbox-using-apache-nifi/
Hosting REST services ON NiFi
- https://community.hortonworks.com/articles/55080/create-a-restful-for-nifi-walmart-case-study.html
- https://community.hortonworks.com/questions/104718/handlehttprequest-configure-to-process-millions-of.html
Offloading (flowfiles from) a node >> https://community.cloudera.com/t5/Community-Articles/Offload-NiFi-Cluster-Nodes-using-the-UI-NiFi-1-8-0/ta-p/249070
Atlas Integration
- https://blog.cloudera.com/hdf-3-1-blog-series-part-6-introducing-nifi-atlas-integration/
- https://community.cloudera.com/t5/Community-Articles/End-to-End-Atlas-Lineage-with-Nifi-Spark-Hive/ta-p/248221
- https://docs.cloudera.com/HDPDocuments/HDF3/HDF-3.4.1.1/installing-hdf/content/configure_nifi_for_atlas_integration.html