Learn & Blog

Non-prioritized ideas for upcoming blog posts

  • Cross-Tech
    • dbVisualizer w/Hive and Spark
    • streaming blog series (storm, spark, kafka, flink, others?)
    • showing SM across NiFi, Kafka, Flink and Spark
    • IntelliJ's Big Data Tools
  • NiFi
    • NUC Cluster
    • Alternative F/C/P Repository options
    • ListenHTTP processor
    • InvokeHTTP processor
    • Retry loop w/wait
    • Record processors
    • Hive Streaming processor
    • Wait & Notify processors
    • ETL Example
      • File ready trigger
      • Enrichment lookups
      • Validation checks
      • Transformation
    • Variables (and Properties)
      • Integrate w/Registry?
      • Sensitive Values
    • Spark integration
    • Use with MiniFi and EFM
    • test drive iot data simulator
    • custom processor development
    • using nifi to monitor nifi
  • Flink
    • Learn SOMETHING
  • Spark
    • JustEnough Python / Scala
    • learn numpy and pandas
    • Connecting to SparkSQL, possibly as suggested here.
    • DFs always use Parquet?
    • lambda perf issues w/DS vs DF
    • Port Streever's data generator (fix MR one?)
    • create "clean checkpoint dir" spark-submit option
  • Kafka
    • Kafka Streams examples
    • Kafka-backed Hive tables
    • Explore Streams Messaging Manager
    • Explore KSOY (Kafka Streams on YARN)
  • Hive
    • explain plan analysis
    • materialized view rewriting examples
  • HBase
    • Phoenix
    • OpenGA analysis
      • Shell
      • Phoenix
  • Solr
    • OpenGA analysis
  • Hadoop
    • HDFS
      • S3 with HDFS/Hive/Spark