Prioritized upcoming blog posts
- Capture some/most/all of the examples from my upcoming DevNexus preso; Transformation Processing Smackdown; Spark vs Hive vs Pig
- Create Spark RDD & DataFrame incantations of the "calculate salary statics w/' postings associated with Open Georgia Analysis
Non-prioritized ideas for upcoming blog posts
- what are "good" terasort numbers (and are they "good for anything"?)
- How to use (and review) YARN "distributed shell" app
- Fix MOYA (not Slider) bug for Hadoop 2.6
- Sample Slider application
- exploring apache hive's sql authorization (grants, roles & other fun stuff)
- how does apache ranger handle hive roles? (it doesn't!)
- Hive's export/import operations (note to self; tracking in O.F.)
- Test drive of Hive 14's CRUD operations (note to self; tracking in O.F.)
- Local DataNode disk balancing options (note to self; tracking in O.F.)
- Typical data ingestion workflow (note to self; tracking in O.F.)
- Sqoop'ing some data
- Transformation/enrichment with Pig
- Accessing it from Hive
- Pulling it all together with Oozie
- Best practices of location/naming/structure of code & config for all components of the workflow
- Maybe a redo of this workflow using Cascading?
- Recap of Summit preso if only to provide links to deck and recording; loosely based on http://hortonworks.com/blog/four-step-strategy-incremental-updates-hive/
- Using snapshots with archiving solution
- Support Hadoop vendors like you'd support PBS
- Change hostnames & IPs of all hosts in a HDP cluster
- HBase via JDBC (using Phoenix)
- HBase via JDBC (part deaux; just using HIve)
- Pig schema reuse (and why not really good)
- Connecting to SparkSQL, possibly as suggested here.
- Managing Kafka offsets automagically
- Playing with HBase versioning (include deleting a range of cells)
If you have some things you'd like to see, please share them in the comments.
If comments are unavailable below, please see red notes in left-hand nav.