presenting at hadoop summit (archiving evolving databases in hive)

What a humbling experience to have the opportunity to present at the 2015 Hadoop Summit conference in San Jose.  I've done a decent number of user group presentations over the years, and even have presented Hadoop topics to audiences as big as 500, but this is the first time I have talked at a major industry conference and I had a blast.  It was just cool to have a "presenter" badge and to have my name in all of the conference literature.

     

My topic was Mutable Data in Hive's Immutable World and here is the synopsis that is visible from the agenda.

Going beyond Hive`s sweet spot of time-series immutable data, another popular utilization is replicating RDBMS schemas. This "active archive" use case`s intention is not to capture every single change, but to update the current view of the source system at regular intervals. This breakout session will compare/contrast full-refresh & delta-processing approaches as well as present advanced strategies for truly "big" data. Those strategies not only parallelize their processing, but leverage Hive`s partitioning to intelligently target the smallest amount of data as possible to improve performance and scalability. Hive 14`s INSERT, UPDATE, and DELETE statements will also be explored.

I've loaded my slides up on SlideShare.  Please use http://www.slideshare.net/lestermartin/mutable-data-in-hives-immutable-world if the preview below is having troubles.

I was lucky enough to find out that Jennifer Knight took some "action shots" of myself during my presentation.

 

 

Yep, as that last one shows, I was talking about BIG data!