hadoop world 2013 (reflections from nyc)

I was sure lucky enough to make it to Hadoop World in New York City this year.  Thanks for the Cloudera team for giving me a pass to the event and to my boss for picking up the travel tab (he hasn't seen my expense report yet -- those rooms aren't cheap in Midtown!).  The whole thing reminded me of JavaOne the first time I went back in 2001.  So much excitement around new technologies.  

In fact, there was such a strong desire to know more coupled with a (absolutely not picking on anyone here!) big void of skills in the field.  It reminded me of when many folks where getting excited about internet development in the mid-90s and then Java in the late-90s.  That excitement is contagious!!  And to stick with the internet and Java comparisons, I'm absolutely sure Hadoop and all the current (and upcoming) ecosystem tools are here to stay.

 It is also crystal clear that hadoop yarn (in a nutshell) offers so much more extensibility and control going forward.  Hortonworks gave a good now/soon/eventually presentation on YARN that was one of the most informative of the event for me personally.  I really like the quote I heard them say; "YARN is the OS of the Hadoop cluster".  This really is cool technology and decoupling the workload type from MapReduce is a giant step forward.  I'm embedding their presentation next (click on down arrow in lower left of the preview tab to download it). 

As for the rest of the sessions & keynotes there were about 60/40 favoring on the good side.  One of may favorite messages was from keynote presenter Jim Kaskade (Infochimps CEO) who pushed past the technology and made it clear that "big data starts with the application" and followed it up with "don't have a big data reference architecture, have some use cases."

There surely were some misses.  Not that they weren't good presentations, but a few I'm not sure I would have attended if they were at a local user group.  Sometimes, being a former Facebook/Twitter/LinkedIn/Yahoo/whatever employee isn't enough -- you still need a good topic.  Oh… and rolling your own resource negotiator on top of the Yet Another Resource Negotiator just creates a YA-YA-RN in my book (that, or I just didn't get the REEF presentation).

I did enjoy Doug Cutting's keynote on the final day.  He was asked to make some predictions about the future of Hadoop.  As with any good programmer, you had to get through all the disclaimers and reasoning before you got to the meat of the answer.  Finally, he got there.  He said the magic words that, if all goes according to plan, will take Hadoop past its current analytics and OLAP focus.  Doug declared "even transactions possible on Hadoop".  He further echoed this with an even clearer "it's inevitable that we'll see just about every kind of workload be moved to this platform; even OLTP" decree.

This is powerful stuff!  I'm very excited about Doug's "we are in the middle of a revolution in data processing; revolutions are scary" line and his summary quote of "the future for data is Hadoop."

If you are interested, feel free to check out the day-by-day blogging I did on my company's intranet.  I dumped them out to PDF; Day1.pdfDay2.pdf, and Day3.pdf.