I was just explaining to a colleague today how Hadoop 2.0 (aka YARN, which stands for Yet Another Resource Negotiator) differs from Hadoop 1.0. Today's "core Hadoop" consists of HDFS and MapReduce and each have their own master & worker daemon processes. Specifically, NameNode & DataNode for HDFS and JobTracker & TaskTracker for MapReduce. This itself makes sense as HDFS and MapReduce are focused on two different things.
HDFS offers redundant, reliable storage while the tightly-coupled MapReduce framework concentrates on data processing and the cluster resource management that is needed for such a scalable platform. This model works well when we only have applications that layer on top of MapReduce such as the open-source Pig & Hive frameworks and the commercial offering from Datameer, but these tools will only perform as fast as the underlying batch-oriented MapReduce layer will allow.
There are other offerings that are bringing near real-time responsiveness to the Hadoop "ecosystem". One of the most established ones is HBase. In projects like this one an entirely new set of daemons come into play to address the necessary data processing requirements as well as cluster resource management. This model starts to become a nightmare on two fronts as more and more Hadoop tools & frameworks become available.
First, we have multiple teams (open source and commercial) building the same kinds of software to handle operating in a clustered environment which simply leads to way too many implementations of the same general problem. Second, it isn't hard to imagine that we could quickly have a number of different sets of master & worker daemons starting to run on all the machines in our cluster. Hadoop 2.0 / YARN is here to help with both of these concerns.
YARN attacks the underlying resource management problem for all and features an interface that allows data processing frameworks to plug into this shared functionality. MapReduce and HBase in Hadoop 2.0 will sit on top of YARN (w/o requiring app developers to rewrite anything). As always, a picture is worth a thousand words and Hortonworks has presented a few in their Hadoop YARN write-up. In fact, their synopsis is even better than mine, but I thought I'd give it a try anyway.
Hadoop 2.0 is coming fast and has a great opportunity to be the "app fabric" of tomorrow as a wise man recently predicted to me. If you have an app or framework that needs to scale to the level of where Hadoop is going (i.e. thousands of nodes), then this is the time to see how you can unwind some of your own cluster management code and take advantage of YARN yourself. If you need some help doing it -- drop me a line as that would be a fun project!!
While working my way through Eric Sammer's Hadoop Operations book I came across this call-out from Chapter 9.
On "Reboot It" Syndrome
The propensity for rebooting hosts or restarting daemons without any form of investigation is the opposite of everything discussed thus far. This particular form of disease was born out of a different incarnation of the 80/20 rule, one in which 80% of the users have no desire or need to understand what the problem is or why it exists. As administrators, we exist within the 20% for whom this is not –nor can we allow it to become-- the case. By defaulting to just restarting things, you're defaulting to the nuclear option, obliterating all information and opportunities to learn from an experience. Without meaningful experience, preventative care simply isn't possible.
Consider for a moment what would happen if doctors opted for the medical version of "reboot it". Maybe they'd just cut off anything that stopped working.
Amen!! I remember when I first encountered this mindset. I had just completed one of the six-month (evenings & weekends) UNIX/C "retooling" programs at SMU that were popular back in the early-mid 1990s. I learned a LOT about UNIX in that program from an awesome, but curt and crusty, UNIX administrator named Bobby. With that bald head, long ponytail, anti-government rhetoric, and wild stories about the "early" days of UNIX he told, I'm sure he was "off the grid" back then and hasn't been back on it yet. I was coming from a mainframe development background at the time. That massively uptime environment coupled with learning early about the bragging rights the came from running the uptime
command solidified in my brain that systems should (and could!) be run for a long time without a restart.
The skills I learned from those courses (not to mention the tenacity I showed to learn anything and everything I could about web development – mostly CGI back in those days) helped me land a web developer job. I got hired at an ISP that was about to go national and compete against companies like AOL, MSN and NetZero (yes... this was the 90s – anyone remind AltaVista?) and we had a big operations team. The ops team was split about 50/50 between UNIX and Windows administrators and they were all on the same big floor. Although the cube farm layout was almost identical, it was VERY easy to spot which side was which.
As you walked from the elevator through the door that was square in the middle of this large open area, you could immediately see the contrast in styles. On the left was the Windows team. Everyone looked 16 (I'm sure they were in their 20's!) and they all had short & stylish haircuts and wore polo shirts and khaki pants. They were all so eager to be walking around and talking to each other. It was like a scene out of Stepford Wives; except they were all dudes. By contrast, the UNIX side looked like a dark scene from a Tolkein novel. Most of the florescent lights from the suspended ceiling were unscrewed and the primary illumination was from the 21" CRTs most of these admins had on their desks. Some of the folks even rigged up "ceilings" for their cubes made up of flattened cardboard boxes that their servers where shipped in. Long scraggly hair (mostly down) and unshaven faces was the norm.
What really stood out was the big "here there be dragons" sign!! I really did love hanging out on that side of the floor and I learned a lot from these guys.
I only take you down memory lane as this wild variance in administrative "styles" was were I first encountered the "Reboot It" Syndrome that Eric was describing. Most of these young Windows administrators never saw another platform in their life beyond a PC and the UNIX guys have been running mission-critical systems for years. That maturity and craftsmanship was evident in the pride they took in the operational behavior, and yes... UPTIME, of the servers in their charge. Conversely, this Windows administration team was quick to declare "reboot it!" at the first sign of any trouble.
They even came up with alternative phrases such as "kick it", the ever popular "restart it", as well as the one that tried to make it sound like a desired process; an "environmental refresh".
Hey... don't get me wrong... machines do need to be restarted sometimes, but executing the nuclear option shouldn't be the first step in your analysis. Many of these young administrators probably went on to develop their skills and experiences to a senior level, but deep down I'm almost positive most still suffer from "Reboot It" Syndrome.
So, the next time your crappy AT&T U-Verse DVR freezes up go ahead and pull the plug on it to get it that much needed "environmental refresh". But, if a critical Hadoop daemon in your cluster is throwing some wild Exceptions... PLEASE take a few minutes to do some investigation activities and restarting it is most likely just going to produce a small outage and then return you to the undesired state you were hoping to fix with "magic". Or, at least tell me you did this before you Reboot It!!
My current employer has been doing some decent renovations at our offices and I stumbled into a "library" on one of the newly jazzed up floors. You know, one of those rooms that have a bunch of bookshelves that everyone put all of their old & crappy books they just don't want anymore. The last time I was in this room I saw a book with a catchy title; The No Asshole Rule. I can't say that I have read it and based on the 4-star review average on Amazon I'm assuming it is a good book. It did, however, remind me of something from my last job.
We had a long no-hire period and I got excited when we finally have the opportunity to hire some fresh new faces due to some openings one of our senior leaders created. My enthusiasm simmered down a bit when this gentleman pontificated that we need to only hire folks that "fit in with our culture" and he absolutely declared, "no assholes or prima donnas". As for me... I'm thinking I want a few assholes and prima donnas on my team!
The senior leader advocating this approach failed to realize a couple of things. First, the culture of "sameness" and (faux) "can-do attitudes" we had simply forced the staff to only talk among themselves, and not to their leaders, about the things bothering them in the organization. This makes it difficult for leaders to fix things that aren't coming to their attention. Of course, any leader worth their salt would feel a bit awkward if NOBODY was EVER coming to them with ANY TYPE OF COMPLAINT.
I'm not advocating a season of Total Drama Island, but seriously, when was the last time that everyone on your team was perfectly happy about everything. I believe that you need a healthy mix of different kinds of folks to get a stellar team. While I've been lucky to lead (and be part of part) of a couple of high performing teams where everyone's skills were almost identical and nobody had an attitude at all, there are some great teams who are great because they have a couple of assholes and/or prima donnas in their ranks.
Assholes and prima donnas can only get away with being the folks they are if they perform, and they usually perform so well that they know it -- hence, they turn up the "attitude". Heck, even General George S. Patton said, "all very successful commanders are prima donnas and must be so treated." He also admitted being a prima donna himself and I'm sure I'm not alone when I think that he was called an asshole a couple of times!
Assholes stir things up when they need to be. If they see something stupid happening, they call it out. They don't have time for nonsense. I want an asshole, or two, on my project instead of a bunch of "yes men". Prima Donnas are the superstars of your team. Let them puff up their feathers. Let the other team members know that if they rock, then you'll put up with their crap, too!
With all of that said, I'm proud to say I'm a bit of an asshole, too, and I'm surely a prima donna. I challenge you to look within yourself and the folks you admire in your organization to see if they just might be assholes and/or prima donnas. Are they the ones driving the bus? Is your organization better off because of them? Despite their problems, would you ever want them to leave?