a robust set of hadoop master nodes (it is hard to swing it with two machines)

Let me start this blog post by clearly saying that I'm not suggesting that you should not stand up a Hadoop cluster if you have only allocated two hosts to serve as master nodes as it makes great sense that you should get started on whatever you have!!  I am, however, saying that you will need to violate the mantra of not collocating master & worker daemons on the same host stereotype and/or you will have to live without the evolving HA capabilities inherent to many of the key master processes.  Unfortunately, as the tweet below indicates, you have to spend a few bucks to get out of the POC-sized cluster if you want something hardened that you'll feel comfortable running your business on.

At a bare minimum, I'm thinking that three master nodes, if they are significantly "beefy" enough, would let you run a basic HA configuration and allow all the master processes to live on dedicated machines.  The following (high-level) configuration might make sense for a vanilla install of a modern Hadoop distribution with this number of nodes.

master01master02master03
Active Daemons
  • Active NameNode
  • Hive Services
  • Oozie
  • HBase Master
  • Storm
  • Resource Manager
  • Falcon
Backup Daemons
  • Resource Manager Failover
  • Passive NameNode
  • Secondary Hive
HA/Failover "Helper"
  • ZooKeeper
  • JournalNode
  • ZooKeeper
  • JournalNode
  • ZooKeeper
  • JournalNode
Admin & Monitoring
  • Ganglia Server
  • Ambari
  • Nagios Server
  • Ambari Backup

This illustration is not a cookbook as there are almost always good conversations centered around how to spread the master processes for a particular cluster.  This view is more of a way to express that there are many moving parts and you will have a more resilient cluster when your master processes have enough nodes to be spread across.  I'm even generalizing quite a bit to keep the level of detail from overshadowing the general theme of this post.

If you are wondering what an ideal number of master nodes would be so that you would not have to revisit them when you are actively growing the worker nodes, then my experience would suggest 5-7.  This allows you to go wider on the ZooKeeper and/or JournalNode instances if you decided that was the best for your cluster while allowing a single machine to be almost solely devoted to each of the core daemons.  Again, if you only have budget for six nodes then expanding the master nodes is something that you need to think about for another day. 

If you're wondering how many master nodes I've seen in a single cluster, it would have to one of my clients who is running in the 100s of worker nodes.  When we were designing the cluster layout they let me know I had 24 nodes designated to be master nodes (this is excluding any edge/ingestion nodes, too).  I was blown away and I talked them back down to 15 which is PLENTY as you can see by the following layout.

master01master02master03master04master05
  • Ambari
  • Active NameNode
  • Resource Manager
  • Hive Services
  • HBase Master
master06master07master08master09master10
  • Reserved for HBase Backup
  • Reserved for Storm Backup
  • Ambari Backup
  • Reserved for Falcon Backup
  • Passive NameNode
  • Resource Manager Failover
  • Secondary Hive
  • Reserved for Oozie Backup
master11master12master13master14master15
  • Ganglia
  • ZooKeeper
  • JournalNode
  • Nagios
  • ZooKeeper
  • JournalNode
  • Oozie
  • ZooKeeper
  • JournalNode
  • Storm
  • ZooKeeper
  • JournalNode
  • Falcon
  • ZooKeeper
  • JournalNode

With this many master nodes one can be ready for the additional HA features when they arrive to all services.  It also gives you almost unlimited flexibility to adapt to multiple server failures.  While not many clusters will be able to have this many master nodes, proper planning and budgeting is essential to make sure your cluster not only fits in your budget, but delivers on the SLAs you require of it.

I'd love to hear about how you laid our your Hadoop cluster – feel free to add a comment and let me know about your configuration.