Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

If you are wondering what an ideal number of master nodes would be so that you would not have to revisit them when you are actively growing the worker nodes, then my experience would suggest 5-7 which would allow .  This allows you to go wider on the ZooKeeper and/or JournalNode instances if you decided that was the best for your cluster while allowing a single machine to be almost solely devoted to each of the core daemons.  Again, if you only have budget for six nodes then expanding the master nodes is something that you need to think about for another day. 

If you're wondering how many master nodes I've seen in a single cluster, it would have to one of my clients who is running in the 100s of worker nodes.  When we were designing the cluster layout they let me know I had 24 nodes designated to be master nodes (this is excluding any edge/ingestion nodes, too).  I was blow away and I talked them back down to 15 which is PLENTY as you can see by the following layout.

master01master02master03master04master05
  • Ambari
  • Active NameNode
  • Resource Manager
  • Hive Services
  • HBase Master
master06master07master08master09master10
  • Reserved for HBase Backup
  • Reserved for Storm Backup
  • Ambari Backup
  • Reserved for Falcon Backup
  • Passive NameNode
  • Resource Manager Failover
  • Secondary Hive
  • Reserved for Oozie Backup
master11master12master13master14master15
  • Ganglia
  • ZooKeeper
  • JournalNode
  • Nagios
  • ZooKeeper
  • JournalNode
  • Oozie
  • ZooKeeper
  • JournalNode
  • Storm
  • ZooKeeper
  • JournalNode
  • Falcon
  • ZooKeeper
  • JournalNode

As you can see, with With this many master nodes one can be ready for the additional HA features when they arrive to all services.  It also gives you almost unlimited flexibility to adapt to multiple server failures.  While not many clusters will be able to have this many master nodes, proper planning and budgeting is essential to make sure your cluster not only fits in your budget, but delivers on the SLAs you require of it.

...