Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 4 Next »

Let me start this blog post by clearly saying that I'm not suggesting that you should not stand up a Hadoop cluster if you have only allocated two hosts to serve as master nodes as it makes great sense that you should get started on whatever you have!!  I am, however, saying that you will need to violate the pattern of not collocating master & worker daemons on the same host stereotype and/or you will have to live without the evolving HA capabilities inherent to many of the key master processes.  Unfortunately, as the tweet below indicates you have to spend a few bucks to get out of the POC cluster mindset if you want something you'll be running your business on.

At a bare minimum, I'm thinking that three master nodes, if they are significantly "beefy" enough, could let your run a basic HA configuration and allow all the master processes to live on dedicated machines.  The following (high-level) configuration might make sense for a vanilla install of a modern Hadoop distribution.

master01master02master03
Active Daemons
  • Active NameNode
  • Hive Services
  • Oozie
  • HBase Master
  • Storm
  • Resource Manager
  • Falcon
Backup Daemons
  • Resource Manager Failover
  • Passive NameNode
  • Secondary Hive
HA/Failover "Helper"
  • ZooKeeper
  • JournalNode
  • ZooKeeper
  • JournalNode
  • ZooKeeper
  • JournalNode
Admin & Monitoring
  • Ganglia Server
  • Ambari
  • Nagios Server
  • Ambari Backup

This illustration is not a cookbook as there are almost always good conversations centered around how to spread the master processes for a particular cluster.  This view is more of a way to express that there are many moving parts and you will have a resilient cluster when your master processes have enough nodes to be spread across.  I'm even generalizing quite a bit to keep the level of detail from overshadowing the general theme of this post.

If you are wondering what an ideal number of master nodes would be so that you would not have to revisit them when you are actively growing the worker nodes, then my experience would suggest 5-7 which would allow you to go wider on the ZooKeeper and/or JournalNode instances if you decided that was the best for your cluster.  Again, if you've only got budget for six nodes total then expanding the master nodes is something that you need to think of for another day. 

 

If you were able to allocate five master nodes, a much more evenly spread configuration starts to emerge.

master01

 

master02

 

master03

 

master04

 

master05

 

  • No labels