Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

SIDEBAR: On that last one, I even ran into an organization who formalized a policy that requires ALL servers to be restarted every 90 days.  Knowing first hand where that concept started and fully understanding the unix/linux mindset of running a box forever, I pointed them to my just reboot it (when was that ever a good idea?) posting.  (wink)

I'm surely not suggesting that patching is optional.  There are very solid reasons for doing this that still make it a requirement.  I'm simply suggesting that (especially early on in your Hadoop journey) you rethink how this process will occur on the hosts that make up your Hadoop cluster.  The very obvious easy answer to this is to simply take an outage by stopping all Hadoop services, perform the OS patching as it is done today, and then restart all services thereby ending the outage.  In most shops, this could happen in a very fast manner, but it still requires a service outage which we all want to prevent if possible.

Up until now, my strong advice has been to only apply OS level patches & updates when performing a Hadoop platform maintenance activity such as an upgrade.  The intention is to take advantage of the downtime that will be present with an upgrade and, in fact, to encourage that platform architects are to always be thinking about the next upgrade; especially with the level of innovation (and fixes!) that Hadoop is still undergoing.

Regardless of how you introduce OS patching, no production cluster should ever be upgraded without appropriate testing in, at least one, pre-production environment of not just Hadoop, but your specific use cases and applications that sit on top of it.  Aligning the OS patching with this cycle can prevent an unwanted side effect of an updated dependent artifact from slipping that slips into the environment when doing OS patching separately.  This model also forces the OS patches to get some real testing which in my experience is almost always not done.

...