Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: clarifying Ranger product name
Info
titleUPDATE: Feb 5, 2015

The component referred to as "XA Secure (Apache Argus)" finally settled down with its finalized open-source name and is now referred to as Apache Ranger.  I've had some very positive hands-on experiences with Ranger since this blog posting was written and I am very enthusiastic about its place in the Hadoop security stack.  Check out the Apache project or Hortworks product page for more information on Ranger.

There are plenty of folks who will tell you "Hadoop isn't secure!", but that's not the whole story.  Yes, as Merv Adrian points out, Hadoop in its most bare-bone mode does have a number of concerns.  Fortunately, like with the rest of Hadoop, software layers upon software layers come together in concert to provide various levels of security shields to address specific requirements.

Hadoop started out with no heavy thought to security.  This is because there was a problem to be solved and the "users" of the earliest incarnations of Hadoop all worked together – and all trusted each other.  Fortunately for all of us, especially those of us who made career bets on this technology, Hadoop acceptance and adoption has been growing by leaps and bounds which only makes security that much more important.  Some of the earliest thoughts around Hadoop security was to simply "wall it off".  Yep, wrap it up with network security and only let a few, trusted, folks in.  Of course, then you needed to keep letting a few more folks in and from that approach of network isolation came the every ever present edge node (aka gateway server, ingestion node, etc) that almost every cluster employs today.  But wait... I'm getting ahead of myself.

...

The Apache Knox gateway provides a software layer intended to perform this perimeter security function.  It has a pluggable provider based mechanism to integrate customer AAA mechanisms.  Not all operations are fully supported yet with Knox to have it completely replace the need for the traditional edge node, but the project's roadmap addresses missing functionality.  It's REST API extends the reach to different types of clients and eliminates the need to SSH to a fully configured "Hadoop client" to interact with the Hadoop cluster (or several as Knox can front multiple clusters).

What are people doing in this space?  Early adopters have already deployed Knox, but the majority of clusters still rely heavily on traditional edge nodes.  The interest is clearly present in almost all customers that I work and I expect a rapid adoption of this technology.

Where do I go for more info?  http://www.dummies.com/how-to/content/edge-nodes-in-hadoop-clusters.html and http://knox.apache.org/

...