Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Tip

This blog posting's content was originally on the Build a Virtualized 5-Node Hadoop 2.0 Cluster wiki page, but it just made sense to refactor it into a blog posting based on the short shelf-life it has due to changes in the ever-evolving HDP stack.

This write-up is designed to capture the steps required to stand up a 5-node HDP2 (Hortonworks Data Platform) Hadoop 2.0/YARN cluster (with 2 master nodes & 3 worker nodes) running on CentOS 6.4 – all executing within VirtualBox.  Of course, you'll need a beefy host machine to run all of these 5 guest machines within.  To help me out, the good folks at Hortonworks outfitted me with a MacBook Pro that has a 2.3 GHz i7 processor, 16GB of ram, and a 500GB SSD.  Yes... I know... I'm lucky!

...

  • In the VirtualBox UI:
    • Name the host 5N-HDP2-M1More on the naming convention later.
    • Set the memory to 2GB and create a 20GB (dynamically allocated) hard drive.
    • For the Network options, set Adapter 1 to use NAT and set Adapter 2 to use the Host-only Adapter identified as vboxnet0 which will allow the host OS to access the CentOS VM.  We'll move this second adapter from DHCP to static later.
  • During the CentOS installation setup:
    • When it is time to select the Hostname, use "m1.hdp2" (again, more on naming convention later)
    • On that same screen, click the Configure Network button to edit each of the two listed "System ethX" network connections and select the Connect automatically checkbox for both. 
    • For simplicity's sake (i.e. were going to end up with a lot of passwords!) just use "hadoop" for the root password. 
    • Stick with the default "Minimal installation" type. 

...