Tip |
---|
This blog posting's content was originally on the Build a Virtualized 5-Node Hadoop 2.0 Cluster wiki page, but it just made sense to refactor it into a blog posting based on the short shelf-life it has due to changes in the ever-evolving HDP stack. |
This write-up is designed to capture the steps required to stand up a 5-node HDP2 (Hortonworks Data Platform) Hadoop 2.0/YARN cluster (with 2 master nodes & 3 worker nodes) running on CentOS 6.4 – all executing within VirtualBox. Of course, you'll need a beefy host machine to run all of these 5 guest machines within. To help me out, the good folks at Hortonworks outfitted me with a MacBook Pro that has a 2.3 GHz i7 processor, 16GB of ram, and a 500GB SSD. Yes... I know... I'm lucky!
...
- In the VirtualBox UI:
- Name the host 5N-HDP2-M1. More on the naming convention later.
- Set the memory to 2GB and create a 20GB (dynamically allocated) hard drive.
- For the Network options, set Adapter 1 to use NAT and set Adapter 2 to use the Host-only Adapter identified as vboxnet0 which will allow the host OS to access the CentOS VM. We'll move this second adapter from DHCP to static later.
- During the CentOS installation setup:
- When it is time to select the Hostname, use "m1.hdp2" (again, more on naming convention later).
- On that same screen, click the Configure Network button to edit each of the two listed "System ethX" network connections and select the Connect automatically checkbox for both.
- For simplicity's sake (i.e. were going to end up with a lot of passwords!) just use "hadoop" for the root password.
- Stick with the default "Minimal installation" type.
...