building a virtualized 5-node HDP 2.0 cluster (all within a mac)

This blog posting's content was originally on the Build a Virtualized 5-Node Hadoop 2.0 Cluster wiki page, but it just made sense to refactor it into a blog posting based on the short shelf-life it has due to changes in the ever-evolving HDP stack.

This write-up is designed to capture the steps required to stand up a 5-node HDP2 (Hortonworks Data Platform) Hadoop 2.0/YARN cluster (with 2 master nodes & 3 worker nodes) running on CentOS 6.4 – all executing within VirtualBox.  Of course, you'll need a beefy host machine to run all of these 5 guest machines within.  To help me out, the good folks at Hortonworks outfitted me with a MacBook Pro that has a 2.3 GHz i7 processor, 16GB of ram, and a 500GB SSD.  Yes... I know... I'm lucky!

HDP 2.0.9.0 was utilized for this tutorial.  Documentation for the most current version of HDP is available at http://docs.hortonworks.com/.

When we are done, we will have a cluster like the visualization below.

This effort isn't a 15 minute effort for the faint of heart and keeping up with the Earl of Wisdom to the right that suggests that you have to bring some level of sys admin, networking, debugging, root/cause analysis, and just plain "hunting" skills before you'd even try to do this, this write-up will NOT present a series of mindless screenshots. 

Additionally, I CAN GUARANTEE that there will be some bumps along the way that will require you to do some digging on your own.  You can raise concerns via the comments section at the bottom of the page and I'll try help you unjam any problems you can't work through.

That said, let's get started!!

First Master Node

After you get VirtualBox setup, then we need to get a base install of CentOS going that we can use as a baseline.  We can then tweak it for the other nodes.  As we'll be doing a "Minimal Install", we only need the Disk1 ISO which you can get here

Minimal Installation using DHCP

As mentioned above, some of this is an "explore as you go" exercise and I surely don't want to give you screenshots of every single step of setting up CentOS to run within a VirtualBox VM.  Fortunately, there are good examples out there in the wild and some quick google searching yields some good links.  For this exercise, I suggest taking a look at the write-up at http://webees.me/installing-centos-6-4-in-virtualbox-and-setting-up-host-only-networking/.  It is not perfect (and the author, like me, leaves some things out), but it should get you going.  The page at http://extr3metech.wordpress.com/2012/10/25/centos-6-3-installation-in-virtual-box-with-screenshots/ could also be reviewed prior to starting this step.

Here are some specific items that need to be addressed during the setup (again, above-n-beyond the basics identified in the earlier tutorials) that are focused on our need to build a multi-node Hadoop cluster.

  • In the VirtualBox UI:
    • Name the host 5N-HDP2-M1More on the naming convention later.
    • Set the memory to 2GB and create a 20GB (dynamically allocated) hard drive.
    • For the Network options, set Adapter 1 to use NAT and set Adapter 2 to use the Host-only Adapter identified as vboxnet0 which will allow the host OS to access the CentOS VM.  We'll move this second adapter from DHCP to static later.
  • During the CentOS installation setup:
    • When it is time to select the Hostname, use "m1.hdp2" (again, more on naming convention later)
    • On that same screen, click the Configure Network button to edit each of the two listed "System ethX" network connections and select the Connect automatically checkbox for both. 
    • For simplicity's sake (i.e. were going to end up with a lot of passwords!) just use "hadoop" for the root password. 
    • Stick with the default "Minimal installation" type. 

If all goes well, then you'll end up with a CLI-only Linux box that you can now log into.

CentOS release 6.4 (Final)
Kernel 2.6.32-358.el6.x86_64 on an x86_64

m1 login: root
Password: 
[root@m1 ~]# _

At this point, we're using DHCP so run ifconfig to see what IP address is being used (my eth1 connection reported 192.168.56.102, but yours may vary).  Then use a terminal program from your host to make sure you can log into the newly created virtual machine.  And once there, ping an external server to verify it can reach back out.

hw10653:~ lmartin$ ssh root@192.168.56.102
root@192.168.56.102's password: 
Last login: Thu Jan 16 15:34:32 2014 from 192.168.56.1
[root@m1 ~]# 
[root@m1 ~]# ping www.google.com
PING www.google.com (74.125.227.113) 56(84) bytes of data.
64 bytes from dfw06s16-in-f17.1e100.net (74.125.227.113): icmp_seq=1 ttl=63 time=46.0 ms
64 bytes from dfw06s16-in-f17.1e100.net (74.125.227.113): icmp_seq=2 ttl=63 time=43.2 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1531ms
rtt min/avg/max/mdev = 43.230/44.615/46.001/1.401 ms
[root@m1 ~]# 

Configure Static IP

Edit /etc/sysconfig/network-scripts/ifcfg-eth1 to make the following changes:

  • Make sure ONBOOT is set to yes (will be if you selected the Connect automatically checkbox during installation and mentioned above)
  • Change NM_CONTROLLED from yes to no
  • Change BOOTPROTO from dhcp to none
  • Add the following lines
    • IPADDR=192.168.56.41
    • NETMASK=255.255.255.0
  • Remove the lines that start with the following
    • UUID
    • HWADDR

Edit /etc/sysconfig/network-scripts/ifcfg-eth0 and remove the line starting with UUID.  Run rm /etc/udev/rules.d/70-persistent-net.rules to flush out the existing network settings and then issue a shutdown -r now to reboot the machine.

After the box is restarted try to ping 192.168.56.41 (the virtual machine "m1" that we just built), ssh to it, and once there ping back to the guest host (the vboxnet0 identifies it as 192.168.56.1) to verify it is routable in & out.  Also, verify that you can still ping addresses in the wild.

hw10653:~ lmartin$ ping 192.168.56.41
PING 192.168.56.41 (192.168.56.41): 56 data bytes
64 bytes from 192.168.56.41: icmp_seq=0 ttl=64 time=0.297 ms
64 bytes from 192.168.56.41: icmp_seq=1 ttl=64 time=0.345 ms
^C
--- 192.168.56.41 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 0.297/0.321/0.345/0.024 ms
hw10653:~ lmartin$ 
hw10653:~ lmartin$ ssh root@192.168.56.41
root@192.168.56.41's password: 
Last login: Sat Jan 18 03:28:27 2014 from 192.168.56.1
[root@m1 ~]# 
[root@m1 ~]# ping 192.168.56.1
PING 192.168.56.1 (192.168.56.1) 56(84) bytes of data.
64 bytes from 192.168.56.1: icmp_seq=1 ttl=64 time=0.201 ms
64 bytes from 192.168.56.1: icmp_seq=2 ttl=64 time=0.230 ms
^C
--- 192.168.56.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1428ms
rtt min/avg/max/mdev = 0.201/0.215/0.230/0.020 ms
[root@m1 ~]# 
[root@m1 ~]# ping www.google.com
PING www.google.com (74.125.227.115) 56(84) bytes of data.
64 bytes from dfw06s16-in-f19.1e100.net (74.125.227.115): icmp_seq=1 ttl=63 time=48.0 ms
64 bytes from dfw06s16-in-f19.1e100.net (74.125.227.115): icmp_seq=2 ttl=63 time=105 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1481ms
rtt min/avg/max/mdev = 48.053/76.825/105.598/28.773 ms
[root@m1 ~]# 

Hadoop Installation Prep

Before we create the additional nodes using this one as the base to start from, there are some additional steps we should take so we won't have to do them for each VM.  Later in this document we will utilize the Ambari documentation to setup our cluster, but there are some items within that documentation that are useful at this time.

SSH Setup

First, run yum install openssh-clients and then use ssh localhost to verify it is working.

Next, follow the instructions at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.0/bk_using_Ambari_book/content/ambari-chap1-5-2.html to allow password-less SSH connections.  

[root@m1 ~]# ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/root/.ssh/id_rsa): 
Enter passphrase (empty for no passphrase): 
Enter same passphrase again: 
Your identification has been saved in /root/.ssh/id_rsa.
Your public key has been saved in /root/.ssh/id_rsa.pub.
The key fingerprint is:
23:4a:e9:9f:c9:ea:91:1f:9b:bf:98:e2:16:af:0c:24 root@m1.hdp2
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|                 |
|                 |
|     .           |
|E . o . S        |
| o o.o . .       |
|  . =o.          |
|   oo=.O         |
|   +*+@.o.       |
+-----------------+
[root@m1 ~]# cd .ssh
[root@m1 .ssh]# ls -l
total 12
-rw-------. 1 root root 1675 Jan 18 04:15 id_rsa
-rw-r--r--. 1 root root  394 Jan 18 04:15 id_rsa.pub
-rw-r--r--. 1 root root  391 Jan 18 04:09 known_hosts
[root@m1 .ssh]# 
[root@m1 .ssh]# cat id_rsa.pub >> authorized_keys
[root@m1 .ssh]# ls -l
total 16
-rw-r--r--. 1 root root  394 Jan 18 04:17 authorized_keys
-rw-------. 1 root root 1675 Jan 18 04:15 id_rsa
-rw-r--r--. 1 root root  394 Jan 18 04:15 id_rsa.pub
-rw-r--r--. 1 root root  391 Jan 18 04:09 known_hosts
[root@m1 .ssh]# 
[root@m1 .ssh]# chmod 600 authorized_keys
[root@m1 .ssh]# ls -l
total 16
-rw-------. 1 root root  394 Jan 18 04:17 authorized_keys
-rw-------. 1 root root 1675 Jan 18 04:15 id_rsa
-rw-r--r--. 1 root root  394 Jan 18 04:15 id_rsa.pub
-rw-r--r--. 1 root root  391 Jan 18 04:09 known_hosts
[root@m1 .ssh]# 
[root@m1 .ssh]# ssh localhost
Last login: Sat Jan 18 04:09:59 2014 from localhost
[root@m1 ~]# 

Lastly, create a file named config in /root/.ssh that contains StrictHostKeyChecking no as the only line.

[root@m1 ~]# cd /root/.ssh
[root@m1 .ssh]# echo 'StrictHostKeyChecking no' >> config
[root@m1 .ssh]# cat config
StrictHostKeyChecking no
[root@m1 .ssh]# 

Disable Key Security Options

Security-Enhanced Linux (SELinux) is a "good thing" and I'm not recommending you not use it in your "real" environments, but for our purposes let's just disabled it.  To do this simply edit /etc/selinux/config and change the SELINUX value from enforcing to disabled.

The Linux firewall, iptables, is also another "good thing", but let's make our environment easier to install to by disabling it.  Run the following commands.

[root@m1 ~]# chkconfig iptables off
[root@m1 ~]# chkconfig ip6tables off

Reboot and make sure the following output is seen to ensure iptables are not running at system startup.

[root@m1 ~]# service iptables status
iptables: Firewall is not running.
[root@m1 ~]# 

Setup NTP Service

The clocks on all nodes in the cluster will need to be in-sync and the NTP service will take care of this for us.  Execute yum install ntp to enable this service.  Then type the following commands.

[root@m1 ~]# chkconfig ntpd on
[root@m1 ~]# service ntpd start
Starting ntpd:                                             [  OK  ]
[root@m1 ~]# 

Flush Networking Rules

Perform the following as the final step (as a final flush of networking rules) in our preparation of a base VM to help create the other nodes.

[root@m1 ~]# cd /etc/udev/rules.d
[root@m1 rules.d]# rm 70-persistent-net.rules 
rm: remove regular file `70-persistent-net.rules'? y
[root@m1 rules.d]# 

Pre-Configure Hosts

When finished we will have the following 5 hosts in our cluster.

VirtualBox NameOS HostnameIP Address
5N-HDP2-M1m1.hdp2192.168.56.41
5N-HDP2-M2m2.hdp2192.168.56.42
5N-HDP2-W1w1.hdp2192.168.56.51
5N-HDP2-W2w2.hdp2192.168.56.52
5N-HDP2-W3w3.hdp2192.168.56.53

Update Hosts File

Since we will not be relying on DNS for name resolution, we need to add the lines below (as described in http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.0/bk_using_Ambari_book/content/ambari-chap1-5-5.html) to /etc/hosts.

192.168.56.41    m1.hdp2
192.168.56.42    m2.hdp2
192.168.56.51    w1.hdp2
192.168.56.52    w2.hdp2
192.168.56.53    w3.hdp2

Obviously, we've only created the first one of these servers, but we'll build out the rest of them shortly.

Set the Hostname

Run the following commands and make sure the output is as shown.

[root@m1 etc]# hostname m1.hdp2
[root@m1 etc]# hostname -f
m1.hdp2
[root@m1 etc]# 

Create Node Appliance

Now that we have a solid baseline, let's use VirtualBox to create an "appliance" that can be used as an import when standing up the subsequent nodes.  Select Export Appliance... from the File pulldown and the highlight the 5N-HDP2-M1 VM before clicking on Continue.   Change the filename of the export to be 5N-HDP2-template.ova (stick with the default OVF 1.0 format) and click on Continue (make note of the directory this file will be created in).  In the Appliance settings screen, change the Name to be 5N-HDP2-template and click on the Export button.

Remaining Nodes

For the second master node and all three of the worker nodes, repeat the steps in this section.

NOTE: All examples will show the VirtualBox Name, OS Hostname, and IP Address of the second master node.  For the worker nodes, replace them with the appropriate values as detailed in the Pre-Configure Hosts section earlier in this document.

Import Template Appliance

From the VirtualBox File pulldown, select Import Appliance... and select the 5N-HDP2-template.ova previously created and click the Continue button.  On the Appliance settings screen edit the Name field to represent the appropriate VirtualBox Name (example: 5N-HDP2-M2).  Select the Reinitialize the MAC address of all network cards checkbox and click on Import.

Modify Node-Specific Settings

Make sure the 5N-HDP2-M1 VM is not running before starting up each newly imported machine as to avoid two machines binding to the same static IP address; 192.168.56.41.

Power up the new VM, log into it locally (root password is still 'hadoop'), and then make the following changes to configure the appropriate hostname and IP address for each additional node.

  • Modify HWADDR in /etc/sysconfig/network-scripts/ifcfg-eth0 to be what the MAC Address is set to for the Adapter 1 interface as shown in the example below.

  • Modify IPADDR in /etc/sysconfig/network-scripts/ifcfg-eth1 to be the appropriate IP address (example: 192.168.56.42).
  • Modify HOSTNAME in /etc/sysconfig/network to be the correct fully-qualified hostname (example: m2.hdp2).
  • Set the hostname as described in the earlier 'Set the Hostname' section (example: run hostname m2.hdp2 and verify by typing hostname -f to see it displayed).
  • Delete the /etc/udev/rules.d/70-persistent-net.rules file.
  • Execute shutdown -r now to reboot the VM.

Ensure Connectivity Between Nodes

The 5N-HDP2-M1 VM can now be brought back up for connectivity testing as there will not be a collision on the 192.168.56.41 address.

Make an ssh connection to the new cluster node from the host OS and once there ping the first master node & any other operational cluster nodes as well as an internet address.  Then verify the correct hostname is assigned to the newly created node.

hw10653:~ lmartin$ ssh root@192.168.56.42
root@192.168.56.42's password: 
Last login: Sat Jan 18 22:21:24 2014 from 192.168.56.1
[root@m2 ~]# 
[root@m2 ~]# ping 192.168.56.1
PING 192.168.56.1 (192.168.56.1) 56(84) bytes of data.
64 bytes from 192.168.56.1: icmp_seq=1 ttl=64 time=0.176 ms
64 bytes from 192.168.56.1: icmp_seq=2 ttl=64 time=0.249 ms
^C
--- 192.168.56.1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1440ms
rtt min/avg/max/mdev = 0.176/0.212/0.249/0.039 ms
[root@m2 ~]# 
[root@m2 ~]# ping m1.hdp2
PING m1.hdp2 (192.168.56.41) 56(84) bytes of data.
64 bytes from m1.hdp2 (192.168.56.41): icmp_seq=1 ttl=64 time=1.16 ms
64 bytes from m1.hdp2 (192.168.56.41): icmp_seq=2 ttl=64 time=0.394 ms
^C
--- m1.hdp2 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1431ms
rtt min/avg/max/mdev = 0.394/0.777/1.160/0.383 ms
[root@m2 ~]# 
[root@m2 ~]# ping www.google.com
PING www.google.com (173.194.115.49) 56(84) bytes of data.
64 bytes from dfw06s40-in-f17.1e100.net (173.194.115.49): icmp_seq=1 ttl=63 time=54.8 ms
64 bytes from dfw06s40-in-f17.1e100.net (173.194.115.49): icmp_seq=2 ttl=63 time=144 ms
^C
--- www.google.com ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1517ms
rtt min/avg/max/mdev = 54.879/99.851/144.824/44.973 ms
[root@m2 ~]# 
[root@m2 ~]# hostname -f
m2.hdp2
[root@m2 ~]# 

Now verify the password-less SSH connectivity is working between nodes.

hw10653:~ lmartin$ ssh root@192.168.56.42
root@192.168.56.42's password: 
Last login: Sat Jan 18 22:21:34 2014 from 192.168.56.1
[root@m2 ~]# 
[root@m2 ~]# ssh m1.hdp2
Last login: Sat Jan 18 22:09:13 2014 from m2.hdp2
[root@m1 ~]# 
[root@m1 ~]# ssh m2.hdp2
Last login: Sat Jan 18 22:32:01 2014 from 192.168.56.1
[root@m2 ~]# 

Rinse, Lather, and Repeat...

Go back to the beginning of this Remaining Nodes section and repeat all of the steps for each cluster node that still needs to be created, configured, and tested.

Install Cluster via Ambari

Now it is time to get down to the business at hand; setting up a Hadoop cluster.  We'll make it easy by using the Ambari installer.  The instructions are at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.0/bk_using_Ambari_book/content/ambari-chap1-1.html.  Fortunately, we did much of the "prep" work already. 

Perform all the CLI operations on m1.hdp2 as this is the box where we will be running Ambari.  After you run yum install wget, you can pick up at http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.0/bk_using_Ambari_book/content/ambari-chap2-1.html and drop the bits some place like /tmp.  Just keep rolling, the instructions are rock-solid.  For our simple case, take all the defaults and skip the optional steps.

Don't stop ambari-server as shown in http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.0/bk_using_Ambari_book/content/ambari-chap2-3.html as we're ready to start up the Ambari UI which should now be available at http://192.168.56.41:8080.  As http://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.0.9.0/bk_using_Ambari_book/content/ambari-chap3-1_2x.html states, the username/password is admin/admin.

When asked, name the cluster 'hdp2' and continue to follow the documentation.  On the Install Options screen, paste the contents from /root/.ssh/id_rsa including the "BEGIN" and "END" lines in the SSH Private Key textbox so the screen looks like the following.

After you click on the Register and Confirm button you will see the following Warning screen.

Just click OK.  After all hosts have been registered, uncheck Nagios, Ganglia, and HBase on the Choose Services page. Just clear the Limited Functionality Warning message that surfaces.

The Assign Masters screen defaulted to deploying the NameNode, SNameNode, and ResourceManager as visualized at the beginning of this document.  If yours looks different than below, then adjust to fit this approach.

Configure the Assign Slaves and Clients screen to look like the following.

To clear the error messages on the Customize Services screen, select the Hive tab and type in "hadoop" (twice) for the Default Password field.  Then do the exact same for the Oozie tab.  Take the default options on everything else and press OK.  The press Deploy on the Review page (again, I hope you're following along in the Hortonworks docs).  Now let the Install, Start and Test screen do its magic.

BTW; there is a LOT to be done so it will take some time.  In fact, some time-outs may occur and may surface as failures as the screenshot below showed I received.

Hit the Retry button again!  I was lucky, I only had to do that once before getting a screen full of green bars, but it did take a looooooooooooooong time to finish and I heard the fans on my MacBook Pro spinning up quite loud.

Lastly, we want to enable the base Ambari daemons to kick up at machine start up time so do the following.

  • On m1.hdp2 (i.e. where Ambari itself runs), type chkconfig ambari-server on.
  • On all machines, including m1.hdp2, type chkconfig ambari-agent on.

There you have it... as http://192.168.56.41:8080 shows, you have a fully functional cluster with 2 master nodes and 3 worker nodes!!