Skip to end of metadata
Go to start of metadata

You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 2 Next »

This is a sister post to installing hdp 2.2 with ambari 2.0 (moving to the azure cloud), but this time using AWS' EC2 IaaS servers.  Just like with Azure having options such as HDInsight, AWS offers EMR for an easy to deploy Hadoop option.  This post is focused on using another Cloud provider's IaaS offering to help someone who is planning on deploying the full HDP stack (in the cloud or on-prem), so I'll focus on using AWS' EC2 offering.

Like with the Azure cluster build-out, I'm going to violate my own a robust set of hadoop master nodes (it is hard to swing it with two machines) advice and create a cluster across three nodes with all three being masters AND workers.  I repeat, this approach is NOT what I'd recommend for any "real" cluster.  I am mainly thinking of the costs I'm about to incur and this setup will still provide much, much more storage & compute abilities than I could ever get running a cluster VirtualBox VMs on my mac.  That out of the way, let's get rolling!

First up is to log into https://aws.amazon.com/.  If you don't have an account already, check to see if Amazon is offering some free services (hey, nothing wrong with getting something for free!).  Once logged in, go to the EC2 Dashboard; one way to get there from the main AWS page is to select Services >> EC2.  You should see something like the following.

Let's run some CentOS 6.x servers since HDP 2.2 supported OS list does not cover CentOS 7.  To get started, click on the Launch Instance button above.  Then select AWS Marketplace and type in "CentOS 6" in the search box and press <Enter>.  You can then click the Select button as shown below.

For my cluster, I went with the "d2.xlarge" instance types as it'll give me a few disks per node as well as enough CPU cores and memory to set up a simple cluster.

Be sure to click on Next: Configure Instance Details to set up the disks the way we need them to be and NOT on Review and Launch.  On the Step 3 screen, I entered "3" in the Number of instances text box (i.e. I want 3 servers) and clicked on Next: Add Storage.  On the Step 4 screen, as shown below, toggle the three Type column pulldowns from "Instance Store N" to "EBS", enter "1024" for the Size column, choose "SSD" for the Volume Type and select the Delete on Termination checkbox for all four storage devices before clicking on Next: Tag Instance

BTW, yes, 8GB is woefully small for a root file system on a host running HDP, but it'll do for the exercise of installing HDP.  Visit these instructions for how to change it after creation.  On the Step 5 screen, just select Next: Configure Security Group.  On Step 6, you want "Create a new security group" radio button with the Assign a security group label and add something like "ec2testSecurityGroup" to Security group name before ignoring the IP address warning and clicking on Review and Launch.

After you perform a quick review on Step 7, click on the Launch button at the bottom of this screen.  You'll then get prompted with the following lightbox where you can use a .pem file that you created before or build a new one.  I chose to create a new one called "ec2test".

As indicated above, be sure to click on Download Key Pair and save this file as you'll need it later to SSH to the boxes.  Then you can click on Launch Instances.  You will then get some help finding these instances, but to help you find them later just use the Services >> EC2 navigation approach presented earlier.  You should now see a 3 Running Instances link near the top of the page and clicking on that should show you something like the following (I went ahead and added some friendly identifies in the Name column, too).

Now we should log into the boxes and make sure they are running.  On the same screen as shown above, select the first instance ("ec2node2.hdp22" in my example) and click on the Connect button which will pop-up another lightbox telling you how to connect.  Here's that in action on my mac.

HW10653:Amazon lmartin$ ls -l
total 8
-rw-r-----@ 1 lmartin  staff  1692 Jun 29 23:46 ec2test.pem
HW10653:Amazon lmartin$ chmod 400 ec2test.pem
HW10653:Amazon lmartin$ ssh -i ec2test.pem root@52.27.108.113
[root@ip-172-31-7-169 ~]# whoami
root
[root@ip-172-31-7-169 ~]# hostname
ip-172-31-7-169

After you check all three boxes, it is time to setup the three 1TB drives we identified earlier for each node.  Good sources of information on this process are here and here.  The following is the play-by-play steps based on the first of the two provided links.  Specifically, the To make a volume available steps from the first link and then Step 7 from the second link.

ghghgg

 

 

 

 

 

 

 

 

  • No labels