Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: added screenshots

...

Despite my staunch recommendations at a robust set of hadoop master nodes (it is hard to swing it with two machines) for strong node stereotypes, I've also decided I would start this cluster out with only three nodes total.  Yes, I'm going to make them all masters AND workers.  If it wasn't clear... that's NOT what I'd recommend for any "real" cluster.  I am mainly thinking of the costs I'm about to incur and this will still be much, much more than I usually run in my VirtualBox VMs on my mac.

So... let's get started!

INSERT SERVEROPITONS.JPEG HEREImage Added

As you see above, I decided to use the OpenLogic CentOS 6.6 VM image that Azure has.  I identified the hostname as hdp22-node1 and I'm betting you can guess what the other two names will be.  I then let it get provisioned and requested for it to be started.  From the next screenshot, you can find the domain name for this server.

INSERT SSHCONNECTIONPORTDETAIL.PNG HEREImage Added

You can also find the specific port needed to make an SSH connection which I was able to connect from my mac as shown below.

...

Now it is time to pull up the Ambari UI which should look something like my http://hdp22-node1.cloudapp.net:8080 link, but you should NOT be able to access it.  This is because we need to create an Azure "endpoint" to allow this traffic thru.  Back in the Azure Portal, click into your Ambari server, then All settings and Endpoints.

ADD findEndpoints GRAPHIC !!!!!Image Added

Then Add a new one as shown below and click OK to save it.

INSERT addEndpoint GRAPHIC /S CREENSHOT !!

The Image Added

Now you should be able to get in!!

INSERT ambariSplash WEB PAGE (MAKE A SMALLER SIZE AS REALLY UNIMPORTANT)Image Added

Now you can resume with the Launching the Ambari Install Wizard which should be the next step in the instructions after the bullet list items from above.  Remember to use the "internal" FQDNs you got from the hostname -f output as well as the private key data from the /root/.ssh/id_rsa file you created earlier.

INSERT hosts SCREENSHOT!!!Image Added

I did get hit with waning about the ntpd service not running, but I ignored it since it clearly looks like these Azure hosted VMs are in lock-step on time sync.  I was also warned that Transparent Huge Pages (THP) was enabled and I ignored it because, well frankly just because this is still just a test cluster and I needed to keep moving forward in my install(wink)  On the Choose Services wizard step, I deselected HBase as it isn't in my upcoming use cases and these VMs are already going to be taxed trying to be masters and workers.  I spread out the masters servers as shown below.

INSERT allMasters2 SCREENSHOT (MAYBE ON THE RIGHT AGAIN?)Image Added

With the (again, not recommended!) strategy of making each node a master and a worker, I simply checked all boxes on the Assign Slaves and Clients screen.  As usual, there are some things that have to be addressed on the Customize Services screen.  For the Hive, Oozie, and Knox tabs you are required for selecting a password for each of these components.  There are some other changes that need to be made.  For those properties than can (and should!) support multiple directories, Ambari tries to help out.  In most cases it adds the desired /grid/[1-4] mount points, but usually also brings in the "special" Azure filesystem, too.  The following table identifies the properties that need some attention prior to moving forward.

...

When I finally moved passed this screen, I got the following warnings (ignore the NM one as I backed that change out later) which are basically telling us to let the NN and NM processes have more memory, but we just don't have it and this should not be a problem for the kinds of limited workloads I'll be running on this cluster.

INSERT memoryWARNINGS.JPG SCREENSHOT!!Image Added

After a somewhat longer install process than i was expecting, I found out that only the DataNode that was running on the same node as the NameNode was actually functioning correctly.  After some searching I realized that this is likely due to Azure's use of DHCP and I found this StackOverflow article that helped.  Using Ambari (i.e Services > HDFS > Config > Custom hdfs-site > Add Property...) I was able to add the following KVP.

...

Code Block
languagexml
themeEmacs
title/etc/ambari-agent/conf/ambari-agent.ini
[server]
hostname=REMOVE_THE_IDIP_AND_REPLACE_WITH_FQDN 

After making that change on all three config files and issuing the ambari-agent restart command, Ambari could then communicate with the hosts and successfully started up the cluster. 

iINSERT FULLCLUSTERURNNING.PNG HERE!!Image Added

I am a bit concerned there are still some lingering IP-oriented problems and the cost is already seeming to be a bit prohibitive for me to spend too much time with the cluster operational, but I will share more problems & their resolutions as I encounter them.  With this HDP cluster running on Azure, I'll wrap up another (mildly successful) Hadoop blog posting!!