Links & Cheat Sheets for Hadoop & Big Data

http://127.0.0.1:8888/ (HUE: hue/1111 and AMBARI: admin/admin)

ssh root@127.0.0.1 -p 2222 (default password is "hadoop")

scp -P 2222 example.jar mruser@127.0.0.1:/home/mruser/mrJars (password is "Sprint2000" for linux user and "hadoop" for one created in hue)

if you make changes to /etc/sysconfig/network-scripts/ifcfg-eth0, then make sure you rm /etc/udev/rules.d/70-persistent-net.rules

Benchmarking and Stress Testing a Hadoop Cluster
Teragen & Terasort on HDP (the argument is how many lines of 100 bytes you want - 10,000,000,000 would net you 1TB of data)
- HDP 1.3.2 (correctly identified that 100000000 would give you 10GB)
- HDP 2.0.9.0 (incorrectly identifies that 10000 would give you 10GB, but that is really only 1MB)

Kill a hadoop job:

yarn application -kill $ApplicationId

You can get a list of all ApplicationId's doing:

yarn application -list

dfs.datanode.max.transfer.threads - default is 1024, but bump to >= 4096 or >= 16K for HBase

Lester Martin