/
HDP & Core Hadoop Cheat Sheet
HDP & Core Hadoop Cheat Sheet
Hortonworks Sandbox
http://127.0.0.1:8888/ (HUE: hue/1111 and AMBARI: admin/admin)
ssh root@127.0.0.1 -p 2222 (default password is "hadoop")
scp -P 2222 example.jar mruser@127.0.0.1:/home/mruser/mrJars (password is "Sprint2000" for linux user and "hadoop" for one created in hue)
Networking & VirtualBox
if you make changes to /etc/sysconfig/network-scripts/ifcfg-eth0, then make sure you rm /etc/udev/rules.d/70-persistent-net.rules
HDP File Locations
Benchmarking & Performance/Scalability Testing
- Benchmarking and Stress Testing a Hadoop Cluster
- Teragen & Terasort on HDP (the argument is how many lines of 100 bytes you want - 10,000,000,000 would net you 1TB of data)
- HDP 1.3.2 (correctly identified that 100000000 would give you 10GB)
- HDP 2.0.9.0 (incorrectly identifies that 10000 would give you 10GB, but that is really only 1MB)
Repo Help
Apache Ambari
- Examples of Ambari REST API
- Ambari Shell for CLI commands; https://cwiki.apache.org/confluence/display/AMBARI/Ambari+Shell
Other Stuff
- Script to iterate down a dir tree and copy everything into HDFS; http://one-line-it.blogspot.com/2013/05/hadoop-copy-directly-to-hdfs-from.html
- How to set the number of mappers and reducers of Hadoop in command line
- Details on HDFS Balancer command; http://www.swiss-scalability.com/2013/08/hadoop-hdfs-balancer-explained.html
- Oracle JDK 7 Archive Download Page
- SAP HANA and HDP Integration; http://hortonworks.com/wp-content/uploads/2013/09/Demo-Tutorial-Leveraging_SAP_HANA__HDP_Jan_2014.pdf
- Creating & registering custom Ambari alerts; https://community.hortonworks.com/articles/38149/how-to-create-and-register-custom-ambari-alerts.html
- Managing Hadoop DR with distcp and snapshots; https://community.cloudera.com/t5/Community-Articles/Managing-Hadoop-DR-with-distcp-and-snapshots/ta-p/248362
YARN
Kill a hadoop job:
yarn application -kill $ApplicationId
You can get a list of all ApplicationId's doing:
yarn application -list
Random Notes
dfs.datanode.max.transfer.threads - default is 1024, but bump to >= 4096 or >= 16K for HBase
Related content
Links & Cheat Sheets for Hadoop & Big Data
Links & Cheat Sheets for Hadoop & Big Data
More like this
building a virtualized 5-node HDP 2.0 cluster (all within a mac)
building a virtualized 5-node HDP 2.0 cluster (all within a mac)
More like this
installing hdp 2.2 with ambari 2.0 (moving to the azure cloud)
installing hdp 2.2 with ambari 2.0 (moving to the azure cloud)
More like this
installing hdp 2.2 with ambari 2.0 (moving to the amazon cloud)
installing hdp 2.2 with ambari 2.0 (moving to the amazon cloud)
More like this
setting up hdp 2.1 with non-standard users for hadoop services (why not use a non-standard user for ambari, too)
setting up hdp 2.1 with non-standard users for hadoop services (why not use a non-standard user for ambari, too)
More like this
what's after the hortonworks sandbox? (a 5-node cluster!)
what's after the hortonworks sandbox? (a 5-node cluster!)
More like this