Creating a New HDFS User
For a simple environment such as called out in Build a Virtualized 5-Node Hadoop 2.0 Cluster, first create a new linux user and set a password for the new account.
useradd -m -s /bin/bash theNEWusername passwd theNEWusername
Then create a HDFS home directory for this new user.
su hdfs hdfs dfs -mkdir /user/theNEWusername hdfs dfs -chown theNEWusername /user/theNEWusername
Hortonworks Sandbox
http://127.0.0.1:8888/ (HUE: hue/1111 and AMBARI: admin/admin)
ssh root@127.0.0.1 -p 2222 (default password is "hadoop")
scp -P 2222 example.jar mruser@127.0.0.1:/home/mruser/mrJars (password is "Sprint2000" for linux user and "hadoop" for one created in hue)
Networking & VirtualBox
if you make changes to /etc/sysconfig/network-scripts/ifcfg-eth0, then make sure you rm /etc/udev/rules.d/70-persistent-net.rules
HDP File Locations
Binaries: /usr/lib/SERVICENAME
Configuration: /etc/SERVICENAME/conf
Logs: /var/log/SERVICENAME
Benchmarking & Performance/Scalability Testing
- Benchmarking and Stress Testing a Hadoop Cluster
- Teragen & Terasort on HDP (the argument is how many lines of 100 bytes you want - 10,000,000,000 would net you 1TB of data)
- HDP 1.3.2 (correctly identified that 100000000 would give you 10GB)
- HDP 2.0.9.0 (incorrectly identifies that 10000 would give you 10GB, but that is really only 1MB)
Repo Help
Other Stuff
- Script to iterate down a dir tree and copy everything into HDFS; http://one-line-it.blogspot.com/2013/05/hadoop-copy-directly-to-hdfs-from.html
- How to set the number of mappers and reducers of Hadoop in command line
- Details on HDFS Balancer command; http://www.swiss-scalability.com/2013/08/hadoop-hdfs-balancer-explained.html
- Oracle JDK 7 Archive Download Page
- SAP HANA and HDP Integration; http://hortonworks.com/wp-content/uploads/2013/09/Demo-Tutorial-Leveraging_SAP_HANA__HDP_Jan_2014.pdf
Random Notes
dfs.datanode.max.transfer.threads - default is 1024, but bump to >= 4096 or >= 16K for HBase
Add Comment