simple hadoop cluster user provisioning process (simple = w/o pam or kerberos)
These instructions are for "simple" Hadoop clusters that have no sophisticated PAM and/or Kerberos integrations. They are ideal for the HDP Sandbox or other such "simple" setups like the one called out in building a virtualized 5-node HDP 2.0 cluster (all within a mac) that rely on "local" users.
For all command examples, replace $theNEWusername
with the username being created.
Edge Node / Client Gateway
On the box(es) where the user will SSH to and utilize CLI tools (this does NOT have to be a dedicated machine; for example, on the Sandbox there is only one machine), login as root and execute the following commands to create a local account and set the password.
useradd -m -s /bin/bash $theNEWusername passwd $theNEWusername
Then create a HDFS home directory for this new user.
su - hdfs hdfs dfs -mkdir /user/$theNEWusername hdfs dfs -chown $theNEWusername /user/$theNEWusername hdfs dfs -chmod -R 755 /user/$theNEWusername
Master & Worker Nodes
On the remainder of the cluster nodes (if any), we just need to have the new user present. There is no need to set a password as these CLI users will not need to log into any of these hosts directly.
useradd $theNEWusername
User Validation
To validate, users can SSH into the edge node with their new credentials and run the following commands to verify that they can manipulate content on HDFS. Note: where in Linux user can use "~" to reference their home directory, the FS Shell treats relative referencing (i.e. nothing before the initial file or folder name) as the equivalent to "~/" which means everything is based on the user's home folder in HDFS.
hdfs dfs -put /etc/group groupList.txt hdfs dfs -ls /user/$theNEWusername hdfs dfs -cat groupList.txt hdfs dfs -rm -skipTrash /user/$theNEWusername/groupList.txt