simple hadoop cluster user provisioning process (simple = w/o pam or kerberos)

These instructions are for "simple" Hadoop clusters that have no sophisticated PAM and/or Kerberos integrations.  They are ideal for the HDP Sandbox or other such "simple" setups like the one called out in building a virtualized 5-node HDP 2.0 cluster (all within a mac)  that rely on "local" users.

For all command examples, replace $theNEWusername with the username being created.

Edge Node / Client Gateway

On the box(es) where the user will SSH to and utilize CLI tools (this does NOT have to be a dedicated machine; for example, on the Sandbox there is only one machine), login as root and execute the following commands to create a local account and set the password.

useradd -m -s /bin/bash $theNEWusername
passwd $theNEWusername

Then create a HDFS home directory for this new user.

su - hdfs
hdfs dfs -mkdir /user/$theNEWusername
hdfs dfs -chown $theNEWusername /user/$theNEWusername
hdfs dfs -chmod -R 755 /user/$theNEWusername

Master & Worker Nodes

On the remainder of the cluster nodes (if any), we just need to have the new user present.  There is no need to set a password as these CLI users will not need to log into any of these hosts directly.

useradd $theNEWusername

User Validation

To validate, users can SSH into the edge node with their new credentials and run the following commands to verify that they can manipulate content on HDFS.  Note: where in Linux user can use "~" to reference their home directory, the FS Shell treats relative referencing (i.e. nothing before the initial file or folder name) as the equivalent to "~/" which means everything is based on the user's home folder in HDFS.

hdfs dfs -put /etc/group groupList.txt
hdfs dfs -ls /user/$theNEWusername
hdfs dfs -cat groupList.txt
hdfs dfs -rm -skipTrash /user/$theNEWusername/groupList.txt