So... time to eat some crow. I had a customer who is automating the user onboarding process for his Hadoop cluster and wanted to know if he could use a linux account besides hdfs
to create a HDFS user home directory and set the appropriate permissions (see "Creating a New HDFS User" in my Hadoop Cheat Sheet). I told him he was out of luck and that was just the way it was going to be.
To make matters worse, my "you must switch to hdfs
to create the home directory and change the owner" is actually wrong. You can just switch to the newly created user and keep on keeping on.
[root@sandbox ~]# useradd nonadminuser [root@sandbox ~]# su nonadminuser [nonadminuser@sandbox root]$ hdfs dfs -mkdir /user/nonadminuser [nonadminuser@sandbox root]$ hdfs dfs -chgrp nonadminuser /user/nonadminuser [nonadminuser@sandbox root]$ hdfs dfs -ls /user ... rm'd some lines ... drwxr-xr-x - nonadminuser nonadminuser 0 2014-08-14 00:01 /user/nonadminuser ... rm'd some lines ...
If you want to have a process that doesn't involve switching to any other user, then please read on.
Thinking about it a bit later, I realized I actually never ran this one down. Navigating through the Hadoop site got me to http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#The_Super-User which told me what I've been espousing all along; the user that starts up the NameNode (NN) is the superuser. Then I saw it – the phrase that let me know I was wrong in my reply...
In addition, the administrator my identify a distinguished group using a configuration parameter. If set, members of this group are also super-users.
Doh! I was definitely wrong in my thinking and reply to my customer. Hey, only the second time this month, but we have half a month to go!!
Let's see this in action. First, we need a test bed to work from. Let's use hdfs
to create a test directory and then lock down the permissions to only the hdfs
user.
[root@sandbox ~]# su hdfs [hdfs@sandbox root]$ hdfs dfs -mkdir /testSuperUser [hdfs@sandbox root]$ hdfs dfs -mkdir /testSuperUser/testDirectory [hdfs@sandbox root]$ hdfs dfs -ls /testSuperUser Found 1 items drwxr-xr-x - hdfs hdfs 0 2014-08-13 22:42 /testSuperUser/testDirectory [hdfs@sandbox root]$ hdfs dfs -chmod 700 /testSuperUser/testDirectory [hdfs@sandbox root]$ hdfs dfs -ls /testSuperUser Found 1 items drwx------ - hdfs hdfs 0 2014-08-13 22:42 /testSuperUser/testDirectory
Now let's create an animals
group with two users in it; cat
and bat
.
[hdfs@sandbox root]$ exit exit [root@sandbox ~]# groupadd animals [root@sandbox ~]# useradd -ganimals cat [root@sandbox ~]# useradd -ganimals bat [root@sandbox ~]# lid -g animals cat(uid=1021) bat(uid=1022)
Then make sure they can't do anything that requires superuser access.
[root@sandbox ~]# su cat [cat@sandbox root]$ hdfs dfs -ls /testSuperUser Found 1 items drwx------ - hdfs hdfs 0 2014-08-13 22:42 /testSuperUser/testDirectory [cat@sandbox root]$ hdfs dfs -chgrp bogus /testSuperUser/testDirectory chgrp: changing ownership of '/testSuperUser/testDirectory': Permission denied
No joy, but that is as expected. The instructions at http://hadoop.apache.org/docs/r2.4.1/hadoop-project-dist/hadoop-hdfs/HdfsPermissionsGuide.html#Configuration_Parameters let me know I need to make sure there is a dfs.permissions.group
KVP created for hdfs-site.xml
. This parameter can be found in Ambari at Services > HDFS > Configs > Advanced > dfs.permissions.superusergroup. For my Hortonworks Sandbox this value is set to hdfs
. This also aligns with the fact that unless you do a −chgrp
, your newly created items have the group set to hdfs
on this little pseudo-cluster. I did find out later that even with a different superusergroup identified, the owning group stayed as hdfs
.
[cat@sandbox root]$ exit exit [root@sandbox ~]# su turtle [turtle@sandbox root]$ hdfs dfs -put /etc/group groups.txt [turtle@sandbox root]$ hdfs dfs -ls Found 1 items -rw-r--r-- 1 turtle hdfs 1033 2014-08-13 23:12 groups.txt
After I changed the "superuser" group to be animals
, I can now make the changes that I wanted to earlier.
[turtle@sandbox root]$ exit exit [root@sandbox ~]# su cat [cat@sandbox root]$ hdfs dfs -ls /testSuperUser Found 1 items drwx------ - hdfs hdfs 0 2014-08-13 22:42 /testSuperUser/testDirectory [cat@sandbox root]$ hdfs dfs -chgrp bogus /testSuperUser/testDirectory [cat@sandbox root]$ hdfs dfs -ls /testSuperUser Found 1 items drwx------ - hdfs bogus 0 2014-08-13 22:42 /testSuperUser/testDirectory
I also did not screw up the fact that hdfs
is my true superuser as shown by my "old" HDFS home directory process.
[cat@sandbox root]$ exit exit [root@sandbox ~]# useradd user1 [root@sandbox ~]# su hdfs [hdfs@sandbox root]$ hdfs dfs -mkdir /user/user1 [hdfs@sandbox root]$ hdfs dfs -ls /user ... rm'd some lines ... NOTICE THAT THE GROUP STILL DEFAULTS TO hdfs, NOT animals drwxr-xr-x - hdfs hdfs 0 2014-08-13 23:49 /user/user1 [hdfs@sandbox root]$ hdfs dfs -chown user1 /user/user1 [hdfs@sandbox root]$ hdfs dfs -chgrp user1 /user/user1 [hdfs@sandbox root]$ hdfs dfs -ls /user ... rm'd some lines ... drwxr-xr-x - user1 user1 0 2014-08-13 23:49 /user/user1
Which can now also be done as a "real" user if set up appropriately.
[hdfs@sandbox root]$ exit exit [root@sandbox ~]# useradd user2 [root@sandbox ~]# su bat [bat@sandbox root]$ hdfs dfs -mkdir /user/user2 [bat@sandbox root]$ hdfs dfs -ls /user ... rm'd some lines ... NOTICE THAT THE GROUP STILL DEFAULTS TO hdfs, NOT animals drwxr-xr-x - user1 user1 0 2014-08-13 23:49 /user/user1 drwxr-xr-x - bat hdfs 0 2014-08-13 23:55 /user/user2 [bat@sandbox root]$ hdfs dfs -chown user2 /user/user2 [bat@sandbox root]$ hdfs dfs -chgrp user2 /user/user2 [bat@sandbox root]$ hdfs dfs -ls /user ... rm'd some lines ... drwxr-xr-x - user1 user1 0 2014-08-13 23:49 /user/user1 drwxr-xr-x - user2 user2 0 2014-08-13 23:55 /user/user2
As usual, there are many ways to skin this cat and this simple property is the gateway to those choices. For many, the simple model of just adding the desired linux user(s) to the existing "superusers" group may be the way to go. If you are using this today, or might just do so, I'd love to hear your actual, or planned, approach.
Add Comment