Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

As always is the case, never be too sure of yourself to not listen to others or to give yourself a few minutes to validate your understanding with a simple test. And yes, enjoy the big piece of humble pie when it is served to you. (smile)

Info

These findings only raised more questions as I thought about it, such as what happens to cat, cp, mv, and rm commands so I tested them out to find less than desirable answers, but ones that did fit in line with the findings above. I was going to publish a second blog post to follow this up as I’m a big believer that blog postings (not wiki pages – check out https://martin.atlassian.net/wiki/spaces/lestermartin/blog/2012/11/18/8749120/enterprise+2.0+book+review+using+web+2.0+technologies+within+organizations if you don’t know why!) are immutable, but since I was wrong about learning something new every day (seems hdfs is not as immutable as i thought) and these additional research findings are additive and do not alter the content above, I decided to just add them here.

Can the COPYING File be Read? YES.

The file cannot be read by the filename that it is being created as, but the COPYING file can.

Code Block
breakoutModewide
languagebash
training@cmhost:~$ ./putFile.sh & 
[1] 10138
Fri Mar 22 11:22:21 PDT 2019

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup 1818572288 2019-03-22 11:22 /tmp/ngramsAthruE/aTHRUe.txt._COPYING_

training@cmhost:~$ hdfs dfs -cat /tmp/ngramsAthruE/aTHRUe.txt
cat: `/tmp/ngramsAthruE/aTHRUe.txt': No such file or directory

training@cmhost:~$ hdfs dfs -cat /tmp/ngramsAthruE/aTHRUe.txt._COPYING_

aflually_ADV	2004	1	1
aflually_ADV	2006	2	2
aflually_ADV	2008	1	1
afluente_.	1923	2	2
afluente_.	1924	5	1
afluente_.	1926	1	1
aflcat: Filesystem closed

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup 2415919104 2019-03-22 11:22 /tmp/ngramsAthruE/aTHRUe.txt._COPYING_
training@cmhost:~$ Fri Mar 22 11:23:34 PDT 2019

[1]+  Done                    ./putFile.sh

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup 7498258729 2019-03-22 11:23 /tmp/ngramsAthruE/aTHRUe.txt
training@cmhost:~$ 

Can the COPYING File be Copied? YES.

While I was able to create an exception on one test, the results below do validate that the in-flight COPYING file can be copied based on its size at the time of the operations.

Code Block
breakoutModewide
languagebash
training@cmhost:~$ ./putFile.sh &
[1] 18298
Fri Mar 22 11:53:02 PDT 2019

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup  402653184 2019-03-22 11:53 /tmp/ngramsAthruE/aTHRUe.txt._COPYING_

training@cmhost:~$ hdfs dfs -cp /tmp/ngramsAthruE/aTHRUe.txt._COPYING_ /tmp/ngramsAthruE/inflight-copy.txt

training@cmhost:~$ Fri Mar 22 11:54:37 PDT 2019

[1]+  Done                    ./putFile.sh

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 2 items
-rw-r--r--   3 training supergroup 7498258729 2019-03-22 11:54 /tmp/ngramsAthruE/aTHRUe.txt
-rw-r--r--   3 training supergroup 1225386496 2019-03-22 11:53 /tmp/ngramsAthruE/inflight-copy.txt
training@cmhost:~$

Can the COPYING File be Moved/Renamed? YES.

Much to my surprise, this actually caused no problems at all and the completed, full-sized, file retained the name it was renamed to.

Code Block
breakoutModewide
languagebash
training@cmhost:~$ ./putFile.sh &
[1] 24698
Fri Mar 22 12:02:37 PDT 2019

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup  536870912 2019-03-22 12:02 /tmp/ngramsAthruE/aTHRUe.txt._COPYING_

training@cmhost:~$ hdfs dfs -mv /tmp/ngramsAthruE/aTHRUe.txt._COPYING_ /tmp/ngramsAthruE/inflight-move.txt
training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup 2013265920 2019-03-22 12:02 /tmp/ngramsAthruE/inflight-move.txt

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup 4026531840 2019-03-22 12:02 /tmp/ngramsAthruE/inflight-move.txt

Fri Mar 22 12:03:54 PDT 2019

[1]+  Done                    ./putFile.sh

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup 7498258729 2019-03-22 12:03 /tmp/ngramsAthruE/inflight-move.txt
training@cmhost:~$

Can the COPYING File be Deleted? YES.

Sadly, it can. Additionally, it causes havoc for the client writing the file.

Code Block
breakoutModewide
languagebash
training@cmhost:~$ ./putFile.sh &
[1] 11965
Fri Mar 22 12:15:20 PDT 2019

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
Found 1 items
-rw-r--r--   3 training supergroup  536870912 2019-03-22 12:15 /tmp/ngramsAthruE/aTHRUe.txt._COPYING_

training@cmhost:~$ hdfs dfs -rm -skipTrash /tmp/ngramsAthruE/aTHRUe.txt._COPYING_
Deleted /tmp/ngramsAthruE/aTHRUe.txt._COPYING_
training@cmhost:~$ 19/03/22 12:15:35 WARN hdfs.DFSClient: DataStreamer Exception
org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /tmp/ngramsAthruE/aTHRUe.txt._COPYING_ (inode 56597): File does not exist. Holder DFSClient_NONMAPREDUCE_-1649048187_1 does not have any open files.
	at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:3820)
  ...  ...  ...  STACK TRACE LINES RM'D  ...  ...  ... 
	at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:790)
put: No lease on /tmp/ngramsAthruE/aTHRUe.txt._COPYING_ (inode 56597): File does not exist. Holder DFSClient_NONMAPREDUCE_-1649048187_1 does not have any open files.

Fri Mar 22 12:15:36 PDT 2019

[1]+  Done                    ./putFile.sh

training@cmhost:~$ hdfs dfs -ls /tmp/ngramsAthruE
training@cmhost:~$ 

How Do I Feel About All of This?

I guess it doesn’t matter that I don’t like it… It is what it is!! Glad I know NOW!

Good luck and happy Hadooping.