learning something new every day (seems hdfs is not as immutable as i thought)

Well... it seems the old adage about learning something new every day just kicked me in the pants.  I have always stood firm on the statement that HDFS files were immutable, but I told folks that you could "game the system" by appending to a file.  Of course, I continued to stand firm on the immutability line as that meant that the newly added data (on the end of the file) was just captured in additional blocks.

Then today, the Big Data Bear (aka Laurent Weichberger) pointed me to the article at http://www.waitingforcode.com/hdfs/append-and-truncate-in-hdfs/read that indicates HDFS allows for actual block-level mutability.  I thought, well... let's try it out.

First I created a new (tiny) file and verified it was only had one block.

[maria_dev@sandbox ~]$ hdfs dfs -mkdir mutability-test
[maria_dev@sandbox ~]$ hdfs dfs -put /etc/hosts mutability-test/hosts.txt
[maria_dev@sandbox ~]$ hdfs dfs -cat mutability-test/hosts.txt
127.0.0.1    localhost
::1    localhost ip6-localhost ip6-loopback
fe00::0    ip6-localnet
ff00::0    ip6-mcastprefix
ff02::1    ip6-allnodes
ff02::2    ip6-allrouters
172.17.0.2    sandbox.hortonworks.com
[maria_dev@sandbox ~]$ hdfs fsck mutability-test/hosts.txt
Connecting to namenode via http://sandbox.hortonworks.com:50070/fsck?ugi=maria_dev&path=%2Fuser%2Fmaria_dev%2Fmutability-test%2Fhosts.txt
FSCK started by maria_dev (auth:SIMPLE) from /172.17.0.2 for path /user/maria_dev/mutability-test/hosts.txt at Tue Oct 03 19:16:01 UTC 2017
.Status: HEALTHY
 Total size:    185 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                 0
 Total blocks (validated):       1 (avg. block size 185 B)
 Minimally replicated blocks:    1 (100.0 %)
 Over-replicated blocks:         0 (0.0 %)
 Under-replicated blocks:        0 (0.0 %)
 Mis-replicated blocks:          0 (0.0 %)
 Default replication factor:     1
 Average block replication:      1.0
 Corrupt blocks:                 0
 Missing replicas:               0 (0.0 %)
 Number of data-nodes:           1
 Number of racks:                1
FSCK ended at Tue Oct 03 19:16:01 UTC 2017 in 1 milliseconds


The filesystem under path '/user/maria_dev/mutability-test/hosts.txt' is HEALTHY
[maria_dev@sandbox ~]$ 

Then I added the same linux file to the end of the newly created HDFS file and verified all was there.

[maria_dev@sandbox ~]$ hdfs dfs -appendToFile /etc/hosts mutability-test/hosts.txt
[maria_dev@sandbox ~]$ hdfs dfs -cat mutability-test/hosts.txt
127.0.0.1    localhost
::1    localhost ip6-localhost ip6-loopback
fe00::0    ip6-localnet
ff00::0    ip6-mcastprefix
ff02::1    ip6-allnodes
ff02::2    ip6-allrouters
172.17.0.2    sandbox.hortonworks.com
127.0.0.1    localhost
::1    localhost ip6-localhost ip6-loopback
fe00::0    ip6-localnet
ff00::0    ip6-mcastprefix
ff02::1    ip6-allnodes
ff02::2    ip6-allrouters
172.17.0.2    sandbox.hortonworks.com
[maria_dev@sandbox ~]$

Then the moment of truth came, I checked to see how many blocks it was made of.

[maria_dev@sandbox ~]$ hdfs fsck mutability-test/hosts.txt
Connecting to namenode via http://sandbox.hortonworks.com:50070/fsck?ugi=maria_dev&path=%2Fuser%2Fmaria_dev%2Fmutability-test%2Fhosts.txt
FSCK started by maria_dev (auth:SIMPLE) from /172.17.0.2 for path /user/maria_dev/mutability-test/hosts.txt at Tue Oct 03 19:18:30 UTC 2017
.Status: HEALTHY
 Total size:    370 B
 Total dirs:    0
 Total files:   1
 Total symlinks:                 0
 Total blocks (validated):       1 (avg. block size 370 B)
 Minimally replicated blocks:    1 (100.0 %)
 Over-replicated blocks:         0 (0.0 %)
 Under-replicated blocks:        0 (0.0 %)
 Mis-replicated blocks:          0 (0.0 %)
 Default replication factor:     1
 Average block replication:      1.0
 Corrupt blocks:                 0
 Missing replicas:               0 (0.0 %)
 Number of data-nodes:           1
 Number of racks:                1
FSCK ended at Tue Oct 03 19:18:30 UTC 2017 in 1 milliseconds


The filesystem under path '/user/maria_dev/mutability-test/hosts.txt' is HEALTHY
[maria_dev@sandbox ~]$ 

No surprise that the file was twice as long now, but low and behold... it is (still) only made up of ONE block!  I guess I'd better update my immutability speech I deliver almost every week.  See... we do learn something new every day!!