...

Code Block

breakoutMode	wide
language	bash

hdfs dfs -ls /warehouse/tablespace/managed/hive/
drwxrwx---+  - hive hadoop          0 2019-12-12 07:38 /wwa/t/m/h/try_it

Info
The `/warehouse/tablespace/managed/hive/` path is abbrevited as `/wwa/t/m/h/` in the above snippet and in the remainder of this blog posting.

...

There is no need to look at the ORC files for this change as it is similar to what we saw in transaction #3.

Txn 5 & 6: UPDATE Single Row to Change Partition

A student in a recent class asked if it would be possible to run an update statement that changed the partition virtual column value. I wasn’t sure and could argue it both ways. So, like most things in Hadoop the best way to answer a question like this is to just TRY IT!

Code Block

breakoutMode	wide
language	sql

UPDATE try_it SET prt = 'p3' WHERE a_val = 'noise' AND prt = 'p1';

Error: Error while compiling statement: FAILED: SemanticException [Error 10292]: 
  Updating values of partition columns is not supported (state=42000,code=10292)

Well, as you can see, NO JOY!! But would could just run to separate transactions; one to delete it and one to add it, but we do not have an atomic transaction with this and the responsibility of the data integrity would fall to the application, or person, to make sure both are committed.

Code Block

breakoutMode	wide
language	sql

DELETE FROM try_it WHERE a_val = 'noise' AND prt = 'p1';

INSERT INTO try_it VALUES (1,'noise','bogus2','p3');

select * from try_it;
+------------+---------------+---------------+-------------+
| try_it.id  | try_it.a_val  | try_it.b_val  | try_it.prt  |
+------------+---------------+---------------+-------------+
| 2          | noise         | bogus3        | p2          |
| 3          | noise         | bogus2        | p3          |
| 1          | noise         | bogus2        | p3          |
+------------+---------------+---------------+-------------+
3 rows selected (0.227 seconds)

Here are the applicable HDFS details for these two transactions.

Code Block

breakoutMode	wide
language	bash

drwxrwx---+  - hive hadoop          0 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1/delete_delta_0000005_0000005_0000
-rw-rw----+  3 hive hadoop          1 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1/delete_delta_0000005_0000005_0000/_orc_acid_version
-rw-rw----+  3 hive hadoop        733 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1/delete_delta_0000005_0000005_0000/bucket_00000
drwxrwx---+  - hive hadoop          0 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3/delta_0000006_0000006_0000
-rw-rw----+  3 hive hadoop          1 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3/delta_0000006_0000006_0000/_orc_acid_version
-rw-rw----+  3 hive hadoop        813 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3/delta_0000006_0000006_0000/bucket_00000

Parting Thoughts

These ACID transaction capabilities were mentioned way back during my presenting at hadoop summit (archiving evolving databases in hive) talk a few years ago and provide a much cleaner way at develop-time to address this need for allowing mutable data in Hive’s (previously) immutable world.

A concern that surfaces is that small files and hadoop's hdfs (bonus: an inode formula) causes not just “Namenode pressure”, but also forces the processing to read potentially lots and lots of small(ish) delta files instead of a few big(ger) “base” files. To make it worse, this work cannot be done in parallel for a given partition due to the need to walk these edits files in order so that Hive can present an accurate representation of what the end results should be.

Some of these concerns are addressed with hive delta file compaction (minor and major) processing that exists and anther big helper is to leverage the MERGE command to lump many changes into a single transaction. Both of these topics will be presented in future blog posts.

Versions Compared

Old Version 5

New Version Current

Key

Txn 5 & 6: UPDATE Single Row to Change Partition

Parting Thoughts

Page Comparison

Versions Compared

Old Version 5

New Version Current

Key

Txn 5 & 6: UPDATE Single Row to Change Partition

Parting Thoughts