Content Comparison

...

Code Block

breakoutMode	wide
language	sql

INSERT INTO try_it VALUES 
(2, 'noise', 'bogus', 'p2'),
(3, 'noise', 'bogus', 'p3');

select * from try_it;
+------------+---------------+---------------+-------------+
| try_it.id  | try_it.a_val  | try_it.b_val  | try_it.prt  |
+------------+---------------+---------------+-------------+
| 1          | noise         | bogus         | p1          |
| 2          | noise         | bogus         | p2          |
| 3          | noise         | bogus         | p3          |
+------------+---------------+---------------+-------------+
3 rows selected (0.193 seconds)

Verify that both p1 and p2 partitions now have delta file directories/files which each contain changes belonging to transaction #2.

Code Block

breakoutMode	wide
language	bash

$ hdfs dfs -ls -R /wa/t/m/h/try_it
drwxrwx---+  - hive hadoop          0 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1
drwxrwx---+  - hive hadoop          0 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000
-rw-rw----+  3 hive hadoop          1 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000/_orc_acid_version
-rw-rw----+  3 hive hadoop        788 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000/bucket_00000
drwxrwx---+  - hive hadoop          0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2
drwxrwx---+  - hive hadoop          0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000
-rw-rw----+  3 hive hadoop          1 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/_orc_acid_version
-rw-rw----+  3 hive hadoop        788 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/bucket_00000
drwxrwx---+  - hive hadoop          0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3
drwxrwx---+  - hive hadoop          0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000
-rw-rw----+  3 hive hadoop          1 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/_orc_acid_version
-rw-rw----+  3 hive hadoop        796 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/bucket_00000

Pull down the delta files and inspect them.

Code Block

breakoutMode	wide
language	bash

[hive@ip-172-30-8-35 ~]$ hdfs dfs -get /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/bucket_00000 add2-p2
[hive@ip-172-30-8-35 ~]$ hdfs dfs -get /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/bucket_00000 add2-p3
[hive@ip-172-30-8-35 ~]$ ls -l add2-*
-rw-r--r--. 1 hive hadoop 788 Dec 12 09:22 add2-p2
-rw-r--r--. 1 hive hadoop 796 Dec 12 09:22 add2-p3

$ java -jar orc-tools-1.5.1-uber.jar data add2-p2
Processing data file add2-p2 [length: 788]
{"operation":0,"originalTransaction":2,"bucket":536870912,"rowId":0,"currentTransaction":2,
"row":{"id":2,"a_val":"noise","b_val":"bogus"}}

$ java -jar orc-tools-1.5.1-uber.jar data add2-p3
Processing data file add2-p3 [length: 796]
{"operation":0,"originalTransaction":2,"bucket":536870912,"rowId":0,"currentTransaction":2,
"row":{"id":3,"a_val":"noise","b_val":"bogus"}}

You can now see that both the p2 and p3 record additions are linked to "originalTransaction":2. To reiterate, transaction #2 spanned multiple partitions and each partition’s delta folder/file was properly aligned with the same transaction.

Txn 3: UPDATE Multiple Rows Across Multiple Partitions

Updates are tricky with Hive Transactions as there is no real in-place update occurring. Basically, Hive deletes the record to be updated and then does a net-new insert to account for what the updated recorded should look like at the end of the SQL statement. The section will show what this looks like behind-the-scenes.

To make the use case more interesting, we’ll make the update span records in multiple partitions so that we can see a similar behavior to the prior use case of a particular transaction number spanning these affected partitions.

Let’s start off with the SQL.

Version	Old Version 3	New Version 4
Changes made by	Lester Martin	Lester Martin
Saved on	Dec 23, 2019	Dec 23, 2019

Versions Compared

Key

Txn 3: UPDATE Multiple Rows Across Multiple Partitions