...
Code Block | ||||
---|---|---|---|---|
| ||||
INSERT INTO try_it VALUES (2, 'noise', 'bogus', 'p2'), (3, 'noise', 'bogus', 'p3'); select * from try_it; +------------+---------------+---------------+-------------+ | try_it.id | try_it.a_val | try_it.b_val | try_it.prt | +------------+---------------+---------------+-------------+ | 1 | noise | bogus | p1 | | 2 | noise | bogus | p2 | | 3 | noise | bogus | p3 | +------------+---------------+---------------+-------------+ 3 rows selected (0.193 seconds) |
Verify that both p1 and p2 partitions now have delta file directories/files which each contain changes belonging to transaction #2.
Code Block | ||||
---|---|---|---|---|
| ||||
$ hdfs dfs -ls -R /wa/t/m/h/try_it
drwxrwx---+ - hive hadoop 0 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1
drwxrwx---+ - hive hadoop 0 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 788 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2
drwxrwx---+ - hive hadoop 0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 788 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3
drwxrwx---+ - hive hadoop 0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 796 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/bucket_00000 |
Pull down the delta files and inspect them.
Code Block | ||||
---|---|---|---|---|
| ||||
[hive@ip-172-30-8-35 ~]$ hdfs dfs -get /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/bucket_00000 add2-p2
[hive@ip-172-30-8-35 ~]$ hdfs dfs -get /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/bucket_00000 add2-p3
[hive@ip-172-30-8-35 ~]$ ls -l add2-*
-rw-r--r--. 1 hive hadoop 788 Dec 12 09:22 add2-p2
-rw-r--r--. 1 hive hadoop 796 Dec 12 09:22 add2-p3
$ java -jar orc-tools-1.5.1-uber.jar data add2-p2
Processing data file add2-p2 [length: 788]
{"operation":0,"originalTransaction":2,"bucket":536870912,"rowId":0,"currentTransaction":2,
"row":{"id":2,"a_val":"noise","b_val":"bogus"}}
$ java -jar orc-tools-1.5.1-uber.jar data add2-p3
Processing data file add2-p3 [length: 796]
{"operation":0,"originalTransaction":2,"bucket":536870912,"rowId":0,"currentTransaction":2,
"row":{"id":3,"a_val":"noise","b_val":"bogus"}} |
You can now see that both the p2 and p3 record additions are linked to "originalTransaction":2
. To reiterate, transaction #2 spanned multiple partitions and each partition’s delta folder/file was properly aligned with the same transaction.
Txn 3: UPDATE Multiple Rows Across Multiple Partitions
Updates are tricky with Hive Transactions as there is no real in-place update occurring. Basically, Hive deletes the record to be updated and then does a net-new insert to account for what the updated recorded should look like at the end of the SQL statement. The section will show what this looks like behind-the-scenes.
To make the use case more interesting, we’ll make the update span records in multiple partitions so that we can see a similar behavior to the prior use case of a particular transaction number spanning these affected partitions.
Let’s start off with the SQL.