hive acid transactions with partitions (a behind the scenes perspective)

Ever since Hive Transactions have surfaced, and especially since Apache Hive 3 was released, I’ve been meaning to capture a behind-the-scenes look at the underlying delta ORC files that are created; and yes, compacted. If you are new to Hive’s ACID transactions, then the first link in this post as well as the Understanding Hive ACID Transaction Table blog posting are great places to start.

Bonus points to those who remember what ACID stands for – add a comment at the bottom of this posting if you know! If you don’t it might be time to review RDBMS fundamentals.

 

Transactional Table DDL

Let’s create a transactional table with some Data Definition Language to test our use cases out on.

CREATE TABLE try_it (id int, a_val string, b_val string) PARTITIONED BY (prt string) STORED AS ORC; desc try_it; +--------------------------+------------+----------+ | col_name | data_type | comment | +--------------------------+------------+----------+ | id | int | | | a_val | string | | | b_val | string | | | prt | string | | | | NULL | NULL | | # Partition Information | NULL | NULL | | # col_name | data_type | comment | | prt | string | | +--------------------------+------------+----------+

Check to make sure the HDFS file structure was created.

hdfs dfs -ls /warehouse/tablespace/managed/hive/ drwxrwx---+ - hive hadoop 0 2019-12-12 07:38 /wa/t/m/h/try_it

The /warehouse/tablespace/managed/hive/ path is abbrevited as /wa/t/m/h/ in the above snippet and in the remainder of this blog posting.

DML Use Cases

Let’s explore some CRUD (Create, Retrieve, Update, Delete) uses cases as expressed in Data Manipulation Language.

Txn 1: INSERT Single Row

INSERT INTO try_it VALUES (1, 'noise', 'bogus', 'p1'); select * from try_it; +------------+---------------+---------------+-------------+ | try_it.id | try_it.a_val | try_it.b_val | try_it.prt | +------------+---------------+---------------+-------------+ | 1 | noise | bogus | p1 | +------------+---------------+---------------+-------------+ 1 row selected (0.515 seconds)

Verify that the p1 partition now has a delta file and that it only includes changes belonging to transaction #1 (see the delta_0000001_0000001 indicator).

Pull down this delta file and use the knowledge from https://martin.atlassian.net/wiki/spaces/lestermartin/blog/2019/12/12/1397686273 to inspect it.

You can see the single row we added into the p1 partition is present. You’ll also notice it is annotated as coming from transaction #1 as indicated by "currentTransaction":1.

Txn 2: INSERT Multiple Rows Across Multiple Partitions

Insert statements allow multiple rows to be added at once and they all belong to a single ACID transaction. This use case is to exercise that, but to make it a bit more fun we can span more than one partition.

Verify that both p1 and p2 partitions now have delta file directories/files which each contain changes belonging to transaction #2.

Pull down the delta files and inspect them.

You can now see that both the p2 and p3 record additions are linked to "currentTransaction":2. To reiterate, transaction #2 spanned multiple partitions and each partition’s delta folder/file was properly aligned with the same transaction.

Txn 3: UPDATE Multiple Rows Across Multiple Partitions

Updates are tricky with Hive Transactions as there is no real in-place update occurring. Basically, Hive deletes the record to be updated and then does a net-new insert to account for what the updated recorded should look like at the end of the SQL statement. The section will show what this looks like behind-the-scenes.

To make the use case more interesting, we’ll make the update span records in multiple partitions so that we can see a similar behavior to the prior use case of a particular transaction number spanning these affected partitions.

Let’s start off with the SQL.

Verify that all three partitions are modified by each having delete_delta_ and delta_ directories.

Let’s just focus on p2’s files as the other two partitions are basically the same thing.

First, look at the delete file which shows that this is "currentTransaction":3 which is a delete of "originalTransaction":2 created earlier in transaction #2.

The delta file then shows a new "currentTransaction":3 record which is the projection of what the update statement modified to the record that was just deleted.

Txn 4: UPDATE Single Row (Leveraging Partitioning)

This use case is just calling out that that we should be using the partitioned virtual column in the update statement as much as possible to make Hive’s just much easier, but only looking in the folders that can possibly be affected instead of walking the full table’s contents.

In this example, without the partition condition we would have updated all three partitions again. Make sure only the p2 partition shows delete_delta_0000004_0000004_ and delta_0000004_0000004_ folders.

There is no need to look at the ORC files for this change as it is similar to what we saw in transaction #3.

Txn 5 & 6: UPDATE Single Row to Change Partition

A student in a recent class asked if it would be possible to run an update statement that changed the partition virtual column value. I wasn’t sure and could argue it both ways. So, like most things in Hadoop the best way to answer a question like this is to just TRY IT!

Well, as you can see, NO JOY!! But would could just run to separate transactions; one to delete it and one to add it, but we do not have an atomic transaction with this and the responsibility of the data integrity would fall to the application, or person, to make sure both are committed.

Here are the applicable HDFS details for these two transactions.

Parting Thoughts

These ACID transaction capabilities were mentioned way back during my https://martin.atlassian.net/wiki/spaces/lestermartin/blog/2015/06/11/41123865 talk a few years ago and provide a much cleaner way at develop-time to address this need for allowing mutable data in Hive’s (previously) immutable world.

A concern that surfaces is that https://martin.atlassian.net/wiki/spaces/lestermartin/blog/2014/07/11/26148906 causes not just “Namenode pressure”, but also forces the processing to read potentially lots and lots of small(ish) delta files instead of a few big(ger) “base” files. To make it worse, this work cannot be done in parallel for a given partition due to the need to walk these edits files in order so that Hive can present an accurate representation of what the end results should be.

Some of these concerns are addressed with processing that exists and anther big helper is to leverage the MERGE command to lump many changes into a single transaction. Both of these topics will be presented in future blog posts.