Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Ever since Hive Transactions have surfacesurfaced, and especially since Apache Hive 3 was released, I’ve been meaning to capture a behind-the-scenes look at the underlying ORC files that are created; and yes, compacted. If you are new to Hive’s ACID transactions, then the first link in this post as well as the Understanding Hive ACID Transaction Table blog posting are great places to start.

Table of Contents

Transactional Table DDL

Let’s create a transactional table to test our use cases out on.

Code Block
languagesql
CREATE TABLE try_it (id int, a_val string, b_val string)
 PARTITIONED BY (prt string)  STORED AS ORC;
 
desc try_it;

+--------------------------+------------+----------+
|         col_name         | data_type  | comment  |
+--------------------------+------------+----------+
| id                       | int        |          |
| a_val                    | string     |          |
| b_val                    | string     |          |
| prt                      | string     |          |
|                          | NULL       | NULL     |
| # Partition Information  | NULL       | NULL     |
| # col_name               | data_type  | comment  |
| prt                      | string     |          |
+--------------------------+------------+----------+

Check to make sure the HDFS file structure was created.

Code Block
languagebash
hdfs dfs -ls /warehouse/tablespace/managed/hive/
drwxrwx---+  - hive hadoop          0 2019-12-12 07:38 /w/t/m/h/try_it
Info

The /warehouse/tablespace/managed/hive/ path is abbrevited as /w/t/m/h/ in the above snippet and in the remainder of this blog posting.