Ever since Hive Transactions have surfacesurfaced, and especially since Apache Hive 3 was released, I’ve been meaning to capture a behind-the-scenes look at the underlying ORC files that are created; and yes, compacted. If you are new to Hive’s ACID transactions, then the first link in this post as well as the Understanding Hive ACID Transaction Table blog posting are great places to start.

Table of Contents

Transactional Table DDL

Let’s create a transactional table to test our use cases out on.

Code Block

language	sql

CREATE TABLE try_it (id int, a_val string, b_val string)
 PARTITIONED BY (prt string)  STORED AS ORC;
 
desc try_it;

+--------------------------+------------+----------+
|         col_name         | data_type  | comment  |
+--------------------------+------------+----------+
| id                       | int        |          |
| a_val                    | string     |          |
| b_val                    | string     |          |
| prt                      | string     |          |
|                          | NULL       | NULL     |
| # Partition Information  | NULL       | NULL     |
| # col_name               | data_type  | comment  |
| prt                      | string     |          |
+--------------------------+------------+----------+

Check to make sure the HDFS file structure was created.

Code Block

language	bash

hdfs dfs -ls /warehouse/tablespace/managed/hive/
drwxrwx---+  - hive hadoop          0 2019-12-12 07:38 /w/t/m/h/try_it

Info
The `/warehouse/tablespace/managed/hive/` path is abbrevited as `/w/t/m/h/` in the above snippet and in the remainder of this blog posting.

Versions Compared

Old Version 1

New Version 2

Key

Transactional Table DDL

Page Comparison

Versions Compared

Old Version 1

New Version 2

Key

Transactional Table DDL