Ever since Hive Transactions have surfacesurfaced, and especially since Apache Hive 3 was released, I’ve been meaning to capture a behind-the-scenes look at the underlying ORC files that are created; and yes, compacted. If you are new to Hive’s ACID transactions, then the first link in this post as well as the Understanding Hive ACID Transaction Table blog posting are great places to start.
Transactional Table DDL
Let’s create a transactional table to test our use cases out on.
Code Block |
---|
|
CREATE TABLE try_it (id int, a_val string, b_val string)
PARTITIONED BY (prt string) STORED AS ORC;
desc try_it;
+--------------------------+------------+----------+
| col_name | data_type | comment |
+--------------------------+------------+----------+
| id | int | |
| a_val | string | |
| b_val | string | |
| prt | string | |
| | NULL | NULL |
| # Partition Information | NULL | NULL |
| # col_name | data_type | comment |
| prt | string | |
+--------------------------+------------+----------+ |
Check to make sure the HDFS file structure was created.
Code Block |
---|
|
hdfs dfs -ls /warehouse/tablespace/managed/hive/
drwxrwx---+ - hive hadoop 0 2019-12-12 07:38 /w/t/m/h/try_it |
Info |
---|
The /warehouse/tablespace/managed/hive/ path is abbrevited as /w/t/m/h/ in the above snippet and in the remainder of this blog posting. |