PLACEHOLDER for follow-up to This is a quick blog post to show how minor and major compaction for Hive transactional tables occurs. Let’s use the situation that the hive acid transactions with partitions (a behind the scenes perspective) post leaves us in. Here it is!
Code Block |
---|
breakoutMode | wide |
---|
language | bash |
---|
|
$ hdfs dfs -ls -R /wa/t/m/h/try_it
drwxrwx---+ - hive hadoop 0 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1
drwxrwx---+ - hive hadoop 0 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p1/delete_delta_0000003_0000003_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p1/delete_delta_0000003_0000003_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 733 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p1/delete_delta_0000003_0000003_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1/delete_delta_0000005_0000005_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1/delete_delta_0000005_0000005_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 733 2019-12-12 09:57 /wa/t/m/h/try_it/prt=p1/delete_delta_0000005_0000005_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 788 2019-12-12 07:43 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000001_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p1/delta_0000003_0000003_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p1/delta_0000003_0000003_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 816 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p1/delta_0000003_0000003_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:45 /wa/t/m/h/try_it/prt=p2
drwxrwx---+ - hive hadoop 0 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p2/delete_delta_0000003_0000003_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p2/delete_delta_0000003_0000003_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 727 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p2/delete_delta_0000003_0000003_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:45 /wa/t/m/h/try_it/prt=p2/delete_delta_0000004_0000004_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:44 /wa/t/m/h/try_it/prt=p2/delete_delta_0000004_0000004_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 733 2019-12-12 09:44 /wa/t/m/h/try_it/prt=p2/delete_delta_0000004_0000004_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 788 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p2/delta_0000002_0000002_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p2/delta_0000003_0000003_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p2/delta_0000003_0000003_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 816 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p2/delta_0000003_0000003_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:45 /wa/t/m/h/try_it/prt=p2/delta_0000004_0000004_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:44 /wa/t/m/h/try_it/prt=p2/delta_0000004_0000004_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 815 2019-12-12 09:44 /wa/t/m/h/try_it/prt=p2/delta_0000004_0000004_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3
drwxrwx---+ - hive hadoop 0 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p3/delete_delta_0000003_0000003_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p3/delete_delta_0000003_0000003_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 727 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p3/delete_delta_0000003_0000003_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 796 2019-12-12 08:15 /wa/t/m/h/try_it/prt=p3/delta_0000002_0000002_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p3/delta_0000003_0000003_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p3/delta_0000003_0000003_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 807 2019-12-12 09:34 /wa/t/m/h/try_it/prt=p3/delta_0000003_0000003_0000/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3/delta_0000006_0000006_0000
-rw-rw----+ 3 hive hadoop 1 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3/delta_0000006_0000006_0000/_orc_acid_version
-rw-rw----+ 3 hive hadoop 813 2019-12-12 10:06 /wa/t/m/h/try_it/prt=p3/delta_0000006_0000006_0000/bucket_00000 |
As that blog post.COMING SOON!!, and the directory listing above, shows, there where a total of six ACID transactions that have occurred across three partitions to get to this point.
As https://cwiki.apache.org/confluence/display/Hive/Configuration+Properties#ConfigurationProperties-Compactor shows, we need to make sure hive.compactor.initiator.on
is set to true
for the compactor to be run.
Minor Compaction
This type of compaction is scheduled after the number of delta directories passes the value set in the hive.compactor.delta.num.threshold
property, but you can also trigger it to run on-demand.
Code Block |
---|
breakoutMode | wide |
---|
language | sql |
---|
|
ALTER TABLE try_it COMPACT 'minor';
ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask.
You must specify a partition to compact for partitioned tables |
This error helps us by making the point that we must run compaction on an specific partition unless the table is not partitioned. Let’s try it again and be sure to wait until it completes!
Code Block |
---|
breakoutMode | wide |
---|
language | sql |
---|
|
ALTER TABLE try_it partition (prt='p1') COMPACT 'minor';
show compactions;
+---------------+-----------+----------+------------+--------+------------+-----------+----------------+---------------+-------------------------+
| compactionid | dbname | tabname | partname | type | state | workerid | starttime | duration | hadoopjobid |
+---------------+-----------+----------+------------+--------+------------+-----------+----------------+---------------+-------------------------+
| CompactionId | Database | Table | Partition | Type | State | Worker | Start Time | Duration(ms) | HadoopJobId |
| 1 | default | try_it | prt=p1 | MINOR | succeeded | --- | 1576145642000 | 179000 | job_1575915931720_0012 |
+---------------+-----------+----------+------------+--------+------------+-----------+----------------+---------------+-------------------------+
2 rows selected (0.031 seconds) |
Let’s look at the file system again for the p1 partition.
Code Block |
---|
breakoutMode | wide |
---|
language | bash |
---|
|
$ hdfs dfs -ls -R /wa/t/m/h/try_it/prt=p1
drwxrwx---+ - hive hadoop 0 2019-12-12 10:16 /wa/t/m/h/try_it/prt=p1/delete_delta_0000001_0000005
-rw-rw----+ 3 hive hadoop 1 2019-12-12 10:16 /wa/t/m/h/try_it/prt=p1/delete_delta_0000001_0000005/_orc_acid_version
-rw-rw----+ 3 hive hadoop 654 2019-12-12 10:16 /wa/t/m/h/try_it/prt=p1/delete_delta_0000001_0000005/bucket_00000
drwxrwx---+ - hive hadoop 0 2019-12-12 10:16 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000005
-rw-rw----+ 3 hive hadoop 1 2019-12-12 10:16 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000005/_orc_acid_version
-rw-rw----+ 3 hive hadoop 670 2019-12-12 10:16 /wa/t/m/h/try_it/prt=p1/delta_0000001_0000005/bucket_00000 |
Since there were changes in this partition from transaction #s 1, 3, and 5, we now see rolled together versions of the delta directories spanning these transaction #s. Let’s verify that the contents of the files has the rolled up details in a single file for the delta
and delete_delta
transactions.
Code Block |
---|
breakoutMode | wide |
---|
language | bash |
---|
|
$ hdfs dfs -get /wa/t/m/h/try_it/prt=p1/delete_delta_0000001_0000005/bucket_00000 p1Minor-delete_delta
$ java -jar orc-tools-1.5.1-uber.jar data p1Minor-delete_delta
Processing data file p1Minor-delete_delta [length: 654]
{"operation":2,"originalTransaction":1,"bucket":536870912,"rowId":0,"currentTransaction":3,
"row":null}
{"operation":2,"originalTransaction":3,"bucket":536870912,"rowId":0,"currentTransaction":5,
"row":null}
$ hdfs dfs -get /wa/t/m/h/try_it/prt=p1/delta_0000001_0000005/bucket_00000 p1Minor-delta
$ java -jar orc-tools-1.5.1-uber.jar data p1Minor-delta
Processing data file p1Minor-delta [length: 670]
{"operation":0,"originalTransaction":1,"bucket":536870912,"rowId":0,"currentTransaction":1,
"row":{"id":1,"a_val":"noise","b_val":"bogus"}}
{"operation":0,"originalTransaction":3,"bucket":536870912,"rowId":0,"currentTransaction":3,
"row":{"id":1,"a_val":"noise","b_val":"bogus2"}} |