Trino Compared to Other Big Data SQL Engines

WORK IN PROGRESS (FOR SURE!)

 

Parallel Processing Concepts

Concept

Trino

Hive on Tez

Spark SQL

Concept

Trino

Hive on Tez

Spark SQL

Validated string of SQL

Query

Query

Query

Parallelized slice of input data

Split

Block

Partition

Individual Operations

Operator

?Operator?

?Function?

Pipeline of narrow operations working on a slice of data

Task (can be broken into more than 1 “Driver” internally)

Mapper or Reducer

Task

Collection of same type of tasks running in parrellel

Stage (logically called a Fragment)

Map or Reduce Stage

Stage

Transfer of data between stages caused by a wide operations

Exchange

Shuffle

Shuffle

Series of interrelated stages to finalized results of a query

Query Plan

DAG

Job

 

 

 

 

 

 

 

 

 

 

Architectural Concepts

 

 

 

Join Types

 

 

Query Optimizations

 

 

SQL Differences

Action

Trino

Hive

Action

Trino

Hive

List the name of the underlying data file that a particular row is sourced from

select "$path", colX from table;

select INPUT__FILE__NAME, colX from table;