Trino Compared to Other Big Data SQL Engines
WORK IN PROGRESS (FOR SURE!)
Parallel Processing Concepts
Concept | Trino | Hive on Tez | Spark SQL |
---|---|---|---|
Validated string of SQL | Query | Query | Query |
Parallelized slice of input data | Split | Block | Partition |
Individual Operations | Operator | ?Operator? | ?Function? |
Pipeline of narrow operations working on a slice of data | Task (can be broken into more than 1 “Driver” internally) | Mapper or Reducer | Task |
Collection of same type of tasks running in parrellel | Stage (logically called a Fragment) | Map or Reduce Stage | Stage |
Transfer of data between stages caused by a wide operations | Exchange | Shuffle | Shuffle |
Series of interrelated stages to finalized results of a query | Query Plan | DAG | Job |
|
|
|
|
|
|
|
|
Architectural Concepts
Join Types
Query Optimizations
SQL Differences
Action | Trino | Hive |
---|---|---|
List the name of the underlying data file that a particular row is sourced from |
|
|
|
|
|