what is difference between "ORDER BY" and "SORT BY in hive
"ORDER BY" is used to sort the data globally across all reducer processes.
The data is collected into a single reducer, sorted there, and then the sorted result is returned.
"ORDER BY" can be resource-intensive and may not be suitable for large in hive.
Example
SELECT name, age
FROM users
ORDER BY age;
"SORT BY" is used for sorting the data within each reducer task.
It does not perform a global sort across all data but sorts the data within each reducer independently.
"SORT BY" be suitable for large in hive.
Example
SELECT name, age
FROM users
ORDER BY age;