Summary: in this tutorial, you’ll learn how to use the SQL PARTITION BY
clause to divide a result into multiple partitions on which a window function can operate.
Introduction to the SQL PARTITION BY clause #
In SQL, a window function allows you to calculate across table rows that are somehow related to the current row.
A window function uses the OVER
clause to define the window or a set of rows on which a window function operates. The OVER
clause includes optional PARTITION BY
and ORDER BY
clause.
Here’s the basic syntax of a window function that uses PARTITION BY
and ORDER BY
clauses:
window_function (expression) OVER (
PARTITION BY
column1,
column2
ORDER BY
column3,
column4
)
Code language: SQL (Structured Query Language) (sql)
If you omit the PARTION BY
clause, the window function treats the whole result set as a single partition.
When you use the PARTITION BY
clause, it divides the result set into multiple partitions. The window function will operate on each partition independently.
Suppose we have the following salary_reports
table:
state | job | salary |
---|---|---|
California | IT | 150000.00 |
California | Marketing | 130000.00 |
Texas | IT | 100000.00 |
Texas | Marketing | 80000.00 |
SQL script for creating the salary_reports table
DROP TABLE IF EXISTS salary_reports;
CREATE TABLE salary_reports (
state varchar(255) NOT NULL,
job varchar(255) NOT NULL,
salary DEC(11, 2)
);
INSERT INTO
salary_reports (state, job, salary)
VALUES
('California', 'IT', 150000),
('California', 'Marketing', 130000),
('Texas', 'IT', 100000),
('Texas', 'Marketing', 80000);
SELECT
*
FROM
salary_reports;
Code language: SQL (Structured Query Language) (sql)
The following query uses the AVG()
window function to calculate the average salary across states and jobs:
SELECT
state,
job,
salary,
ROUND(AVG(salary) OVER (), 2) average_salary
FROM
salary_reports
ORDER BY
state,
job;
Code language: SQL (Structured Query Language) (sql)
Output:
state | job | salary | average_salary
------------+-----------+-----------+----------------
California | IT | 150000.00 | 115000.00
California | Marketing | 130000.00 | 115000.00
Texas | IT | 100000.00 | 115000.00
Texas | Marketing | 80000.00 | 115000.00
Code language: SQL (Structured Query Language) (sql)
In this example, the AVG
function calculates the average salary of all salaries in four rows:
(150,000 + 130,000 + 100,000 + 80,000) / 4
Code language: SQL (Structured Query Language) (sql)
Partitioning data by jobs #
To calculate average salaries by jobs across states, you can use the PARITION BY
clause to divide the result set into partitions:
SELECT
state,
job,
salary,
ROUND(AVG(salary) OVER (PARTITION BY job), 2) average_salary
FROM
salary_reports
ORDER BY
state,
job;
Code language: SQL (Structured Query Language) (sql)
Output:
state | job | salary | average_salary
------------+-----------+-----------+----------------
California | IT | 150000.00 | 125000.00
California | Marketing | 130000.00 | 105000.00
Texas | IT | 100000.00 | 125000.00
Texas | Marketing | 80000.00 | 105000.00
Code language: SQL (Structured Query Language) (sql)
Note that the query uses the ROUND()
function to round the average values to numbers with two decimal places.
In this example, the PARITION BY
clause divides the result set by the values in the job columns. Since we have two jobs (IT
& Marketing
), the PARTITION BY
clause divides the result set into two partitions:
The first partition:
state | job | salary
------------+-----------+-----------
California | IT | 150000.00
Texas | IT | 100000.00
Code language: SQL (Structured Query Language) (sql)
The second partition:
state | job | salary
------------+-----------+-----------
California | Marketing | 130000.00
Texas | Marketing | 80000.00
Code language: SQL (Structured Query Language) (sql)
The AVG()
window function calculates the average salary of each partition and returns 125,000
for the first partition:
(150,000 + 100,000) / 2 = 125,000
Code language: SQL (Structured Query Language) (sql)
And 105,000
for the second partition:
(100,000 + 80,000) /2 = 105,000
Code language: SQL (Structured Query Language) (sql)
Partitioning data by states #
The following query uses the PARTITION BY
clause to calculate the average salaries by states across jobs:
SELECT
state,
job,
salary,
ROUND(AVG(salary) OVER (PARTITION BY state), 2) average_salary
FROM
salary_reports
ORDER BY
state,
job;
Code language: SQL (Structured Query Language) (sql)
Output:
state | job | salary | average_salary
------------+-----------+-----------+----------------
California | IT | 150000.00 | 140000.00
California | Marketing | 130000.00 | 140000.00
Texas | IT | 100000.00 | 90000.00
Texas | Marketing | 80000.00 | 90000.00
Code language: SQL (Structured Query Language) (sql)
In this example, the PARITTION BY
clause divides the result set by the values in the state
column.
Since there are two states California
and Texas
, the PARTITION BY
clause divides the result set into two partitions, each per state:
First partition:
state | job | salary
------------+-----------+-----------
California | IT | 150000.00
California | Marketing | 130000.00
Code language: SQL (Structured Query Language) (sql)
Second partition:
state | job | salary | average_salary
------------+-----------+-----------+----------------
Texas | IT | 100000.00 | 90000.00
Texas | Marketing | 80000.00 | 90000.00
Code language: SQL (Structured Query Language) (sql)
The AVG()
window function calculates the average salary for each partition. It returns 140,000
for the first partition and 90,000
for the second one.
Partitioning data by departments #
We’ll use the employees
table from the sample database.
data:image/s3,"s3://crabby-images/a38ca/a38ca3130eb5e3f63f3fbc0c1dbeca8457c78e1a" alt="SQL PARTITION BY"
The following query uses the SUM
window function with a PARTITION BY
clause to retrieve the employee salary along with the total salary of the current employee’s department:
SELECT
first_name,
last_name,
department_name,
salary,
SUM(salary) OVER (
PARTITION BY department_name
) department_salary
FROM
employees e
INNER JOIN departments d ON d.department_id = e.department_id
ORDER BY
department_name;
Code language: SQL (Structured Query Language) (sql)
Output:
first_name | last_name | department_name | salary | department_salary
-------------+-------------+------------------+----------+-------------------
William | Gietz | Accounting | 8300.00 | 20300.00
Shelley | Higgins | Accounting | 12000.00 | 20300.00
Jennifer | Whalen | Administration | 4400.00 | 4400.00
Steven | King | Executive | 24000.00 | 58000.00
Neena | Kochhar | Executive | 17000.00 | 58000.00
Lex | De Haan | Executive | 17000.00 | 58000.00
...
Code language: SQL (Structured Query Language) (sql)
How the query works:
- First, the
PARTITION BY
clause divides the rows in theemployees
table by departments. Employees within the same department belong to the same partition. - Second, the
SUM()
function calculates the total salary of employees in each department.
Summary #
- Use the
PARTITION BY
clause to divide a result set into multiple partitions.