SQL PARTITION BY

Summary: in this tutorial, you’ll learn how to use the SQL PARTITION BY clause to divide a result into multiple partitions on which a window function can operate.

Introduction to the SQL PARTITION BY clause #

In SQL, a window function allows you to calculate across table rows that are somehow related to the current row.

A window function uses the OVER clause to define the window or a set of rows on which a window function operates. The OVER clause includes optional PARTITION BY and ORDER BY clause.

Here’s the basic syntax of a window function that uses PARTITION BY and ORDER BY clauses:

window_function (expression) OVER (
  PARTITION BY
    column1,
    column2
  ORDER BY
    column3,
    column4
)Code language: SQL (Structured Query Language) (sql)

If you omit the PARTION BY clause, the window function treats the whole result set as a single partition.

When you use the PARTITION BY clause, it divides the result set into multiple partitions. The window function will operate on each partition independently.

Suppose we have the following salary_reports table:

statejobsalary
CaliforniaIT150000.00
CaliforniaMarketing130000.00
TexasIT100000.00
TexasMarketing80000.00
SQL script for creating the salary_reports table
DROP TABLE IF EXISTS salary_reports;

CREATE TABLE salary_reports (
  state varchar(255) NOT NULL,
  job varchar(255) NOT NULL,
  salary DEC(11, 2)
);

INSERT INTO
  salary_reports (state, job, salary)
VALUES
  ('California', 'IT', 150000),
  ('California', 'Marketing', 130000),
  ('Texas', 'IT', 100000),
  ('Texas', 'Marketing', 80000);

SELECT
  *
FROM
  salary_reports;Code language: SQL (Structured Query Language) (sql)

The following query uses the AVG() window function to calculate the average salary across states and jobs:

SELECT
  state,
  job,
  salary,
  ROUND(AVG(salary) OVER (), 2) average_salary
FROM
  salary_reports
ORDER BY
  state,
  job;Code language: SQL (Structured Query Language) (sql)

Try it

Output:

   state    |    job    |  salary   | average_salary
------------+-----------+-----------+----------------
 California | IT        | 150000.00 |      115000.00
 California | Marketing | 130000.00 |      115000.00
 Texas      | IT        | 100000.00 |      115000.00
 Texas      | Marketing |  80000.00 |      115000.00Code language: SQL (Structured Query Language) (sql)

In this example, the AVG function calculates the average salary of all salaries in four rows:

(150,000 + 130,000 + 100,000 + 80,000) / 4Code language: SQL (Structured Query Language) (sql)

Partitioning data by jobs #

To calculate average salaries by jobs across states, you can use the PARITION BY clause to divide the result set into partitions:

SELECT
  state,
  job,
  salary,
  ROUND(AVG(salary) OVER (PARTITION BY job), 2) average_salary
FROM
  salary_reports
ORDER BY
  state,
  job;Code language: SQL (Structured Query Language) (sql)

Try it

Output:

   state    |    job    |  salary   | average_salary
------------+-----------+-----------+----------------
 California | IT        | 150000.00 |      125000.00
 California | Marketing | 130000.00 |      105000.00
 Texas      | IT        | 100000.00 |      125000.00
 Texas      | Marketing |  80000.00 |      105000.00Code language: SQL (Structured Query Language) (sql)

Note that the query uses the ROUND() function to round the average values to numbers with two decimal places.

In this example, the PARITION BY clause divides the result set by the values in the job columns. Since we have two jobs (IT & Marketing), the PARTITION BY clause divides the result set into two partitions:

The first partition:

   state    |    job    |  salary
------------+-----------+-----------
 California | IT        | 150000.00
 Texas      | IT        | 100000.00Code language: SQL (Structured Query Language) (sql)

The second partition:

   state    |    job    |  salary
------------+-----------+-----------
 California | Marketing | 130000.00
 Texas      | Marketing |  80000.00Code language: SQL (Structured Query Language) (sql)

The AVG() window function calculates the average salary of each partition and returns 125,000 for the first partition:

(150,000 + 100,000) / 2 = 125,000Code language: SQL (Structured Query Language) (sql)

And 105,000 for the second partition:

(100,000 + 80,000) /2 = 105,000Code language: SQL (Structured Query Language) (sql)

Partitioning data by states #

The following query uses the PARTITION BY clause to calculate the average salaries by states across jobs:

SELECT
  state,
  job,
  salary,
  ROUND(AVG(salary) OVER (PARTITION BY state), 2) average_salary
FROM
  salary_reports
ORDER BY
  state,
  job;Code language: SQL (Structured Query Language) (sql)

Try it

Output:

   state    |    job    |  salary   | average_salary
------------+-----------+-----------+----------------
 California | IT        | 150000.00 |      140000.00
 California | Marketing | 130000.00 |      140000.00
 Texas      | IT        | 100000.00 |       90000.00
 Texas      | Marketing |  80000.00 |       90000.00Code language: SQL (Structured Query Language) (sql)

In this example, the PARITTION BY clause divides the result set by the values in the state column.

Since there are two states California and Texas, the PARTITION BY clause divides the result set into two partitions, each per state:

First partition:

   state    |    job    |  salary
------------+-----------+-----------
 California | IT        | 150000.00
 California | Marketing | 130000.00Code language: SQL (Structured Query Language) (sql)

Second partition:

   state    |    job    |  salary   | average_salary
------------+-----------+-----------+----------------
 Texas      | IT        | 100000.00 |       90000.00
 Texas      | Marketing |  80000.00 |       90000.00Code language: SQL (Structured Query Language) (sql)

The AVG() window function calculates the average salary for each partition. It returns 140,000 for the first partition and 90,000 for the second one.

Partitioning data by departments #

We’ll use the employees table from the sample database.

SQL PARTITION BY

The following query uses the SUM window function with a PARTITION BY clause to retrieve the employee salary along with the total salary of the current employee’s department:

SELECT
  first_name,
  last_name,
  department_name,
  salary,
  SUM(salary) OVER (
    PARTITION BY department_name
  ) department_salary
FROM
  employees e
  INNER JOIN departments d ON d.department_id = e.department_id
ORDER BY
  department_name;Code language: SQL (Structured Query Language) (sql)

Try it

Output:

 first_name  |  last_name  | department_name  |  salary  | department_salary
-------------+-------------+------------------+----------+-------------------
 William     | Gietz       | Accounting       |  8300.00 |          20300.00
 Shelley     | Higgins     | Accounting       | 12000.00 |          20300.00
 Jennifer    | Whalen      | Administration   |  4400.00 |           4400.00
 Steven      | King        | Executive        | 24000.00 |          58000.00
 Neena       | Kochhar     | Executive        | 17000.00 |          58000.00
 Lex         | De Haan     | Executive        | 17000.00 |          58000.00
...Code language: SQL (Structured Query Language) (sql)

How the query works:

  • First, the PARTITION BY clause divides the rows in the employees table by departments. Employees within the same department belong to the same partition.
  • Second, the SUM() function calculates the total salary of employees in each department.

Summary #

  • Use the PARTITION BY clause to divide a result set into multiple partitions.
Was this tutorial helpful ?