Recursive Queries in PostgreSQL

5 min readJan 4, 2025

Recursive queries are one of PostgreSQL’s most versatile tools, enabling developers to work with hierarchical or tree-structured data effortlessly. From traversing organizational charts to exploring dependency graphs or solving mathematical problems, recursive queries unlock powerful capabilities for modern applications.

In this extensive guide, we’ll take you from the very basics of recursive queries to advanced techniques, including handling cycles, limiting recursion depth, and exploring alternative solutions. By the end, you’ll be ready to tackle even the most complex hierarchical data challenges.

What Are Recursive Queries?

Recursive queries allow you to process data that is self-referential or hierarchical by repeatedly applying logic until a specific condition is met. PostgreSQL implements recursive queries using the WITH RECURSIVE clause, which extends Common Table Expressions (CTEs).

Why Use Recursive Queries?

To traverse parent-child relationships, adjacency lists, or graphs.
To calculate iterative sequences like factorials or Fibonacci numbers.
To extract insights from hierarchical structures such as directories, organizational charts, or threaded conversations.

Syntax and Structure of Recursive Queries

A recursive query consists of:

Anchor Query: Defines the starting point for recursion.
Recursive Query: Repeats a process based on the results of previous iterations.
Termination Condition: Stops the recursion when no more rows are added.

General Syntax:

WITH RECURSIVE cte_name AS (
    -- Anchor query
    initial_query
UNION ALL
    -- Recursive query
    recursive_query
)
SELECT * FROM cte_name;

Anchor Query: Generates the base result set.
Recursive Query: Expands the result set iteratively.
UNION ALL: Combines the results of the anchor and recursive queries.

Getting Started: A Simple Example

Problem:

Calculate the factorial of a number (e.g., 5).

Query:

WITH RECURSIVE factorial(n, result) AS (
    -- Anchor query: Start with 1
    SELECT 1, 1
UNION ALL
    -- Recursive query: Multiply the current result by the next number
    SELECT n + 1, result * (n + 1)
    FROM factorial
    WHERE n < 5
)
SELECT * FROM factorial;

Output:

Real-World Examples

Example 1: Threaded Conversations (Tweets)

Problem:

Retrieve the ancestry of a tweet up to the root.

Dataset:

Query:

WITH RECURSIVE tweet_hierarchy AS (
    -- Anchor query
    SELECT tweet_id, parent_id
    FROM tweets
    WHERE tweet_id = 6
UNION ALL
    -- Recursive query
    SELECT t.tweet_id, t.parent_id
    FROM tweets t
    INNER JOIN tweet_hierarchy th
    ON t.tweet_id = th.parent_id
)
SELECT * FROM tweet_hierarchy;

Output:

Outputs:

Click on Tweet 6:

Ancestry chain: 6 → 5 → 2 → 1

2. Click on Tweet 1:

1 (Root tweet only)

3. Click on Tweet 5:

5 → 2 → 1

Example 2: Organizational Hierarchy

Problem:

Retrieve all employees reporting (directly or indirectly) to a specific manager.

Dataset:

Query:

WITH RECURSIVE employee_hierarchy AS (
    -- Anchor query
    SELECT id, name, manager_id
    FROM employees
    WHERE id = 1
UNION ALL
    -- Recursive query
    SELECT e.id, e.name, e.manager_id
    FROM employees e
    INNER JOIN employee_hierarchy eh
    ON e.manager_id = eh.id
)
SELECT * FROM employee_hierarchy;

Output:

Advanced Techniques: Handling Cycles and Limiting Recursion Depth

1. Handling Cycles: Avoid Infinite Loops

What Are Cycles?

Cycles occur when data forms a loop, causing infinite recursion.

Solution: Use a Path Column

Track visited nodes to avoid revisiting them.

WITH RECURSIVE hierarchy AS (
    -- Anchor query
    SELECT id, parent_id, ARRAY[id] AS path
    FROM hierarchy_table
    WHERE id = 1
UNION ALL
    -- Recursive query
    SELECT h.id, h.parent_id, path || h.id
    FROM hierarchy_table h
    INNER JOIN hierarchy ON h.parent_id = hierarchy.id
    WHERE h.id <> ALL(path)   -- Avoid cycles
)
SELECT * FROM hierarchy;

2. Limiting Recursion Depth

Problem:

Deeply nested structures can consume excessive resources.

Solution: Add a Depth Column

Track recursion depth and stop at a specified limit.

WITH RECURSIVE limited_hierarchy AS (
    -- Anchor query
    SELECT id, parent_id, 1 AS depth
    FROM hierarchy_table
    WHERE id = 1
UNION ALL
    -- Recursive query
    SELECT h.id, h.parent_id, lh.depth + 1
    FROM hierarchy_table h
    INNER JOIN limited_hierarchy lh ON h.parent_id = lh.id
    WHERE lh.depth < 5 -- Limit recursion depth
)
SELECT * FROM limited_hierarchy;

3. Combining Cycle Prevention and Depth Limiting

Combine these techniques for robust recursion.

WITH RECURSIVE hierarchy AS (
    -- Anchor query
    SELECT id, parent_id, ARRAY[id] AS path, 1 AS depth
    FROM hierarchy_table
    WHERE id = 1
UNION ALL
    -- Recursive query
    SELECT h.id, h.parent_id, path || h.id, hierarchy.depth + 1
    FROM hierarchy_table h
    INNER JOIN hierarchy ON h.parent_id = hierarchy.id
    WHERE h.id <> ALL(path)
      AND hierarchy.depth < 5
)
SELECT * FROM hierarchy;

Performance Optimization for Recursive Queries in PostgreSQL

Recursive queries can become computationally expensive, especially when working with large datasets or deep hierarchies. Here are several techniques to optimize their performance effectively:

1. Use Indexes

Proper indexing is crucial for improving query performance by reducing the number of rows scanned.

Key Index Recommendations:

Primary Index: Ensure the id column is indexed for fast lookups.
Foreign Key Index: Index the parent_id column (or equivalent) to speed up joins in the recursive query.

Example:

CREATE INDEX idx_parent_id ON hierarchy_table(parent_id);
CREATE INDEX idx_id ON hierarchy_table(id);

Benefits:

Speeds up anchor queries and recursive joins by enabling faster lookups and filtering.

2. Optimize Recursive Query Logic

Avoid UNION if not necessary: Use UNION ALL instead of UNION to prevent PostgreSQL from removing duplicate rows, which can be computationally expensive.
Simplify joins: Minimize the number of tables involved in recursive joins to reduce complexity.

Example:

WITH RECURSIVE hierarchy AS (
    SELECT id, parent_id
    FROM hierarchy_table
    WHERE id = 1
UNION ALL
    SELECT h.id, h.parent_id
    FROM hierarchy_table h
    INNER JOIN hierarchy ON h.parent_id = hierarchy.id
)
SELECT DISTINCT * FROM hierarchy;

Conclusion

Recursive queries in PostgreSQL are a powerful way to handle hierarchical data. By mastering basic syntax, handling cycles, limiting recursion depth, and exploring alternative solutions, you can effectively manage complex datasets and unlock the full potential of PostgreSQL.

Recursive Queries in PostgreSQL

What Are Recursive Queries?

Why Use Recursive Queries?

Syntax and Structure of Recursive Queries

General Syntax:

Getting Started: A Simple Example

Problem:

Query:

Output:

Real-World Examples

Example 1: Threaded Conversations (Tweets)

Problem:

Dataset:

Query:

Output:

Outputs:

Example 2: Organizational Hierarchy

Problem:

Dataset:

Query:

Output:

Advanced Techniques: Handling Cycles and Limiting Recursion Depth

1. Handling Cycles: Avoid Infinite Loops

What Are Cycles?

Solution: Use a Path Column

2. Limiting Recursion Depth

Problem:

Solution: Add a Depth Column

3. Combining Cycle Prevention and Depth Limiting

Performance Optimization for Recursive Queries in PostgreSQL

1. Use Indexes

Key Index Recommendations:

Example:

2. Optimize Recursive Query Logic

Example:

Conclusion

Written by Talaat Magdy

No responses yet