home writings projects

Common Table Expressions (CTEs): Writing Cleaner, More Maintainable SQL

Nov 14, 2024

·

8 min read

tl;dr: Learn how to use CTEs to simplify complex queries, improve code readability, and solve recursive problems in SQL. Complete guide with practical examples.

Common Table Expressions (CTEs) are one of SQL’s most underutilized features. They allow you to write cleaner, more maintainable queries while solving complex problems that would otherwise require temporary tables or convoluted subqueries.

What Are CTEs?

A CTE is a temporary named result set that exists only for the duration of a single query. Think of it as a way to break down complex queries into logical, reusable chunks.

WITH cte_name AS (
  SELECT column1, column2
  FROM table_name
  WHERE condition
)
SELECT * FROM cte_name;

Basic Example

Let’s start simple. Here’s a query without CTEs:

-- Without CTE: Messy subquery
SELECT
  u.name,
  u.email,
  order_summary.total_orders,
  order_summary.total_spent
FROM users u
JOIN (
  SELECT
    user_id,
    COUNT(*) as total_orders,
    SUM(amount) as total_spent
  FROM orders
  WHERE status = 'completed'
  GROUP BY user_id
) order_summary ON u.id = order_summary.user_id;

Now with a CTE:

-- With CTE: Clean and readable
WITH order_summary AS (
  SELECT
    user_id,
    COUNT(*) as total_orders,
    SUM(amount) as total_spent
  FROM orders
  WHERE status = 'completed'
  GROUP BY user_id
)
SELECT
  u.name,
  u.email,
  os.total_orders,
  os.total_spent
FROM users u
JOIN order_summary os ON u.id = os.user_id;

Much cleaner, right?

Multiple CTEs

You can chain multiple CTEs together, building complex logic step by step:

WITH
  -- Step 1: Get active users
  active_users AS (
    SELECT id, name, email
    FROM users
    WHERE is_active = true
  ),

  -- Step 2: Calculate user statistics
  user_stats AS (
    SELECT
      user_id,
      COUNT(*) as order_count,
      SUM(amount) as total_spent,
      AVG(amount) as avg_order_value
    FROM orders
    WHERE user_id IN (SELECT id FROM active_users)
    GROUP BY user_id
  ),

  -- Step 3: Find high-value customers
  vip_users AS (
    SELECT user_id
    FROM user_stats
    WHERE total_spent > 1000 OR order_count > 10
  )

-- Final query
SELECT
  au.name,
  au.email,
  us.order_count,
  us.total_spent,
  us.avg_order_value
FROM active_users au
JOIN user_stats us ON au.id = us.user_id
WHERE au.id IN (SELECT user_id FROM vip_users)
ORDER BY us.total_spent DESC;

This step-by-step approach makes complex queries easier to:

  • Understand - Each CTE has a clear purpose
  • Debug - Test each CTE independently
  • Maintain - Modify specific parts without breaking everything

Recursive CTEs

Recursive CTEs are perfect for hierarchical data like organization charts, category trees, or graph traversal.

Example: Employee Hierarchy

WITH RECURSIVE employee_hierarchy AS (
  -- Anchor: Start with CEO (no manager)
  SELECT
    id,
    name,
    manager_id,
    1 as level,
    name as path
  FROM employees
  WHERE manager_id IS NULL

  UNION ALL

  -- Recursive: Add employees at each level
  SELECT
    e.id,
    e.name,
    e.manager_id,
    eh.level + 1,
    eh.path || ' > ' || e.name
  FROM employees e
  JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT
  id,
  name,
  level,
  path as org_chart
FROM employee_hierarchy
ORDER BY level, name;

Output:

id  | name           | level | org_chart
----|----------------|-------|---------------------------
1   | Alice (CEO)    | 1     | Alice (CEO)
2   | Bob            | 2     | Alice (CEO) > Bob
3   | Carol          | 2     | Alice (CEO) > Carol
4   | Dave           | 3     | Alice (CEO) > Bob > Dave
5   | Eve            | 3     | Alice (CEO) > Carol > Eve

Example: Category Tree

WITH RECURSIVE category_tree AS (
  -- Anchor: Top-level categories
  SELECT
    id,
    name,
    parent_id,
    0 as depth,
    name as full_path
  FROM categories
  WHERE parent_id IS NULL

  UNION ALL

  -- Recursive: Child categories
  SELECT
    c.id,
    c.name,
    c.parent_id,
    ct.depth + 1,
    ct.full_path || ' / ' || c.name
  FROM categories c
  JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY full_path;

Real-World Use Cases

1. Running Totals / Cumulative Sums

WITH daily_sales AS (
  SELECT
    DATE(created_at) as sale_date,
    SUM(amount) as daily_total
  FROM orders
  GROUP BY DATE(created_at)
),
cumulative_sales AS (
  SELECT
    sale_date,
    daily_total,
    SUM(daily_total) OVER (ORDER BY sale_date) as cumulative_total
  FROM daily_sales
)
SELECT * FROM cumulative_sales;

2. Finding Gaps in Sequences

WITH invoice_numbers AS (
  SELECT invoice_number
  FROM invoices
  ORDER BY invoice_number
),
gaps AS (
  SELECT
    invoice_number + 1 as gap_start,
    LEAD(invoice_number) OVER (ORDER BY invoice_number) - 1 as gap_end
  FROM invoice_numbers
)
SELECT gap_start, gap_end
FROM gaps
WHERE gap_end IS NOT NULL AND gap_start <= gap_end;

3. Deduplication

WITH ranked_records AS (
  SELECT
    *,
    ROW_NUMBER() OVER (
      PARTITION BY email
      ORDER BY created_at DESC
    ) as rn
  FROM users
)
SELECT * FROM ranked_records WHERE rn = 1;

4. Pagination with Total Count

WITH
  total AS (
    SELECT COUNT(*) as total_count
    FROM products
    WHERE category = 'electronics'
  ),
  page_data AS (
    SELECT *
    FROM products
    WHERE category = 'electronics'
    ORDER BY name
    LIMIT 20 OFFSET 0
  )
SELECT
  pd.*,
  t.total_count
FROM page_data pd
CROSS JOIN total t;

CTEs vs Subqueries vs Temporary Tables

FeatureCTESubqueryTemp Table
Readability✅ Excellent❌ Poor✅ Good
Reusability✅ Multiple refs❌ No✅ Yes
Recursion✅ Yes❌ No❌ No
Performance⚠️ Similar to subquery⚠️ Can be slow✅ Can be indexed
ScopeSingle querySingle querySession
Persistence❌ No❌ No✅ Yes

Performance Considerations

CTEs are primarily for readability, not performance. The database often optimizes them similarly to subqueries.

When CTEs Might Be Slower

-- This CTE will be evaluated multiple times
WITH user_orders AS (
  SELECT * FROM orders WHERE user_id = 123
)
SELECT COUNT(*) FROM user_orders
UNION ALL
SELECT SUM(amount) FROM user_orders;

-- Better: Materialize in temp table for multiple uses
CREATE TEMP TABLE user_orders AS
  SELECT * FROM orders WHERE user_id = 123;

Optimization Tips

  1. Use MATERIALIZED (PostgreSQL 12+)
WITH MATERIALIZED user_summary AS (
  SELECT user_id, COUNT(*) as order_count
  FROM orders
  GROUP BY user_id
)
SELECT * FROM user_summary;
  1. Filter early
-- Good: Filter before CTE
WITH recent_orders AS (
  SELECT * FROM orders
  WHERE created_at > '2024-01-01'  -- Filter here
)
SELECT * FROM recent_orders WHERE status = 'completed';

-- Bad: Filter after CTE
WITH all_orders AS (
  SELECT * FROM orders  -- Processes all rows
)
SELECT * FROM all_orders
WHERE created_at > '2024-01-01' AND status = 'completed';

Common Pitfalls

1. Over-complicating Simple Queries

-- Bad: Unnecessary CTE
WITH users_cte AS (
  SELECT * FROM users WHERE id = 123
)
SELECT * FROM users_cte;

-- Good: Just query directly
SELECT * FROM users WHERE id = 123;

2. Recursive CTE Without Termination

-- Dangerous: Infinite recursion
WITH RECURSIVE infinite AS (
  SELECT 1 as n
  UNION ALL
  SELECT n + 1 FROM infinite  -- Never stops!
)
SELECT * FROM infinite;

-- Safe: Add termination condition
WITH RECURSIVE safe AS (
  SELECT 1 as n
  UNION ALL
  SELECT n + 1 FROM safe WHERE n < 100  -- Stops at 100
)
SELECT * FROM safe;

3. Assuming CTEs Are Cached

CTEs are not automatically cached. Each reference might re-execute the query.

Best Practices

  1. Use meaningful names - active_users is better than cte1
  2. Add comments - Explain complex logic
  3. Break down complex queries - One logical step per CTE
  4. Test incrementally - Query each CTE independently
  5. Consider MATERIALIZED - For expensive CTEs used multiple times
  6. Limit recursion depth - Always add termination conditions
  7. Format consistently - Make nested CTEs easy to read

Example: Complete Analytics Query

WITH
  -- Get date range
  date_range AS (
    SELECT
      CURRENT_DATE - INTERVAL '30 days' as start_date,
      CURRENT_DATE as end_date
  ),

  -- Active users in period
  active_users AS (
    SELECT DISTINCT user_id
    FROM orders
    WHERE created_at >= (SELECT start_date FROM date_range)
  ),

  -- User statistics
  user_metrics AS (
    SELECT
      o.user_id,
      COUNT(*) as order_count,
      SUM(o.amount) as total_spent,
      AVG(o.amount) as avg_order_value,
      MAX(o.created_at) as last_order_date
    FROM orders o
    WHERE o.user_id IN (SELECT user_id FROM active_users)
    GROUP BY o.user_id
  ),

  -- Customer segments
  segments AS (
    SELECT
      user_id,
      CASE
        WHEN total_spent > 1000 THEN 'VIP'
        WHEN total_spent > 500 THEN 'Premium'
        ELSE 'Regular'
      END as segment
    FROM user_metrics
  )

-- Final report
SELECT
  s.segment,
  COUNT(*) as user_count,
  AVG(um.order_count) as avg_orders,
  AVG(um.total_spent) as avg_spent,
  SUM(um.total_spent) as segment_revenue
FROM segments s
JOIN user_metrics um ON s.user_id = um.user_id
GROUP BY s.segment
ORDER BY segment_revenue DESC;

Conclusion

CTEs are a powerful tool for writing maintainable SQL. They excel at:

  • Breaking complex queries into logical steps
  • Improving code readability
  • Solving recursive problems
  • Making queries easier to debug and maintain

While they don’t always improve performance, the readability and maintainability benefits make them invaluable for complex queries.

Start using CTEs in your next query—your future self (and your teammates) will thank you.


Further Reading:

  • Window functions with CTEs
  • Optimizing recursive queries
  • Advanced CTE patterns for analytics

Practice: Try refactoring your most complex SQL query using CTEs. You’ll be surprised how much clearer it becomes!