Common Table Expressions (CTEs): Writing Cleaner, More Maintainable SQL
·
8 min read
tl;dr: Learn how to use CTEs to simplify complex queries, improve code readability, and solve recursive problems in SQL. Complete guide with practical examples.
Common Table Expressions (CTEs) are one of SQL’s most underutilized features. They allow you to write cleaner, more maintainable queries while solving complex problems that would otherwise require temporary tables or convoluted subqueries.
What Are CTEs?
A CTE is a temporary named result set that exists only for the duration of a single query. Think of it as a way to break down complex queries into logical, reusable chunks.
WITH cte_name AS (
SELECT column1, column2
FROM table_name
WHERE condition
)
SELECT * FROM cte_name;
Basic Example
Let’s start simple. Here’s a query without CTEs:
-- Without CTE: Messy subquery
SELECT
u.name,
u.email,
order_summary.total_orders,
order_summary.total_spent
FROM users u
JOIN (
SELECT
user_id,
COUNT(*) as total_orders,
SUM(amount) as total_spent
FROM orders
WHERE status = 'completed'
GROUP BY user_id
) order_summary ON u.id = order_summary.user_id;
Now with a CTE:
-- With CTE: Clean and readable
WITH order_summary AS (
SELECT
user_id,
COUNT(*) as total_orders,
SUM(amount) as total_spent
FROM orders
WHERE status = 'completed'
GROUP BY user_id
)
SELECT
u.name,
u.email,
os.total_orders,
os.total_spent
FROM users u
JOIN order_summary os ON u.id = os.user_id;
Much cleaner, right?
Multiple CTEs
You can chain multiple CTEs together, building complex logic step by step:
WITH
-- Step 1: Get active users
active_users AS (
SELECT id, name, email
FROM users
WHERE is_active = true
),
-- Step 2: Calculate user statistics
user_stats AS (
SELECT
user_id,
COUNT(*) as order_count,
SUM(amount) as total_spent,
AVG(amount) as avg_order_value
FROM orders
WHERE user_id IN (SELECT id FROM active_users)
GROUP BY user_id
),
-- Step 3: Find high-value customers
vip_users AS (
SELECT user_id
FROM user_stats
WHERE total_spent > 1000 OR order_count > 10
)
-- Final query
SELECT
au.name,
au.email,
us.order_count,
us.total_spent,
us.avg_order_value
FROM active_users au
JOIN user_stats us ON au.id = us.user_id
WHERE au.id IN (SELECT user_id FROM vip_users)
ORDER BY us.total_spent DESC;
This step-by-step approach makes complex queries easier to:
- Understand - Each CTE has a clear purpose
- Debug - Test each CTE independently
- Maintain - Modify specific parts without breaking everything
Recursive CTEs
Recursive CTEs are perfect for hierarchical data like organization charts, category trees, or graph traversal.
Example: Employee Hierarchy
WITH RECURSIVE employee_hierarchy AS (
-- Anchor: Start with CEO (no manager)
SELECT
id,
name,
manager_id,
1 as level,
name as path
FROM employees
WHERE manager_id IS NULL
UNION ALL
-- Recursive: Add employees at each level
SELECT
e.id,
e.name,
e.manager_id,
eh.level + 1,
eh.path || ' > ' || e.name
FROM employees e
JOIN employee_hierarchy eh ON e.manager_id = eh.id
)
SELECT
id,
name,
level,
path as org_chart
FROM employee_hierarchy
ORDER BY level, name;
Output:
id | name | level | org_chart
----|----------------|-------|---------------------------
1 | Alice (CEO) | 1 | Alice (CEO)
2 | Bob | 2 | Alice (CEO) > Bob
3 | Carol | 2 | Alice (CEO) > Carol
4 | Dave | 3 | Alice (CEO) > Bob > Dave
5 | Eve | 3 | Alice (CEO) > Carol > Eve
Example: Category Tree
WITH RECURSIVE category_tree AS (
-- Anchor: Top-level categories
SELECT
id,
name,
parent_id,
0 as depth,
name as full_path
FROM categories
WHERE parent_id IS NULL
UNION ALL
-- Recursive: Child categories
SELECT
c.id,
c.name,
c.parent_id,
ct.depth + 1,
ct.full_path || ' / ' || c.name
FROM categories c
JOIN category_tree ct ON c.parent_id = ct.id
)
SELECT * FROM category_tree ORDER BY full_path;
Real-World Use Cases
1. Running Totals / Cumulative Sums
WITH daily_sales AS (
SELECT
DATE(created_at) as sale_date,
SUM(amount) as daily_total
FROM orders
GROUP BY DATE(created_at)
),
cumulative_sales AS (
SELECT
sale_date,
daily_total,
SUM(daily_total) OVER (ORDER BY sale_date) as cumulative_total
FROM daily_sales
)
SELECT * FROM cumulative_sales;
2. Finding Gaps in Sequences
WITH invoice_numbers AS (
SELECT invoice_number
FROM invoices
ORDER BY invoice_number
),
gaps AS (
SELECT
invoice_number + 1 as gap_start,
LEAD(invoice_number) OVER (ORDER BY invoice_number) - 1 as gap_end
FROM invoice_numbers
)
SELECT gap_start, gap_end
FROM gaps
WHERE gap_end IS NOT NULL AND gap_start <= gap_end;
3. Deduplication
WITH ranked_records AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY email
ORDER BY created_at DESC
) as rn
FROM users
)
SELECT * FROM ranked_records WHERE rn = 1;
4. Pagination with Total Count
WITH
total AS (
SELECT COUNT(*) as total_count
FROM products
WHERE category = 'electronics'
),
page_data AS (
SELECT *
FROM products
WHERE category = 'electronics'
ORDER BY name
LIMIT 20 OFFSET 0
)
SELECT
pd.*,
t.total_count
FROM page_data pd
CROSS JOIN total t;
CTEs vs Subqueries vs Temporary Tables
| Feature | CTE | Subquery | Temp Table |
|---|---|---|---|
| Readability | ✅ Excellent | ❌ Poor | ✅ Good |
| Reusability | ✅ Multiple refs | ❌ No | ✅ Yes |
| Recursion | ✅ Yes | ❌ No | ❌ No |
| Performance | ⚠️ Similar to subquery | ⚠️ Can be slow | ✅ Can be indexed |
| Scope | Single query | Single query | Session |
| Persistence | ❌ No | ❌ No | ✅ Yes |
Performance Considerations
CTEs are primarily for readability, not performance. The database often optimizes them similarly to subqueries.
When CTEs Might Be Slower
-- This CTE will be evaluated multiple times
WITH user_orders AS (
SELECT * FROM orders WHERE user_id = 123
)
SELECT COUNT(*) FROM user_orders
UNION ALL
SELECT SUM(amount) FROM user_orders;
-- Better: Materialize in temp table for multiple uses
CREATE TEMP TABLE user_orders AS
SELECT * FROM orders WHERE user_id = 123;
Optimization Tips
- Use MATERIALIZED (PostgreSQL 12+)
WITH MATERIALIZED user_summary AS (
SELECT user_id, COUNT(*) as order_count
FROM orders
GROUP BY user_id
)
SELECT * FROM user_summary;
- Filter early
-- Good: Filter before CTE
WITH recent_orders AS (
SELECT * FROM orders
WHERE created_at > '2024-01-01' -- Filter here
)
SELECT * FROM recent_orders WHERE status = 'completed';
-- Bad: Filter after CTE
WITH all_orders AS (
SELECT * FROM orders -- Processes all rows
)
SELECT * FROM all_orders
WHERE created_at > '2024-01-01' AND status = 'completed';
Common Pitfalls
1. Over-complicating Simple Queries
-- Bad: Unnecessary CTE
WITH users_cte AS (
SELECT * FROM users WHERE id = 123
)
SELECT * FROM users_cte;
-- Good: Just query directly
SELECT * FROM users WHERE id = 123;
2. Recursive CTE Without Termination
-- Dangerous: Infinite recursion
WITH RECURSIVE infinite AS (
SELECT 1 as n
UNION ALL
SELECT n + 1 FROM infinite -- Never stops!
)
SELECT * FROM infinite;
-- Safe: Add termination condition
WITH RECURSIVE safe AS (
SELECT 1 as n
UNION ALL
SELECT n + 1 FROM safe WHERE n < 100 -- Stops at 100
)
SELECT * FROM safe;
3. Assuming CTEs Are Cached
CTEs are not automatically cached. Each reference might re-execute the query.
Best Practices
- Use meaningful names -
active_usersis better thancte1 - Add comments - Explain complex logic
- Break down complex queries - One logical step per CTE
- Test incrementally - Query each CTE independently
- Consider MATERIALIZED - For expensive CTEs used multiple times
- Limit recursion depth - Always add termination conditions
- Format consistently - Make nested CTEs easy to read
Example: Complete Analytics Query
WITH
-- Get date range
date_range AS (
SELECT
CURRENT_DATE - INTERVAL '30 days' as start_date,
CURRENT_DATE as end_date
),
-- Active users in period
active_users AS (
SELECT DISTINCT user_id
FROM orders
WHERE created_at >= (SELECT start_date FROM date_range)
),
-- User statistics
user_metrics AS (
SELECT
o.user_id,
COUNT(*) as order_count,
SUM(o.amount) as total_spent,
AVG(o.amount) as avg_order_value,
MAX(o.created_at) as last_order_date
FROM orders o
WHERE o.user_id IN (SELECT user_id FROM active_users)
GROUP BY o.user_id
),
-- Customer segments
segments AS (
SELECT
user_id,
CASE
WHEN total_spent > 1000 THEN 'VIP'
WHEN total_spent > 500 THEN 'Premium'
ELSE 'Regular'
END as segment
FROM user_metrics
)
-- Final report
SELECT
s.segment,
COUNT(*) as user_count,
AVG(um.order_count) as avg_orders,
AVG(um.total_spent) as avg_spent,
SUM(um.total_spent) as segment_revenue
FROM segments s
JOIN user_metrics um ON s.user_id = um.user_id
GROUP BY s.segment
ORDER BY segment_revenue DESC;
Conclusion
CTEs are a powerful tool for writing maintainable SQL. They excel at:
- Breaking complex queries into logical steps
- Improving code readability
- Solving recursive problems
- Making queries easier to debug and maintain
While they don’t always improve performance, the readability and maintainability benefits make them invaluable for complex queries.
Start using CTEs in your next query—your future self (and your teammates) will thank you.
Further Reading:
- Window functions with CTEs
- Optimizing recursive queries
- Advanced CTE patterns for analytics
Practice: Try refactoring your most complex SQL query using CTEs. You’ll be surprised how much clearer it becomes!