Learn how to design normalized database schemas and write efficient queries using joins, relationships, and advanced PostgreSQL techniques. From one-to-many to polymorphic relationships with real-world examples.

SQL joins are the backbone of relational databases. Yet many engineers struggle with them—not because joins are inherently complex, but because they're often taught in isolation, divorced from the real-world schema design decisions that make joins necessary.
This article takes a different approach. We'll build a realistic e-commerce database from scratch, explore every relationship type you'll encounter in production systems, and show you how to query them efficiently. By the end, you'll understand not just how to write joins, but why certain relationships exist and when to use them.
Whether you're optimizing slow queries, designing a new schema, or preparing for a system design interview, this guide covers the practical patterns that matter.
Before diving into joins, let's establish why they exist. Relational databases enforce normalization—the practice of organizing data to minimize redundancy and maintain data integrity.
Consider storing customer information. You could duplicate a customer's address in every order row, but that creates problems:
Joins solve this by letting you split data across tables and reassemble it on demand. The cost? You need to understand relationships.
A one-to-one relationship means each row in table A corresponds to exactly one row in table B, and vice versa.
Real-world example: A user has exactly one profile, and a profile belongs to exactly one user.
CREATE TABLE users (
id SERIAL PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE user_profiles (
id SERIAL PRIMARY KEY,
user_id INTEGER UNIQUE NOT NULL REFERENCES users(id) ON DELETE CASCADE,
bio TEXT,
avatar_url VARCHAR(255),
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);The UNIQUE constraint on user_id enforces the one-to-one relationship. Without it, multiple profiles could reference the same user.
Query example:
SELECT
u.id,
u.email,
p.bio,
p.avatar_url
FROM users u
LEFT JOIN user_profiles p ON u.id = p.user_id
WHERE u.id = 42;We use LEFT JOIN here because a user might not have a profile yet. If we used INNER JOIN, users without profiles would be excluded.
One-to-many is the most common relationship. One row in table A can have multiple rows in table B, but each row in B belongs to exactly one row in A.
Real-world example: A customer has many orders, but each order belongs to one customer.
CREATE TABLE customers (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id) ON DELETE CASCADE,
order_date TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
total_amount DECIMAL(10, 2),
status VARCHAR(50) DEFAULT 'pending'
);
CREATE INDEX idx_orders_customer_id ON orders(customer_id);The index on customer_id is crucial. Without it, queries filtering by customer become full table scans.
Query example:
SELECT
c.id,
c.name,
c.email,
o.id AS order_id,
o.order_date,
o.total_amount,
o.status
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
WHERE c.id = 5
ORDER BY o.order_date DESC;This returns one row per order. If a customer has 100 orders, you get 100 rows. This is correct behavior—it's called "fan-out" and is expected in one-to-many joins.
Many-to-many means rows in table A can relate to multiple rows in table B, and vice versa. You need a junction table (also called a join table or bridge table) to represent this.
Real-world example: Products belong to many categories, and categories contain many products.
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2) NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE categories (
id SERIAL PRIMARY KEY,
name VARCHAR(100) NOT NULL UNIQUE,
description TEXT
);
CREATE TABLE product_categories (
product_id INTEGER NOT NULL REFERENCES products(id) ON DELETE CASCADE,
category_id INTEGER NOT NULL REFERENCES categories(id) ON DELETE CASCADE,
PRIMARY KEY (product_id, category_id)
);
CREATE INDEX idx_product_categories_category_id ON product_categories(category_id);The junction table has a composite primary key ensuring no duplicate relationships. The second index allows efficient queries from the category side.
Query example:
SELECT
p.id,
p.name,
p.price,
c.id AS category_id,
c.name AS category_name
FROM products p
LEFT JOIN product_categories pc ON p.id = pc.product_id
LEFT JOIN categories c ON pc.category_id = c.id
WHERE p.id = 10;Reverse query (categories with products):
SELECT
c.id,
c.name,
p.id AS product_id,
p.name AS product_name,
p.price
FROM categories c
LEFT JOIN product_categories pc ON c.id = pc.category_id
LEFT JOIN products p ON pc.product_id = p.id
WHERE c.id = 3
ORDER BY p.name;Polymorphic relationships allow a single table to relate to multiple different tables. This is common in systems with shared behavior across different entity types.
Real-world example: Comments can be attached to posts, products, or reviews. Instead of separate tables for each comment type, use a single comments table with a type indicator.
CREATE TABLE posts (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
content TEXT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE products (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
price DECIMAL(10, 2)
);
CREATE TABLE reviews (
id SERIAL PRIMARY KEY,
title VARCHAR(255) NOT NULL,
rating INTEGER CHECK (rating >= 1 AND rating <= 5)
);
CREATE TABLE comments (
id SERIAL PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50) NOT NULL,
commentable_id INTEGER NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
CHECK (commentable_type IN ('post', 'product', 'review'))
);
CREATE INDEX idx_comments_polymorphic ON comments(commentable_type, commentable_id);The commentable_type and commentable_id columns form a polymorphic key. The index on both columns is essential for performance.
Query example (fetch comments for a specific post):
SELECT
p.id,
p.title,
p.content,
c.id AS comment_id,
c.content AS comment_text,
c.created_at
FROM posts p
LEFT JOIN comments c ON p.id = c.commentable_id
AND c.commentable_type = 'post'
WHERE p.id = 7
ORDER BY c.created_at DESC;The AND c.commentable_type = 'post' clause is critical. Without it, you'd get comments from other entity types.
Warning
Polymorphic relationships trade flexibility for complexity. Each query requires filtering by type. Consider whether a union of separate tables or a more normalized design might be clearer.
A table can reference itself, useful for hierarchical data like organizational structures or threaded comments.
Real-world example: Employees have managers (who are also employees).
CREATE TABLE employees (
id SERIAL PRIMARY KEY,
name VARCHAR(255) NOT NULL,
email VARCHAR(255) UNIQUE NOT NULL,
manager_id INTEGER REFERENCES employees(id) ON DELETE SET NULL,
department VARCHAR(100),
salary DECIMAL(10, 2)
);
CREATE INDEX idx_employees_manager_id ON employees(manager_id);Query example (fetch employee with manager details):
SELECT
e.id,
e.name,
e.email,
e.department,
m.id AS manager_id,
m.name AS manager_name,
m.email AS manager_email
FROM employees e
LEFT JOIN employees m ON e.manager_id = m.id
WHERE e.id = 15;Recursive query (fetch entire reporting chain):
WITH RECURSIVE reporting_chain AS (
-- Base case: start with the employee
SELECT
id,
name,
manager_id,
1 AS level
FROM employees
WHERE id = 15
UNION ALL
-- Recursive case: get the manager's manager, etc.
SELECT
e.id,
e.name,
e.manager_id,
rc.level + 1
FROM employees e
INNER JOIN reporting_chain rc ON e.id = rc.manager_id
WHERE rc.level < 10 -- Prevent infinite loops
)
SELECT * FROM reporting_chain
ORDER BY level;This recursive CTE walks up the organizational hierarchy, showing each employee's chain of command.
Returns only rows where the join condition matches in both tables.
SELECT
c.name,
o.id AS order_id,
o.total_amount
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
WHERE o.total_amount > 100;This returns only customers who have orders over $100. Customers without orders are excluded.
Returns all rows from the left table, with matching rows from the right table. Non-matching rows have NULL values for right table columns.
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;This returns all customers, even those with zero orders (order_count will be 0).
The opposite of LEFT JOIN. Returns all rows from the right table.
SELECT
c.name,
o.id AS order_id
FROM customers c
RIGHT JOIN orders o ON c.id = o.customer_id;In practice, RIGHT JOIN is rarely used. You can always rewrite it as a LEFT JOIN by swapping table order.
Returns all rows from both tables, with NULLs where matches don't exist.
SELECT
c.id,
c.name,
o.id AS order_id,
o.total_amount
FROM customers c
FULL OUTER JOIN orders o ON c.id = o.customer_id;Useful for finding unmatched records in either table.
Returns the Cartesian product—every row from the left table paired with every row from the right table.
SELECT
c.name,
p.name
FROM customers c
CROSS JOIN products p;This returns 1,000 rows if you have 10 customers and 100 products. Use sparingly—it's easy to accidentally create massive result sets.
Let's design a realistic schema combining multiple relationship types:
Fetch customers with their order count and average order value:
SELECT
c.id,
c.name,
c.email,
COUNT(DISTINCT o.id) AS total_orders,
COALESCE(AVG(o.total_amount), 0) AS avg_order_value,
COALESCE(SUM(o.total_amount), 0) AS lifetime_value
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name, c.email
ORDER BY lifetime_value DESC;The DISTINCT in COUNT(DISTINCT o.id) prevents double-counting if a customer appears multiple times due to other joins.
Find customers with more than 5 orders:
SELECT
c.id,
c.name,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name
HAVING COUNT(o.id) > 5
ORDER BY order_count DESC;Use HAVING for conditions on aggregated values, not WHERE (which filters before aggregation).
Get products with their category list and average rating:
SELECT
p.id,
p.name,
p.price,
STRING_AGG(c.name, ', ') AS categories,
ROUND(AVG(r.rating)::NUMERIC, 2) AS avg_rating,
COUNT(r.id) AS review_count
FROM products p
LEFT JOIN product_categories pc ON p.id = pc.product_id
LEFT JOIN categories c ON pc.category_id = c.id
LEFT JOIN reviews r ON p.id = r.product_id
GROUP BY p.id, p.name, p.price
ORDER BY avg_rating DESC NULLS LAST;STRING_AGG concatenates category names into a single string. NULLS LAST puts products without reviews at the end.
Rank products by sales within each category:
SELECT
c.name AS category,
p.name AS product,
SUM(oi.quantity) AS units_sold,
ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY SUM(oi.quantity) DESC) AS rank_in_category
FROM categories c
INNER JOIN product_categories pc ON c.id = pc.category_id
INNER JOIN products p ON pc.product_id = p.id
LEFT JOIN order_items oi ON p.id = oi.product_id
GROUP BY c.id, c.name, p.id, p.name
ORDER BY c.name, rank_in_category;ROW_NUMBER() OVER (PARTITION BY c.id ORDER BY ...) assigns a rank within each category.
CREATE TABLE orders (
id SERIAL PRIMARY KEY,
customer_id INTEGER NOT NULL REFERENCES customers(id),
total_amount DECIMAL(10, 2)
);
-- Missing index on customer_id!Without an index, queries filtering by customer_id scan the entire table. Always index foreign keys:
CREATE INDEX idx_orders_customer_id ON orders(customer_id);SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
INNER JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;This returns only customers with orders. If you want all customers (including those with zero orders), use LEFT JOIN.
SELECT
c.name,
o.id,
oi.id
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id;If a customer has 10 orders and each order has 5 items, you get 50 rows (10 Ă— 5). This is correct for this query, but it's easy to accidentally multiply rows. Always verify your result set size.
SELECT
c.id,
c.name,
COUNT(o.id) AS order_count,
COUNT(r.id) AS review_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN reviews r ON c.id = r.customer_id
GROUP BY c.id, c.name;This double-counts because each order-review combination creates a row. The fix is to aggregate separately:
SELECT
c.id,
c.name,
(SELECT COUNT(*) FROM orders WHERE customer_id = c.id) AS order_count,
(SELECT COUNT(*) FROM reviews WHERE customer_id = c.id) AS review_count
FROM customers c;Or use window functions:
SELECT DISTINCT
c.id,
c.name,
COUNT(o.id) OVER (PARTITION BY c.id) AS order_count,
COUNT(r.id) OVER (PARTITION BY c.id) AS review_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN reviews r ON c.id = r.customer_id;SELECT c.id, c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id;If a customer has 1 order with 5 items, you get 5 rows for that customer. Use DISTINCT if you want unique customers:
SELECT DISTINCT c.id, c.name
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN order_items oi ON o.id = oi.order_id;EXPLAIN ANALYZE
SELECT
c.name,
COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
GROUP BY c.id, c.name;Look for sequential scans on large tables—they indicate missing indexes. Nested loops are fine for small result sets but problematic at scale.
If you frequently query customer lifetime value, consider storing it in the customers table:
ALTER TABLE customers ADD COLUMN lifetime_value DECIMAL(10, 2) DEFAULT 0;
-- Update via trigger or batch job
UPDATE customers c
SET lifetime_value = COALESCE(
(SELECT SUM(total_amount) FROM orders WHERE customer_id = c.id),
0
);Trade-off: You must keep this value in sync. Use triggers or scheduled jobs.
CREATE MATERIALIZED VIEW customer_metrics AS
SELECT
c.id,
c.name,
COUNT(DISTINCT o.id) AS total_orders,
COALESCE(SUM(o.total_amount), 0) AS lifetime_value,
COALESCE(AVG(r.rating), 0) AS avg_product_rating
FROM customers c
LEFT JOIN orders o ON c.id = o.customer_id
LEFT JOIN reviews r ON c.id = r.customer_id
GROUP BY c.id, c.name;
CREATE INDEX idx_customer_metrics_lifetime_value ON customer_metrics(lifetime_value DESC);
-- Refresh periodically
REFRESH MATERIALIZED VIEW customer_metrics;Materialized views store query results on disk. They're stale until refreshed but much faster than computing on-the-fly.
Foreign keys are join conditions. Without indexes, joins become full table scans.
-- Good: Foreign key name indicates the relationship
ALTER TABLE orders ADD CONSTRAINT fk_orders_customer_id
FOREIGN KEY (customer_id) REFERENCES customers(id);
-- Good: Junction table name is descriptive
CREATE TABLE product_categories (
product_id INTEGER,
category_id INTEGER
);Eliminate transitive dependencies. If column A depends on B, and B depends on C, then A shouldn't be in the same table as C.
-- Good: Surrogate key
CREATE TABLE products (
id SERIAL PRIMARY KEY,
sku VARCHAR(50) UNIQUE,
name VARCHAR(255)
);
-- Avoid: Natural key that might change
CREATE TABLE products (
sku VARCHAR(50) PRIMARY KEY,
name VARCHAR(255)
);Surrogate keys are stable and efficient for joins. Natural keys (like SKU) can change, breaking relationships.
COMMENT ON TABLE customers IS 'Core customer records';
COMMENT ON COLUMN customers.email IS 'Unique email address, used for login';
COMMENT ON TABLE orders IS 'Customer orders with timestamps and totals';
COMMENT ON COLUMN orders.customer_id IS 'Foreign key to customers table';Use \d+ table_name in psql to view comments.
A query that's fast with 1,000 rows might be slow with 1 million. Always test with production-like data.
In OLAP systems (data warehouses), denormalization is standard. Joins are expensive at scale, so data is pre-aggregated.
If you're using MongoDB or similar, embedding related data is often better than joins. Document databases don't have efficient joins.
For high-frequency queries, pre-computed aggregations (materialized views, caches) beat joins every time.
If data lives in separate services, joins happen in application code, not SQL. Consider API calls or event-driven synchronization.
SQL joins are powerful because they enforce data integrity while keeping storage efficient. The key is understanding your relationships—one-to-one, one-to-many, many-to-many, polymorphic, and self-referential—and choosing the right join type for each query.
Start with a normalized schema, index your foreign keys, and use EXPLAIN ANALYZE to catch performance issues early. As your system grows, denormalization and materialized views become tools, not shortcuts.
The e-commerce schema we built covers most real-world patterns. Use it as a template for your own designs, and you'll write efficient queries that scale.
Next steps: Build a schema for your domain, write queries against it, and analyze the execution plans. The patterns you learn will transfer to any relational database.