SQL Practice
The Classic Self-Join (Manager-Employee Hierarchy): Given an Employees table with columns emp_id, emp_name, and manager_id, write a query to display all employees along with their manager's name. How do you ensure employees who do not have a manager (like the CEO) are still included in the output?
SQL Query (Best Version)
SELECT
e.emp_id,
e.emp_name AS employee_name,
m.emp_name AS manager_name
FROM employees e
LEFT JOIN employees m
ON e.manager_id = m.emp_id;How to Approach (Step-by-Step Thinking)
- Understand the Self-Join:
- The table is joined with itself.
- e = Employee table (for the employee)
- m = Employee table (for the manager)
- Join Condition:
- Join on e.manager_id = m.emp_id
- Use LEFT JOIN (Important):
- To include employees who have no manager (like CEO, where manager_id IS NULL).
- INNER JOIN would exclude the CEO.
Cheat Codes / Interview Tips
- Always use LEFT JOIN when you want to include employees without managers.
- If interviewer asks for "only employees who have managers" → use INNER JOIN.
- Common follow-up: "How to show 'No Manager' instead of NULL?"SQLCOALESCE(m.emp_name, 'No Manager') AS manager_name
- Performance Tip: Ensure there is an index on manager_id.
Expected Output Example
| emp_id | employee_name | manager_name |
|---|---|---|
| 1 | CEO | No Manager |
| 2 | Alice | CEO |
| 3 | Bob | Alice |
Multi-Table Left Join Verification: Given three tables: Customers (cust_id, name), Accounts (acc_id, cust_id, acc_type), and Transactions (tx_id, acc_id, amount), write a query to list all customers who have a bank account but have executed zero transactions in the last 6 months.
Here's a strong, interview-ready answer for this SQL question:
SQL Query (Best Version)
SELECT
c.cust_id,
c.name
FROM customers c
INNER JOIN accounts a ON c.cust_id = a.cust_id
LEFT JOIN transactions t
ON a.acc_id = t.acc_id
AND t.tx_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
WHERE t.tx_id IS NULL
GROUP BY c.cust_id, c.name;How to Approach (Step-by-Step Thinking)
- Identify Customers with Accounts:
- Use INNER JOIN between customers and accounts (must have at least one account).
- Check for No Recent Transactions:
- Use LEFT JOIN with transactions table.
- Add date condition in ON clause: tx_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
- Filter Customers with Zero Transactions:
- WHERE t.tx_id IS NULL → This is the key trick for "zero transactions".
- Avoid Duplicates:
- Use GROUP BY c.cust_id, c.name
Cheat Codes / Interview Tips
- Key Trick: Put the date filter in the ON clause of LEFT JOIN (not in WHERE). This is very important.
- Use DATE_SUB(CURDATE(), INTERVAL 6 MONTH) for "last 6 months".
- Always use GROUP BY when joining multiple tables to avoid duplicate rows.
- If interviewer asks for "customers with no accounts" → change INNER JOIN to LEFT JOIN and check a.acc_id IS NULL.
Alternative Version (Using NOT EXISTS - More Efficient)
SELECT
c.cust_id,
c.name
FROM customers c
INNER JOIN accounts a ON c.cust_id = a.cust_id
WHERE NOT EXISTS (
SELECT 1
FROM transactions t
WHERE t.acc_id = a.acc_id
AND t.tx_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
);This version is often preferred in interviews because it's more efficient.
Unmatched Row Isolation (Anti-Join): Write a query to find all records in a Customers table that do not have any corresponding records in the Accounts table. Show three different architectural ways to achieve this (LEFT JOIN with IS NULL, NOT IN, and NOT EXISTS) and explain their internal execution plan differences.
Unmatched Row Isolation (Anti-Join)
Problem: Find all customers who do not have any accounts.
Three Different Ways
1. LEFT JOIN with IS NULL (Most Recommended)
SELECT
c.cust_id,
c.name
FROM Customers c
LEFT JOIN Accounts a ON c.cust_id = a.cust_id
WHERE a.acc_id IS NULL;Execution Plan:
- Does a left outer join and filters where right side is null.
- Generally the best performance with proper indexes.
2. NOT IN
SELECT
cust_id,
name
FROM Customers
WHERE cust_id NOT IN (SELECT cust_id FROM Accounts);Execution Plan:
- Executes the subquery first, then checks exclusion.
- Can be slow if subquery returns many rows or has nulls.
3. NOT EXISTS (Often Best for Performance)
SELECT
c.cust_id,
c.name
FROM Customers c
WHERE NOT EXISTS (
SELECT 1
FROM Accounts a
WHERE a.cust_id = c.cust_id
);Execution Plan:
- Correlated subquery, stops as soon as one match is found.
- Usually the most efficient for this type of anti-join.
How to Approach (Step-by-Step)
- Understand you need Anti-Join (customers without accounts).
- Three standard patterns:
- LEFT JOIN + IS NULL → Most readable
- NOT IN → Simple but can have performance issues with nulls
- NOT EXISTS → Best performance for large tables
- Always use proper indexes on join columns (cust_id).
Cheat Codes / Interview Tips
- Preferred Answer: Start with LEFT JOIN + IS NULL (most readable), then mention NOT EXISTS is better for performance.
- NOT IN can give wrong results if subquery has NULL values.
- NOT EXISTS is usually the fastest for large datasets.
- Always mention indexing on join columns.
Write SQL to fetch top 5 customers with total successful transactions in the last 30 days. For what columns will you add index and why ?
Only the transactions table given, transaction table has tx_id, tx_date, cust_id, status, amount
SQL Query
SELECT
cust_id,
COUNT(*) AS successful_tx_count,
SUM(amount) AS total_amount
FROM transactions
WHERE tx_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
AND status = 'SUCCESS'
GROUP BY cust_id
ORDER BY successful_tx_count DESC, total_amount DESC
LIMIT 5;How to Approach (Step-by-Step)
- Filter transactions from last 30 days using tx_date.
- Filter only successful transactions (status = 'SUCCESS').
- Group by cust_id.
- Calculate count and sum.
- Sort by count descending, then by amount descending.
- Limit to top 5.
Indexing Strategy (Very Important in Interview)
For this query, I would create the following indexes:
| Column Combination | Index Type | Why? |
|---|---|---|
| (tx_date, status, cust_id) | Composite | Covers WHERE + GROUP BY + ORDER BY |
| (status, tx_date, cust_id) | Composite | If status filter is more selective |
| cust_id | Single | For GROUP BY and future joins |
Best Index for this query:
CREATE INDEX idx_tx_date_status_cust ON transactions(tx_date, status, cust_id);Why this index?
- tx_date for range filter (last 30 days)
- status for equality filter ('SUCCESS')
- cust_id for grouping
This is called a covering index for this query.
Cheat Codes / Interview Tips
- Always mention composite index on filter + group by columns.
- Mention EXPLAIN to verify index usage.
- Talk about cardinality (e.g., status has low cardinality, so put it after tx_date).
- If interviewer asks for performance: "Index on (tx_date, status) will help avoid full table scan."
Cross-Join Data Generation: How do you generate a matrix of every possible combination between a Branches table and a FinancialProducts table for a quarterly performance tracking template?
SQL Query (Cross-Join for Matrix Generation)
SELECT
b.branch_id,
b.branch_name,
p.product_id,
p.product_name,
'Q2 2026' AS quarter, -- You can make this dynamic
NULL AS performance_score, -- Placeholder for later update
NULL AS revenue,
NULL AS customer_count
FROM Branches b
CROSS JOIN FinancialProducts p
ORDER BY b.branch_id, p.product_id;How to Approach (Step-by-Step)
- Understand the requirement: You need every possible combination of Branch and Financial Product.
- Use CROSS JOIN: This is the key — it generates Cartesian product (every row from first table with every row from second table).
- Add static columns: Quarter, placeholders for performance metrics.
- Order the result for readability.
Cheat Codes / Interview Tips
- CROSS JOIN is the correct choice when you want all combinations.
- Alternative: INNER JOIN with ON 1=1 (same effect, but CROSS JOIN is clearer).
- For performance with large tables:
- Add LIMIT during development.
- Consider materializing the result into a temporary table.
- Common follow-up: "How to generate for multiple quarters?"SQL
CROSS JOIN (SELECT 'Q1 2026' AS quarter UNION SELECT 'Q2 2026') q
Why we are setting as NULL? How we can update later ?
Full Picture (Step by Step):
Step 1: Create the Template Table (branch_product_matrix)
This is the main matrix table we generate using CROSS JOIN.
CREATE TABLE branch_product_matrix (
branch_id INT,
branch_name VARCHAR(100),
product_id INT,
product_name VARCHAR(100),
quarter VARCHAR(20),
performance_score DECIMAL(10,2) DEFAULT NULL,
revenue DECIMAL(15,2) DEFAULT NULL,
customer_count INT DEFAULT NULL,
PRIMARY KEY (branch_id, product_id, quarter)
);Step 2: Populate the Matrix (using CROSS JOIN)
INSERT INTO branch_product_matrix (branch_id, branch_name, product_id, product_name, quarter)
SELECT
b.branch_id,
b.branch_name,
p.product_id,
p.product_name,
'Q2 2026' AS quarter
FROM Branches b
CROSS JOIN FinancialProducts p;Step 3: Update Performance Data Later
UPDATE branch_product_matrix m
JOIN quarterly_performance p
ON m.branch_id = p.branch_id
AND m.product_id = p.product_id
AND m.quarter = p.quarter
SET m.performance_score = p.score,
m.revenue = p.revenue,
m.customer_count = p.customer_count;Summary:
- branch_product_matrix = Template table (created once, holds all combinations)
- quarterly_performance = Source table that contains actual performance data (comes later)
We first create the full structure (matrix), then fill in the values as data becomes available.
Self-Join for Consecutive Event Tracking: In a LoginAudit table (user_id, login_time, ip_address), write a query to detect fraud cases where the same user logged in from two different IP addresses within less than 5 minutes of each other.
SQL Query (Best Version)
SELECT
a.user_id,
a.login_time AS first_login,
b.login_time AS second_login,
a.ip_address AS first_ip,
b.ip_address AS second_ip,
TIMESTAMPDIFF(MINUTE, a.login_time, b.login_time) AS time_diff_minutes
FROM LoginAudit a
INNER JOIN LoginAudit b
ON a.user_id = b.user_id
AND a.login_time < b.login_time
AND TIMESTAMPDIFF(MINUTE, a.login_time, b.login_time) < 5
AND a.ip_address != b.ip_address
ORDER BY a.user_id, a.login_time;How to Approach (Step-by-Step)
- Self-Join on user_id
- Ensure a.login_time < b.login_time (to avoid duplicate pairs)
- Check time difference < 5 minutes using TIMESTAMPDIFF(MINUTE, ...)
- Check different IP addresses (a.ip_address != b.ip_address)
- Order by user and time for readability
Cheat Codes / Interview Tips
- Always use a.login_time < b.login_time in self-join to avoid duplicate pairs (A-B and B-A).
- TIMESTAMPDIFF(MINUTE, a.login_time, b.login_time) < 5 is the key condition.
- Alternative using NOT EXISTS or Window functions for more advanced versions.
- Mention performance: Index on (user_id, login_time) is very important.
Alternative Version (Using Window Function - More Modern)
WITH LoginWithPrev AS (
SELECT
user_id,
login_time,
ip_address,
LAG(ip_address) OVER (PARTITION BY user_id ORDER BY login_time) AS prev_ip,
LAG(login_time) OVER (PARTITION BY user_id ORDER BY login_time) AS prev_login_time
FROM LoginAudit
)
SELECT
user_id,
prev_login_time,
login_time,
prev_ip,
ip_address
FROM LoginWithPrev
WHERE prev_ip IS NOT NULL
AND ip_address != prev_ip
AND TIMESTAMPDIFF(MINUTE, prev_login_time, login_time) < 5;