SQL Practice

The Classic Self-Join (Manager-Employee Hierarchy): Given an Employees table with columns emp_id, emp_name, and manager_id, write a query to display all employees along with their manager's name. How do you ensure employees who do not have a manager (like the CEO) are still included in the output?

SQL Query (Best Version)

SQL
SELECT 
    e.emp_id,
    e.emp_name AS employee_name,
    m.emp_name AS manager_name
FROM employees e
LEFT JOIN employees m 
    ON e.manager_id = m.emp_id;

How to Approach (Step-by-Step Thinking)

  1. Understand the Self-Join:
    • The table is joined with itself.
    • e = Employee table (for the employee)
    • m = Employee table (for the manager)
  2. Join Condition:
    • Join on e.manager_id = m.emp_id
  3. Use LEFT JOIN (Important):
    • To include employees who have no manager (like CEO, where manager_id IS NULL).
    • INNER JOIN would exclude the CEO.

Cheat Codes / Interview Tips

  • Always use LEFT JOIN when you want to include employees without managers.
  • If interviewer asks for "only employees who have managers" → use INNER JOIN.
  • Common follow-up: "How to show 'No Manager' instead of NULL?"
    SQL
     
    COALESCE(m.emp_name, 'No Manager') AS manager_name 
     
  • Performance Tip: Ensure there is an index on manager_id.

Expected Output Example

emp_idemployee_namemanager_name
1CEONo Manager
2AliceCEO
3BobAlice

Multi-Table Left Join Verification: Given three tables: Customers (cust_id, name), Accounts (acc_id, cust_id, acc_type), and Transactions (tx_id, acc_id, amount), write a query to list all customers who have a bank account but have executed zero transactions in the last 6 months. 

Here's a strong, interview-ready answer for this SQL question:


SQL Query (Best Version)

SQL
SELECT 
    c.cust_id,
    c.name
FROM customers c
INNER JOIN accounts a ON c.cust_id = a.cust_id
LEFT JOIN transactions t 
    ON a.acc_id = t.acc_id 
    AND t.tx_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
WHERE t.tx_id IS NULL
GROUP BY c.cust_id, c.name;

How to Approach (Step-by-Step Thinking)

  1. Identify Customers with Accounts:
    • Use INNER JOIN between customers and accounts (must have at least one account).
  2. Check for No Recent Transactions:
    • Use LEFT JOIN with transactions table.
    • Add date condition in ON clause: tx_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
  3. Filter Customers with Zero Transactions:
    • WHERE t.tx_id IS NULL → This is the key trick for "zero transactions".
  4. Avoid Duplicates:
    • Use GROUP BY c.cust_id, c.name

Cheat Codes / Interview Tips

  • Key Trick: Put the date filter in the ON clause of LEFT JOIN (not in WHERE). This is very important.
  • Use DATE_SUB(CURDATE(), INTERVAL 6 MONTH) for "last 6 months".
  • Always use GROUP BY when joining multiple tables to avoid duplicate rows.
  • If interviewer asks for "customers with no accounts" → change INNER JOIN to LEFT JOIN and check a.acc_id IS NULL.

Alternative Version (Using NOT EXISTS - More Efficient)

SQL
SELECT 
    c.cust_id,
    c.name
FROM customers c
INNER JOIN accounts a ON c.cust_id = a.cust_id
WHERE NOT EXISTS (
    SELECT 1 
    FROM transactions t 
    WHERE t.acc_id = a.acc_id 
      AND t.tx_date >= DATE_SUB(CURDATE(), INTERVAL 6 MONTH)
);

This version is often preferred in interviews because it's more efficient.

Unmatched Row Isolation (Anti-Join): Write a query to find all records in a Customers table that do not have any corresponding records in the Accounts table. Show three different architectural ways to achieve this (LEFT JOIN with IS NULL, NOT IN, and NOT EXISTS) and explain their internal execution plan differences.  

Unmatched Row Isolation (Anti-Join)

Problem: Find all customers who do not have any accounts.


Three Different Ways

1. LEFT JOIN with IS NULL (Most Recommended)

SQL
SELECT 
    c.cust_id,
    c.name
FROM Customers c
LEFT JOIN Accounts a ON c.cust_id = a.cust_id
WHERE a.acc_id IS NULL;

Execution Plan:

  • Does a left outer join and filters where right side is null.
  • Generally the best performance with proper indexes.

2. NOT IN

SQL
SELECT 
    cust_id,
    name
FROM Customers
WHERE cust_id NOT IN (SELECT cust_id FROM Accounts);

Execution Plan:

  • Executes the subquery first, then checks exclusion.
  • Can be slow if subquery returns many rows or has nulls.

3. NOT EXISTS (Often Best for Performance)

SQL
SELECT 
    c.cust_id,
    c.name
FROM Customers c
WHERE NOT EXISTS (
    SELECT 1 
    FROM Accounts a 
    WHERE a.cust_id = c.cust_id
);

Execution Plan:

  • Correlated subquery, stops as soon as one match is found.
  • Usually the most efficient for this type of anti-join.

How to Approach (Step-by-Step)

  1. Understand you need Anti-Join (customers without accounts).
  2. Three standard patterns:
    • LEFT JOIN + IS NULL → Most readable
    • NOT IN → Simple but can have performance issues with nulls
    • NOT EXISTS → Best performance for large tables
  3. Always use proper indexes on join columns (cust_id).

Cheat Codes / Interview Tips

  • Preferred Answer: Start with LEFT JOIN + IS NULL (most readable), then mention NOT EXISTS is better for performance.
  • NOT IN can give wrong results if subquery has NULL values.
  • NOT EXISTS is usually the fastest for large datasets.
  • Always mention indexing on join columns.

Write SQL to fetch top 5 customers with total successful transactions in the last 30 days. For what columns will you add index and why ?  
 
Only the transactions table given, transaction table has tx_id, tx_date, cust_id, status, amount

SQL Query

SQL
SELECT 
    cust_id,
    COUNT(*) AS successful_tx_count,
    SUM(amount) AS total_amount
FROM transactions
WHERE tx_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
  AND status = 'SUCCESS'
GROUP BY cust_id
ORDER BY successful_tx_count DESC, total_amount DESC
LIMIT 5;

How to Approach (Step-by-Step)

  1. Filter transactions from last 30 days using tx_date.
  2. Filter only successful transactions (status = 'SUCCESS').
  3. Group by cust_id.
  4. Calculate count and sum.
  5. Sort by count descending, then by amount descending.
  6. Limit to top 5.

Indexing Strategy (Very Important in Interview)

For this query, I would create the following indexes:

Column CombinationIndex TypeWhy?
(tx_date, status, cust_id)CompositeCovers WHERE + GROUP BY + ORDER BY
(status, tx_date, cust_id)CompositeIf status filter is more selective
cust_idSingleFor GROUP BY and future joins

Best Index for this query:

SQL
CREATE INDEX idx_tx_date_status_cust ON transactions(tx_date, status, cust_id);

Why this index?

  • tx_date for range filter (last 30 days)
  • status for equality filter ('SUCCESS')
  • cust_id for grouping

This is called a covering index for this query.


Cheat Codes / Interview Tips

  • Always mention composite index on filter + group by columns.
  • Mention EXPLAIN to verify index usage.
  • Talk about cardinality (e.g., status has low cardinality, so put it after tx_date).
  • If interviewer asks for performance: "Index on (tx_date, status) will help avoid full table scan."

Cross-Join Data Generation: How do you generate a matrix of every possible combination between a Branches table and a FinancialProducts table for a quarterly performance tracking template?

SQL Query (Cross-Join for Matrix Generation)

SQL
SELECT 
    b.branch_id,
    b.branch_name,
    p.product_id,
    p.product_name,
    'Q2 2026' AS quarter,           -- You can make this dynamic
    NULL AS performance_score,      -- Placeholder for later update
    NULL AS revenue,
    NULL AS customer_count
FROM Branches b
CROSS JOIN FinancialProducts p
ORDER BY b.branch_id, p.product_id;

How to Approach (Step-by-Step)

  1. Understand the requirement: You need every possible combination of Branch and Financial Product.
  2. Use CROSS JOIN: This is the key — it generates Cartesian product (every row from first table with every row from second table).
  3. Add static columns: Quarter, placeholders for performance metrics.
  4. Order the result for readability.

Cheat Codes / Interview Tips

  • CROSS JOIN is the correct choice when you want all combinations.
  • Alternative: INNER JOIN with ON 1=1 (same effect, but CROSS JOIN is clearer).
  • For performance with large tables:
    • Add LIMIT during development.
    • Consider materializing the result into a temporary table.
  • Common follow-up: "How to generate for multiple quarters?"
    SQL
    CROSS JOIN (SELECT 'Q1 2026' AS quarter UNION SELECT 'Q2 2026') q

Why we are setting as NULL? How we can update later ?

Full Picture (Step by Step):

Step 1: Create the Template Table (branch_product_matrix)

This is the main matrix table we generate using CROSS JOIN.

SQL
CREATE TABLE branch_product_matrix (
    branch_id INT,
    branch_name VARCHAR(100),
    product_id INT,
    product_name VARCHAR(100),
    quarter VARCHAR(20),
    performance_score DECIMAL(10,2) DEFAULT NULL,
    revenue DECIMAL(15,2) DEFAULT NULL,
    customer_count INT DEFAULT NULL,
    PRIMARY KEY (branch_id, product_id, quarter)
);

Step 2: Populate the Matrix (using CROSS JOIN)

SQL
INSERT INTO branch_product_matrix (branch_id, branch_name, product_id, product_name, quarter)
SELECT 
    b.branch_id,
    b.branch_name,
    p.product_id,
    p.product_name,
    'Q2 2026' AS quarter
FROM Branches b
CROSS JOIN FinancialProducts p;

Step 3: Update Performance Data Later

SQL
UPDATE branch_product_matrix m
JOIN quarterly_performance p 
    ON m.branch_id = p.branch_id 
   AND m.product_id = p.product_id 
   AND m.quarter = p.quarter
SET m.performance_score = p.score,
    m.revenue = p.revenue,
    m.customer_count = p.customer_count;

Summary:

  • branch_product_matrix = Template table (created once, holds all combinations)
  • quarterly_performance = Source table that contains actual performance data (comes later)

We first create the full structure (matrix), then fill in the values as data becomes available.

Self-Join for Consecutive Event Tracking: In a LoginAudit table (user_id, login_time, ip_address), write a query to detect fraud cases where the same user logged in from two different IP addresses within less than 5 minutes of each other. 

SQL Query (Best Version)

SQL
SELECT 
    a.user_id,
    a.login_time AS first_login,
    b.login_time AS second_login,
    a.ip_address AS first_ip,
    b.ip_address AS second_ip,
    TIMESTAMPDIFF(MINUTE, a.login_time, b.login_time) AS time_diff_minutes
FROM LoginAudit a
INNER JOIN LoginAudit b 
    ON a.user_id = b.user_id
    AND a.login_time < b.login_time
    AND TIMESTAMPDIFF(MINUTE, a.login_time, b.login_time) < 5
    AND a.ip_address != b.ip_address
ORDER BY a.user_id, a.login_time;

How to Approach (Step-by-Step)

  1. Self-Join on user_id
  2. Ensure a.login_time < b.login_time (to avoid duplicate pairs)
  3. Check time difference < 5 minutes using TIMESTAMPDIFF(MINUTE, ...)
  4. Check different IP addresses (a.ip_address != b.ip_address)
  5. Order by user and time for readability

Cheat Codes / Interview Tips

  • Always use a.login_time < b.login_time in self-join to avoid duplicate pairs (A-B and B-A).
  • TIMESTAMPDIFF(MINUTE, a.login_time, b.login_time) < 5 is the key condition.
  • Alternative using NOT EXISTS or Window functions for more advanced versions.
  • Mention performance: Index on (user_id, login_time) is very important.

Alternative Version (Using Window Function - More Modern)

SQL
WITH LoginWithPrev AS (
    SELECT 
        user_id,
        login_time,
        ip_address,
        LAG(ip_address) OVER (PARTITION BY user_id ORDER BY login_time) AS prev_ip,
        LAG(login_time) OVER (PARTITION BY user_id ORDER BY login_time) AS prev_login_time
    FROM LoginAudit
)
SELECT 
    user_id,
    prev_login_time,
    login_time,
    prev_ip,
    ip_address
FROM LoginWithPrev
WHERE prev_ip IS NOT NULL
  AND ip_address != prev_ip
  AND TIMESTAMPDIFF(MINUTE, prev_login_time, login_time) < 5;