--------------------------------------------------------------------------------------------------
Interview Questions 2026
--------------------------------------------------------------------------------------------------
Explain the internal structural updates made to HashMap performance when handling severe bucket collisions (Treeifying threshold vs. Detreeifying threshold). How does it behave if the key's hashCode() always returns a constant?
The Core Collision Mechanic
=> When multiple keys map to the same bucket array index, Java switches from a linked list to a balanced Red-Black Tree to keep search times fast.
=> This structural upgrade changes the worst-case lookup time from linear time $O(N)$ down to logarithmic time $O(\log N)$.
The Two Critical Thresholds
=> Treeifying Threshold (8): When a bucket's linked list reaches 8 nodes AND the total map capacity is at least 64, the bucket transforms into a Red-Black Tree. If the map capacity is less than 64, the map expands (resizes) instead.
=> Detreeifying Threshold (6): During a resize operation, if the elements in a tree bucket drop down to 6 nodes, the tree converts back into a standard linked list to save memory overhead.
Constant HashCode Behavior Trap
=> If hashCode() always returns a constant value, every single entry you put into the map will land in the exact same bucket.
Here is how the HashMap reacts under the hood:
=> The Structure: The first 8 entries form a linked list. The 9th entry triggers a treeification, turning that single bucket into a Red-Black Tree.
=> The equals() Dependency: Even though it becomes a tree, the map must still compare keys. It will try to use the key's class comparison if it implements Comparable.
=> The Worst-Case Performance: If your keys do not implement Comparable, the map falls back to checking equals() on the tree nodes, keeping your insertion and lookup operations highly inefficient.
Key point : HashMap default load factor 0.75
Walk through the complete lifecycle of an object in the JVM Heap. How do generational garbage collectors (like G1 GC or ZGC) minimize STW (Stop-The-World) pauses for highly transactional backend systems?
Part 1: The Complete Lifecycle of an Object
1. Allocation in the Eden Space (Young generation)
=> Every time your Spring Boot application runs the new keyword, the object is allocated inside the Eden Space of the Young Generation.
=> Short-lived objects like local transaction variables or validation DTOs are created and die right here during high-frequency processing.
2. Aging inside the Survivor Spaces (S0 / S1)
=> When the Eden space fills up, the JVM triggers a Minor GC.
=> Live objects are identified, and dead objects are purged (purged means, collected to be eliminated/removed)
=> Survived objects copy over to Survivor Space 0 (S0), and their age tracking increments to 1.
=> Ping-Pong : On the next minor cycle, live elements from Eden and S0 copy over to Survivor Space 1 (S1). This alternative copying continues back and forth to keep memory defragmented.
3. Promotion to the Old Generation
=> Each time an object survives a Minor GC cycle, its age counter ticks upward.
=> Once it passes the Tenuring Threshold (the default maximum limit is 15), it is promoted to the Old Generation.
=> Long-lived infrastructural objects like your database connection pools, configuration properties, or active cached entities reside here permanently.
Part 2: Minimizing STW Pauses in Production
=> Traditional garbage collectors freeze your entire backend application execution (Stop-The-World pause) to scan and clean memory, causing critical request timeouts.
=> Modern collectors (G1 GC, ZGC) use unique runtime designs to eliminate these massive latency spikes
G1 GC (Garbage-First Garbage Collector)
=> G1 completely breaks the traditional continuous heap layout into thousands of small, independent, equal-sized Memory Regions.
=> Instead of scanning the entire Old Generation at once, it scans regions concurrently while your application threads run.
=> It calculates which regions contain the most dead data ("Garbage-First") and evacuates only those targeted zones.
ZGC (Z Garbage Collector - Generational)
=> Available in modern systems like Java 21
=> If an application thread attempts to read an object that is currently being relocated by the GC, a tiny hardware code hook ("Load Barrier") intercepts the call, instantly fixes the memory address pointer on the fly, and lets the application thread proceed immediately.
=> It processes concurrently alongside your active processing threads.
=> Because object relocation happens concurrently, your Stop-The-World pauses are consistently kept under 1 millisecond, completely independent of whether your overall heap allocation is 4GB or a massive 100GB.
How do Virtual Threads (Project Loom, Java 21+) differ from platform threads regarding memory footprint and OS thread scheduling? When should you not use them in a Spring Boot application?
Memory Footprint: Platform vs. Virtual
Platform Threads (OS Threads):
=> OS threads.
=> They are assigned a large, fixed memory allocation for their stack—typically 1 MB—regardless of whether they are actively executing code or sitting idle.
Virtual Threads (Project Loom):
=> These are lightweight, user-mode threads managed entirely by the Java Virtual Machine (JVM) runtime instead of the OS.
=> They start with a minimal footprint of only a few hundred bytes to a few kilobytes, dynamically resizing their stack in the JVM heap as needed.
The Scale Impact:
=> A system can comfortably run millions of concurrent virtual threads on the same hardware that would run out of memory (OOM) after spawning just a few thousand platform threads.
OS Thread Scheduling Mechanics
Platform Threads:
=> Scheduled directly by the Operating System kernel using pre-emptive scheduling.
=> When a thread performs a blocking I/O operation (like waiting for a database response), the OS forces a context switch. This requires a costly transition into kernel mode to save and swap CPU registers.
Virtual Threads:
=> Scheduled by the JVM using a private, internal ForkJoinPool acting as "carrier threads." => When a virtual thread hits a blocking I/O call (e.g., executing an HTTP call or a database query via JDBC), the JVM intercepts the block, detaches ("unmounts") the virtual thread from its carrier thread, and parked its stack on the heap. The carrier OS thread remains completely free to run other virtual threads.
When NOT to Use Virtual Threads in Spring Boot
=> Virtual threads are designed explicitly to scale applications bound by frequent, blocking network and disk I/O.
=> You should strictly avoid them or use caution in these three scenarios:
1. CPU-Bound Workloads
=> If your application code spends its time running heavy calculations, processing cryptography algorithms, or parsing massive JSON chunks, virtual threads offer no performance benefit. They cannot bypass raw hardware limits; CPU-bound tasks require dedicated, continuous physical cores (Platform Threads).
2. Synchronized Blocks and Pinned Threads
=> If your application or a third-party library relies heavily on the synchronized keyword or executes native code via JNI, a virtual thread will become pinned to its underlying OS carrier thread during blocking operations.
synchronized(lock) {
// TRAP: Virtual thread pins the OS thread here during this blocking call
String data = httpClient.fetchData();
}
=> When pinned, the JVM cannot unmount the thread, completely defeating the purpose of Loom and potentially starving the ForkJoinPool. Replace these blocks with explicit ReentrantLock instances
3. Over-use of ThreadLocals
=> Virtual threads support ThreadLocal variables, but because you will be spawning millions of them, keeping massive or complex objects bound to thread contexts will rapidly bloat your JVM heap.
=> If you must pass context across millions of virtual concurrent paths, migrate to Scoped Values (introduced in modern Java runtimes).
Implement a thread-safe, high-throughput custom Cache using ReadWriteLock or ConcurrentHashMap. Explain the difference between segment locking (older Java) and CAS (Compare-And-Swap) operations used now.
=> For a thread-safe, high-throughput custom cache, I would prefer ConcurrentHashMap in most cases because it provides excellent concurrency with minimal locking overhead.
=> However, depending on the access pattern, ReentrantReadWriteLock + HashMap can also be useful. Let me explain both approaches with code and then the internal difference.
Best Choice: Using ConcurrentHashMap (Recommended for High Throughput)
public class CustomCache<K, V> {
private final ConcurrentHashMap<K, V> cache = new ConcurrentHashMap<>();
private final long maxSize; // Optional: for LRU-like behavior
public CustomCache(long maxSize) {
this.maxSize = maxSize;
}
public V get(K key) {
return cache.get(key); // Lock-free, very high concurrency
}
public void put(K key, V value) {
cache.put(key, value);
// Optional: Simple size-based eviction
if (cache.size() > maxSize) {
// Evict one entry (can be improved with LinkedHashMap + removeEldestEntry)
cache.keySet().remove(cache.keySet().iterator().next());
}
}
public void remove(K key) {
cache.remove(key);
}
public void clear() {
cache.clear();
}
}
Why this is high-throughput:
=> Almost all operations are lock-free or use very fine-grained locking.
=> Supports atomic operations like computeIfAbsent(), merge(), compute().
Alternative: Using ReentrantReadWriteLock (When you need strong read-write separation)
public class CustomCacheWithRWLock<K, V> {
private final Map<K, V> cache = new HashMap<>();
private final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
private final ReentrantReadWriteLock.ReadLock readLock = rwLock.readLock();
private final ReentrantReadWriteLock.WriteLock writeLock = rwLock.writeLock();
public V get(K key) {
readLock.lock();
try {
return cache.get(key);
} finally {
readLock.unlock();
}
}
public void put(K key, V value) {
writeLock.lock();
try {
cache.put(key, value);
} finally {
writeLock.unlock();
}
}
}
When to choose this:
=> Read-heavy workload with occasional writes.
=> You need strong consistency guarantees.
Internal Working – Segment Locking (Old) vs CAS (Modern)
Old ConcurrentHashMap (Java 7 and below) – Segment Locking:
=> The map was divided into 16 segments by default (configurable).
=> Each segment had its own lock (ReentrantLock).
=> Multiple threads could write concurrently only if they were modifying different segments.
=> Limitation: Maximum concurrency was limited to the number of segments (16). If all threads hit the same segment → contention.
Modern ConcurrentHashMap (Java 8+) – CAS + Fine-grained Locking:
=> Uses array of buckets instead of segments.
=> For most operations, it uses CAS (Compare-And-Swap) — a lock-free, atomic CPU instruction.
=> Only when there is a collision or during resize, it uses synchronized on the individual bucket/node (very fine-grained).
=> Linked lists are converted to Red-Black Trees when bin size > 8 for better performance.
Key Advantages of CAS approach:
=> Much higher concurrency — multiple threads can update different buckets without any locking.
=> Better scalability on multi-core systems.
=> Lower overhead than segment locking.
Interview Tip:
=> From Java 8 onwards, ConcurrentHashMap moved from coarse-grained segment locking to a CAS-based approach, which significantly improved throughput under high contention.
=> Distributed Cache: For microservices, use Redis instead of local cache.
Trade-offs Summary
| Approach | Read Concurrency | Write Concurrency | Complexity | Best For |
|---|---|---|---|---|
| ConcurrentHashMap | Excellent | Excellent | Low | Most high-throughput cases |
| ReentrantReadWriteLock | Excellent | Low (single) | Medium | Read-heavy + strong consistency |
| Synchronized HashMap | Poor | Poor | Low | Simple cases only |
=> In my one of the project, I used ConcurrentHashMap for caching user session/rate-limit data because it gave us very high TPS during peak transaction hours. For cases where I needed explicit read-write separation, I used ReentrantReadWriteLock.
What is CAS (Compare And Swap) ?
=> CAS is a low-level atomic operation provided by the CPU/hardware that allows you to update a memory location only if its current value matches an expected value.
=> It is the foundation of lock-free and wait-free algorithms.
How CAS Works (Step by Step)
boolean compareAndSwap(MemoryLocation var, ExpectedValue, NewValue) {
if (var == ExpectedValue) { // Compare
var = NewValue; // Swap
return true; // Success
}
return false; // Failed (someone else modified it)
}
=> Atomicity Guarantee: The entire "Compare + Swap" happens as a single, indivisible operation at the hardware level. No other thread can see or modify the value in between.
Real Java Example
Real Java Example
=> Java provides CAS through the sun.misc.Unsafe class (internally) and exposed via Atomic classes:
AtomicInteger counter = new AtomicInteger(0);
// This is implemented using CAS internally
int oldValue = counter.get(); // Expected = 5
boolean success = counter.compareAndSet(5, 6); // If current == 5, set to 6
Common Usage Pattern (Optimistic Locking):
public void increment() {
int oldValue;
do {
oldValue = counter.get(); // Read current value
} while (!counter.compareAndSet(oldValue, oldValue + 1));
// Retry if another thread changed it in between
}
=> This is called Optimistic Concurrency — assume no conflict, and retry if there is one.
Why is CAS Important?
=> Allows lock-free algorithms → No thread blocking/suspension.
=> Much better performance under low-to-medium contention.
=> Avoids problems of traditional locking (deadlocks, priority inversion, context switching).
Used Heavily in:
=> AtomicInteger, AtomicLong, AtomicReference
=> ConcurrentHashMap (Java 8+)
=> ConcurrentLinkedQueue, ConcurrentSkipListMap
=> Lock-free data structures
Key tip to remember CAS:
=> Remember the shared google drive document, two writers writing on the same document, without any locks.
In ConcurrencyHashMap, when it will fallback to Synchronized on the single node ?
=> In ConcurrentHashMap (Java 8+), the primary mechanism is CAS (lock-free).
=> It falls back to synchronized on a single node (bucket head) only in contended scenarios — mainly when there is a collision (multiple keys in same bucket) and the operation cannot be completed purely with CAS.
When Exactly Does It Fall Back to synchronized?
=> When inserting second or subsequent nodes in the same bucket (hash collision).
=> During linked list traversal for put, replace, remove, compute, etc.
=> When resizing is in progress (some helper threads help with migration).
=> During treeifyBin() / untreeify() (converting list ↔ red-black tree).
=> When using advanced methods like compute(), merge(), computeIfAbsent() that require reading + writing atomically.
Key Point:
Even when it falls back to synchronized, the lock is held only on one bucket head node, not the entire map. This is why it scales so well.
| Scenario | Mechanism Used | Locking? | Concurrency Level |
|---|---|---|---|
| Empty bucket (first insert) | CAS | No lock | Extremely high |
| Simple update on existing key | CAS (optimistic) | Usually no | Very high |
| Hash collision (list/tree) | CAS → fallback synchronized on bin head | Yes (fine-grained) | High (per bucket) |
| Resize / Treeification | synchronized + helpers | Yes (limited) | Good |
Falling back to Synchronized on the node level means, it falling back to its previous (Before Java 8) segmental level locking approach ? or not ?
=> No, it is NOT falling back to the old segmental locking approach.
=> Pre-Java 8 (Segment Locking): Coarse-grained locking → Maximum 16 concurrent writers (limited scalability).
=> Java 8+ (CAS + Node-level synchronized): Much finer-grained locking → Theoretically, as many concurrent writers as there are buckets (usually 16 or 32 initially, grows with capacity).
| Aspect | Java 7 and Below (Segment Locking) | Java 8+ (Current Approach) | Winner |
|---|---|---|---|
| Locking Unit | Entire Segment (16 segments by default) | Single Bucket Head Node | Java 8+ |
| Max Concurrent Writers | 16 (one per segment) | Up to number of buckets (hundreds or thousands) | Java 8+ |
| Primary Mechanism | ReentrantLock on each segment | CAS (lock-free) most of the time | Java 8+ |
| Fallback Mechanism | Always locked at segment level | synchronized only on one bucket when needed | Java 8+ |
| Scalability | Limited | Excellent | Java 8+ |
| Memory Overhead | Higher (each segment had its own lock + HashMap) | Lower | Java 8+ |
Exact Condition for treeifyBin() in ConcurrentHashMap (Java 8+)
Treeification (converting a linked list into a Red-Black Tree) happens only if BOTH conditions are satisfied:
=> The number of nodes in that bucket (binCount) ≥ TREEIFY_THRESHOLD → which is 8.
=> The table capacity (length of the internal array) ≥ MIN_TREEIFY_CAPACITY → which is 64.
In ConcurrentHashMap, treeification of a bucket happens only when two conditions are met:
1. The bucket contains 8 or more nodes (TREEIFY_THRESHOLD = 8).
2. The internal table capacity is at least 64 (MIN_TREEIFY_CAPACITY = 64).
If the table size is still small (< 64), instead of treeifying, the map performs a resize() to spread out the entries and reduce collisions. This is a smart optimization.
When Would You Actually Use ReadWriteLock wrapping standard HashMap over ConcurrencyHashMap for cache?
=> You would only choose ReadWriteLock wrapping a standard HashMap in one specific architectural scenario: Massive, Atomic Bulk Mutations.
=> If your business logic requires you to periodically wipe the entire cache, refresh all keys simultaneously, or perform multiple dependent updates that must be isolated together, a ReadWriteLock is perfect.
// Example: Complete Cache Eviction/Refresh
public void refreshEntireCache(Map<K, V> newData) {
writeLock.lock();
try {
internalMap.clear(); // Atomic operation: No reader can see
internalMap.putAll(newData); // the map in a half-empty state
} finally {
writeLock.unlock();
}
}
Can you let me know what's the specific scenario with example where I can choose Reentrant lock over concurrencyHashMap ?
=> Use ReentrantLock / ReentrantReadWriteLock when you need to perform compound operations or multi-step logic that must be atomic across multiple data structures or multiple keys.
=> ConcurrentHashMap is excellent for single-key operations, but it does not provide atomicity for operations involving multiple keys or multiple maps.
Specific Scenarios Where ReentrantLock is Better
Here are the most practical scenarios (especially relevant for your Payment Wallet System):
1. Compound Actions / Multi-Key Operations (Most Common Reason)
Scenario: You need to update two or more related entries atomically.
Example in Payment Wallet:
=> Add same amount to Transaction Log Map
=> Update User Last Transaction Timestamp
This must be all or nothing to maintain consistency.
// Using ReentrantLock - Correct & Safe
private final ReentrantLock lock = new ReentrantLock();
private final Map<String, BigDecimal> balanceMap = new HashMap<>();
private final Map<String, Transaction> txnLogMap = new HashMap<>();
public void processTransaction(String userId, BigDecimal amount) {
lock.lock();
try {
BigDecimal current = balanceMap.get(userId);
if (current.compareTo(amount) < 0) {
throw new InsufficientBalanceException();
}
balanceMap.put(userId, current.subtract(amount)); // Update 1
txnLogMap.put(userId, new Transaction(...)); // Update 2
updateLastActivity(userId); // Update 3
} finally {
lock.unlock();
}
}
Why ConcurrentHashMap fails here?
Even if you use balanceMap and txnLogMap as ConcurrentHashMap, the three operations are not atomic. Another thread can see inconsistent state (e.g., balance deducted but transaction not logged).
2. Read-Heavy + Complex Write Logic
Use ReentrantReadWriteLock when:
=> Reads are very frequent (90%+)
=> Writes are infrequent but complex (need multiple steps or calculations)
Example :
private final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
private final Map<String, UserPortfolio> portfolioMap = new HashMap<>();
public UserPortfolio getPortfolio(String userId) {
rwLock.readLock().lock();
try {
return portfolioMap.get(userId);
} finally {
rwLock.readLock().unlock();
}
}
public void rebalancePortfolio(String userId) { // Complex write
rwLock.writeLock().lock();
try {
UserPortfolio p = portfolioMap.get(userId);
// Multiple complex calculations + updates
calculateNewAllocation(p);
updateMultipleAssets(p);
validateCompliance(p);
portfolioMap.put(userId, p);
} finally {
rwLock.writeLock().unlock();
}
}
=> ConcurrentHashMap cannot give you this read-write separation easily.
3. When You Need Custom Locking Strategy / Fairness
=> You want fair locking (FIFO order) to prevent thread starvation.
=> You need tryLock(timeout) or lockInterruptibly().
=> You need multiple Condition objects (different wait conditions on same lock).
4. When Protecting Multiple Data Structures Together
=> One lock protecting Map + List + Counter + Some Object.
=> Example: Updating cache + statistics + audit log together.
ReentrantLock and ReentrantReadWriteLock
=> Both are not same
1. ReentrantLock
=> It is a single exclusive lock (just like synchronized, but more powerful).
=> Only one thread can hold the lock at any time (whether reading or writing).
Use Case:
=> When you need to protect a critical section where only one thread should execute at a time.
private final ReentrantLock lock = new ReentrantLock();
public void updateBalance(String userId, BigDecimal amount) {
lock.lock(); // Only ONE thread can enter
try {
// complex logic: check balance, deduct, log transaction, update stats
} finally {
lock.unlock();
}
}
2. ReentrantReadWriteLock
=> It is a pair of locks in one object
=> ReadLock → Shared lock (Multiple threads can hold it simultaneously)
=> WriteLock → Exclusive lock (Only one thread can hold it)
=> Designed specifically for read-heavy workloads.
private final ReentrantReadWriteLock rwLock = new ReentrantReadWriteLock();
private final ReadLock readLock = rwLock.readLock();
private final WriteLock writeLock = rwLock.writeLock();
public BigDecimal getBalance(String userId) {
readLock.lock(); // Many threads can read at the same time
try {
return cache.get(userId);
} finally {
readLock.unlock();
}
}
public void updateBalance(String userId, BigDecimal amount) {
writeLock.lock(); // Only ONE thread can write + blocks all readers
try {
// update cache
} finally {
writeLock.unlock();
}
}
| Feature | ReentrantLock | ReentrantReadWriteLock |
|---|---|---|
| Number of Locks | 1 (Exclusive) | 2 (Read + Write) |
| Multiple Readers Allowed? | No | Yes |
| Reader + Writer at same time? | No | No (Writer blocks readers) |
| Best For | General mutual exclusion | Read-heavy workloads |
| Performance | Good | Better when reads >> writes |
| Fairness Support | Yes | Yes |
| Multiple Conditions | Yes (newCondition()) | Yes (on write lock) |
| Complexity | Simpler | Slightly more complex |
When to Choose Which? (Your Payment Wallet Context)
- Use ReentrantLock:
=> When operations are mostly writes or complex compound actions.
=> When you need strong consistency across multiple steps. - Use ReentrantReadWriteLock:
=> When you have high read traffic (e.g., checking wallet balance, transaction history, user profile) and occasional writes (e.g., after payment).
=> This gives maximum throughput.
Real Example for You: In a Payment Wallet:
- getBalance(), getTransactionHistory() → should use ReadLock (thousands of users checking at once).
- deductAmount(), addMoney() → should use WriteLock.
Reentrant vs synchronized
| Feature | synchronized | ReentrantLock | Winner |
|---|---|---|---|
| Type | Language keyword | Java Class (explicit) | - |
| Lock Acquisition | Implicit (automatic) | Explicit (lock() + manual unlock()) | ReentrantLock |
| Unlocking | Automatic (when block ends) | Manual (must call unlock() in finally) | synchronized |
| Reentrancy | Yes (same thread can re-acquire) | Yes (same thread can re-acquire) | Tie |
| Fairness | No (unfair) | Yes (new ReentrantLock(true)) | ReentrantLock |
| Try Lock with Timeout | Not possible | tryLock(long time, TimeUnit unit) | ReentrantLock |
| Interruptible Lock | Not possible | lockInterruptibly() | ReentrantLock |
| Multiple Conditions | Only one implicit condition (wait/notify) | Multiple Condition objects | ReentrantLock |
| Performance | Slightly better in low contention | Slightly slower in low contention, but more features | synchronized (low contention) |
| Exception Handling | Automatic release on exception | Must handle in finally block | synchronized |
| Monitoring / Debugging | Limited | Better (methods like getOwner(), getQueueLength()) | ReentrantLock |
When to Prefer ReentrantLock (Advanced / Production Cases)
Use ReentrantLock when you need:
- Fairness (FIFO ordering to prevent starvation).
- Timed lock acquisition (tryLock()).
- Interruptible locking.
- Multiple wait conditions (like different producer-consumer conditions).
- Better debugging in complex systems.
Real Example (Payment Wallet):
private final ReentrantLock walletLock = new ReentrantLock(true); // fair lock
public void transfer(String fromUser, String toUser, BigDecimal amount) {
if (walletLock.tryLock(2, TimeUnit.SECONDS)) { // Timeout
try {
// Check balance, deduct, credit, log transaction
} finally {
walletLock.unlock();
}
} else {
throw new LockAcquisitionTimeoutException();
}
}
=> NOTE : Always unlock in finally block with ReentrantLock, otherwise you can get deadlock.
Have you ever faced a situation where synchronized was not enough?
Yes. In one flow, we needed a fair lock to prevent starvation of certain payment processing threads, and we also needed a timeout while acquiring the lock. synchronized could not provide that, so we used ReentrantLock.
Explain Class Loaders
=> Class Loaders are a fundamental part of the Java Virtual Machine (JVM) responsible for dynamically loading Java classes into memory at runtime
=> Java uses a parent-first delegation model
=> JVM uses ClassLoaders to achieve lazy loading, conserving memory and optimizing application startup times
1. The Three Subsystem Phases (Class Loading Phases)
When the JVM encounters a class reference, the ClassLoader subsystem processes it through three structural phases:
Loading: Locates the binary bytecode (.class file) matching the fully qualified class name and generates a java.lang.Class object in the Metaspace.
Linking: Prepares the class for execution. This involves
=> Verification : (ensuring bytecode complies with JVM safety constraints)
=> Preparation (allocating memory for static fields and initializing them to default values)
=> Resolution (converting symbolic memory references into direct references).
Initialization: Executes static initializers and assigns the actual configured values to static variables.
2. The Built-in Hierarchy (Types of Class Loaders - BPA)
Java utilizes a strict, built-in hierarchy of ClassLoaders, where each loader has a dedicated location from which it pulls bytecode
| ClassLoader | Source Location | Description |
| Bootstrap ClassLoader | rt.jar, /lib directory | The foundational loader written in native C/C++. It loads core Java API classes (like java.lang.*, java.util.*). Calling .getClassLoader() on these returns null. |
| Platform ClassLoader (formerly Extension) | /lib/ext or modular runtimes | Loads platform-specific extensions or security frameworks built into the Java runtime environment. |
| Application ClassLoader (System) | Application -classpath / -cp | Loads the classes from your application's classpath, including your own compiled source code and third-party Maven/Gradle dependencies. |
3. Delegation Model (Most Important Concept) -Three Delegation Principles
A. Delegation Principle
=> When a ClassLoader receives a request to load a class, it delegates the request upward to its parent ClassLoader first, before attempting to locate the file itself.
=> Example: If your application needs java.lang.String, the Application ClassLoader passes the request up to the Platform, which passes it to the Bootstrap loader. The Bootstrap loader finds it in the core runtime libraries and loads it. Your application cannot accidentally override core Java classes.
B. Visibility Principle
=> A child ClassLoader can see classes loaded by its parent ClassLoader, but a parent ClassLoader cannot see classes loaded by its child.
=> Example: Classes loaded by the Bootstrap loader are visible to your application code, but the Bootstrap loader cannot see or reference custom DTOs sitting in your application classpath.
C. Uniqueness Principle
=> A class loaded by a parent ClassLoader will never be re-loaded by a child ClassLoader. This guarantees that a single, globally unambiguous definition of a class exists across the runtime context.
4. Production Reality & Custom ClassLoaders
Why would you bypass or extend the default Class Loader delegation model? When do you create Custom Class Loaders?
=> At a senior level, I rarely create custom class loaders because the default delegation model is well-designed for most enterprise applications.
=> However, there are specific scenarios where extending or bypassing the default behavior becomes necessary. Let me explain the why and when.
=> We can extend ClassLoader and override findClass() for dynamic loading (plugins, secure environments, etc.).
Key Scenarios Where I Would Create Custom Class Loaders
Here are the real-world cases I consider:
- Plugin Architecture / Extensible Systems
Example: Payment Gateway plugins in a Wallet system.
Different vendors provide JARs with same class names but different implementations.
Solution: Create a separate URLClassLoader for each plugin so they don’t interfere with each other. - Multiple Versions of Same Library (Dependency Hell)
When two modules need different versions of the same library (e.g., one uses Jackson 2.12, another needs 2.15).
Default class loader would cause conflicts. A custom class loader per module solves this. - Hot Deployment / Dynamic Code Loading
In low-downtime systems, load new versions of classes without restarting the JVM.
Used in some trading/fintech platforms. - Security / Sandboxing
Run untrusted code (user-uploaded scripts or third-party extensions) in a restricted environment.
Custom class loader can filter or instrument bytecode. - Loading Classes from Non-Standard Sources
Load classes from Database, Network, S3, or encrypted files.
Example: Loading customer-specific business rules stored in DB. - OSGi-like Modular Systems
Although rare now (Kubernetes + microservices replaced many), still relevant in some legacy large systems.
public class CustomPluginClassLoader extends URLClassLoader {
public CustomPluginClassLoader(URL[] urls, ClassLoader parent) {
super(urls, parent);
}
@Override
protected Class<?> loadClass(String name, boolean resolve) throws ClassNotFoundException {
// Option 1: Child-first delegation (bypass parent first)
Class<?> loaded = findLoadedClass(name);
if (loaded == null) {
try {
loaded = findClass(name); // Try child first
} catch (ClassNotFoundException e) {
loaded = super.loadClass(name, resolve); // then parent
}
}
return loaded;
}
// Override findClass() for custom byte loading if needed
}
Risks & Considerations (Senior Thinking)
=> Memory Leaks: Old class loaders must be properly GC’ed (very common production issue).
=> Class Cast Exceptions: Same class loaded by two different loaders are treated as different types.
=> Debugging Complexity: Stack traces and monitoring become harder.
=> Performance Overhead.
How Class Loaders are used in Spring Boot (Very Important for your level)
=> Spring Boot uses Tomcat’s WebAppClassLoader (or equivalent) for web applications.
=> Spring Boot Loader (in fat JARs) uses a special class loader to load classes from nested JARs.
=> DevTools uses a custom RestartClassLoader for fast reload during development.
=> In microservices, different services can load different versions of the same library thanks to class loader isolation.
ClassNotFoundException vs NoClassDefFoundError
1. ClassNotFoundException
=> Type: Checked Exception (extends ReflectiveOperationException → Exception)
=> When it occurs: When an application tries to load a class dynamically at runtime and the class is not found.
Common Causes:
=> Class.forName("com.example.SomeClass")
=> ClassLoader.loadClass()
=> Using reflection
=> Missing dependency in a plugin or dynamically loaded JAR
public void loadClassDynamically() {
try {
Class<?> clazz = Class.forName("com.example.SomeClass"); // Compiler forces handling
} catch (ClassNotFoundException e) {
// This block executes only at runtime if class is missing
// Handle gracefully - maybe log and disable feature
e.printStackTrace();
}
}
=> Key Point: This is recoverable in most cases. You can catch it and handle it (e.g., fallback logic).
2. NoClassDefFoundError
=> Type: Error (subclass of LinkageError → Error)
=> When it occurs: The class was available during compile time, but is missing at runtime.
Common Causes:
=> Missing JAR in production classpath (while it was present during compilation).
=> Version conflict / transitive dependency issues.
=> Static initializer (static {} block) throws an exception during class loading.
=> Class was removed after compilation (e.g., shading issues, partial deployment).
Example :
// Compile time: OK
SomeClass obj = new SomeClass();
// Runtime: NoClassDefFoundError if SomeClass is missing
| Feature | ClassNotFoundException | NoClassDefFoundError |
|---|---|---|
| Category | Checked Exception (Thrown at Runtime) | Error (Unrecoverable) (Thrown at Runtime) |
| Occurs When | Dynamic loading fails (forName, reflection) | Class was present at compile-time, missing at runtime |
| Recoverable? | Yes (can be caught & handled) | Generally No (indicates serious deployment issue) |
| Common Scenario | Plugin systems, custom class loaders | Missing JARs, classpath issues in production |
| Exception vs Error | Exception (programmer can handle) | Error (usually JVM-level problem) |
| Root Cause | Class not found by class loader | Linkage error during class initialization |
Most Common Causes in My Project (Payment Wallet System) for NoClassDefFoundError
- Missing Transitive Dependencies in Docker (Very Common)
- Multi-module Maven dependency not properly resolved
- Version conflict between Spring Boot / Spring Cloud dependencies
- mvn clean package not done properly before Docker build
- In Docker, only target/*.jar is copied but some dependencies are not included
Virtual Threads (Java 19/21): How do Virtual Threads differ from platform threads? How do they improve throughput for I/O-bound Spring Boot applications?
=> Virtual Threads (Java 21 – Project Loom) are a game-changing feature, especially for I/O-bound applications like Spring Boot microservices.
Core Difference: Platform Threads vs Virtual Threads
Platform Threads (Traditional):
=> These are the classic OS-managed threads we have been using since Java 1.
=> Each platform thread maps 1:1 with an OS Kernal thread.
=> Creating a platform thread is expensive (memory + OS call).
Virtual Threads (Lightweight Threads):
=> Introduced in Java 21 (preview in 19/20).
=> They are managed by the JVM, not the OS.
=> Extremely cheap to create — you can create millions of virtual threads with very low overhead.
=> They follow M:N mapping
| Aspect | Platform Threads | Virtual Threads |
|---|---|---|
| Managed By | Operating System | JVM |
| Memory Footprint | High (~1–2 MB per thread) | Very Low (few KB) |
| Creation Cost | Expensive | Very Cheap |
| Thread Limit | Few thousand (practical limit) | Millions possible |
| Blocking Behavior | Blocks OS thread | Does not block carrier thread |
| Use Case | CPU-bound + I/O-bound | Best for I/O-bound |
Platform Thread: [Java Thread] ── 1:1 ──> [OS Kernel Thread] (Heavy)
Virtual Thread: [VThread1]
[VThread2] ── M:N ──> [Carrier Thread] ──> [OS Kernel Thread]
[VThread3]
-----------------------------------------------------------------------------------------------------------------------
Virtual Threads → M:N Mapping
[ Application Code ]
↓
┌──────────────────────────────┐
│ Multiple Virtual Threads │ ← Thousands / Millions (JVM Managed)
│ (Very Lightweight) │
└──────────────┬───────────────┘
│ M : N
▼
┌──────────────────────────────┐
│ Few Carrier Threads │ ← Platform Threads (Limited)
│ (ForkJoinPool) │
└──────────────┬───────────────┘
│ 1 : 1
▼
┌──────────────────────────────┐
│ OS Kernel Threads │ ← Actual OS Threads
└──────────────────────────────┘
-----------------------------------------------------------------------------------------------------------------------
How Virtual Threads Improve Throughput in I/O-Bound Apps
The real magic happens during blocking I/O (DB calls, Redis, External APIs, Kafka, etc.):
How Platform Threads Handle I/O Blocking:
- The OS Kernel Thread enters BLOCKED / WAITING state.
- It cannot execute any code until the I/O operation completes.
- The OS Scheduler removes the blocked thread from the CPU and loads a different ready thread onto the same CPU core.
- This involves saving the state (registers, program counter, etc.) of the old thread and loading the new one → this is costly in terms of CPU cycles.
How Virtual Threads Handle I/O Blocking:
When a Virtual Thread performs a blocking I/O operation (e.g., waiting for DB response, Redis, external API call, etc.):
- The JVM detects the blocking call.
- It unmounts the Virtual Thread from its current Carrier Thread.
- The Virtual Thread’s execution state (stack frames) is parked in heap memory.
- The Carrier Thread (which is a Platform Thread) becomes free immediately.
- The Carrier Thread then picks up and runs another Virtual Thread from the queue.
- When the I/O response arrives (network packet or DB result), the JVM remounts the original Virtual Thread onto an available Carrier Thread and resumes execution exactly from where it left off.
Key Advantage:
- The Carrier Thread is never blocked during I/O operations.
- No expensive OS-level context switches are needed for every blocking call.
- This allows a small number of Carrier Threads to efficiently handle a very large number of Virtual Threads.
In my TCS project, the most performance-critical part was the nightly Spring Batch job in the fiber-logic-service. This job processed large volumes of site audit data, performed complex route calculations, checked inventory/port availability, and assigned fiber ports. Initially, it was taking more than 2 hours, and sometimes failing under load.
We had already optimized it significantly by:
- Using Redis caching for master data
- Reducing database roundtrips
- Adding parallel processing using CompletableFuture in safe, independent steps of the batch processing
While working on this optimization, I realized that a major bottleneck was the traditional platform thread pool used by Spring Batch and @Async methods. Since the job involved a lot of I/O operations (calling other microservices via Kafka, querying Oracle DB, Redis calls, etc.), many threads were getting blocked waiting for responses.
Improvement in I/O-bound applications: Virtual Threads dramatically improve throughput because we can create thousands (or even millions) of them cheaply. In the context of my batch job, instead of tuning a limited thread pool and worrying about thread exhaustion, we could process many more audit records concurrently with much lower memory usage and simpler code.
Although the production codebase was on Java 11 + Spring Boot 2.7, this experience with Virtual Threads has prepared me well for modern Java 21+ projects. I’m confident it would give significant gains in high I/O scenarios like payment processing or batch reconciliation systems.
| Term | What it is | Managed By | Backed by OS Thread? |
|---|---|---|---|
| Virtual Thread | Lightweight user thread | JVM | No |
| Carrier Thread | A Platform Thread that "carries" virtual threads | JVM | Yes |
| Platform Thread | Traditional Java thread (Thread) | JVM + OS | Yes |
| OS Kernel Thread | Actual OS-level thread | Operating System | - |
CompletableFuture: How would you orchestrate three asynchronous microservice calls where the third call depends on the combined results of the first two?
In my Verizon Site Audit and Transport Fiber Logic System project at TCS, I used CompletableFuture extensively while optimizing the nightly Spring Batch job in the fiber-logic-service.
A typical scenario was during bulk fiber port assignment and route calculation. For every site audit record, I needed to:
- Call Service A: Fetch site master data + feasibility details
- Call Service B: Check real-time inventory/port availability
- Call Service C: Perform final route calculation + port assignment (this depended on results of A & B)
Here’s how I orchestrated it using CompletableFuture:
public void processSiteAudit(SiteAuditRecord record) {
// 1. Run Service A and Service B in parallel
CompletableFuture<FeasibilityData> feasibilityFuture = CompletableFuture.supplyAsync(
() -> siteAuditService.getFeasibilityData(record.getSiteId()), executor);
CompletableFuture<InventoryData> inventoryFuture = CompletableFuture.supplyAsync(
() -> inventoryService.checkPortAvailability(record.getSiteId()), executor);
// 2. Wait for both parallel calls and then call Service C
CompletableFuture.allOf(feasibilityFuture, inventoryFuture)
.thenApply(v -> {
FeasibilityData feasibility = feasibilityFuture.join();
InventoryData inventory = inventoryFuture.join();
// Business validations
validateFeasibilityAndInventory(feasibility, inventory);
return new CombinedContext(feasibility, inventory);
})
.thenAcceptAsync(combinedContext -> {
// Service C - Depends on both A & B
routeCalculationService.performRouteAndPortAssignment(record, combinedContext);
}, executor)
//Enforce strict production SLA timeouts
.orTimeout(3, TimeUnit.SECONDS)
.exceptionally(ex -> {
log.error("Failed processing site {}", record.getSiteId(), ex);
// Retry logic or compensation
return null;
});
}
Why this design?
- supplyAsync() + custom executor → Controlled parallelism.
- allOf() → Efficiently waits for multiple independent calls.
- thenApply() / thenAcceptAsync() → Clean chaining for dependent steps.
- Proper exception handling.
This approach helped me reduce the overall batch processing time significantly (from 2+ hours to 35-45 minutes) by safely parallelizing independent I/O calls.
In modern Spring Boot 3.x projects, I would further enhance this using Virtual Threads for even better throughput.
We can use thenCombine(), but in the scenario I showed, allOf() + thenApply() is more common and readable.
| Method | Use Case | Best When |
|---|---|---|
thenCombine() | Combine exactly two futures | Only 2 async calls |
allOf() + thenApply() | Combine 2 or more futures | 3+ async calls (most real cases) |
CompletableFuture<FeasibilityData> feasibilityFuture = CompletableFuture.supplyAsync(...);
CompletableFuture<InventoryData> inventoryFuture = CompletableFuture.supplyAsync(...);
// Using thenCombine for two parallel calls
feasibilityFuture.thenCombine(inventoryFuture, (feasibility, inventory) -> {
// Combine logic
validate(...);
return new CombinedContext(feasibility, inventory);
})
.thenAcceptAsync(combined -> {
// Third dependent call
routeCalculationService.performRouteAndPortAssignment(record, combined);
}, executor);
ClassNotFoundException vs NoClassDefFoundError
If I encounter high latency due to long Stop-the-World pauses, I follow a systematic approach
My Diagnosis Steps:
- Enable Detailed GC Logging First
I immediately add these JVM flags and restart the application (or do it dynamically if possible):This gives me complete visibility into every GC event.text
-Xlog:gc*:file=gc.log:time,level,tags - Connect JVisualVM and Monitor Live
I connect JVisualVM to the running JVM and do the following:
Go to the Monitor tab → Check overall Heap usage, Old Gen occupancy, and Metaspace.
Go to the Memory tab → Look at the GC graph to see frequency and duration of Stop-the-World pauses (especially Full GCs).
Go to the Threads tab → Check if many threads are in BLOCKED or WAITING state during pauses.
Take a Heap Dump and a Thread Dump when pause is happening. - Analyze the GC Behavior in JVisualVM
I look for these specific patterns:
Very high Promotion Rate from Young Gen to Old Gen.
Frequent Full GCs with long pause times.
Sudden spikes in Old Generation usage.
Long Remark or Cleanup phases in G1GC. - Take Heap Dump and Analyze While still in JVisualVM, I take a heap dump and open it in Eclipse MAT to identify if there are any memory leaks or objects retaining large amounts of memory.
- Correlate with Application Logs I check application logs at the exact time of long pauses to see which business operation was running (e.g., batch job, high traffic endpoint, etc.).
Production Safety Practice
I always configure the JVM in production with:
-XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/app/dumps/This way, if there's a serious issue (like OutOfMemoryError), a .hprof file is automatically generated. I then analyze it offline using JVisualVM or Eclipse MAT. This approach minimizes risk to the live system.
Correlate with Application Logs I check application logs at the exact timestamps of long pauses to identify which operation (batch job, API endpoint, etc.) triggered the issue.
Difference between G1GC and ZGC
| Aspect | G1GC | ZGC |
|---|---|---|
| Pause Time Goal | Targets < 200ms | Targets < 10ms |
| STW Pauses | Yes (Remark, Full GC) | Almost none |
| Best For | Balanced workloads | Ultra low-latency applications |
In my TCS Verizon project, the nightly batch job sometimes faced long pauses. I used the above approach — GC logs + offline heap dump analysis using JVisualVM
If I were working on a real-time system (like payment processing), I would prefer ZGC for near-zero pause times.
Concurrent Collections : How does ConcurrentHashMap achieve thread safety without locking the entire map? Compare its internal mechanics to Hashtable.
ConcurrentHashMap is one of the most important concurrent collections in Java. It achieves high thread safety and scalability by avoiding locking the entire map, unlike the older Hashtable
How ConcurrentHashMap Achieves Thread Safety (Java 8+)
Instead of using a single global lock, ConcurrentHashMap uses a fine-grained locking + CAS strategy:
- The internal array is divided into buckets.
- For most operations, it uses CAS (Compare-And-Swap) — a lock-free atomic operation.
- When there is a hash collision (multiple entries in one bucket), it falls back to locking only the head node of that specific bucket using synchronized on the Node.
- If a bucket’s linked list grows beyond 8 nodes (and table size ≥ 64), it converts to a Red-Black Tree for O(log n) performance. This treeification also happens under the bucket-level lock.
This design allows multiple threads to write to different buckets simultaneously, giving excellent concurrency.
Key methods like computeIfAbsent(), merge(), and putIfAbsent() are atomic at the bucket level.
| Aspect | Hashtable | ConcurrentHashMap (Java 8+) |
|---|---|---|
| Locking Strategy | Locks the entire map for every operation | Fine-grained (bucket-level) + CAS |
| Concurrency | Very Poor (only one thread at a time) | Very High (multiple threads on different buckets) |
| Performance | Slow under multi-threaded access | Excellent scalability |
| Internal Structure | Single lock on the whole object | Array of buckets with CAS + synchronized on node |
| Null Keys/Values | Allows null keys and values? No | Does not allow null keys/values |
| Iterator Behavior | Fail-fast | Weakly consistent (does not throw ConcurrentModificationException) |
| Use Case | Only for very simple, low-concurrency cases | Default choice for high-throughput systems |
Hashtable is essentially a synchronized HashMap — it uses the synchronized keyword on almost every method, which becomes a major bottleneck.
Real Project Example (TCS Verizon Project)
"In the fiber-logic-service at TCS, we used ConcurrentHashMap heavily for caching master data (site inventory, port availability) that was accessed by the nightly Spring Batch job and multiple API threads.
Using Hashtable would have created severe contention during peak batch processing. With ConcurrentHashMap, multiple threads could read and update different site entries concurrently without blocking the entire cache. This was one of the key reasons we could reduce batch processing time from 2+ hours to 35-45 minutes.
=> In modern applications, I always prefer ConcurrentHashMap over any synchronized map for better scalability. I only use ReentrantReadWriteLock when I need explicit read-write separation across multiple data structures.
NOTE :
ConcurrentHashMap does NOT allow null keys or null values.
| Operation | HashMap | ConcurrentHashMap | Reason |
|---|---|---|---|
put(null, value) | Allowed | Throws NullPointerException | Not allowed |
put(key, null) | Allowed | Throws NullPointerException | Not allowed |
get(null) | Allowed | Throws NullPointerException | Not allowed |
containsKey(null) | Allowed | Throws NullPointerException | Not allowed |
Unlike HashMap, ConcurrentHashMap does not permit null keys or null values. This design choice avoids ambiguity in concurrent scenarios. If I need to store null-like values, I usually use a sentinel object (e.g., a special EMPTY constant).
Spring Boot Internals: Explain the lifecycle of a Spring Bean. How does @SpringBootApplication use auto-configuration underneath?
1. Spring Bean Lifecycle in Spring Boot (Easy to Remember)
In Spring Boot, the bean lifecycle follows these main phases:
- Instantiation Spring creates the bean object using the constructor.
- Dependency Injection Spring injects all dependencies (Constructor Injection is preferred in modern Spring Boot).
- Aware Interfaces (Optional) If the bean implements any *Aware interfaces (like ApplicationContextAware, BeanNameAware), Spring calls them.
- Pre-Initialization (BeanPostProcessor) Spring calls all BeanPostProcessor beans → postProcessBeforeInitialization().
- Initialization
Spring executes initialization logic in this order:
- @PostConstruct annotated method
- afterPropertiesSet() (if implements InitializingBean)
- Custom initMethod (if defined in @Bean)
- Post-Initialization (BeanPostProcessor) Spring calls postProcessAfterInitialization() on BeanPostProcessors (commonly used by AOP, proxies, etc.).
- Bean is Ready The bean is fully initialized and put into the ApplicationContext for use.
- Destruction (on Application Shutdown)
- @PreDestroy annotated method
- destroy() (if implements DisposableBean)
- Custom destroyMethod
Most Important Points for Interview (Easy Version)
- Creation Phase: Constructor → Dependency Injection
- Initialization Phase: @PostConstruct → InitializingBean → initMethod
- Destruction Phase: @PreDestroy → DisposableBean → destroyMethod
- BeanPostProcessors run before and after initialization (very important for proxies, validation, etc.)
Pro Tip I follow in projects: I prefer using Constructor Injection + @PostConstruct for initialization logic because it is clean and Spring Boot friendly.
2. How @SpringBootApplication Enables Auto-Configuration
@SpringBootApplication is a meta-annotation that does three important things:
@SpringBootApplication =
@SpringBootConfiguration
+ @EnableAutoConfiguration
+ @ComponentScanThe Magic – @EnableAutoConfiguration:
- It imports AutoConfigurationImportSelector class.
- This selector scans the classpath for files named META-INF/spring.factories (or spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports in newer versions).
- It finds all auto-configuration classes (e.g., DataSourceAutoConfiguration, RedisAutoConfiguration, KafkaAutoConfiguration, etc.).
- Each auto-configuration class is loaded conditionally using annotations like:
- @ConditionalOnClass
- @ConditionalOnProperty
- @ConditionalOnMissingBean
- @ConditionalOnBean, etc.
This is why just adding spring-boot-starter-web or spring-boot-starter-data-jpa automatically configures Tomcat, DataSource, EntityManager, etc.
In my TCS Verizon project, I heavily relied on this mechanism. We used multiple Spring Boot starters (Web, Batch, Kafka, Redis, JPA), and auto-configuration handled most of the boilerplate setup. I only overrode specific beans when we needed custom configurations (e.g., custom ThreadPoolTaskExecutor or RedisTemplate).
=> We used multiple Spring Boot starters (Web, Batch, Kafka, Redis, JPA), and auto-configuration handled most of the boilerplate setup. I only overrode specific beans when we needed custom configurations (e.g., custom ThreadPoolTaskExecutor or RedisTemplate)
JPA/Hibernate N+1 Problem: What causes the N+1 select problem? What are the three best ways to resolve it in a production Spring Boot app?
What Causes the N+1 Problem?
It happens when you fetch a parent entity along with its associated child collections, but Hibernate executes:
- 1 query to fetch the parent entities, and
- N additional queries (one for each parent) to fetch the associated child collections.
Example in Payment Wallet / Verizon-like project:
// Bad - Triggers N+1
List<User> users = userRepository.findAll(); // 1 query
for(User user : users) {
System.out.println(user.getTransactions().size()); // N queries (one per user)
}This results in 1 + N queries instead of 1, causing severe performance degradation under load.
Three Best Ways to Solve N+1 in Production Spring Boot Apps
Here are the three most effective solutions I use in real projects, in order of preference:
1. Fetch Join (Most Recommended)
@Query("SELECT u FROM User u LEFT JOIN FETCH u.transactions WHERE u.id IN :ids")
List<User> findUsersWithTransactions(@Param("ids") List<Long> ids);Cons: Can lead to Cartesian product with multiple collections.
2. EntityGraph (Clean & Modern Approach)
@EntityGraph(attributePaths = {"transactions", "transactions.details"})
List<User> findAllWithTransactions();Supports SUBGRAPH for nested relationships.
My go-to method in Spring Boot 2.7+ projects.
3. Hibernate Batch Fetching (Global / Per-Entity)
# application.properties
spring.jpa.properties.hibernate.default_batch_fetch_size=100
# OR per entity
@BatchSize(size = 100)
@OneToMany(mappedBy = "user")
private List<Transaction> transactions;Reduces N queries to N/batch_size.
Production Best Practices I Follow
- Always use EntityGraph or Fetch Join for critical endpoints.
- Enable spring.jpa.properties.hibernate.generate_statistics=true in dev/staging to detect N+1 issues early.
- Use DTO projections (via JPQL Constructor expressions or MapStruct) when I don’t need the full entity graph.
- Monitor with Actuator + Micrometer + Database query logs in production.
In my TCS Verizon project, the batch job was suffering from severe N+1 issues while loading site master data + port inventory. We solved it using @EntityGraph and default_batch_fetch_size, which contributed significantly to reducing the job time from 2+ hours to 35-45 minutes.
_____________________________________________________________________
What are the multiple ways to create threads ?
1. Traditional Ways (Not Recommended in Production)
=> Extending the Thread class
=> Implementing the Runnable interface
// Example - Not recommended
Thread t1 = new Thread(() -> System.out.println("Hello"));
t1.start();
=> These are simple but not suitable for production because they don’t provide thread pooling, proper resource management, or graceful shutdown
2. Executor Framework (Recommended in Production)
=> This is the standard way to create and manage threads in enterprise applications.
ExecutorService executor = Executors.newFixedThreadPool(10);
// or better
ThreadPoolTaskExecutor customExecutor = new ThreadPoolTaskExecutor();
3. CompletableFuture (Modern & Clean Way)
CompletableFuture.supplyAsync(() -> performTask(), executor);
4. Virtual Threads (Java 21+ - Latest Approach)
ExecutorService virtualExecutor = Executors.newVirtualThreadPerTaskExecutor();
Real Project Example (TCS Verizon)
=> In my Verizon Site Audit and Transport Fiber Logic System project at TCS, we had a nightly Spring Batch job that needed to process thousands of site records.
=> I used CompletableFuture with a custom ThreadPoolTaskExecutor (ExecutorService) to run feasibility check and inventory check in parallel for each record. This helped us significantly reduce the batch processing time from 2 hours to 35-45 minutes.
Whats the difference between ExecutorService and CustomExecutor ?
=> ExecutorService is not a class, it is standard java interface
=> CustomExecutor means our own implementation for the interface ExecutorService
| Term | What it is | Type |
|---|---|---|
| ExecutorService | A standard Java Interface | Interface |
| Custom Executor | Your own configured implementation of ExecutorService | Implementation |
What if we don't have custom implementation of ExecutorService ?
=> Without custom implementation, Spring Boot just provides SimpleAsyncTaskExecutor
SimpleAsyncTaskExecutor (Provided by default) - behavior
=> Creates a new thread every time (no pooling).
=> Can cause high memory usage and OutOfMemoryError under load.
Custom Executor (What I prefer) - behavior
@Bean(name = "taskExecutor")
public ExecutorService taskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10); // Minimum threads always alive
executor.setMaxPoolSize(50); // Maximum threads during peak load
executor.setQueueCapacity(100); // How many tasks can wait
executor.setThreadNamePrefix("App-Executor-");
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
//The above line handles the task rejection - CallerRunsPolicy
executor.initialize();
return executor;
}
I always prefer creating a custom ExecutorService (using ThreadPoolTaskExecutor) instead of relying on Spring’s default executor because:
=> It gives full control over the number of threads (corePoolSize, maxPoolSize).
=> We can define a proper queue to handle backpressure.
=> We can set a good rejection policy (CallerRunsPolicy).
=> It allows graceful shutdown
=> In my TCS Verizon project, I created a custom batchExecutor for the Spring Batch job. This helped us safely run multiple parallel calls using CompletableFuture without overwhelming the system
Have you worked on ExecutorService or CompletableFuture ?
=> Yes, I have worked extensively on both ExecutorService and CompletableFuture in my previous project.
=> In my TCS Verizon Site Audit and Transport Fiber Logic System project, we had a very heavy nightly Spring Batch job that needed to process thousands of site audit records efficiently.
=> Each record required multiple independent I/O calls — for example:
=> Fetching site feasibility data (fetch from site-audit-service)
=> Checking port/inventory availability (fetch from inventory-service)
=> These calls were independent, but the third service (fiber-logic-service) depended on these 2 service calls. So I used CompletableFuture along with a custom ExecutorService to run the 2 service calls in parallel and then combine for the 3rd service call
How I Implemented It:
Step 1: Configuration (Custom Executor implementation)
@Configuration
@EnableAsync
public class AsyncConfig {
@Bean(name = "batchExecutor")
public ExecutorService batchExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(15); // Minimum threads always alive
executor.setMaxPoolSize(40); // Maximum threads during peak load
executor.setQueueCapacity(200); // How many tasks can wait
executor.setThreadNamePrefix("App-Executor-");
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
//The above line handles the task rejection - CallerRunsPolicy
executor.initialize();
return executor; // Returning as ExecutorService
}
}
Step 2: Usage in Service
@Service
public class FiberLogicBatchService {
@Autowired
@Qualifier("batchExecutor")
private ExecutorService batchExecutor; // Injected as ExecutorService
public void processSiteRecord(SiteAuditRecord record) {
// Running two independent calls in parallel
CompletableFuture<FeasibilityData> feasibilityFuture =
CompletableFuture.supplyAsync(() ->
siteService.getFeasibilityData(record.getSiteId()),
batchExecutor);
CompletableFuture<InventoryData> inventoryFuture =
CompletableFuture.supplyAsync(() ->
inventoryService.checkPortAvailability(record.getSiteId()),
batchExecutor);
// Combine both results
CompletableFuture.allOf(feasibilityFuture, inventoryFuture)
.thenAcceptAsync(v -> {
// This runs after both calls are done
FeasibilityData fd = feasibilityFuture.join();
InventoryData id = inventoryFuture.join();
routeCalculationService.performAssignment(record, fd, id);
}, batchExecutor);
}
}
=> We created a custom thread pool using ThreadPoolTaskExecutor.
=> We return it as ExecutorService (this is the standard practice).
=> We inject it using @Qualifier("batchExecutor").
=> We pass this executor to CompletableFuture.supplyAsync(..., executor) to control which thread pool runs the task.
Why batchExecutor() instead of taskExecutor() ?
=> There is no technical difference in how they work.
=> The name is just for readability and clarity.
=> taskExecutor → General purpose (used for many things).
=> batchExecutor → Clearly tells that this executor is mainly used for Batch Job processing.
=> You can use any name. I used batchExecutor because it was related to the Spring Batch job in the TCS Verizon project.
How we are handling task rejection ?
executor.setRejectedExecutionHandler(new ThreadPoolExecutor.CallerRunsPolicy());
=> The above line handles the task rejection - CallerRunsPolicy
=> This line tells the ThreadPoolExecutor what to do when a new task is submitted, but:
=> All threads are busy (maxPoolSize reached), and
=> The queue is also full (queueCapacity reached)
=> This situation is called task rejection.
=> CallerRunsPolicy is one of the Rejection Policies in Java.
=> Instead of rejecting (discarding) the task or throwing an exception, It runs the task on the caller thread (the thread that submitted the task).
Simple Example:
Imagine your thread pool is fully loaded (all 50 threads busy + queue full).
=> A new task comes → Normally it would be rejected.
=> With CallerRunsPolicy → The current thread (the one trying to submit the task) will execute the task itself synchronously.
=> This acts as a self-throttling / backpressure mechanism.
Why do we use CallerRunsPolicy() in Production?
Advantages:
=> Prevents task loss.
=> Slows down the incoming request rate naturally (because caller thread is busy executing the task).
=> Very useful in high-traffic systems like Payment processing.
Disadvantages:
=> The caller thread gets blocked → can increase response time for that particular request.
Final Recommendation
In most Spring Boot production applications (especially Payment/Fintech), CallerRunsPolicy() is the best and safest choice.
CompletableFuture must require the ExecutorService?
=> No, CompletableFuture does not strictly require an ExecutorService, but in production, we should always pass one.
1. Without passing Executor (Default Behavior)
CompletableFuture.supplyAsync(() -> fetchData()); // No Executor passed
=> It uses the common ForkJoinPool (ForkJoinPool.commonPool()) internally.
=> This is the same pool used by parallelStream().
=> Risk: All CompletableFuture calls in the entire JVM share this pool → can cause thread starvation in high-traffic applications.
2. With ExecutorService (Recommended in Production - best practice)
CompletableFuture.supplyAsync(() -> fetchData(), executorService);
=> This gives better control over thread count, queue size, and prevents one heavy operation from starving the entire application.
Streams Performance: How does a .parallelStream() decide thread allocation? What are the risks of using it in a high-traffic Spring Boot application?
=> .parallelStream() uses the Fork/Join framework under the hood, specifically the common ForkJoinPool.
How .parallelStream() Decides Thread Allocation
A. Spliterator Data Partitioning
=> Before any threads are assigned, the source collection is evaluated by a Spliterator.
=> The Spliterator recursively splits the data structure in half (using a divide-and-conquer strategy) until the chunks are small enough to be processed independently.
=> Note: Array-based collections (like ArrayList) split perfectly with zero overhead because their sizes are known and elements can be accessed via an index.
=> Link-based structures (like LinkedList) split terribly because the JVM has to traverse the nodes sequentially just to split them, diminishing parallel advantages.
B. The Common Pool Sizing Strategy
=> By default, the JVM automatically configures the parallelism level of the common pool using a hardware calculation:
Parallelism Level = Available CPU Cores - 1
=> If your production server has 8 CPU cores, the common pool will assign 7 worker threads, plus the main application thread that initiated the stream, utilizing a total of 8 active threads.
=> The threads execute tasks using a Work-Stealing Algorithm
=> Work-Stealing Algorithm : if Thread A finishes processing its allocated data chunks faster than Thread B, it will dynamically reach into the back of Thread B's queue and "steal" pending chunks to keep CPU utilization optimized.
Risks of Using parallelStream() in High-Traffic Spring Boot Applications
1. Thread Starvation (Biggest Risk)
=> Since all parallel streams share the common ForkJoinPool, heavy usage in one request can starve other requests.
=> In a high-traffic application, this can cause severe latency spikes.
2. Uncontrolled Thread Creation
=> No easy way to limit or configure the thread pool size per operation.
=> Can lead to high CPU usage and context switching.
3. Thread Safety Issues
=> If the operations inside the stream are not thread-safe (e.g., updating shared mutable state), you will get race conditions.
4. Blocking Operations Problem
=> If any operation inside the parallel stream does blocking I/O (DB call, external API, Redis), it can block the ForkJoin worker threads, reducing overall throughput.
5. Ordering & Predictability
=> Loses ordering guarantee unless you use forEachOrdered() (which is slower).
6. Debugging Difficulty
=> Stack traces become harder to read due to fork-join execution.
My Recommendation / Best Practice
In high-traffic Spring Boot applications, I avoid using parallelStream() in request threads.
Instead, I prefer:
=> Custom ThreadPoolExecutor with proper configuration.
=> CompletableFuture with a managed executor.
=> Virtual Threads (Java 21+) for I/O-bound work.
=> I only use parallelStream() for CPU-intensive, independent, short-lived operations (like bulk data processing in batch jobs) where I can afford to use the common pool."
Short-Circuiting: Explain how findFirst() vs findAny() behaves in a parallel stream.
=> Both findFirst() and findAny() are short-circuiting terminal operations — meaning they stop processing the stream as soon as they find a matching element.
=> However, their behavior differs significantly when used with parallel streams."
Key differences
| Method | Sequential Stream | Parallel Stream Behavior | Performance in Parallel |
|---|---|---|---|
findFirst() | Returns the first matching element | Respects encounter order (returns leftmost element) | Slower (needs coordination) |
findAny() | Returns any matching element | Returns any matching element (no order guarantee) | Faster |
findFirst() (Deterministic & Ordered):
=> This method guarantees that it will always return the exact first element in the encounter order of the stream, regardless of whether it is run sequentially or in parallel.
=> Guarantees to return the first element in the stream’s encounter order.
=> In parallel streams, even though processing happens in multiple threads, the framework ensures order is maintained.
=> This coordination makes it slightly slower in parallel execution.
findAny() (Non-deterministic & Unordered):
=> This method completely abandons encounter order. It returns whichever matching element is processed first in real-world time by any of the concurrent threads.
=> Returns any element that matches the condition.
=> In parallel streams, whichever thread finds a matching element first can return it immediately.
=> This makes it more performant in parallel scenarios.
Real Project Example (TCS Verizon)
"In the nightly batch job of the Verizon Fiber Logic System, I had to check if any site had a critical port availability issue.
I used findAny() with parallelStream() because order didn’t matter — I just needed to know if any critical site existed. This gave better performance.
If I needed the first site in sequence (e.g., for priority processing), I would use findFirst()."
Summary
=> Use findFirst() when order matters, and findAny() when performance matters more in parallel streams.
=> If you are using parallel streams and simply need to verify if an element exists or grab a valid match, always choose findAny(). Only use findFirst() if the absolute position of the element matters to the business logic.
How can the findFirst in parallelstream determines which is the earliest while doing parallel processing ?
The Core Mechanic: The Fork-Join Task Tree
When you invoke a parallel stream on an array or ArrayList, the JVM's Spliterator recursively cuts the collection in half. This creates a binary Task Tree structure in memory. Each task knows exactly who its Left Child (earlier elements) and Right Child (later elements) are.
Let’s map your example list [3, 7, 11, 40, 15, 22, 13, 30] into 4 parallel threads:
[ Root Task ]
/ \
[Task Left] [Task Right]
/ \ / \
Thread 1 Thread 2 Thread 3 Thread 4
[3, 7] [11, 40] [15, 22] [13, 30]
Thread 1 gets elements at the very front:
[3, 7]Thread 2 gets the next batch:
[11, 40](Contains40, which matches your filter)Thread 3 gets:
[15, 22](Contains15, which matches your filter)Thread 4 gets:
[13, 30](Contains30, which matches your filter)
Step-by-Step Architecture: How They Coordinate
Here is the exact step-by-step trace of how the JVM resolves the race condition when Thread 3 finishes first:
Step A: Thread 3 Finishes First
Thread 3 finishes processing [15, 22] in hardware real-time and finds 15.
It records
15as its local result.Because this is
.findFirst(), it cannot just return it globally. It checks its parent node: "Hey, do I have a left sibling task?" Yes, the task handled by Thread 1 and 2 is its left sibling.Because a left sibling exists, Thread 3 cannot propagate its result upward yet. It is forced to wait and sit idle.
Step B: The Global Short-Circuit Optimization
Even though Thread 3 cannot return 15 yet, it does something highly efficient: it sets a shared, volatile "Short-Circuit Flag" on all tasks to its right. It tells Thread 4: "Hey, I already found a match at a position earlier than yours. You can abort your work completely."
Thread 4 instantly stops.
30is completely discarded.
Step C: Thread 2 Finishes Second
Now, Thread 2 finishes processing [11, 40] and finds 40.
Thread 2 looks at its position in the tree. It notices it is the right child of
[Task Left].It checks if Thread 1 (its left sibling) found anything. Thread 1 finishes processing
[3, 7], finds no match, and reports backnull.
Step D: Tree Traversal and Resolution
Because Thread 1 returned null, Thread 2’s result (40) becomes the absolute champion of the entire [Task Left] subtree.
[Task Left]passes40up to the[Root Task].The
[Root Task]now looks at both sides:[Task Left]has provided40.Task Right(from Thread 3) has provided15.Because
[Task Left]represents the earlier section of the original array, the Root Task explicitly picks 40, throws away Thread 3's 15, and terminates the stream pipeline.
Summary
The JVM tracks encounter order by mirroring the parallel execution inside a Structured Binary Task Tree (FindTask). When a collection is split, each sub-task maintains a strict parental knowledge of its left and right child relationships.
*If a thread handling a right-hand chunk finishes first and finds a match, it sets a short-circuit flag to kill all tasks further to its right, but it buffers its own result and cannot push it up the tree. It must block and wait for the left-hand sister tasks to report back. *
The tree resolves from the bottom up: a parent node will only accept a right-child's result if the left-child explicitly reports completion with no match. This pointer-tracking and leaf-node waiting mechanism is exactly what introduces the heavy synchronization overhead in .findFirst() during parallel execution.
Custom Collectors: How do you implement a custom Collector to group a list of objects into a specialized map structure?
1. The Core Architecture: The Collector<T, A, R> Interface
T: The type of input elements from the stream.
A: The type of the intermediate mutable accumulator container (e.g., your specialized map structure)
R: The final result type returned by the collector pipeline
2. The 5 Essential Lifecycle Methods
=> supplier(): Creates and initializes a fresh, mutable container instance.
=> accumulator(): Defines how a single stream element is incorporated into that container instance.
=> combiner(): Defines how two independent, partially filled container objects are merged together (critical for safety during .parallelStream() execution).
=> finisher(): Performs the final transformations on the container object before returning it to the application (or defaults to an identity function if no transformation is required).
=> characteristics(): A set of configuration flags (e.g., IDENTITY_FINISH, UNORDERED) that optimize internal stream execution.
3. Concrete Production Scenario & Code
import java.util.*;
import java.util.stream.Collector;
public class TransactionCustomCollector {
public static Collector
<
Transaction, //The type of input elements from the stream
TreeMap<String, List<Transaction>>, //Currency -> List<Transaction>
TreeMap<String, List<Transaction>> //Currency -> List<Transaction>
> toSpecializedCurrencyMap() {
return Collector.of(
// 1. Supplier: Initialize our specialized Map structure
TreeMap::new,
// 2. Accumulator: Group elements manually into the map
(map, transaction) -> {
map.computeIfAbsent(transaction.getCurrency(), k -> new ArrayList<>())
.add(transaction);
},
// 3. Combiner: Safely merge two map sub-trees during parallel executions
(map1, map2) -> {
map2.forEach((currency, transactions) ->
map1.computeIfAbsent(currency, k -> new ArrayList<>())
.addAll(transactions)
);
return map1;
},
// 4. Finisher: Freeze the inner lists to guarantee immutability before delivery
map -> {
map.replaceAll((currency, list) -> Collections.unmodifiableList(list));
return map;
},
// 5. Characteristics: No optimization flags match due to our custom finisher
Collector.Characteristics.UNORDERED
);
}
}
// Simple Domain Data Carrier Record
public class Transaction(String transactionId, String currency, double amount) {
public String getCurrency() { return currency; }
}
Usage :
TreeMap<String, List<Transaction>> result = transactions.stream()
.collect(TransactionCustomCollector.toSpecializedCurrencyMap());
Simple Breakdown (Easy to Remember)
| Component | What it does | Why needed? |
|---|---|---|
| Supplier | Creates empty TreeMap | Starting point |
| Accumulator | Adds transaction to correct list by currency | Main logic |
| Combiner | Merges two maps (used in parallel streams) | Required for parallel |
| Finisher | Makes inner lists immutable | Safety |
Intermediate vs Terminal: Explain lazy evaluation in Streams. How does the JVM optimize pipeline execution?
Lazy Evaluation is one of the core features of Java Streams. It means intermediate operations are not executed immediately — they only build a pipeline. The actual processing happens only when a terminal operation is invoked.
Intermediate vs Terminal Operations
- Intermediate Operations → Lazy (Return another Stream) Example: filter(), map(), flatMap(), sorted(), distinct(), limit(), peek()
- Terminal Operations → Eager (Trigger execution) Example: collect(), forEach(), reduce(), findFirst(), count(), anyMatch()
How Lazy Evaluation Works + JVM Optimizations
The JVM optimizes Streams through two important concepts:
- Vertical Processing (Pipelining) Instead of processing all elements horizontally (first apply filter on entire list, then map), the JVM processes one element vertically through the entire pipeline before moving to the next element. This reduces memory usage and improves cache efficiency.
- Stateless Operations
Most intermediate operations (filter, map, etc.) are stateless — they don’t depend on previous elements. This allows the JVM to:
=> Execute operations in a single pass.
=> Easily split the stream for parallel processing.
=> Apply short-circuiting (e.g., findFirst(), limit() stop early).
Real Example
.filter(t -> t.getAmount() > 10000) // Lazy + Stateless
.map(Transaction::getCurrency) // Lazy + Stateless
.distinct() // Lazy
.limit(10) // Lazy + Short-circuit
.collect(Collectors.toList()); // Terminal → Execution starts
Real Project Example (TCS Verizon)
"In the Verizon Fiber Logic batch job, we had thousands of site records. I applied multiple filters and transformations using Streams. Thanks to lazy + vertical processing, records that failed early filters were never processed further. This optimization, along with stateless operations, helped reduce the batch processing time significantly from 2+ hours to 35-45 minutes."
Strong Closing Line:
Lazy evaluation combined with vertical processing and stateless operations allows the JVM to optimize memory usage, enable short-circuiting, and support efficient parallel execution.
FlatMap vs Map: When would you choose flatMap over map in a real-world nested JSON payload transformation?
=> map() and flatMap() are both intermediate operations in Java Streams, but they are used in very different scenarios, especially when dealing with nested data structures like JSON payloads.
=> Use .map() when you have a 1-to-1 transformation. You take one object and transform it into exactly one other object, maintaining a flat structure.
=> Use .flatMap() when you have a 1-to-Many transformation involving nested collections. It transforms each element into a stream of its own, and then flattens all those nested sub-streams into a single, unified data channel.
Real-World Example (From Payment Wallet System)
Suppose we receive a nested JSON payload from a payment gateway:
{
"payments": [
{
"txnId": "TXN001",
"items": [
{"product": "Gold", "amount": 5000},
{"product": "Silver", "amount": 2000}
]
},
{
"txnId": "TXN002",
"items": [
{"product": "Platinum", "amount": 15000}
]
}
]
}
Using map() (Incorrect for flattening):
List<List<Item>> nestedItems = payments.stream()
.map(Payment::getItems) // Returns Stream<List<Item>>
.collect(Collectors.toList()); // Result: List of Lists
Using flatMap() (Correct Choice):
List<Item> allItems = payments.stream()
.flatMap(payment -> payment.getItems().stream()) // Flatten nested lists
.collect(Collectors.toList());
Fail-Fast vs Fail-Safe: Differentiate between ArrayList and CopyOnWriteArrayList iterators under concurrent modification.
| Iterator Behavior | Fail-Fast (ArrayList) | Fail-Safe (CopyOnWriteArrayList) |
|---|---|---|
| Modification Detection | Immediately detects structural modification | Does not throw exception on modification |
| Exception | Throws ConcurrentModificationException | No exception |
| How it works | Uses modCount (modification count) | Works on a snapshot (copy) of the collection |
| Performance | Faster, low memory | Slower (copy on write), higher memory usage |
| Use Case | Single-threaded or properly synchronized code | High read, low write concurrent scenarios |
Detailed Explanation
1. ArrayList Iterator (Fail-Fast)
List<String> list = new ArrayList<>();
list.add("A");
list.add("B");
Iterator<String> it = list.iterator();
new Thread(() -> list.add("C")).start(); // Another thread modifies
while(it.hasNext()) {
System.out.println(it.next()); // Throws ConcurrentModificationException
}
=> It checks modCount (structural modification counter) before every operation.
=> If modCount changes → throws ConcurrentModificationException.
2. CopyOnWriteArrayList Iterator (Fail-Safe)
List<String> list = new CopyOnWriteArrayList<>();
list.add("A");
list.add("B");
Iterator<String> it = list.iterator();
new Thread(() -> list.add("C")).start(); // Modification happens
while(it.hasNext()) {
System.out.println(it.next()); // No exception, works on snapshot
}
When to Choose Which?
- Use ArrayList (Fail-Fast) → When you control synchronization or it's single-threaded.
- Use CopyOnWriteArrayList (Fail-Safe) → When you have high read concurrency and infrequent writes.
Note: CopyOnWriteArrayList has high write cost (full copy), so use it only when writes are rare.
Identity vs Equality: Why must you override hashCode() alongside equals() when using custom objects as keys in a HashSet?
"Identity means reference equality — checked by == operator (whether two references point to the same object in memory).
Equality means logical equality — defined by the equals() method.
Java has a very important contract between equals() and hashCode():
If two objects are equal according to equals(), then they must return the same hashCode() value.
If two objects are equal according to equals(), then they must return the same hashCode() value.
Why we must override both when using custom objects in HashSet / HashMap
HashSet (and HashMap) works in two steps for performance:
- It calls hashCode() to find the bucket where the object should go.
- Then it uses equals() to check for duplicates inside that bucket.
If you override only equals() but not hashCode():
- Two logically equal objects can produce different hash codes.
- They may land in different buckets.
- HashSet will treat them as different objects → duplicates will be allowed, breaking the Set contract.
The Scenario
Imagine you have a custom User object, and you override equals() to state that two users are identical if they share the same userId. However, you do not override hashCode().
User user1 = new User("USER-99");
User user2 = new User("USER-99");
HashSet<User> userSet = new HashSet<>();
userSet.add(user1);
userSet.add(user2);
What Happens in Memory?
Adding user1: The JVM invokes the default hashCode() on user1. It returns an address-based integer, say 104857. Java places user1 inside Bucket #5.
Adding user2: The JVM invokes the default hashCode() on user2. Because user2 is a completely distinct object instance in memory, it returns a totally different hash code, say 987654. Java calculates its index and routes it straight to Bucket #12.
The Glitch: Because the HashSet checks Bucket #12 and finds it empty, it happily inserts user2. It never even calls your custom .equals() method because it never looks inside Bucket #5!
Your set now contains duplicate keys (user1 and user2), breaking the core architectural guarantee of a HashSet.
🛠️ Visualizing the Bucket Separation
user1.hashCode() -> 104857 (Bucket 5) --> [ User("USER-99") ] user2.hashCode() -> 987654 (Bucket 12) --> [ User("USER-99") ] Result: Duplicate objects bypass HashSet uniqueness validation because they land in different buckets.
How to Correctly Override Them
import java.util.Objects;
public class User {
private String userId;
public User(String userId) {
this.userId = userId;
}
@Override
public boolean equals(Object o) {
if (this == o) return true;
if (o == null || getClass() != o.getClass()) return false;
User user = (User) o;
return Objects.equals(userId, user.userId);
}
@Override
public int hashCode() {
// Enforces that identical userIds produce identical hash integers
return Objects.hash(userId);
}
}
Failing to override hashCode() alongside equals() violates the fundamental contract of the Java collections framework and breaks the data integrity of hashing structures like HashSet and HashMap.
*A HashSet uses a two-step lookup sequence to identify duplicates. It first evaluates the object's hashCode() to route it to a specific memory bucket, and only invokes .equals() if it encounters a bucket collision. *
*If we only override equals(), two structurally identical objects will retain their default, memory-address-based hash codes. Consequently, they will hash to completely different buckets inside the map. Because the structure never looks in the same bucket twice for these instances, it bypasses our equality check entirely and inserts duplicate objects into the set, breaking its core uniqueness guarantee. *
Therefore, as a strict rule, any fields used to establish business equality in equals() must be included in the calculation of hashCode() to ensure equal objects consistently target the same hash bucket.
Closing line :
Whenever you override equals(), you must override hashCode() using the same fields to maintain the contract. Failing to do so breaks collections like HashSet, HashMap, and can cause subtle bugs in production
Method References: How do method references (Class::method) work dynamically under the hood compared to standard lambda expressions?
While a lambda expression and a method reference look like interchangeable syntactic options, their compilation and runtime mechanics diverge inside the class layout.
Both utilize the invokedynamic instruction and invoke LambdaMetafactory at runtime to generate anonymous classes in memory, completely avoiding disk-level inner class overhead. However, a lambda expression forces the compiler to generate a hidden, synthetic helper method inside our native class to encapsulate the lambda body, resulting in an indirect execution hop.
*A method reference is more structurally efficient. Because it references an already existing, fully compiled method signature, the compiler avoids generating a synthetic helper method entirely. The LambdaMetafactory maps a direct MethodHandle straight from the functional interface to the target method. *
This direct mapping optimizes the call stack frames and, in the case of static method references, allows the JIT compiler to inline the code significantly faster, making method references marginally more efficient and structurally cleaner at scale.
=> While method references (Class::method) and lambda expressions are functionally interchangeable, there are subtle differences in how the compiler and JVM handle them.
Correct Technical Reality:
- Both lambdas and method references use invokedynamic + LambdaMetafactory at runtime.
- Both generate dynamic proxy classes in memory (no .class files on disk).
However, Gemini’s explanation is partially correct but overstated:
Method References are slightly more efficient because:
- The compiler does not need to generate an extra synthetic helper method to wrap the lambda body.
- It can directly create a MethodHandle pointing to the existing method.
- This leads to a slightly cleaner call stack and marginally better JIT inlining opportunities.
Performance Difference:
- In most cases, the difference is very small / negligible.
- In hot paths (high-frequency code), method references can be marginally faster (usually < 5-10% in microbenchmarks).
- The real benefit of method references is readability and maintainability, not raw performance.
In my projects, I prefer method references whenever possible because the code becomes cleaner and more expressive. For example, in the Verizon batch job, I used siteRecords.forEach(this::validateAndProcessSite) instead of a lambda.
While there is a minor performance advantage due to direct method binding, the main reason I choose them is improved code readability.
Optional Anti-patterns: What are the performance and clean-code downsides of using Optional.get() or passing Optional as method parameters?
=> While Optional is a great tool to represent nullable values explicitly, misusing it can lead to code smells, bugs, and even performance issues. There are two very common anti-patterns I actively avoid.
1. Using Optional.get() (Most Dangerous Anti-Pattern)
Why it's bad:
- Optional.get() throws NoSuchElementException if the value is absent.
- It defeats the whole purpose of Optional — which is to force the developer to handle the absent case.
- It leads to silent failures or unexpected exceptions in production.
Bad Code:
Optional<User> userOpt = userRepository.findById(id);
User user = userOpt.get(); // Dangerous!
Better Alternatives:
User user = userOpt.orElse(new User());
// Option B: Compute default lazily
User user = userOpt.orElseGet(() -> createDefaultUser());
// Option C: Throw meaningful exception
User user = userOpt.orElseThrow(() -> new UserNotFoundException(id));
// Option D: Functional style
userOpt.ifPresent(user -> processUser(user));
2. Passing Optional as Method Parameter (Clean Code Anti-Pattern)
Why it's bad:
- It makes the API unclear — the method signature doesn't clearly communicate whether null is acceptable.
- Forces the caller to create unnecessary Optional objects.
- Violates the "Optional should be used for return types only" principle (as per Brian Goetz, Java Language Architect).
Bad Code:
public void updateUser(Optional<User> userOpt) { ... } // Anti-patternBetter Approach:
Objects.requireNonNull(user, "User cannot be null");
// business logic
}
=> I treat Optional as a return type only. I avoid Optional.get() completely and never pass Optional as method parameters. This improves code clarity, reduces bugs, and maintains good performance by avoiding unnecessary object creation.
Memory Footprint: What happens to the underlying memory structure when you create millions of tiny, short-lived String objects inside a Stream loop?
=> This is a classic scenario that can cause significant memory pressure and GC overhead if not handled properly.
What Happens Under the Hood
When you create millions of tiny, short-lived String objects inside a Stream (especially in a loop or map() operation):
- Object Creation
Every String is an immutable object.
Each operation like new String(), string concatenation (+), or String.valueOf() creates a new object on the heap. - Memory Allocation Pattern
All these short-lived objects are allocated in the Young Generation (Eden space).
Since they are short-lived (die after the stream operation), they should ideally be collected by Minor GC. - Pressure on Garbage Collector
Millions of objects → Eden space fills up very quickly.
This triggers frequent Minor GCs (Stop-the-World pauses).
If the allocation rate is very high, it can lead to:
High promotion rate to Old Generation.
Eventually, Full GCs with long pauses. - Additional Overhead in Streams
In parallel streams, multiple threads create objects simultaneously → even faster Eden filling.
Each String also has its own char array (or byte array in Java 9+), doubling the memory footprint.
Real-World Example
// Bad - High memory churn
List<String> result = items.stream()
.map(item -> item.getName() + "_" + item.getId() + "_processed") // Creates millions of temp Strings
.collect(Collectors.toList());Better Approach:
// Better - Reduced object creation
List<String> result = items.stream()
.map(item -> String.format("%s_%s_processed", item.getName(), item.getId()))
.collect(Collectors.toList());Or even better — use StringBuilder in a traditional loop for extreme cases.
=> Creating millions of short-lived Strings leads to high allocation rate in Eden space, frequent Minor GCs, and potential promotion to Old Gen. Always be mindful of object creation inside hot streams or loops.
What happens behind the scenes when SpringApplication.run() is called?
=> SpringApplication.run() is the main entry point of a Spring Boot application. It does a significant amount of work to bootstrap the entire application.
Step-by-Step Process Behind SpringApplication.run()
When you execute SpringApplication.run(MyApplication.class, args);
Behind the Scenes of SpringApplication.run()
- Create SpringApplication Instance
=> Reads the main class (annotated with @SpringBootApplication).
=> Detects the application type (Servlet, Reactive, or Non-web). - Prepare Environment
=> Loads application.properties / application.yml
=> Loads environment variables, command-line arguments, and profiles.
=> Prepares ConfigurableEnvironment. - Create ApplicationContext
=> Creates AnnotationConfigServletWebServerApplicationContext (for web apps) or appropriate context. - Load Auto-Configuration
=> This is the heart of Spring Boot.
=> Scans META-INF/spring.factories (or spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports in newer versions).
=> Loads hundreds of auto-configuration classes (e.g., DataSourceAutoConfiguration, JpaRepositoriesAutoConfiguration, RedisAutoConfiguration, etc.).
=> Applies conditions like @ConditionalOnClass, @ConditionalOnProperty, @ConditionalOnMissingBean, etc. - Component Scanning
=> Scans for @Component, @Service, @Repository, @Controller, @Configuration, etc., starting from the main class package. - Refresh the ApplicationContext
=> BeanFactoryPostProcessors are executed.
=> All singleton beans are instantiated and initialized (@PostConstruct, InitializingBean, etc.). - Start Embedded Server
=> Starts Tomcat (default), Jetty, or Undertow. - Execute Runners
=> Runs all CommandLineRunner and ApplicationRunner beans. - Publish ApplicationReadyEvent
=> Application is fully started and ready to accept traffic.
SpringApplication.run() does three major things:
- Sets up the environment and properties.
- Performs Auto-Configuration + Component Scanning.
- Creates beans, starts the embedded server, and runs any custom runners.
Bean Lifecycle: Walk through how Spring instantiates, configures, and destroys a prototype-scoped bean vs a singleton bean.
=> Spring manages bean lifecycle differently based on the scope. The two most important scopes are Singleton (default) and Prototype.
1. Singleton Scope (Default)
=> Dependency Injection & Configuration: Dependencies are injected once.
=> Initialization: @PostConstruct, InitializingBean, init-method called only once.
=> Usage: Same instance returned for every request.
=> Destruction: Called only once when the ApplicationContext is closed (@PreDestroy, DisposableBean, destroy-method).
Lifecycle Summary (Singleton): One-time creation → One-time initialization → Long-lived → One-time destruction
2. Prototype Scope (@Scope("prototype"))
=> Dependency Injection & Configuration: Dependencies are injected fresh for every new instance.
=> Initialization: @PostConstruct, InitializingBean, etc., are executed every time a new instance is created.
=> Usage: New object returned on every request.
=> Destruction: Spring does NOT call destroy methods (@PreDestroy, DisposableBean) automatically for prototype beans.
| Scope | When is the bean created? | How many instances? | Created at Application Start? |
|---|---|---|---|
| Singleton (Default) | Once, when ApplicationContext starts (eager) | Only 1 instance | Yes (by default) |
| Prototype | Every time the bean is requested | New instance every time | No |
=> Prototype beans are lazily created — only when someone asks for it (via dependency injection or applicationContext.getBean()).
Example :
@Component
@Scope("prototype")
public class PaymentValidator {
public PaymentValidator() {
System.out.println("PaymentValidator instance created");
}
}
@Service
public class PaymentService {
private final PaymentValidator paymentValidator; // New instance every time
public PaymentService(PaymentValidator paymentValidator) { // Spring gives new instance
this.paymentValidator = paymentValidator;
}
}
@Service
public class PaymentService {
private final ObjectFactory<PaymentValidator> validatorFactory;
public PaymentService(ObjectFactory<PaymentValidator> validatorFactory) {
this.validatorFactory = validatorFactory;
}
public void processPayment() {
PaymentValidator validator = validatorFactory.getObject(); // New instance each time
validator.validate(...);
}
}
If some of Spring Beans are not initialized. How can you identify and fix ?
If some Spring beans are not getting initialized, I follow a systematic debugging approach.
Step-by-Step Identification
1. Check Application Startup Logs
=> Look for exceptions like NoSuchBeanDefinitionException, BeanCreationException, or UnsatisfiedDependencyException.
=> Search for keywords: "Failed to instantiate", "No qualifying bean", "Consider defining a bean".
2. Enable Debug Logging
application.properties
logging.level.org.springframework.beans=DEBUG
logging.level.org.springframework.context=DEBUG
3. Use Actuator (Best Way)
=> Add Spring Boot Actuator and hit:
=> GET /actuator/beans → See all beans and their status.
4. Common Causes I Check First
=> Component Scanning Issue → Bean is not in the scanned package.
=> @Component / @Service / @Repository missing on the class.
=> @Configuration class not scanned or missing @Bean method.
=> Circular Dependency → Spring fails to create the bean.
=> Conditional Annotations (@ConditionalOnProperty, @ConditionalOnBean, etc.) not satisfied.
=> Profile-specific beans — wrong active profile.
5. Fix Strategies
=> For Component Scanning: Move the class under the main package or use @ComponentScan("com.myapp").
=> Explicit Bean Definition: Add @Bean method in a @Configuration class.
=> Fix Circular Dependency: Use constructor injection carefully or @Lazy.
=> Check Conditions: Verify property values or other beans required by @Conditional.
=> Use ApplicationContext for debugging (temporarily):Java
@Autowired
private ApplicationContext context;
// Then check
System.out.println(context.containsBean("myBeanName"));
Real Project Example
=> In my Payment Wallet System, the RedisTemplate bean was not getting initialized. After checking logs, I found it was due to a missing @Configuration on the Redis config class. Once I added it and ensured it was under component scan, the bean was created successfully.
=> I always start with logs and Actuator’s /actuator/beans endpoint. Most bean initialization issues are related to component scanning, missing annotations, or conditional logic.
Circular Dependency: How does Spring resolve circular dependencies between two singleton beans using its three-stage cache mechanism?=> Spring can resolve circular dependencies between singleton beans using a mechanism called the Three-Level Cache (also known as the Singleton Cache). This is one of the advanced internals of Spring.
What is a Circular Dependency (ABA)?
@Service
public class A {
private final B b; // A depends on B
public A(B b) { this.b = b; }
}
@Service
public class B {
private final A a; // B depends on A
public B(A a) { this.a = a; }
}
=> This creates a circular dependency: A → B → A.
Spring’s Three-Level Cache Mechanism
=> Spring uses three maps (caches) to handle this situation during bean creation:
| Cache Level | Map Name | Purpose |
|---|---|---|
| Level 1 (Singleton Cache) | singletonObjects | Fully initialized beans |
| Level 2 (Early Cache) | earlySingletonObjects | Early references (partially created beans) |
| Level 3 (Factory Cache) | singletonFactories | Object factories to create early references |
Step-by-Step Resolution Process
- Spring starts creating Bean A
=> Puts A into singletonFactories (Level 3) with an ObjectFactory that can create an early reference of A. - A needs B → Spring tries to create Bean B.
- While creating B, B needs A.
=> Spring first checks Level 1 (singletonObjects) → Not found.
=> Then checks Level 2 (earlySingletonObjects) → Not found.
=> Then checks Level 3 (singletonFactories) → Found the factory for A.
=> Spring calls the factory to get an early reference (a partially created A) and puts it into Level 2. - B gets the early reference of A and completes its creation.
- B is added to Level 1 (fully created).
- A continues creation, gets the fully created B, and completes.
Important Limitations
- This mechanism works only for constructor injection in limited cases and best with setter/field injection.
- Constructor-based circular dependency cannot be resolved if both beans use constructor injection (Spring throws BeanCurrentlyInCreationException).
- It only works for singleton scoped beans. Prototype beans cannot resolve circular dependencies this way.
=> In my Payment Wallet System, I had a circular dependency between PaymentService and NotificationService. I resolved it by changing one of them to use setter injection instead of constructor injection, allowing Spring’s three-level cache to handle the circular reference smoothly.
Strong Closing Line :
Spring’s three-level cache is a clever mechanism that allows early reference exposure to break circular dependencies, but the best design is to avoid circular dependencies altogether by refactoring the architecture.
Transactional Rollback: Why does a @Transactional annotation fail to roll back changes when a method calls another transaction-marked method within the same class?
Root Cause
@Transactional works using Spring AOP (Proxy-based mechanism).
- When you call a @Transactional method from outside the class (through dependency injection), the call goes through the Spring Proxy.
- The proxy intercepts the call, starts the transaction, and applies commit/rollback logic.
- However, when a method calls another method in the same class (self-invocation), the call bypasses the proxy entirely.
Because the proxy is bypassed, the inner @Transactional method does not participate in the transaction context properly → rollback often fails.
Example of the Problem
public class PaymentService {
@Transactional
public void transferMoney(Long fromUser, Long toUser, BigDecimal amount) {
debit(fromUser, amount); // ← Self-invocation → Proxy bypassed
credit(toUser, amount); // ← Self-invocation → Proxy bypassed
}
@Transactional
private void debit(Long userId, BigDecimal amount) {
// Update wallet balance
}
@Transactional
private void credit(Long userId, BigDecimal amount) {
// Credit wallet balance
}
}
Best Solutions
- Extract inner methods to another bean (Recommended)
- Self-injection (Inject the same bean as proxy)
- Use TransactionTemplate for programmatic transactions
=> @Transactional only applies when the method is invoked through the Spring AOP proxy. Self-invocation inside the same class bypasses the proxy, which is why rollback behavior becomes unreliable
Transaction Propagation: Explain the behavioral difference between REQUIRED and REQUIRES_NEW propagation levels.
=> Transaction propagation defines how a @Transactional method behaves when it is called from another transactional method.
=> The two most important levels are REQUIRED (default) and REQUIRES_NEW.
1. Propagation.REQUIRED (Default)
=> If a transaction is already active, it joins that transaction
=> If no transaction exists, it creates a new one
=> Key Point: All methods participate in the same transaction.
=> Example :
@Transactional(propagation = Propagation.REQUIRED)
public void transferMoney(...) {
debit(...); // Joins the same transaction
credit(...); // Joins the same transaction
}
2. Propagation.REQUIRES_NEW
=> Always creates a new, independent transaction.
=> If an outer transaction exists, it suspends the outer transaction temporarily.
=> The inner transaction commits or rolls back independently.
=> Example :
@Transactional(propagation = Propagation.REQUIRES_NEW)
public void logAudit(String message) {
// This will commit even if outer transaction rolls back
}
| Scenario | REQUIRED | REQUIRES_NEW |
|---|---|---|
| No active transaction | Creates new | Creates new |
| Inside existing transaction | Joins existing transaction | Suspends outer, creates new independent transaction |
| Inner method rolls back | Rolls back entire transaction | Only inner transaction rolls back |
| Outer method rolls back | Rolls back everything | Outer rolls back, inner stays committed |
| Use Case | Normal business logic (e.g., money transfer) | Audit logging, notifications, side effects |
Real Project Example (TCS + Payment Wallet)
In my Payment Wallet System, I used REQUIRES_NEW for audit logging inside the money transfer flow. Even if the main transfer transaction failed and rolled back, the audit log should still be saved. So I marked the saveAuditLog() method with Propagation.REQUIRES_NEW.
In the TCS Verizon project, most batch operations used the default REQUIRED since we wanted atomic behavior across multiple steps.
Strong Closing Line:
REQUIRED is used when you want everything in one transaction. REQUIRES_NEW is used when you need an operation to succeed independently, regardless of the outer transaction's outcome.
Hibernate Caching: Differentiate between First-Level Cache and Second-Level Cache. What are the pitfalls of caching highly dynamic transactional data?
1. First-Level Cache (Session Cache)
- Caches entities loaded within the same session.
- Guarantees object identity — same database row always returns the same Java object instance in one session.
- Supports dirty checking and automatic flushing.
2. Second-Level Cache (SessionFactory Cache)
Behavior:
- Stores entities in a shared cache.
- Can be used by multiple sessions.
- Requires a cache provider (Ehcache, Caffeine, Redis, Infinispan, etc.).
| Feature | First-Level Cache | Second-Level Cache |
|---|---|---|
| Scope | One Session | Entire Application (SessionFactory) |
| Enabled by Default | Yes | No |
| Lifetime | Until Session closes | Until Application stops |
| Stores | Managed entities | Detached entities |
| Concurrency | Safe (single session) | Needs careful configuration |
| Use Case | Normal operations within one transaction | Read-heavy, slowly changing data |
Pitfalls of Caching Highly Dynamic Transactional Data
Caching data that changes frequently (e.g., wallet balance, stock prices, order status) in Second-Level Cache is highly risky:
- Stale Data — Users see outdated information.
- Lost Updates / Inconsistent State — Multiple transactions can overwrite each other.
- Complex Cache Invalidation — Very difficult to keep cache in sync with database in real-time.
- Increased Memory Pressure — Highly dynamic data causes frequent cache evictions and churn.
- Performance Degradation — Frequent cache misses or invalidations can hurt more than they help.
Real Example from My Projects:
- In the Payment Wallet System, I never enabled Second-Level Cache for wallet balances or transactions. Instead, I used Redis with proper TTL and distributed locking for high-read scenarios.
- In the TCS Verizon project, I enabled Second-Level Cache only for master data (site types, port configurations) that changed very rarely.
I use First-Level Cache by default. I enable Second-Level Cache only for read-heavy, slowly changing reference data. For transactional data, I prefer direct database queries with proper indexing or external caches like Redis with explicit invalidation.
Hibernate Caching: Differentiate between First-Level Cache and Second-Level Cache. What are the pitfalls of caching highly dynamic transactional data?What is HikariCP Connection Pool?
HikariCP is the default and most popular connection pooling library used in Spring Boot applications.
- It is known for being extremely fast, lightweight, and reliable.
- Spring Boot automatically uses HikariCP when you add spring-boot-starter-jdbc or spring-boot-starter-data-jpa.
- It manages and reuses database connections efficiently instead of creating a new connection for every request.
Connection Pool Starvation: Symptoms & Fixes
Connection Pool Starvation means all connections in the pool are busy, and new requests have to wait or fail.
Common Symptoms (What you will observe)
- Application becomes very slow (high latency).
- Requests start timing out (especially database calls).
- Logs show warnings like:
HikariPool-1 - Connection is not available, request timed out after 30000ms
Waiting for connection from pool - High number of threads in WAITING / BLOCKED state (visible in thread dump).
- Database CPU is not high, but application response time is poor.
- Spring Boot Actuator /actuator/metrics shows hikaricp.connections.active close to maximumPoolSize.
How to Fix HikariCP Connection Pool Exhaustion
Here’s the practical approach I follow:
1. Immediate Diagnosis
# Add these properties
spring.datasource.hikari.maximum-pool-size=20
spring.datasource.hikari.minimum-idle=5
spring.datasource.hikari.connection-timeout=30000
spring.datasource.hikari.idle-timeout=600000
spring.datasource.hikari.max-lifetime=18000002. Root Cause Analysis & Fixes
- Increase Pool Size (Temporary Fix)properties
spring.datasource.hikari.maximum-pool-size=30 # or 50 depending on DB capacity - Fix the Real Problem (Most Important)
Long-running queries / transactions → Optimize queries or add indexes.
Missing @Transactional boundaries → Transactions staying open too long.
Not closing connections properly → Use try-with-resources or Spring-managed transactions.
Too many concurrent users → Implement proper backpressure or rate limiting.
3. Monitoring
- Use Actuator + Prometheus to monitor:
hikaricp.connections.usage
hikaricp.connections.active
hikaricp.connections.wait
Real Project Example (TCS Verizon):
"In the Verizon project, during heavy batch runs, we faced connection pool starvation. The batch job was opening too many connections without proper transaction boundaries. We fixed it by increasing maximum-pool-size temporarily and more importantly, by optimizing the batch processing logic and using proper @Transactional(readOnly = true) for read-heavy steps."
Spring Boot Autoconfiguration: How does @EnableAutoConfiguration work using spring.factories or conditional annotations (@ConditionalOnClass)?
=> @EnableAutoConfiguration is the heart of Spring Boot’s magic. It automatically configures infrastructure beans based on the libraries present in the classpath
How @EnableAutoConfiguration Works
@SpringBootApplication is a meta-annotation that includes three things:
@SpringBootApplication =
@SpringBootConfiguration
+ @EnableAutoConfiguration
+ @ComponentScan
@EnableAutoConfiguration does the following:
- It imports AutoConfigurationImportSelector.
- This selector scans the classpath for auto-configuration metadata.
- It reads configuration files in this order:
Modern Spring Boot (2.7+ / 3.x): META-INF/spring/org.springframework.boot.autoconfigure.AutoConfiguration.imports
Older Spring Boot: META-INF/spring.factories
Example (spring.factories - Old Style)
org.springframework.boot.autoconfigure.jdbc.DataSourceAutoConfiguration,\
org.springframework.boot.autoconfigure.orm.jpa.HibernateJpaAutoConfiguration,\
org.springframework.boot.autoconfigure.redis.RedisAutoConfiguration
Role of Conditional Annotations
Auto-configuration classes are not applied blindly. They use @Conditional annotations:
- @ConditionalOnClass(RedisOperations.class) → Apply only if Redis is on the classpath.
- @ConditionalOnMissingBean → Apply only if no bean of this type already exists.
- @ConditionalOnProperty → Apply based on property value.
- @ConditionalOnBean, @ConditionalOnExpression, etc.
Real Example (RedisAutoConfiguration)
@Configuration
@ConditionalOnClass(RedisOperations.class) // Key condition
@EnableConfigurationProperties(RedisProperties.class)
public class RedisAutoConfiguration {
@Bean
@ConditionalOnMissingBean
public RedisTemplate<Object, Object> redisTemplate(...) {
// auto-configured RedisTemplate
}
}Strong closing line :
@EnableAutoConfiguration + spring.factories (or .imports file) + smart conditional annotations allow Spring Boot to intelligently configure the application based on dependencies, making development much faster while keeping the code clean
Dirty Reads vs Non-Repeatable Reads: Explain how database isolation levels (Read Committed vs Repeatable Read) map to Spring transaction isolation settings.
Key Concepts
- Dirty Read: A transaction reads data that has been modified but not yet committed by another transaction. If the other transaction rolls back, you’ve read garbage data.
- Non-Repeatable Read: A transaction reads the same row twice and gets different values because another transaction updated and committed the data in between.
Isolation Levels Comparison
| Isolation Level | Dirty Read | Non-Repeatable Read | Phantom Read | Spring Constant | Use Case |
|---|---|---|---|---|---|
| READ_UNCOMMITTED | Yes | Yes | Yes | Isolation.READ_UNCOMMITTED | Rarely used |
| READ_COMMITTED | No | Yes | Yes | Isolation.READ_COMMITTED (Default) | Most common |
| REPEATABLE_READ | No | No | Yes | Isolation.REPEATABLE_READ | Payments, Finance |
| SERIALIZABLE | No | No | No | Isolation.SERIALIZABLE | Highest consistency |
Spring Mapping
// Default - Good balance
@Transactional(isolation = Isolation.READ_COMMITTED)
public void normalOperation() { ... }
// For critical operations
@Transactional(isolation = Isolation.REPEATABLE_READ)
public void transferMoney() {
// Balance check + update must be consistent
}
Optimistic vs Pessimistic Locking: When would you use a version column (@Version) over an explicit row-level database lock for concurrent entity updates?
=> Optimistic Locking and Pessimistic Locking are two strategies to handle concurrent updates.
=> I prefer Optimistic Locking using JPA’s @Version in most cases.
| Aspect | Optimistic Locking (@Version) | Pessimistic Locking (LockModeType.PESSIMISTIC_WRITE) |
|---|---|---|
| Approach | Assume no conflict, check at commit time | Lock the row immediately when reading |
| Performance | High (no locking during read) | Lower (row is locked for other transactions) |
| Scalability | Excellent | Poor under high concurrency |
| Use Case | High read, occasional write | Rare cases with very high conflict probability |
| Exception on Conflict | OptimisticLockException | Waits or deadlocks |
When I Prefer @Version (Optimistic Locking)
I use @Version in most production scenarios because:
- Better Performance & Scalability — No long-held locks.
- Simpler Code — Just add a version field.
- Handles most real-world conflicts gracefully with retry logic.
public class Wallet {
@Id
private Long id;
private BigDecimal balance;
@Version
private Long version; // JPA automatically manages this
}
public class WalletService {
@Autowired
private WalletRepository walletRepository;
@Transactional
public void debit(Long walletId, BigDecimal amount) {
int maxRetries = 3;
for (int attempt = 1; attempt <= maxRetries; attempt++) {
try {
Wallet wallet = walletRepository.findById(walletId)
.orElseThrow(() -> new WalletNotFoundException(walletId));
if (wallet.getBalance().compareTo(amount) < 0) {
throw new InsufficientBalanceException();
}
wallet.setBalance(wallet.getBalance().subtract(amount));
walletRepository.save(wallet); // Version check happens here
return; // Success
} catch (OptimisticLockException ex) {
if (attempt == maxRetries) {
throw new RuntimeException("Failed after " + maxRetries + " retries", ex);
}
// Optional: small backoff
try { Thread.sleep(50 * attempt); } catch (Exception ignored) {}
}
}
}
}
Real Project Examples
- TCS Verizon Project: For site port assignment (highly concurrent), I initially tried pessimistic locking but faced performance issues and deadlocks. Switched to @Version + retry logic, which scaled much better.
When I Would Use Pessimistic Locking Instead
- When conflict probability is very high (e.g., flash sale inventory).
- When business logic requires strong consistency and retrying is not acceptable.
- Short critical sections (e.g., select ... for update).
Rule of Thumb (Senior View):
"I default to Optimistic Locking with @Version for better scalability and simplicity. I only use Pessimistic Locking when the chance of conflict is high and business rules demand immediate locking."
Async Context: How does @Async behave under the hood? What happens if you do not declare a custom task executor configuration?
=> @Async is Spring’s declarative way to execute methods asynchronously. It is built on top of Spring AOP (Proxy mechanism).
How @Async Works Under the Hood
- When Spring detects the @Async annotation on a method, it creates a proxy around the bean.
- When the method is called from outside the class, the proxy intercepts the call.
- Instead of executing the method on the caller thread, the proxy submits the task to a TaskExecutor (thread pool).
- The caller thread returns immediately (non-blocking), while the actual method runs in the background thread.
This makes the method execution asynchronous.
What Happens If You Do NOT Declare a Custom Task Executor?
If you don’t configure any custom executor, Spring Boot falls back to its default executor — SimpleAsyncTaskExecutor.
Problems with SimpleAsyncTaskExecutor:
- It creates a new thread for every @Async call (no thread pooling or reuse).
- There is no limit on the number of threads.
- No queue, no rejection policy, and poor shutdown behavior.
- Under high load, it can easily cause thread exhaustion and OutOfMemoryError.
This is acceptable only in low-traffic or development environments.
=> Always configure a custom ThreadPoolTaskExecutor when using @Async in production. Relying on the default SimpleAsyncTaskExecutor is risky due to uncontrolled thread creation.Lazy Initialization Exceptions: What causes a LazyInitializationException outside of a Hibernate session, and how do you cleanly structure your DTO layer to prevent it?
Root Cause
- By default, relationships like @OneToMany, @ManyToOne (with FetchType.LAZY) are loaded lazily.
- The data is only fetched when you access it.
- Once the transaction ends, the Hibernate Session is closed.
- If you try to access the lazy collection later (in Controller, DTO mapping, JSON serialization, etc.), Hibernate throws LazyInitializationException.
public User getUser(Long id) {
return userRepository.findById(id).get(); // Session active here
}
// Later in Controller
User user = userService.getUser(1L);
user.getOrders().size(); // ← Exception! Session is closed
Clean Solution: DTO Layer Structure
I follow this strict layered approach to prevent this issue:
1. Never expose Entities outside Service Layer
@Servicepublic class UserService {
@Transactional(readOnly = true)
public UserResponse getUserWithOrders(Long userId) {
User user = userRepository.findByIdWithOrders(userId); // Fetch Join or @EntityGraph
// Map to DTO inside active transaction
return UserResponse.fromEntity(user);
}
}
2. DTO Class (Clean Mapping)
private Long id;
private String name;
private List<OrderSummary> orders;
public static UserResponse fromEntity(User user) {
UserResponse dto = new UserResponse();
dto.setId(user.getId());
dto.setName(user.getName());
// Safe because we are still inside the transaction
dto.setOrders(user.getOrders().stream()
.map(OrderSummary::fromEntity)
.collect(Collectors.toList()));
return dto;
}
}
Best Practices I Follow
- Use Fetch Join or @EntityGraph in Repository to load required associations.
- Always map Entity → DTO inside the @Transactional method.
- Never return Entities from Service to Controller.
- Use MapStruct for complex mappings in larger projects.
In my Payment Wallet System, I strictly followed this pattern for User, Wallet, and Transaction entities to avoid LazyInitializationException completely.
Kafka Exactly-Once Semantics (EOS): How do you configure a Kafka producer and consumer to guarantee exactly-once processing?
=> Exactly-Once Semantics (EOS) in Kafka ensures that a message is processed exactly once, even in case of failures and retries. This is critical in financial systems like payments.1. Producer Configuration (Idempotent + Transactional)
bootstrap-servers: localhost:9092
key-serializer: org.apache.kafka.common.serialization.StringSerializer
value-serializer: org.springframework.kafka.support.serializer.JsonSerializer
# For Exactly-Once
enable-idempotence: true
acks: all
retries: 2147483647
max-in-flight-requests-per-connection: 5
transaction-id-prefix: wallet-tx- # Must be unique per instance
public void sendPaymentEvent(PaymentEvent event) {
kafkaTemplate.executeInTransaction(operations -> {
operations.send("payment-topic", event.getTransactionId(), event);
return true;
});
}
2. Consumer Configuration
spring.kafka.consumer:
bootstrap-servers: localhost:9092
group-id: wallet-processing-group
auto-offset-reset: earliest
enable-auto-commit: false
# Critical for Exactly-Once
isolation-level: read_committed # Only read committed transactionsConsumer Code:
@KafkaListener(topics = "payment-topic", groupId = "wallet-processing-group")
@Transactional
public void processPayment(PaymentEvent event) {
// Business logic - save to DB, update wallet etc.
paymentService.process(event);
// Offset will be committed only if transaction succeeds
}Key Requirements for End-to-End Exactly-Once
- Producer must be transactional (transaction-id-prefix)
- Consumer must use isolation.level=read_committed
- Both producer and consumer should participate in Spring’s @Transactional
- Kafka broker must have transaction state logs enabled.
What is the use of group.id in Kafka Consumer?
group.id is one of the most important properties in a Kafka consumer.
Purpose of group.id
It identifies a group of consumers that work together to consume messages from one or more topics. Kafka uses this group ID to:
- Distribute Partitions
Kafka divides a topic into multiple partitions.
All consumers with the same group.id form a Consumer Group.
Kafka automatically distributes the partitions among the consumers in the group (each partition is consumed by only one consumer in the group). - Manage Offset Tracking
Kafka tracks the last consumed offset at the Consumer Group level.
This allows the group to resume from the correct position if a consumer restarts or fails. - Enable Parallel Processing
If you have 10 partitions and 5 consumers in the same group → each consumer processes 2 partitions in parallel.
"Kafka Partitions are the fundamental unit of parallelism and scalability in Kafka."
How Partition Count Affects Consumer Group Scalability
- Each partition of a topic can be consumed by only one consumer in a consumer group at any given time.
- The maximum number of active consumers that can process messages in parallel = Number of partitions.
Example:
- If a topic has 12 partitions, you can have up to 12 consumers in the same consumer group actively consuming messages in parallel.
- Each consumer gets assigned one or more partitions.
- This allows horizontal scaling of your consumer application.
Key Rule:
Partition count = Maximum achievable parallelism for that consumer group.
What Happens If Consumer Count Exceeds Partition Count?
If you run more consumers than the number of partitions (e.g., 15 consumers for a topic with 12 partitions):
- Only 12 consumers will be active (each consuming one partition).
- The extra 3 consumers will remain idle — they will not receive any partitions.
- Kafka’s consumer group coordinator will still perform rebalancing, but no additional throughput is gained.
- You are essentially wasting resources (idle instances consuming CPU/memory).
Best SQL Query (Robust & Modern)
FROM (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM Employee
) ranked
WHERE salary_rank = 3;
Spring Data JPA Equivalent
Best Approach: Using @Query (Native)
public interface EmployeeRepository extends JpaRepository<Employee, Long> {
@Query(value = """
SELECT DISTINCT salary
FROM (
SELECT salary,
DENSE_RANK() OVER (ORDER BY salary DESC) as salary_rank
FROM employee
) ranked
WHERE salary_rank = 3
""", nativeQuery = true)
Optional<BigDecimal> findThirdHighestSalary();
}| Feature | RANK() | DENSE_RANK() |
|---|---|---|
| Duplicate Handling | Leaves gaps in ranking | No gaps — continuous ranking |
| Behavior on Ties | Same value gets same rank, next rank skips | Same value gets same rank, next rank is immediate |
| Use Case | When you want to show "gaps" (traditional ranking) | When you want dense/continuous ranking |
Best Solution (Using Window Function)
-- Delete older duplicate records, keep the latest one based on created_at
DELETE FROM employee e
WHERE e.id IN (
SELECT id
FROM (
SELECT id,
ROW_NUMBER() OVER (PARTITION BY email
ORDER BY created_at DESC, id DESC) as row_num
FROM employee
) t
WHERE t.row_num > 1
);Explanation (Step by Step)
- Inner Subquery:
PARTITION BY email → Groups rows by email (duplicates).
ORDER BY created_at DESC, id DESC → Ranks latest record as 1.
ROW_NUMBER() → Assigns unique rank within each partition. - Outer Query:
Deletes all rows where row_num > 1 (i.e., older duplicates).
Production Best Practice
Before running the delete query, it's good to first identify the duplicates:
SELECT * FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY email ORDER BY created_at DESC) as rn
FROM employee
) t
WHERE rn > 1;
e.employee_id,
e.name,
e.department_id,
d.department_name,
dept.employee_count
FROM employee e
JOIN department d ON e.department_id = d.department_id
JOIN (
SELECT
department_id,
COUNT(*) AS employee_count
FROM employee
GROUP BY department_id
HAVING COUNT(*) > 20
) dept ON e.department_id = dept.department_id
ORDER BY e.department_id, e.name;
Complex Joins & Aggregations: Given an Orders table and a Customers table, find the names of customers who have placed more than 5 orders in the last 30 days along with their total spend.
Best SQL Query
SELECT
c.name,
COUNT(o.order_id) AS order_count,
SUM(o.amount) AS total_spend
FROM customers c
JOIN orders o ON c.customer_id = o.customer_id
WHERE o.order_date >= DATE_SUB(CURDATE(), INTERVAL 30 DAY)
GROUP BY c.customer_id, c.name
HAVING COUNT(o.order_id) > 5
ORDER BY total_spend DESC;Explanation (Step-by-Step)
- JOIN between customers and orders on customer_id.
- Filter orders placed in the last 30 days using DATE_SUB(CURDATE(), INTERVAL 30 DAY).
- GROUP BY customer to aggregate data.
- HAVING clause filters customers who have placed more than 5 orders.
- Show name, order_count, and total_spend (sum of amount).
Advanced Querying: Write an optimal SQL query to find the nth highest salary or the latest updated record per employee partition without using built-in rank functions.
Write an optimal SQL query to find the nth highest salary
Note : They mentioned to not use built-in rank functions. It means we should not use DENSE_RANK(), RANK() or ROW_NUMBER()
-- Nth Highest Salary per Employee (without RANK functions)
SELECT
e1.employee_id,
e1.name,
e1.salary
FROM employees e1
WHERE (
SELECT COUNT(DISTINCT e2.salary)
FROM employees e2
WHERE e2.employee_id = e1.employee_id
AND e2.salary > e1.salary
) = 1; -- Change '1' to n-1 for nth highest (e.g., 2 for 3rd highest)
latest updated record per employee partition without using built-in rank functions
-- Latest Record per Employee (without RANK)
SELECT e.*
FROM employees e
WHERE e.updated_at = (
SELECT MAX(e2.updated_at)
FROM employees e2
WHERE e2.employee_id = e.employee_id
);
Handling Traffic Spikes: If your Spring Boot service experiences a sudden spike from 1,500 concurrent users to 15,000, what immediate infrastructure and application-level adjustments will you implement?
=> I would act immediately on two fronts — Application Level and Infrastructure Level — while keeping the system stable.
Immediate Actions (Prioritized)
Application Level (First 5–10 minutes)
- Thread Pool Tuning Increase maxPoolSize and queueCapacity in my custom ThreadPoolTaskExecutor. If on Java 21+, switch to Virtual Threads (Executors.newVirtualThreadPerTaskExecutor()) — this gives the biggest immediate boost for I/O-heavy services.
- Circuit Breaker + Rate Limiting Activate Resilience4j Circuit Breaker and Rate Limiter on external calls and non-critical endpoints to prevent cascading failures.
- Connection Pool Increase HikariCP maximum-pool-size (e.g., from 20 → 50) and monitor for starvation.
- Caching Temporarily increase cache TTL for read-heavy endpoints (Redis/Caffeine) to reduce database load.
- Async Offloading Ensure non-critical operations (notifications, analytics, logging) are fully offloaded using @Async.
Infrastructure Level (Parallel)
- Horizontal Scaling Immediately increase replicas (Kubernetes) or instances (EC2). Trigger HPA (Horizontal Pod Autoscaler) with aggressive thresholds.
- Load Balancer Check and scale the Load Balancer / Ingress if needed. Enable connection draining.
- Database Scale read replicas if using RDS/Aurora. Kill slow queries if any.
- Monitoring & Alerting Closely watch Prometheus metrics (CPU, memory, thread pool usage, connection pool, latency, error rate).
Final thought :
Prevention is better — I always keep auto-scaling rules, proper thread/connection pool configuration, and Circuit Breakers ready. During a spike, the priority is quick horizontal scaling + protecting downstream systems while buying time to optimize the application.
Observability & Diagnostics: A production API is microservices-based, and a specific user request is failing intermittently. How do you trace this bug end-to-end? (Expect questions on Distributed Tracing via OpenTelemetry, Zipkin, or ELK stacks).
When a user request is failing intermittently in a microservices architecture, I follow the Three Pillars of Observability — Logs, Metrics, and Traces — with a strong focus on Distributed Tracing."
Step-by-Step End-to-End Tracing Approach
Reproduce & Capture Context
- Ask for the failing traceId / correlationId (or userId + timestamp).
- If not available, enable request correlation immediately.
Distributed Tracing (Most Important)
- I use OpenTelemetry (standard now) for instrumentation.
- All services are instrumented with OpenTelemetry Java agent or SDK.
- Traces are sent to Zipkin or Jaeger (or Tempo + Grafana).
Flow:
- A unique traceId is generated at the entry point (API Gateway or first service).
- Every downstream call (via RestTemplate, WebClient, Feign, Kafka) propagates the trace context (using traceparent header).
- I can search the trace in Zipkin by traceId and see the full call chain with latency breakdown per service.
Correlated Logging (ELK Stack)
- All services log with the same traceId and spanId.
- Using ELK (Elasticsearch + Logstash + Kibana) or Loki + Grafana.
- I can search all logs across services for a specific traceId to see errors, exceptions, and timing.
Metrics & Monitoring
- Prometheus + Grafana for service-level metrics (error rate, latency p99, throughput).
- Look for spikes in error rate or latency in the failing service.
Root Cause Analysis
- Check trace for slow downstream calls, database queries, or external API timeouts.
- Check for circuit breaker trips, retries exhaustion, or optimistic lock failures.
- Analyze thread dumps / heap dumps if needed.
Real Project Experience
In my Payment Wallet System:
- We faced intermittent failures in the payment flow.
- Using OpenTelemetry + Zipkin, I could trace a request and found that one downstream bank gateway was occasionally slow (high latency in one span).
- Combined with ELK logs using the same traceId, I identified it was due to occasional network blips.
- We added better retry + circuit breaker + fallback, which resolved the issue.
In TCS Verizon Project:
- We used correlation IDs in logs and basic distributed tracing to debug issues in the batch job when one service was slow.
Observability & Diagnostics: A production API is microservices-based, and a specific user request is failing intermittently. How do you trace this bug end-to-end? (Expect questions on Distributed Tracing via OpenTelemetry, Zipkin, or ELK stacks).
"The Fork-Join Framework is a specialized thread pool introduced in Java 7 designed for divide-and-conquer type of problems. It is highly efficient for parallel processing of large tasks that can be broken down recursively."
How Fork-Join Framework Works
It is based on the work-stealing algorithm and uses a special pool called ForkJoinPool.
Core Idea:
- A large task is forked (split) into smaller subtasks.
- These subtasks are executed in parallel.
- When the subtasks finish, their results are joined to produce the final result.
Internal Mechanics:
- ForkJoinPool creates a number of worker threads (default = number of CPU cores).
- Each worker thread has its own double-ended queue (Deque) of tasks.
- When a thread runs out of work, it steals a task from another thread’s queue (this is called work-stealing).
- This minimizes idle time and maximizes CPU utilization on multi-core systems.
Main Classes:
- RecursiveTask<V> → For tasks that return a value.
- RecursiveAction → For tasks that don’t return a value.
When I Prefer Fork-Join over Normal ThreadPoolExecutor
| Scenario | Prefer Normal ThreadPoolExecutor | Prefer Fork-Join |
|---|---|---|
| Task Type | Independent, I/O-bound tasks | Recursive, CPU-intensive, splittable tasks |
| Workload | General async work | Divide-and-conquer (e.g., parallel array processing) |
| Parallelism Style | Simple parallel execution | Recursive splitting + work-stealing |
I prefer Fork-Join when:
- The problem can be recursively divided into smaller similar subtasks.
- The workload is CPU-bound (not I/O-bound).
- I want maximum CPU utilization through work-stealing.
Real Project Example (TCS Verizon):
"In the Verizon Fiber Logic System project, the nightly batch job processed thousands of site audit records. Some parts involved complex route calculations that could be recursively split (dividing large lists of sites into smaller chunks). While we primarily used CompletableFuture with a custom thread pool, I experimented with Fork-Join for pure CPU-intensive computations within the batch. The work-stealing mechanism helped achieve better CPU utilization compared to a standard thread pool."
Simple Summary (Good Closing):
"I use a normal ThreadPoolExecutor for general asynchronous and I/O-bound tasks. I prefer Fork-Join when the problem naturally fits the divide-and-conquer pattern because of its efficient work-stealing algorithm and better performance on multi-core systems for recursive workloads."
Explain the trade-offs of using EntityGraph versus Join Fetches.
"Both @EntityGraph and JOIN FETCH are used to solve the N+1 Select Problem by eagerly loading associated entities. However, they have different strengths, trade-offs, and ideal use cases."
Comparison Table
| Aspect | @EntityGraph | JOIN FETCH | Winner |
|---|---|---|---|
| Readability | Excellent (clean & declarative) | Moderate (query becomes long) | EntityGraph |
| Reusability | High (can be reused across methods) | Low (defined per query) | EntityGraph |
| Flexibility | Very good (supports subgraphs for nested relations) | Good for simple cases | EntityGraph |
| Performance | Good | Slightly better in simple cases | Slight edge to JOIN FETCH |
| Multiple Collections | Handles well with subgraphs | Risk of Cartesian Product | EntityGraph |
| Maintenance | Easy (centralized definition) | Harder (duplicated queries) | EntityGraph |
| Dynamic Fetching | Easy (attribute paths) | Requires changing JPQL | EntityGraph |
Key Trade-offs
EntityGraph Advantages:
- Much cleaner code.
- Reusable across multiple repository methods.
- Excellent support for nested relationships using subgraph.
- Easier to maintain in large codebases.
EntityGraph Disadvantages:
- Slight overhead in some edge cases.
- Less control over the exact generated SQL.
JOIN FETCH Advantages:
- Very explicit and powerful for complex one-off queries.
- Sometimes produces more optimal SQL.
JOIN FETCH Disadvantages:
- Leads to code duplication.
- High risk of Cartesian Product when fetching multiple collections.
My Practical Decision Rule
- Use @EntityGraph → Default choice for most cases (especially in Service/Repository layers).
- Use JOIN FETCH → Only when I need very specific control or when EntityGraph doesn't give the desired SQL.
Real Project Examples
Payment Wallet System:
- I heavily used @EntityGraph for fetching User → Wallet → Transactions and Payment → PaymentItems.
- Example:Java
@EntityGraph(attributePaths = {"wallet", "wallet.transactions"}) Optional<User> findByIdWithWallet(Long id);
TCS Verizon Project:
- For complex batch reporting queries involving multiple collections, I sometimes used JOIN FETCH when EntityGraph didn't produce the optimal query plan.
Final Recommendation (Senior Touch):
"I prefer @EntityGraph in most production scenarios because it keeps the code clean and maintainable. I fall back to JOIN FETCH only when I need fine-grained control over the generated SQL or face performance issues with EntityGraph."
Removing Embedded Dependencies: How do you remove the default Tomcat server dependency from spring-boot-starter-web and switch it to Undertow or Jetty for high-performance requirements?
By default, spring-boot-starter-web includes Tomcat as the embedded servlet container. To switch to Undertow (my preferred choice for high-performance) or Jetty, we need to explicitly exclude Tomcat and add the desired server."
1. Switching to Undertow (Best for High Throughput)
Maven (pom.xml)
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
</exclusions>
</dependency>
<!-- Add Undertow -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-undertow</artifactId>
</dependency>Gradle (build.gradle)
implementation('org.springframework.boot:spring-boot-starter-web') {
exclude group: 'org.springframework.boot', module: 'spring-boot-starter-tomcat'
}
implementation 'org.springframework.boot:spring-boot-starter-undertow'2. Switching to Jetty
Same pattern, just replace Undertow with Jetty:
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-jetty</artifactId>
</dependency>Verification
After the change, check the startup logs. You should see:
Undertow started on port(s): 8080 (http)Instead of:
Tomcat started on port(s): 8080 (http)Real Project Experience
"In my Payment Wallet System, which needed to handle high concurrent payment requests, I switched from Tomcat to Undertow. Undertow gave better throughput and lower memory footprint under load. The change was straightforward using the exclusion method, and it improved our p99 latency significantly during peak hours."
Additional Tip (Senior Touch):
"Undertow is generally my first choice for high-performance microservices because it is non-blocking and has excellent memory efficiency. Jetty is also good when I need strong Servlet 4.0 / async support."
High-Throughput IoT Ingestion: "VVDN handles millions of hardware events. How would you design a Java-based backend ingestion layer to process 50k events per second without dropping packets?"
To handle 50,000 events per second reliably without dropping packets, I would design a highly decoupled, non-blocking, and horizontally scalable ingestion layer using the following architecture:
High-Level Design
IoT Devices → Load Balancer → Kafka (High Partition Topic) → Ingestion Consumers → Processing Layer
Key Design Choices for 50k+ EPS (Event Processing System)
1. Entry Point – Non-Blocking Ingestion
=> Use Netty or Spring WebFlux (Reactor Netty) as the ingestion endpoint (MQTT or HTTP).
=> Accept the event and immediately push it to Kafka — do not do any heavy processing in the HTTP/MQTT handler.
=> This ensures the ingestion layer can accept events at very high speed.
2. Kafka as the Backbone
=> Create a topic with high number of partitions (e.g., 64 to 128 partitions).
=> Use high replication factor (3) and acks=all + enable.idempotence=true on producers.
=> Kafka acts as a durable buffer — even if consumers are slow, no packets are lost.
3. Consumer Layer (High Throughput Processing)
=> Use Kafka Consumer Groups with multiple consumers.
=> Run consumers with Virtual Threads (Java 21+) or high-configured ThreadPoolTaskExecutor.
=> Enable concurrency in @KafkaListener to match the number of partitions.
4. Core Configuration for High Throughput
// Application.yml
spring:
kafka:
consumer:
group-id: iot-ingestion-group
auto-offset-reset: earliest
enable-auto-commit: false
max-poll-records: 5000 # Important for high throughput
isolation-level: read_committed
Consumer Code:
@KafkaListener(topics = "iot-events",
groupId = "iot-ingestion-group",
concurrency = "64") // Scale with partitions
public void handleEvent(IoTEvent event) {
// Minimal processing here, push to another topic or DB asynchronously
CompletableFuture.runAsync(() -> processEvent(event), virtualExecutor);
}
Critical Techniques to Avoid Packet Loss
=> Use Virtual Threads (Java 21+) for massive concurrency with low memory.
=> Implement proper backpressure and bounded queues.
=> Use Dead Letter Queue (DLQ) for failed events.
=> Monitor consumer lag aggressively.
My Experience
While I worked on the Verizon Fiber Logic System at TCS, we handled large volumes of audit events using a similar pattern with Kafka and parallel processing. For true 50k+ EPS IoT scale, I would extend this with Netty-based ingestion, high-partition Kafka topics, and Virtual Threads for processing — ensuring zero packet loss even during traffic spikes
State Management: "If your application communicates via persistent device connections (like MQTT or WebSockets), how do you maintain state and distribute the connection load across instances?"
Key Techniques:
On connection establishment → Store device metadata + instance ID in Redis.
On message received → Check Redis for current state and process accordingly.
Use Redis Pub/Sub or Kafka to broadcast important events (device disconnected, status changed) to other instances.
Design Patterns in Production: "Explain where you have actively implemented structural or behavioral design patterns (e.g., Strategy or Factory patterns) to solve a complex business requirement in your past project."
"Yes, I have actively used several design patterns in production to solve real business problems. One of the clearest examples is from my Payment Wallet System."
Pattern Used: Strategy Pattern (Behavioral)
Business Problem: We had to support multiple payment gateways (Razorpay, PhonePe, Paytm, Stripe, etc.), each with different integration logic, request formats, response handling, and error codes. The business requirement was to make the system easily extensible whenever a new gateway is added, without changing core payment logic.
Solution: Strategy Pattern
I created a PaymentGatewayStrategy interface:
public interface PaymentGatewayStrategy {
PaymentResponse process(PaymentRequest request);
boolean supports(String gatewayCode);
String getGatewayName();
}Concrete Strategies:
- RazorpayGatewayStrategy
- PhonePeGatewayStrategy
- StripeGatewayStrategy
Context Class:
@Service
public class PaymentGatewayContext {
private final Map<String, PaymentGatewayStrategy> strategies;
public PaymentGatewayContext(List<PaymentGatewayStrategy> strategyList) {
this.strategies = strategyList.stream()
.collect(Collectors.toMap(PaymentGatewayStrategy::getGatewayName, s -> s));
}
public PaymentResponse executePayment(PaymentRequest request) {
PaymentGatewayStrategy strategy = strategies.get(request.getGatewayCode());
if (strategy == null) {
throw new UnsupportedGatewayException(request.getGatewayCode());
}
return strategy.process(request);
}
}Benefits Achieved:
- Open-Closed Principle → Adding a new gateway only requires a new strategy class.
- Clean separation of concerns.
- Easy to unit test each gateway independently.
Another Example: Factory Pattern (in TCS Verizon Project)
In the Verizon project, we had complex fiber route calculation logic with different algorithms based on site type (Urban, Rural, Metro). I used Factory Pattern to return the appropriate RouteCalculator strategy based on site category.
Closing Line (Senior Touch):
"I strongly believe design patterns should be used to solve real business complexity, not for the sake of using them. In both my projects, Strategy and Factory patterns helped me make the system more maintainable, testable, and extensible."
Mutex vs. Semaphore: "Explain the practical difference between a Mutex and a Semaphore. In what scenario would you use a CountDownLatch or a CyclicBarrier over standard locks?"
=> Mutex and Semaphore are both used for synchronization, but they solve different problems.
| Feature | Mutex | Semaphore |
|---|---|---|
| Purpose | Ensures only one thread can access a resource at a time | Controls access to a limited number of resources |
| Permits / Count | Always 1 | Can be N (counting semaphore) |
| Ownership | Must be released by the same thread that acquired it | Can be released by any thread |
| Use Case | Protecting critical sections (e.g., updating balance) | Limiting concurrent access (e.g., DB connection pool, rate limiting) |
=> Mutex = One key to a single room. Only one person can enter, and only that person can unlock it.
=> Semaphore = Multiple keys to a limited number of rooms. Up to N people can enter at the same time.
When to Use CountDownLatch or CyclicBarrier over Standard Locks
CountDownLatch Used when one or more threads need to wait for N other threads to finish their work. Not reusable — once the count reaches zero, it stays open.
Example (Payment Wallet): Before processing a high-value payment, I need to wait for 3 independent checks (Fraud, Balance, KYC) to complete.
CountDownLatch latch = new CountDownLatch(3);
// Start 3 threads for checks
latch.await(); // Main thread waits
CyclicBarrier Similar to CountDownLatch, but reusable. All threads wait for each other to reach a common barrier point, then all proceed together.
Example: In batch processing where multiple threads process different chunks of data and need to synchronize at checkpoints (e.g., after every 10,000 records).
Question: Write a thread-safe, bounded Custom In-Memory Cache implementation from scratch using ConcurrentHashMap and explicit locks, with LRU eviction.
import java.util.LinkedHashMap;
import java.util.Map;
import java.util.concurrent.locks.ReentrantLock;
public class LRUCache<K, V> {
private final int capacity;
private final Map<K, V> cache;
private final ReentrantLock lock = new ReentrantLock();
public LRUCache(int capacity) {
this.capacity = capacity;
// LinkedHashMap with accessOrder = true makes it LRU
this.cache = new LinkedHashMap<K, V>(capacity, 0.75f, true) {
@Override
protected boolean removeEldestEntry(Map.Entry<K, V> eldest) {
return size() > LRUCache.this.capacity; // Auto-evict LRU
}
};
}
public V get(K key) {
lock.lock();
try {
return cache.get(key);
} finally {
lock.unlock();
}
}
public void put(K key, V value) {
lock.lock();
try {
cache.put(key, value);
} finally {
lock.unlock();
}
}
public int size() {
return cache.size();
}
}
Usage Example
cache.put("A", 1);
cache.put("B", 2);
cache.put("C", 3);
cache.get("A"); // A becomes Most Recently Used
cache.put("D", 4); // This will evict B (LRU)
System.out.println(cache.size()); // Output: 3
"This is a classic distributed transaction problem. When we move to database-per-service, we cannot use traditional ACID transactions across services. The standard solution is the Saga Pattern."
Saga Pattern Overview
A Saga is a sequence of local transactions. If one step fails, we execute compensating transactions (rollback actions) for the previous successful steps.
There are two types:
- Orchestration Saga (Centralized) — I used this in my project.
- Choreography Saga (Event-Driven) — More decoupled.
How I Implemented It in My Payment Wallet System
For the P2P Money Transfer flow (Wallet Debit → Payment Processing → Notification):
Orchestrator Service (transaction-service) coordinates the Saga:
@Transactional
public void transferMoney(TransferRequest req) {
try {
// Step 1: Debit Wallet
walletClient.debit(req);
// Step 2: Process Payment
paymentGatewayClient.process(req);
// Step 3: Send Notification
notificationClient.send(req);
} catch (Exception ex) {
// Compensation Logic
compensateTransfer(req); // Refund, revert changes
throw ex;
}
}
private void compensateTransfer(TransferRequest req) {
walletClient.creditBack(req); // Compensating transaction
// Other rollback steps...
}Key Design Decisions
- Idempotency Keys — Every request has a unique key to prevent duplicate processing during retries.
- Compensation Logic — Each service exposes a compensating endpoint (e.g., creditBack).
- Resilience4j — Used Circuit Breaker + Retry on downstream calls.
- Distributed Locking — Used Redis to lock wallet during transfer to prevent race conditions.
Advantages I Got
- Maintained data consistency across services without distributed transactions.
- Easy to debug and monitor the full flow in one orchestrator.
- Graceful handling of partial failures.
In my TCS Verizon project, we faced similar challenges with multi-step batch processing. The Saga pattern helped me think about compensation logic even though we were not fully microservices at that time.
Closing Line:
"Saga Pattern with proper compensation and idempotency is the standard way to handle distributed transactions in microservices. I prefer Orchestration Saga for complex flows like payments because it gives better control and visibility."
CQRS & Data Syncing: "How do you implement the Command Query Responsibility Segregation (CQRS) pattern to separate high-frequency writes from heavy reads, and how do you handle asynchronous data sync between them using Kafka?"
"In my Payment Wallet System, I used CQRS to separate the write model (for money transfers) from the read model (for balance checks and transaction history) because the read load was much higher than writes."
CQRS Design in My Project
1. Write Side (Command Model)
All writes go through this model.
Strong consistency using @Transactional + Optimistic Locking (@Version).
Example: wallet-service and transaction-service handle all write operations.
2. Read Side (Query Model)
Optimized for high read throughput.
3. Asynchronous Data Sync using Kafka
Whenever a write happens, the write service publishes an event to Kafka. A separate consumer updates the read model.
Example from wallet-service (Write Side):
@Transactional
public void debit(DebitRequest req) {
Wallet wallet = walletRepository.findById(req.getWalletId());
wallet.debit(req.getAmount());
walletRepository.save(wallet);
// Publish event for read model sync
kafkaTemplate.send("wallet-events",
new WalletUpdatedEvent(wallet.getId(), wallet.getBalance(), wallet.getLastUpdated()));
}Read Model Updater (Consumer - Separate Component):
@KafkaListener(topics = "wallet-events")
public void updateReadModel(WalletUpdatedEvent event) {
// Update Redis cache
redisTemplate.opsForValue().set("wallet:" + event.getWalletId(), event.getBalance());
// Update Elasticsearch for history
elasticsearchService.updateWalletSummary(event);
}Benefits I Got
- Write side optimized for consistency.
- Read side optimized for speed (no joins, pre-aggregated data).
- Eventual consistency with very low lag.
- Independent scaling — I could scale read replicas separately from write instances.
In TCS Verizon project, I used a similar pattern for site data: write to Oracle and sync read model to Elasticsearch for fast reporting.
Strong Closing Line:
"CQRS with Kafka-based async syncing allows me to optimize read and write sides independently while maintaining eventual consistency. This pattern is extremely useful in high-traffic systems like payments or IoT event processing."
Do We Need Reentrant Locks in CQRS?
Short Answer: No, not usually in the read model.
Why?
- Write Model: Uses database-level locking (@Version + transactions) — no need for ReentrantLock in most cases.
- Read Model: Since it's denormalized and eventually consistent, we usually don't need locks. Redis and Elasticsearch are thread-safe for their operations.
- The only place I might use ReentrantLock is in very specific in-memory caches if I'm maintaining a local cache in the service (rare in CQRS).
In your Payment Wallet project:
- No ReentrantLock was needed because we used Redis for fast reads and Kafka for syncing.
- The separation itself removes the need for complex locking across services.
What is Elastic Search? Why we could not use Redis for the same purpose ?
Elasticsearch is a distributed search and analytics engine. It is designed for fast full-text search, complex queries, and real-time analytics on large amounts of data.
It is commonly used for:
- Search functionality (like searching transactions, products, users)
- Log analysis (ELK Stack)
- Analytics and reporting
Why we cannot use Redis for the same purpose?
Redis is excellent for caching and simple key-value lookups, but it is not designed for complex search and analytics like Elasticsearch.
Main Differences:
- Redis is best for fast, simple access (get by key, cache, sessions, leaderboards). Its search capability is very basic and limited.
- Elasticsearch is built for advanced search — full-text search, fuzzy search, filtering by multiple fields, aggregations, date range queries, etc.
In your Payment Wallet System:
- You can use Redis for fast balance checks (simple key-value).
- You would use Elasticsearch for searching transaction history with multiple filters (date range, amount, status, etc.).
Many systems use both: Redis for caching and Elasticsearch for search.
Monolith to Microservices Evolution: "As a senior engineer, what criteria or architectural guidelines do you use to draw clean domain boundaries and break a monolithic application down into microservices without causing tight coupling?"
"Breaking a monolith into microservices is not just about splitting code — it’s about identifying bounded contexts and ensuring loose coupling with high cohesion inside each service."
My Main Criteria for Defining Boundaries
I follow these guidelines:
- Business Domain & Bounded Context I use Domain-Driven Design (DDD) to identify clear business capabilities. Each microservice should represent a single bounded context with its own domain language.
- Single Responsibility A service should have one clear business responsibility. If a service does too many things, it becomes a distributed monolith.
- Data Ownership Each service should own its own database. I avoid shared databases as much as possible to prevent tight coupling.
- Change Frequency & Team Ownership Services that change together should stay together. I try to assign services to independent teams.
- Loose Coupling Services should communicate through events (Kafka) or well-defined APIs (REST/Feign) with clear contracts. I avoid direct database access between services.
Practical Example from My Payment Wallet System
When I designed the system, I broke it down based on these principles:
- user-service → User management and authentication
- wallet-service → Wallet balance and money management
- transaction-service → Money transfer and transaction history
- notification-service → Sending emails and SMS
Each service has its own database. They communicate through Kafka events and Feign Clients with proper resilience (Circuit Breaker + Retry). This kept the services loosely coupled while maintaining clear domain boundaries.
In the TCS Verizon project, although it was not fully microservices, I applied similar thinking by separating site audit logic from fiber route calculation logic.
How I Avoid Tight Coupling
- Use asynchronous events (Kafka) for inter-service communication whenever possible.
- Define clear API contracts and use API Gateway for routing.
- Implement Circuit Breaker and Fallbacks to prevent cascading failures.
- Avoid shared libraries for domain logic — each service should own its domain.
Final Thought:
"I always start with business capabilities and bounded contexts rather than technical layers. The goal is to create services that are independent, scalable, and owned by a single team."
Circuit Breaker Mechanics: "Service A calls Service B synchronously. If Service B encounters sudden latency spikes, how do you configure Resilience4j thresholds (Closed to Open states) to prevent thread starvation in Service A?"
"When Service A calls Service B synchronously, latency spikes in B can cause thread starvation in A because the calling threads get blocked waiting for responses. Resilience4j Circuit Breaker helps by failing fast when problems are detected."
How I Configure Resilience4j
I use a combination of Circuit Breaker + TimeLimiter + Retry with these thresholds:
resilience4j:
circuitbreaker:
instances:
serviceB:
failureRateThreshold: 50 # Open circuit after 50% failures
slowCallRateThreshold: 60 # Consider slow calls as failures
slowCallDurationThreshold: 2s # Calls taking >2s are slow
slidingWindowSize: 10 # Monitor last 10 calls
waitDurationInOpenState: 30s # How long circuit stays open
permittedNumberOfCallsInHalfOpenState: 3 # Test calls when recovering
timelimiter:
instances:
serviceB:
timeoutDuration: 3s # Fail fast after 3 seconds
retry:
instances:
serviceB:
maxAttempts: 3
waitDuration: 500msHow This Prevents Thread Starvation
- TimeLimiter ensures no thread waits more than 3 seconds.
- Circuit Breaker opens quickly when latency or failures are detected.
- When circuit is Open, calls fail immediately with fallback (no waiting).
- This frees up threads in Service A quickly.
Fallback Method Example:
@CircuitBreaker(name = "serviceB", fallbackMethod = "fallbackForB")
@TimeLimiter(name = "serviceB")
public String callServiceB(...) {
return restTemplate.getForObject(...);
}
public String fallbackForB(Exception ex) {
log.warn("Service B is slow or down. Returning fallback.");
return "Fallback response";
}Real Project Experience
In my Payment Wallet System, Service A (transaction-service) called Service B (wallet-service) synchronously. During peak load, wallet-service had latency spikes. I configured Resilience4j with the above settings. This prevented thread starvation in transaction-service and allowed us to return "PENDING" status to users while retrying in background.
Key Takeaway:
"The combination of TimeLimiter and Circuit Breaker is critical. TimeLimiter prevents long waits, and Circuit Breaker stops calling the slow service entirely after a threshold, protecting the calling service from thread exhaustion."
API Gateway Execution: "What cross-cutting concerns (like OAuth2/JWT token validation, dynamic routing, and token-bucket rate limiting) do you explicitly offload to an API Gateway instead of handling them inside individual microservices?"
"I always offload cross-cutting concerns to the API Gateway to keep microservices clean, focused, and loosely coupled. In my Payment Wallet System, the API Gateway (Spring Cloud Gateway) handled several important responsibilities."
Key Cross-Cutting Concerns I Offload to API Gateway
- OAuth2 / JWT Token Validation
- Centralized authentication and authorization.
- The gateway validates JWT signature, expiry, and issuer.
- It extracts userId, roles, and permissions and forwards them as headers (X-User-Id, X-Roles) to downstream services.
- Downstream services do not validate the token again — they only use the headers.
- Dynamic Routing
- The gateway routes requests to the correct microservice using Eureka service discovery.
- It supports path-based routing (e.g., /api/wallets/** → wallet-service).
- Rate Limiting (Token Bucket)
- Implemented token-bucket rate limiting at the gateway level using Resilience4j or Spring Cloud Gateway filters.
- This protects the entire system from abuse and sudden spikes without burdening individual services.
- Other Concerns
- CORS configuration
- Request logging and correlation ID propagation
- Basic security headers and SSL termination
Benefits of Offloading to API Gateway
- Microservices remain clean and focused on business logic.
- Changes in authentication or rate limiting logic only need to be done in one place.
- Better security — sensitive validation happens before reaching internal services.
- Easier monitoring and observability (all requests pass through one point).
In my Payment Wallet System, the API Gateway was responsible for JWT validation, rate limiting, and routing. This made the individual services (wallet-service, transaction-service, etc.) much simpler and focused only on their domain logic.
Strong Closing Line:
"I treat the API Gateway as the front door of the system. Offloading cross-cutting concerns there keeps microservices lightweight, secure, and maintainable."
Concurrent Balance Updates (Race Conditions): "Two users execute an API call to deduct from the exact same wallet or account balance at the exact same millisecond. How do you prevent a race condition at the application and database layers?"
"This is a very common race condition in payment systems. I prevent it using a combination of application-level and database-level controls."
Application Layer Prevention
I use Redis Distributed Locking to ensure only one thread can modify a wallet at a time.
Example Code:
@Autowired
private RedissonClient redisson;
public void debit(Long walletId, BigDecimal amount) {
RLock lock = redisson.getLock("wallet:" + walletId);
if (lock.tryLock(3, 10, TimeUnit.SECONDS)) { // Wait 3s, hold for 10s
try {
// Critical section
Wallet wallet = walletRepository.findById(walletId);
if (wallet.getBalance().compareTo(amount) < 0) {
throw new InsufficientBalanceException();
}
wallet.setBalance(wallet.getBalance().subtract(amount));
walletRepository.save(wallet);
} finally {
lock.unlock();
}
} else {
throw new WalletLockedException("Wallet is being updated by another request");
}
}Database Layer Prevention
I use Optimistic Locking with @Version column.
@Entity
public class Wallet {
@Id
private Long id;
private BigDecimal balance;
@Version
private Long version; // Hibernate manages this
}If two transactions try to update the same wallet, the second one will throw OptimisticLockException. I catch it and retry the operation.
Full Strategy I Used in My Payment Wallet System
- Redis Distributed Lock for coarse-grained protection.
- @Version Optimistic Locking for fine-grained database safety.
- Idempotency key to handle retries safely.
- Retry logic (Resilience4j) with exponential backoff.
This combination prevented race conditions even during high concurrent debit requests.
In TCS Verizon project, I faced similar issues during concurrent port assignments and used optimistic locking with retry logic.
Strong Closing Line:
"I combine distributed locking at the application level with optimistic locking at the database level. This gives both safety and good performance under high concurrency."
JPA/Hibernate Locking Trade-offs: "What are the strict performance and blocking trade-offs of implementing Optimistic Locking (@Version column) versus Pessimistic Locking (SELECT ... FOR UPDATE) under heavy database write load?"
"Optimistic and Pessimistic Locking are two strategies to handle concurrent updates. I prefer Optimistic Locking in most high-write scenarios, but both have clear trade-offs."
Optimistic Locking (@Version)
- Uses a version column that Hibernate automatically manages and increments on every update.
- No database lock is held during the transaction.
- Conflict is detected only at commit time (OptimisticLockException).
Performance & Blocking Trade-offs:
- High Performance — No blocking during read or processing.
- Low Contention — Multiple transactions can read and prepare updates simultaneously.
- Retry Overhead — On conflict, you must retry the entire transaction.
- Best For — High read, moderate write load.
Pessimistic Locking (SELECT ... FOR UPDATE)
- Acquires a database row lock immediately when reading the row.
- Other transactions trying to read or write the same row are blocked until the lock is released.
Performance & Blocking Trade-offs:
- Higher Blocking — Other transactions wait, which can cause thread starvation and deadlocks.
- Lower Throughput under heavy concurrent writes.
- Guaranteed Consistency — No conflicts at commit time.
- Best For — Low concurrency, high conflict probability scenarios.
My Practical Choice in Projects
In my Payment Wallet System (high concurrent writes on wallet balance):
- I used Optimistic Locking with @Version column.
- It gave much better scalability and lower blocking.
- I handled conflicts with retry logic (3 attempts with backoff).
In TCS Verizon project (concurrent port assignments):
- I initially tried Pessimistic Locking but faced frequent deadlocks and performance issues.
- Switched to Optimistic Locking + retry, which performed much better under load.
Final Recommendation:
"Under heavy write load, I strongly prefer Optimistic Locking with proper retry logic for better scalability and lower blocking. I use Pessimistic Locking only when business rules demand immediate strong consistency and conflict probability is very high."
What is Pessimistic Locking?
Pessimistic Locking is a strategy where you assume conflicts will happen, so you lock the database row as soon as you read it.
- You use SELECT ... FOR UPDATE (in SQL) or LockModeType.PESSIMISTIC_WRITE in JPA.
- Other transactions trying to read or update the same row are blocked until your transaction finishes.
- It guarantees strong consistency but can cause performance issues under high concurrency.
Real-life analogy: You lock the door while you are inside the room so no one else can enter.
What is Optimistic Locking?
Optimistic Locking is a strategy where you assume conflicts are rare, so you do not lock the row while reading.
- You add a @Version column in the entity.
- When you update the record, Hibernate checks if the version is still the same.
- If another transaction updated it first, Hibernate throws OptimisticLockException.
- You then retry the operation.
Real-life analogy: You don’t lock the door, but when you come back, you check if someone changed anything inside. If yes, you redo your work.
Quick Summary
- Pessimistic = Lock early, assume conflict (safe but slower).
- Optimistic = Lock late, assume no conflict (faster but needs retry logic).
Pessimistic Locking (Simple Explanation)
- When you read a record (e.g., wallet balance), you immediately lock the row in the database.
- No other transaction can read or write that same row until you finish your transaction.
- It is called "Pessimistic" because you are pessimistic (assume conflict will happen).
Example: You want to deduct 1000 from a wallet.
- You read the balance → Database locks the row.
- You check if balance is enough.
- You update the balance.
- You commit → Lock is released.
During steps 1 to 4, no one else can touch this wallet.
Optimistic Locking (Simple Explanation)
- You do not lock anything when you read.
- You just read the balance and note down the current version number.
- When you want to update, you send the version number back.
- If the version number is still the same, update is allowed.
- If the version number changed (someone else updated it), you get an error and retry.
Example: You want to deduct 1000 from a wallet.
- You read the balance (version = 5). No lock.
- You check if balance is enough.
- You try to update with version = 5.
- If version is still 5 → Success.
- If version is now 6 (someone else updated) → Error → Retry.
Simple Summary
- Pessimistic = Lock the row early (when reading) → Safe but slow under high load.
- Optimistic = No lock when reading → Fast, but you may have to retry if conflict happens.
In your Payment Wallet project, Optimistic Locking with @Version is usually better because it is faster and scales better.
Version Number IS Stored in the Database
When you use @Version for Optimistic Locking, the version number is stored as a column in your database table.
Example:
@Entity
public class Wallet {
@Id
private Long id;
private BigDecimal balance;
@Version
private Long version; // ← This column exists in the database
}The version column is a real column in your wallet table.
How it Works
- When you read a wallet, Hibernate also reads the current version (e.g., 5).
- When you update, you send the same version (5) back.
- Hibernate checks: "Is the version still 5 in the database?"
- If yes → Update happens and version becomes 6.
- If no (someone else updated it) → OptimisticLockException is thrown.
Simple Summary
- Version number is stored in the database.
- It is automatically managed by Hibernate.
- You don’t need to set it manually.
Object-Oriented Design (LLD Scenarios): "How do you apply GoF design patterns (like the Strategy Pattern for dynamic rule processing or the Chain of Responsibility for multi-step processing) to keep an enterprise component extensible?"
GoF - Gang of Four
"I heavily use GoF design patterns to keep my code extensible, maintainable, and open for future changes without modifying existing code. Two patterns I frequently apply are Strategy and Chain of Responsibility."
1. Strategy Pattern (For Dynamic Rule Processing)
When I use it:
- When I have multiple algorithms or rules that can be swapped at runtime.
Example from My Payment Wallet System:
I had different validation rules for payments (amount limit, daily limit, KYC status, fraud check, etc.).
Instead of writing if-else blocks, I used Strategy Pattern:
public interface PaymentValidationStrategy {
ValidationResult validate(PaymentRequest request);
}
@Component
public class AmountLimitStrategy implements PaymentValidationStrategy { ... }
@Component
public class FraudCheckStrategy implements PaymentValidationStrategy { ... }Context Class:
@Service
public class PaymentValidationService {
private final List<PaymentValidationStrategy> strategies;
public PaymentValidationService(List<PaymentValidationStrategy> strategies) {
this.strategies = strategies;
}
public ValidationResult validate(PaymentRequest request) {
for (PaymentValidationStrategy strategy : strategies) {
ValidationResult result = strategy.validate(request);
if (!result.isValid()) {
return result;
}
}
return ValidationResult.success();
}
}This made it very easy to add new validation rules without changing the main service.
2. Chain of Responsibility Pattern (For Multi-Step Processing)
When I use it:
- When I have a sequence of steps where each step can either process the request or pass it to the next handler.
Example from My Payment Wallet System:
The money transfer flow had multiple steps (validate, debit, call gateway, notify, log).
I used Chain of Responsibility:
public interface TransferHandler {
void handle(TransferRequest request);
void setNext(TransferHandler next);
}
@Component
public class DebitHandler implements TransferHandler { ... }
@Component
public class GatewayHandler implements TransferHandler { ... }This made the flow very flexible — I could easily add or remove steps.
Why I Use These Patterns:
- They follow Open-Closed Principle — open for extension, closed for modification.
- Makes the system easy to extend when new requirements come.
- Improves testability and maintainability.
In my TCS Verizon project, I used Strategy Pattern for different route calculation algorithms based on site type, which made the system easily extensible when new rules were added.
Strong Closing Line:
"I use Strategy for interchangeable algorithms and Chain of Responsibility for multi-step workflows. These patterns help me keep the system extensible without modifying existing code whenever new business rules are added."
Major Design Patterns
Here are the most important design patterns explained simply with examples from your Payment Wallet System project.
1. Strategy Pattern
- Purpose: To allow different algorithms or behaviors to be swapped easily.
- Example: You had different payment validation rules (amount limit, fraud check, daily limit, KYC). Instead of writing big if-else blocks, you created separate strategy classes: AmountLimitStrategy, FraudCheckStrategy, KycValidationStrategy. The main service just calls the strategies one by one.
2. Factory Pattern
- Purpose: To create objects without exposing the creation logic.
- Example: You had multiple payment gateways (Razorpay, PhonePe, Stripe). You created a PaymentGatewayFactory that returns the correct gateway object based on the gateway code passed by the user. The main payment service doesn't need to know how to create each gateway.
3. Builder Pattern
- Purpose: To create complex objects step by step.
- Example: When creating a PaymentRequest object, you had many optional fields (coupon code, metadata, notes, etc.). Instead of a constructor with 10 parameters, you used a Builder to set only the needed fields in a clean way.
4. Chain of Responsibility Pattern
- Purpose: To pass a request along a chain of handlers until one handles it.
- Example: The money transfer flow had multiple steps (validate user, check balance, call gateway, send notification). You created a chain of handlers where each step either processes the request or passes it to the next handler.
5. Observer Pattern
- Purpose: One object notifies multiple other objects when something changes.
- Example: When a payment is successful, the TransactionService notifies NotificationService and AnalyticsService automatically. The transaction service doesn't need to know who is listening — it just publishes an event.
6. Singleton Pattern
- Purpose: Ensure only one instance of a class exists.
- Example: The RedisService or IdempotencyService is created only once and shared across the application.
Debugging 100% CPU Spikes: "If a production application instance suddenly spikes to 100% CPU utilization, walk me through your exact step-by-step diagnostic workflow using Thread Dumps (jstack)."
"When a production Spring Boot application suddenly spikes to 100% CPU, I follow this exact step-by-step workflow using both JVisualVM and jstack."
Step-by-Step Diagnostic Workflow
- Confirm the Issue Use top -H -p <pid> or htop to confirm the Java process is consuming high CPU and note the PID.
- Take Thread Dumps using jstack (Command Line)
Run the following command multiple times (4–5 dumps with 8–10 seconds gap):Bash
jstack -l <pid> > thread_dump1.txt - Use JVisualVM for Visual Analysis
- Connect JVisualVM to the running process (local or remote via JMX).
- Go to the Threads tab.
- Take thread dumps directly from JVisualVM (it gives a visual view).
- Look for threads in RUNNABLE state consuming high CPU.
- Analyze the Dumps
- Look for the most frequent stack traces.
- Identify if many threads are stuck in the same method (e.g., bad regex, heavy computation, infinite loop).
- Check for GC thrashing or lock contention.
- Root Cause & Fix
- If it's a hot method → Optimize the code.
- If it's GC related → Tune JVM heap or GC algorithm.
- If it's a deadlock → Analyze lock contention.
Real Project Example
"In my Payment Wallet System, we had a sudden 100% CPU spike. Using jstack and JVisualVM, I found many threads stuck in a bad regex pattern while parsing user input. I optimized the regex and added caching, which brought CPU usage back to normal immediately."
Strong Closing Line:
"I use jstack for quick command-line dumps in production and JVisualVM for visual analysis. Taking multiple dumps and focusing on the most repeated stack traces helps me find the root cause quickly."
Memory Leak Profiling (OOM): "A container runs smoothly for 3 days and then crashes with a java.lang.OutOfMemoryError: Java heap space. How do you analyze an .hprof heap dump using Eclipse MAT to isolate the root cause?"
"When a container crashes with OutOfMemoryError: Java heap space after running for days, I use Eclipse Memory Analyzer Tool (MAT) to analyze the .hprof heap dump."
Step-by-Step Analysis Workflow Using Eclipse MAT
- Enable Heap Dump Generation Add JVM argument: -XX:+HeapDumpOnOutOfMemoryError -XX:HeapDumpPath=/app/dumps/
- Open the .hprof File in MAT Open the generated heap dump file in Eclipse MAT.
- Run Leak Suspect Report MAT automatically runs the "Leak Suspects Report". It highlights the biggest objects and possible memory leaks.
- Analyze Dominator Tree Look at the Dominator Tree to see which objects are retaining the most memory. Check for large collections (HashMap, ArrayList) that keep growing without eviction.
- Check Histogram View Look for classes with unusually high instance count (e.g., millions of String or byte[] objects).
- Common Root Causes I Look For
- Static collections holding references.
- ThreadLocal variables not cleaned up.
- Unclosed resources (Streams, Connections).
- Caching without proper eviction (unbounded HashMap).
Real Project Experience
"In my Payment Wallet System, we had an OOM issue after running for several days. Using Eclipse MAT on the heap dump, I found that a HashMap in the caching layer was not evicting old entries properly. I fixed it by adding size-based eviction using LinkedHashMap with removeEldestEntry() and switched to Caffeine cache for better LRU support."
Do you have exposure to any cloud platform ?
"Yes, although my previous organization (TCS) used on-premise physical servers for the Verizon project, I recently deployed my Payment Wallet System (the GitHub project) on AWS to gain hands-on cloud experience."
Step-by-Step Deployment to AWS (What You Can Say)
- Containerization I dockerized all microservices using multi-stage Dockerfiles and created a docker-compose.yml for local testing.
- Push Images to ECR I created repositories in Amazon ECR (Elastic Container Registry) and pushed all Docker images there.
- Deploy using ECS (Fargate)
- Created an ECS Cluster.
- Created Task Definitions for each service (user-service, wallet-service, transaction-service, notification-service, api-gateway, eureka-server).
- Created Services for each task definition with desired number of tasks.
- Used Fargate (serverless) so I didn’t have to manage EC2 instances.
- Networking & Load Balancing
- Used Application Load Balancer (ALB) in front of the API Gateway.
- Configured target groups and listeners for routing.
- Service Discovery
- Used AWS Cloud Map or kept Eureka Server running in ECS for service discovery.
- Database & Cache
- Used Amazon RDS for databases.
- Used Amazon ElastiCache (Redis) for caching and distributed locking.
- Monitoring
- Enabled CloudWatch for logs and metrics.
- Set up basic alarms for CPU and memory.
Simple Version You Can Say:
"For my Payment Wallet System, I first dockerized all microservices. Then I pushed the images to Amazon ECR. I deployed them on AWS ECS with Fargate for serverless containers. I used Application Load Balancer in front of the API Gateway for traffic distribution. For the database I used RDS and for caching I used ElastiCache Redis. I also enabled CloudWatch for monitoring."
How do you deploy your Spring Boot applications? Have you worked with cloud containers?
"I deploy Spring Boot applications using Docker containers and cloud platforms. Although in my previous organization (TCS) we deployed on client physical servers and VMs, I have hands-on experience with cloud containers through my personal projects."
My Deployment Process
- Containerization I create a Dockerfile for each microservice using multi-stage builds to keep the image size small and secure.
- Local Testing I use docker-compose.yml to run the entire system locally (Eureka, API Gateway, all microservices, Redis, etc.).
- Cloud Deployment (AWS)
- Push Docker images to Amazon ECR (Elastic Container Registry).
- Deploy using AWS ECS with Fargate (serverless containers).
- Use Application Load Balancer in front of the API Gateway for traffic distribution.
- Use RDS for databases and ElastiCache for Redis.
- Monitoring I use CloudWatch for logs and basic metrics.
In my Payment Wallet System project, I deployed all microservices (user-service, wallet-service, transaction-service, notification-service, api-gateway) on AWS ECS Fargate using Docker. This gave me practical experience with cloud containers, auto-scaling, and production deployment.
Although I haven’t worked with Azure or GCP yet, I am comfortable with AWS container services and can quickly learn other platforms.
In your previous company, how you managed databased passwords or API keys ?
In TCS, for the Verizon project, we were working on on-premise physical servers and VMs, so the approach was traditional.
we stored database passwords and API keys in the application.properties file.
How we did it:
We added the credentials directly in the properties file like this:
spring.datasource.url=jdbc:oracle:thin:@//host:port/service
spring.datasource.username=verizon_user
spring.datasource.password=actual_password_hereFor sensitive values, we used Jasypt encryption so that the password was not visible in plain text.
Example:
spring.datasource.password=ENC(encrypted_value_here)We placed this properties file in the deployed application folder on the WebLogic server. The client’s infrastructure team managed the final production values and access to the servers.
This was the standard way we handled secrets in the on-premise environment.
When you use Jasypt encryption in application.properties, the decryption happens automatically when the application starts on the physical server.
How Decryption Works:
You store the encrypted password in the properties file like this:
propertiesspring.datasource.password=ENC(encrypted_value_here)On the physical server, when you start the application, you pass the master key (decryption password) as a JVM argument:
Bashjava -Djasypt.encryptor.password=your_master_secret_key -jar your-app.jarAs the Spring Boot application starts, Jasypt automatically decrypts all values wrapped in ENC(...) using this master key.
After decryption, Spring uses the actual password to connect to the database.
Important Notes from TCS Practice:
- The master key (your_master_secret_key) was usually provided by the client’s infrastructure team.
- It was never stored in the code or properties file.
- It was passed through secure deployment scripts or environment variables on the WebLogic server.
How do you securely manage database passwords or API keys in a cloud environment?
"I never hardcode database passwords, API keys, or any sensitive information in the code or properties files. I use cloud-native secret management tools."
My Approach in Cloud Environment
- AWS Secrets Manager (My Preferred Choice)
- I store all secrets (DB credentials, Redis password, JWT secret key, third-party API keys) in AWS Secrets Manager.
- At application startup, the service fetches the secrets using the AWS SDK.
- How I Fetch Secrets in Spring Boot
@Value("${spring.datasource.password}")
private String dbPassword; // This value comes from AWS Secrets Manager
// Or using AWS SDK directly
String secret = secretsManagerClient.getSecretValue(
GetSecretValueRequest.builder()
.secretId("payment-db-credentials")
.build()
).secretString();- Environment Variables in ECS/Fargate
- In AWS ECS, I configure the task definition to inject secrets as environment variables from Secrets Manager.
Security Benefits:
- Secrets are encrypted at rest and in transit.
- No secrets in Git or Docker images.
- Easy secret rotation without redeploying the application.
Real Project Experience
"In my Payment Wallet System deployed on AWS, I stored database credentials and Redis auth tokens in AWS Secrets Manager. The ECS tasks fetched these secrets at runtime. This ensured no sensitive data was present in the code or Docker images."
In my previous organization (TCS), we deployed Spring Boot applications on on-premise physical servers and VMs managed by the client (Verizon)."
Deployment Process:
- We built the application using Maven (mvn clean package).
- The generated JAR file was deployed on WebLogic Server.
- Configuration (like database passwords) was done through WebLogic console or externalized properties files.
- The deployment was managed through Jenkins CI/CD pipeline for build and deployment.
- Monitoring was done using WebLogic console and client-provided tools.
It was a traditional on-premise setup without containers or cloud platforms.
Handling High Concurrency & Race Conditions: How do you ensure absolute data consistency when multiple concurrent user threads or systems attempt to update the exact same account balance or inventory row at the exact same millisecond?
"This is a classic race condition problem, especially in payment or inventory systems. I ensure data consistency using a combination of application-level and database-level techniques."
My Approach
Distributed Locking (Application Layer) I use Redis Distributed Lock (Redisson) to ensure only one thread can modify a particular record at a time.
Example:
JavaRLock lock = redisson.getLock("wallet:" + walletId); if (lock.tryLock(3, 10, TimeUnit.SECONDS)) { try { // Critical section: read, validate, update Wallet wallet = walletRepository.findById(walletId); wallet.debit(amount); walletRepository.save(wallet); } finally { lock.unlock(); } }Optimistic Locking (Database Layer) I use JPA’s @Version column. If two transactions try to update the same record, the second one throws OptimisticLockException, which I catch and retry.
Idempotency Every request carries a unique idempotencyKey. I check this key first to prevent duplicate processing.
Real Project Example
In my Payment Wallet System, for wallet debit/credit operations, I used Redis Distributed Lock + @Version Optimistic Locking. This combination successfully prevented race conditions even when multiple users tried to deduct from the same wallet at the exact same time.
In the TCS Verizon project, I faced similar issues during concurrent port assignments and used optimistic locking with retry logic.
Strong Closing Line:
"I combine distributed locking at the application level with optimistic locking at the database level. This gives both safety and good performance under high concurrency."
API Idempotency: If an automated client service or payment gateway retries a failed REST API call, how do you enforce strict idempotency at the architectural layer to prevent duplicate transaction processing?
"Idempotency is critical in payment systems to prevent duplicate transactions when retries happen due to network issues or timeouts. I enforce strict idempotency at the architectural level using a combination of unique keys and database checks."
How I Implement Idempotency
- Unique Idempotency Key Every request from the client must carry a unique idempotencyKey (e.g., clientRequestId + userId + timestamp or a UUID generated by the client).
- Check Before Processing In the service layer, I first check if a request with this key has already been processed.
- Store Processing Status I store the request status in a dedicated table (idempotency_records) with columns: idempotency_key, status, response, created_at.
Example Code:
@Transactional
public PaymentResponse processPayment(PaymentRequest req) {
// Step 1: Check if already processed
Optional<IdempotencyRecord> existing = idempotencyRepo.findByKey(req.getIdempotencyKey());
if (existing.isPresent()) {
return buildResponseFromExisting(existing.get());
}
// Step 2: Process the payment
PaymentRecord payment = executePayment(req);
// Step 3: Save idempotency record
saveIdempotencyRecord(req.getIdempotencyKey(), "SUCCESS", payment);
return buildSuccessResponse(payment);
}Real Project Example
In my Payment Wallet System, I implemented idempotency for the money transfer endpoint. Even if the payment gateway or client retried the same request due to network issues, the system processed it only once because of the idempotency key check.
This prevented duplicate debits and credits.
Strong Closing Line:
"Idempotency is enforced by making every critical API request identifiable with a unique key and checking its status before processing. This ensures exactly-once processing even with retries."
Microservices Resilience: In a distributed environment, if a downstream microservice experiences high load or sudden latency spikes, how do you prevent cascading failures and thread starvation in your calling service?
"In a distributed microservices environment, a slow or overloaded downstream service can easily cause cascading failures and thread starvation in the calling service. I prevent this using Resilience4j with a combination of Circuit Breaker, TimeLimiter, and Retry."
How I Handle It
- TimeLimiter I set a timeout on the downstream call so that threads don’t wait indefinitely.
- Circuit Breaker If the downstream service has high failure rate or latency, the circuit opens and fails fast, returning a fallback response immediately.
- Retry with Backoff For transient errors, I retry with exponential backoff and jitter.
Example from My Payment Wallet System:
@CircuitBreaker(name = "walletService", fallbackMethod = "walletFallback")
@Retry(name = "walletService", maxAttempts = 3)
@TimeLimiter(name = "walletService")
public WalletResponse debitWallet(TransferRequest req) {
return walletClient.debit(req);
}
public WalletResponse walletFallback(TransferRequest req, Exception ex) {
log.warn("Wallet service is slow or down. Returning fallback.", ex);
return new WalletResponse("PENDING", "Will be processed shortly");
}Configuration (application.yml):
resilience4j:
circuitbreaker:
instances:
walletService:
failureRateThreshold: 50
slowCallRateThreshold: 60
slowCallDurationThreshold: 2s
waitDurationInOpenState: 30s
timelimiter:
instances:
walletService:
timeoutDuration: 3sReal Project Experience
In my Payment Wallet System, the transaction-service calls wallet-service synchronously. During peak hours, wallet-service had latency spikes. With the above configuration, the circuit breaker opened quickly, threads in transaction-service were not starved, and we returned "PENDING" status to the user while retrying in the background.
This prevented cascading failures and improved overall system stability.
Strong Closing Line:
"The combination of TimeLimiter and Circuit Breaker is critical. TimeLimiter prevents long waits, and Circuit Breaker stops calling the slow service entirely after a threshold, protecting the calling service from thread exhaustion."
Distributed Transactions (Saga Pattern): When you enforce a Database-per-Service architecture, how do you manage transactions that span multiple services? What are the trade-offs between Orchestration and Choreography when designing compensating rollback actions?
"When we move to a Database-per-Service architecture, we cannot use traditional ACID transactions across services. The standard solution is the Saga Pattern — a sequence of local transactions where each step has a compensating action if something fails later."
How I Manage Distributed Transactions
In my Payment Wallet System, for the money transfer flow (Wallet Debit → Payment Processing → Notification), I used Saga Pattern with compensation logic.
- Each service performs its local transaction.
- If any step fails, the orchestrator or other services trigger compensating actions (e.g., credit back the wallet if payment gateway fails).
Trade-offs Between Orchestration and Choreography
Orchestration Saga (Centralized):
- One dedicated service (Saga Orchestrator) coordinates all steps.
- It calls each service and manages rollback if any step fails.
Advantages:
- Easy to understand and debug the full flow.
- Centralized compensation logic.
- Good for complex flows with conditions.
Disadvantages:
- Orchestrator can become a bottleneck.
- Tighter coupling to the orchestrator.
Choreography Saga (Decentralized / Event-Driven):
- No central orchestrator.
- Services communicate through events. Each service listens to events and reacts.
Advantages:
- Better decoupling and scalability.
- Services are independent.
Disadvantages:
- Harder to track the full flow.
- Compensation logic is scattered across services.
- Debugging is more complex.
My Choice in Projects
In my Payment Wallet System, I used Orchestration Saga for the core money transfer flow because reliability and traceability were more important. The orchestrator managed the entire flow and compensation.
For non-critical flows like notifications, I used Choreography.
Strong Closing Line:
"I choose Orchestration Saga when business logic is complex and traceability is important. I choose Choreography when I want maximum decoupling and independent scaling of services."
CQRS Implementation: How do you implement the Command Query Responsibility Segregation (CQRS) pattern to separate high-frequency writes from heavy reads, and how do you handle asynchronous data sync between them?
"In my Payment Wallet System, I used CQRS to separate the write model (for money transfers) from the read model (for balance checks and transaction history) because the read load was much higher than writes."
How I Implemented CQRS
1. Write Model (Command Side)
- Used JPA Entities (Wallet, Transaction) with MySQL database.
- All writes go through this model.
- Strong consistency using @Transactional + Optimistic Locking (@Version).
2. Read Model (Query Side)
- Used a denormalized read model stored in Redis (for fast balance check) and Elasticsearch (for transaction history and search).
- This model is optimized for fast reads (no joins, pre-aggregated data).
3. Asynchronous Data Sync
- Whenever a write happens, the write service publishes an event to Kafka.
- A separate Read Model Updater (Kafka Consumer) listens to these events and updates the read model.
Example from wallet-service (Write Side):
@Transactional
public void debit(DebitRequest req) {
Wallet wallet = walletRepository.findById(req.getWalletId());
wallet.debit(req.getAmount());
walletRepository.save(wallet);
// Publish event for read model sync
kafkaTemplate.send("wallet-events",
new WalletUpdatedEvent(wallet.getId(), wallet.getBalance()));
}Read Model Updater (Consumer):
@KafkaListener(topics = "wallet-events")
public void updateReadModel(WalletUpdatedEvent event) {
// Update Redis cache
redisTemplate.opsForValue().set("wallet:" + event.getWalletId(), event.getBalance());
// Update Elasticsearch for history
elasticsearchService.updateWalletSummary(event);
}Benefits I Achieved
- Write side optimized for consistency.
- Read side optimized for speed (no joins, pre-aggregated data).
- Eventual consistency with very low lag.
- Independent scaling — I could scale read replicas separately from write instances.
Strong Closing Line:
"CQRS with Kafka-based async syncing allows me to optimize read and write sides independently while maintaining eventual consistency. This pattern is extremely useful in high-traffic systems like payments."
Data Redundancy vs. Coupling: How do you determine the boundaries of microservices using Domain-Driven Design (DDD) to prevent creating a "distributed monolith"? When is storing duplicate local data acceptable over making synchronous REST calls?
Question: Data Redundancy vs. Coupling: How do you determine the boundaries of microservices using Domain-Driven Design (DDD) to prevent creating a "distributed monolith"? When is storing duplicate local data acceptable over making synchronous REST calls?
Recommended Answer:
"Determining microservice boundaries is one of the most important decisions in microservices architecture. I use Domain-Driven Design (DDD) to define clear Bounded Contexts and avoid creating a distributed monolith."
How I Use DDD for Boundaries
I look for:
- Business Capabilities Each microservice should represent a clear business capability with its own ubiquitous language.
- Data Ownership Each service owns its own database. Shared databases create tight coupling.
- Change Frequency Services that change together should stay together.
- Team Ownership Ideally, one team should own one or two related services.
When I Accept Data Duplication (Over Synchronous Calls)
I prefer duplication when:
- The data is read-heavy and needs low latency.
- The source data changes slowly.
- Strong consistency is not required (eventual consistency is acceptable).
Example from My Payment Wallet System:
- wallet-service stores basic user info (name, email) locally for fast balance display.
- When user details change in user-service, it publishes an event.
- wallet-service updates its local copy asynchronously.
This avoids synchronous calls for every balance check, improving performance.
I only make synchronous calls when strong consistency is required (e.g., during money transfer).
Strong Closing Line:
"I use DDD to define bounded contexts and accept controlled data duplication for read performance. The goal is to keep services independent while maintaining business consistency through events."
Zero-Trust Cloud Security: How do your cloud-deployed Java applications securely communicate with cloud databases and external third-party systems without hardcoding credentials, access keys, or passwords in your code repository?
"I follow Zero-Trust principles — never hardcode any credentials, access keys, or passwords in the code or repository. I use cloud-native secret management tools and fetch them at runtime."
My Approach
- AWS Secrets Manager All sensitive information (database credentials, Redis password, third-party API keys, JWT secret) is stored in AWS Secrets Manager.
- Runtime Fetching The application fetches secrets at startup using the AWS SDK.
- IAM Roles The application uses IAM roles with least privilege instead of hardcoded access keys.
Example in Spring Boot:
String dbPassword = secretsManagerClient.getSecretValue(
GetSecretValueRequest.builder()
.secretId("payment-db-credentials")
.build()
).secretString();In my Payment Wallet System deployed on AWS, I stored all secrets in AWS Secrets Manager and fetched them at runtime in ECS tasks. This ensured no sensitive data was present in the code or Docker images.
Strong Closing Line:
"I never commit secrets to Git. I use cloud secret managers and IAM roles to fetch credentials at runtime. This approach is secure, scalable, and follows Zero-Trust security."
Cloud Secret Management Throttling: If a Spring Boot microservice calls an external cloud secret manager on every single API request, what infrastructure bottleneck occurs, and how do you optimize it?
"Calling the cloud secret manager on every single API request is a serious anti-pattern. It leads to API throttling, increased latency, higher costs, and can become a single point of failure."
The Bottleneck
- Cloud secret managers (AWS Secrets Manager, Azure Key Vault, etc.) have strict rate limits.
- High-frequency calls will cause throttling errors (TooManyRequests).
- Every request adds extra network hop and latency.
- It wastes resources and can overload the secret manager service.
How I Optimize It
I fetch secrets once at application startup and cache them in memory.
Example using Spring Cloud AWS:
@Configuration
public class SecretsConfig {
@Bean
public String dbPassword(SecretsManagerClient client) {
return client.getSecretValue(
GetSecretValueRequest.builder()
.secretId("payment-db-credentials")
.build()
).secretString();
}
}I also use @RefreshScope or a scheduled task to refresh secrets periodically (e.g., every 30 minutes) without restarting the application.
Real Project Experience
In my Payment Wallet System, I fetched all secrets (database credentials, Redis password, JWT key) once at startup using AWS Secrets Manager. This completely avoided throttling and kept API latency low.
Strong Closing Line:
"Secrets should be fetched once at startup and cached. Calling the secret manager on every request is an anti-pattern that causes throttling and performance issues."
Distributed Tracing & Log Aggregation: How do you track, correlate, and debug a single end-to-end user request as it traverses across multiple microservices and network boundaries in a cloud ecosystem?
"I use Distributed Tracing and Log Aggregation to track a single user request across multiple microservices."
My Approach
- Distributed Tracing (OpenTelemetry + Zipkin)
- I instrument all services with OpenTelemetry Java agent.
- A unique traceId and spanId is generated at the entry point (API Gateway).
- Every downstream call (REST, Feign, Kafka) propagates the trace context automatically.
- Log Correlation
- All logs include the traceId in the log message.
- I use ELK Stack (Elasticsearch + Logstash + Kibana) or Loki + Grafana to search logs by traceId.
- Visualization
- In Zipkin or Jaeger, I can see the full call chain with latency breakdown per service.
- Debugging Workflow
- Get the failing traceId from the user or logs.
- Search in Zipkin to see the full flow.
- Search logs in ELK using the same traceId to find errors.
Real Project Experience
In my Payment Wallet System, I used OpenTelemetry + Zipkin for distributed tracing. When a payment request failed, I could trace the exact service and step where it failed, even across 5+ microservices.
This made debugging intermittent issues much faster.
Strong Closing Line:
"Distributed tracing with OpenTelemetry and correlated logs in ELK/Loki allows me to see the complete picture of a request across all services, making debugging in complex microservices environments much more efficient."
Container Sizing vs. JVM Memory: When deploying a Spring Boot application on a cloud container service, how do you configure the JVM Heap size properties relative to the cloud host's memory limit to prevent automated OS process termination (like an OOMKilled exit)?
"This is a very common cause of sudden container crashes in cloud environments. The key is to configure JVM heap size to leave enough memory for the operating system and other processes."
My Configuration Strategy
Leave Headroom for OS Never set JVM heap size to the full container memory limit. I usually set JVM heap to 60-75% of the container memory limit.
Recommended JVM Flags
Bash-XX:MaxRAMPercentage=70 # Use 70% of container memory for heap -XX:InitialRAMPercentage=50 -XX:MinRAMPercentage=50 -XX:+UseContainerSupport # Important for containersExample for a 2GB container:
- Heap size will be around 1.4GB (70%).
Real Project Experience
In my Payment Wallet System deployed on AWS ECS Fargate, I set MaxRAMPercentage=70 for all services. This prevented OOMKilled crashes even during traffic spikes, as the OS had enough memory for other processes.
Strong Closing Line:
"Always leave 25-40% of container memory for the OS and native memory. Using MaxRAMPercentage is the modern and recommended way to configure JVM in containers."
Serverless vs. Container Workloads: What are the performance, cold-start latency, and cost trade-offs of deploying business logic inside Serverless Functions versus long-running containerized services?
"Both Serverless (e.g., AWS Lambda) and Containerized services (e.g., ECS Fargate, EKS) have different strengths. I choose based on workload characteristics."
Trade-offs
Serverless Functions (AWS Lambda):
- Performance: Excellent for sporadic or bursty workloads.
- Cold-Start Latency: Noticeable delay (100ms to few seconds) on first invocation.
- Cost: Pay-per-execution — very cheap for low traffic, expensive at very high sustained load.
- Best For: Event-driven, infrequent tasks (notifications, scheduled jobs).
Long-Running Containerized Services:
- Performance: Consistent, low latency after warm-up.
- Cold-Start Latency: None (always running).
- Cost: Fixed cost based on allocated resources — better for steady high traffic.
- Best For: High-frequency APIs, real-time processing, stateful services.
My Decision Framework
In my Payment Wallet System:
- I used long-running containers (AWS ECS Fargate) for core services like payment processing and wallet management because they needed consistent low latency.
- I used Serverless (Lambda) for non-critical tasks like sending notifications and analytics.
Strong Closing Line:
"I use Serverless for sporadic, event-driven workloads and containers for high-frequency, latency-sensitive services. The choice depends on traffic pattern, latency requirements, and cost optimization."
High Availability & DR Failover: How do you architect a multi-region deployment for a Java backend application to ensure zero downtime and handle database replication lag during a disaster recovery (DR) failover?
"To achieve high availability and zero downtime, I design a multi-region active-passive architecture with proper failover mechanisms."
My Architecture Approach
- Multi-Region Deployment Deploy the application in two regions (e.g., Mumbai and Singapore). Use Route 53 with health checks and failover routing.
- Load Balancing Use Global Accelerator or Route 53 latency-based routing to direct traffic to the nearest healthy region.
- Database Replication Use Amazon RDS Cross-Region Read Replicas or Aurora Global Database. For failover, promote the replica to primary in the secondary region.
- Handling Database Replication Lag Monitor replication lag using CloudWatch. Use Application-level eventual consistency for non-critical reads. For critical writes, route to the primary region until failover is complete.
- Zero Downtime Strategy Blue-Green deployment or Canary releases. Health checks at load balancer level. Graceful shutdown of old instances.
Real Project Experience
In my Payment Wallet System, I designed a multi-region setup using AWS Route 53 and Aurora Global Database. During a simulated DR drill, I promoted the secondary region with minimal lag (under 1 second) and switched traffic using Route 53 failover routing.
Strong Closing Line:
"The key is combining Route 53 for traffic routing, Aurora Global Database for low-lag replication, and application-level resilience to achieve near-zero downtime during failover."
Sprint Estimation and Blockers: Walk through how you evaluate complexity during sprint planning. What do you do if you realize mid-sprint that a critical technical component cannot be delivered on time?
"I use a structured approach for estimation and handle blockers proactively to protect sprint commitments."
How I Evaluate Complexity During Sprint Planning
- Story Breakdown I break user stories into technical tasks and identify unknowns.
- Complexity Factors
- Technical complexity (new technology, integration points)
- Domain complexity
- Dependency on other teams
- Testing effort
- Performance / Security requirements
- Estimation Technique I use Planning Poker with Fibonacci points. I consider worst-case scenarios and add buffer for unknowns.
What I Do if I Realize Mid-Sprint a Critical Component Cannot Be Delivered on Time
- Immediate Transparency Raise it in the daily standup and inform the Product Owner immediately.
- Impact Analysis Assess the impact on sprint goal and dependent stories.
- Mitigation Options
- Reprioritize stories.
- Split the story into smaller deliverable parts.
- Bring in additional help or pair programming.
- Negotiate scope with PO.
- Lessons Learned Document the root cause in the retrospective to improve future estimations.
Real Project Experience
In my Payment Wallet System project, during one sprint, I realized a complex integration with an external payment gateway would take longer than estimated. I informed the PO early, split the story, delivered the core functionality, and moved the integration to the next sprint. This helped us meet the sprint goal and maintain trust.
Strong Closing Line:
"I believe in early transparency and proactive mitigation. Good estimation comes with experience, and handling blockers well is what keeps the team delivering reliably."
Managing Technical Friction: As a Tech Lead, how do you handle a scenario where two senior developers on your team are completely deadlocked on a major architectural or low-level design approach?
"As a Tech Lead, I treat such deadlocks as an opportunity to improve the design rather than a conflict to resolve quickly."
My Step-by-Step Approach
- Listen to Both Sides I schedule a focused meeting and let both developers present their approaches with pros and cons.
- Focus on Business Goals I bring the discussion back to the business requirements, timeline, and non-functional needs (scalability, maintainability, performance).
- Data-Driven Decision I encourage quick PoCs (Proof of Concepts) or spike stories to compare both approaches objectively.
- Decision Making If no clear winner, I make the final call based on long-term maintainability and team consensus, explaining my reasoning clearly.
- Post-Decision I ensure the chosen approach is documented and the team moves forward with full support.
Real Project Experience
In my Payment Wallet System project, two senior developers had a deadlock on whether to use Saga Orchestration or Choreography for the money transfer flow. I facilitated a session where both presented their views. We did a small PoC for both, and finally chose Orchestration for better traceability. Both developers felt heard and supported the final decision.
Strong Closing Line:
"I believe healthy technical disagreements lead to better solutions. My role as Tech Lead is to facilitate open discussion, use data, and make timely decisions while keeping the team motivated."
Handling Changing Requirements: How do you manage technical delivery when a Product Owner or Client Business Analyst requests major changes to a feature requirement in the middle of an active sprint?
"Changing requirements mid-sprint is common in agile environments. I handle it by protecting the team while maintaining a good relationship with the Product Owner."
My Step-by-Step Approach
- Acknowledge and Understand I listen carefully to the change request and clarify the business impact.
- Impact Analysis I quickly evaluate the technical impact on the current sprint (effort, dependencies, testing).
- Discuss with PO I present the impact honestly — what can be delivered in the current sprint and what needs to move to the next sprint.
- Decision Making If the change is critical, I negotiate scope reduction or move some stories to the next sprint. I never let major changes derail the sprint goal without PO agreement.
- Documentation I update the story in Jira with the new requirements and note the change for retrospective.
Real Project Experience
In my Payment Wallet System project, the Product Owner once requested a major change in the transfer flow mid-sprint. I showed the impact on timeline and suggested splitting the story. We delivered the core functionality in the current sprint and moved the new requirement to the next sprint. This kept the team focused and maintained trust with the PO.
Strong Closing Line:
"I protect the sprint goal while being flexible. Clear communication and impact analysis help me balance business needs with technical delivery."
Balancing Technical Debt with Feature Delivery: How do you convince non-technical business stakeholders or project managers to allocate sprint capacity for code refactoring and technical debt reduction?
"I treat technical debt as a business risk and communicate it in business terms rather than technical jargon."
My Approach
- Speak in Business Language
I explain technical debt using business impact:
- Slower feature delivery
- Higher bug rates
- Increased maintenance cost
- Risk of production outages
- Quantify the Cost
I show data:
- "Last sprint, we spent 40% of time working around legacy code."
- "Refactoring this module will reduce future development time by 30%."
- Show Business Value
I link refactoring to:
- Faster time-to-market for new features
- Better user experience
- Reduced operational risk
- Propose Incremental Approach I suggest allocating 10-20% of sprint capacity for technical debt every sprint instead of big-bang refactoring.
Real Project Experience
In my Payment Wallet System project, I convinced the Product Owner to allocate 15% of sprint capacity for refactoring the transaction service. I showed how the current code was slowing down new feature delivery. After two sprints, we reduced bug count and improved development speed significantly.
Strong Closing Line:
"I always frame technical debt as a business investment that pays off in faster delivery and lower risk. Regular small investments in refactoring keep the codebase healthy without impacting feature delivery."
Mentoring and Code Standards: How do you ensure that junior or mid-level developers on your team strictly adhere to the defined code quality metrics and architectural standards without micromanaging them?
"I believe in building a culture of quality rather than enforcing rules through micromanagement."
My Approach
- Clear Standards I define clear, documented code standards and architectural guidelines (using tools like SonarQube, Checkstyle, and architecture decision records).
- Automated Enforcement I set up CI/CD pipeline with automated checks (SonarQube quality gates, linting, architecture tests) so feedback is immediate and objective.
- Mentoring through Pairing and Reviews I encourage pair programming for critical modules and do constructive code reviews focusing on learning rather than criticism.
- Regular Knowledge Sharing I conduct weekly tech huddles and code walkthrough sessions to explain why certain standards exist.
- Ownership Culture I give developers ownership of modules and make them responsible for maintaining quality in their areas.
Real Project Experience
In my Payment Wallet System project, I set up SonarQube quality gates and automated architecture tests in the CI pipeline. I also conducted regular code review sessions where we discussed best practices. This helped junior developers internalize the standards without me having to micromanage them.
Strong Closing Line:
"I focus on automation, mentoring, and building ownership. When developers understand the 'why' behind the standards, they naturally follow them."
Catching Technical Debt in PR Reviews: As a Technical Lead conducting peer reviews on a Pull Request (PR), what specific structural flaws, anti-patterns, or architectural liabilities cause you to reject the code?
"As a Technical Lead, I review PRs not just for correctness but for long-term maintainability. I reject or request major changes when I see the following issues:"
Key Reasons I Reject or Request Changes
- Violation of Clean Architecture / Layering Mixing concerns (e.g., controller doing business logic or data access).
- Tight Coupling Direct dependency on concrete implementations instead of interfaces. Hardcoded dependencies.
- Duplicated Code Repeated logic that should be extracted into a shared utility or service.
- Lack of Proper Error Handling Swallowing exceptions or returning generic error messages.
- Missing Tests No unit or integration tests for critical logic.
- Performance Issues N+1 queries, inefficient loops, or missing indexing hints.
- Security Concerns Missing input validation or exposing sensitive data.
- Not Following Team Standards Inconsistent naming, missing documentation, or not using established patterns.
Real Project Experience
In my Payment Wallet System project, I rejected several PRs where developers mixed business logic in controllers or created tight coupling with external services. I used these reviews as mentoring opportunities to explain the long-term impact.
Strong Closing Line:
"I reject PRs when I see issues that will create technical debt or maintenance nightmares in the future. My goal is to keep the codebase clean and scalable."
Enterprise Cross-Cutting Concerns: How do you design a reusable corporate-wide reference template or custom Spring Boot Starter to enforce standardized global exception handling and structured JSON logging across teams?
"To enforce consistency across teams, I create a custom Spring Boot Starter that provides reusable components for cross-cutting concerns like global exception handling and structured logging."
My Design Approach
- Custom Spring Boot Starter I create a separate Maven module called company-spring-boot-starter.
- Global Exception Handling
- A @ControllerAdvice class with standardized error response format.
- Custom exceptions mapped to proper HTTP status codes and error codes.
- Structured JSON Logging
- A custom Logback configuration with structured JSON output.
- Include correlation ID, trace ID, and service name in every log.
- Auto-Configuration The starter uses @AutoConfiguration to automatically register these components when the starter is included in any microservice.
Real Project Experience
In my Payment Wallet System, I created a custom starter that provided standardized exception handling and JSON logging. All microservices included this starter, ensuring consistent error responses and log format across services.
Strong Closing Line:
"A custom Spring Boot Starter is the best way to enforce enterprise-wide standards while keeping individual services clean and focused on business logic."
Object-Oriented Design (LLD Scenarios): How do you design real-world systems (like a Movie Ticket Booking System, an ATM, or a Parking Lot) using clean class structures, SOLID principles, and appropriate Gang of Four (GoF) design patterns?
"I approach Low-Level Design by focusing on SOLID principles and GoF patterns to create clean, extensible, and maintainable systems."
General Approach
- Identify Core Entities and Relationships
- Apply SOLID Principles
- Single Responsibility, Open-Closed, Liskov Substitution, Interface Segregation, Dependency Inversion.
- Use Appropriate GoF Patterns
- Strategy, Factory, Observer, Decorator, etc.
Example: Movie Ticket Booking System
Key Classes:
- Movie, Theater, Show, Seat, Booking, User
Patterns Used:
- Strategy Pattern for different payment methods.
- Factory Pattern for creating different ticket types.
- Observer Pattern for sending booking confirmation notifications.
SOLID Application:
- Each class has single responsibility.
- Payment strategies are interchangeable.
API Contract Versioning: How do you manage API contract changes and versioning in production without breaking backward compatibility for existing downstream enterprise clients?
"Managing API versioning is critical when you have enterprise clients that cannot upgrade immediately. I follow a structured approach to maintain backward compatibility."
My Versioning Strategy
- URL Versioning I use URL-based versioning (e.g., /api/v1/users, /api/v2/users).
- Backward Compatibility Rules
- Never remove or change existing fields in responses.
- Add new fields with default values.
- Make new request fields optional.
- Deprecation Policy
- Mark old versions as deprecated with proper headers.
- Give clients 3–6 months notice before deprecating old versions.
- API Gateway Role
- The API Gateway routes requests to the correct versioned service.
Real Project Experience
In my Payment Wallet System, I started with /api/v1/transactions/transfer. When I added new features (coupon support), I created /api/v2/transactions/transfer. Old clients continued using v1 without any change.
Strong Closing Line:
"I always prioritize backward compatibility by using versioning, adding new fields instead of changing existing ones, and maintaining proper deprecation policies. This ensures existing clients are not broken when new features are added."
Dependency and Core Framework Upgrades: When a critical security vulnerability (like a Log4j or Spring framework exploit) is flagged, how do you plan and execute a mass dependency upgrade across multiple decoupled repositories?
"When a critical security vulnerability is flagged, I treat it as a high-priority security incident and follow a structured upgrade process."
My Step-by-Step Approach
- Impact Assessment
- Identify all repositories and services using the vulnerable dependency.
- Create a Centralized Upgrade Plan
- Prioritize services based on exposure and business criticality.
- Create a shared upgrade branch or ticket for tracking.
- Test in Isolation
- Upgrade in a development environment first.
- Run full regression tests and performance tests.
- Phased Rollout
- Start with non-critical services.
- Use canary releases for critical services.
- Monitor closely after deployment.
- Automated Tools
- Use Dependabot or Renovate for automated PRs.
- Run security scans regularly with tools like OWASP Dependency Check.
Real Project Experience
In my Payment Wallet System, when a Spring Security vulnerability was flagged, I coordinated upgrades across all microservices. I created a shared upgrade plan, tested thoroughly, and rolled out changes in phases. This ensured zero downtime and quick resolution.
Strong Closing Line:
"I treat security vulnerabilities as high-priority incidents and use a structured, phased approach with thorough testing to ensure safe mass upgrades across repositories."
IAM Roles & Secrets Manager (Security): How does your Spring Boot application securely fetch database passwords or API keys from AWS Secrets Manager at runtime without hardcoding them in application.yml or properties files?
"I never hardcode any sensitive information in application.yml or properties files. I fetch them dynamically at runtime from AWS Secrets Manager using IAM Roles."
My Approach
- IAM Role The ECS task or EC2 instance is assigned an IAM Role with permission to read from Secrets Manager (least privilege).
- Fetch at Runtime I use AWS SDK to fetch secrets during application startup.
- Spring Boot Integration
@Configuration
public class SecretsConfig {
private final SecretsManagerClient secretsManagerClient;
public SecretsConfig() {
this.secretsManagerClient = SecretsManagerClient.create();
}
@Bean
public String dbPassword() {
return secretsManagerClient.getSecretValue(
GetSecretValueRequest.builder()
.secretId("payment-db-credentials")
.build()
).secretString();
}
}In application.yml, I only keep non-sensitive config.
Real Project Experience
In my Payment Wallet System deployed on AWS ECS, I used AWS Secrets Manager to store database credentials and Redis password. The application fetched them at startup using the IAM role attached to the ECS task. This ensured no secrets were present in the code or Docker images.
Strong Closing Line:
"This approach follows Zero-Trust security — secrets are never in code, fetched at runtime, and access is controlled via IAM roles."
EC2 vs. Lambda (Compute Trade-offs): What are the architectural, performance, and "Cold Start" latency trade-offs of deploying a heavy Java Spring Boot microservice inside an AWS Lambda (Serverless) function versus running it on a continuous AWS EC2 instance?
"Both AWS Lambda and EC2 have different strengths. I choose based on workload characteristics."
Trade-offs
AWS Lambda (Serverless):
- Architectural: Event-driven, auto-scaling, no server management.
- Performance: Good for sporadic workloads. Can scale to thousands of concurrent executions.
- Cold Start Latency: Significant cold start (1-5 seconds for Java Spring Boot) because the JVM needs to initialize.
- Cost: Pay-per-execution — very cheap for low traffic, expensive for sustained high load.
- Best For: Infrequent or bursty workloads (notifications, scheduled jobs, API for low traffic).
AWS EC2 (Continuous Instance):
- Architectural: Traditional, always running, full control over the server.
- Performance: Consistent low latency after startup. No cold start.
- Cold Start Latency: None (always warm).
- Cost: Fixed cost based on instance size — better for steady high traffic.
- Best For: High-frequency APIs, real-time processing, stateful services, heavy Spring Boot applications.
My Decision Framework
In my Payment Wallet System, I used EC2 / ECS Fargate for core services (payment processing, wallet management) because they needed consistent low latency. I used Lambda for non-critical tasks like sending notifications.
Strong Closing Line:
"I use Lambda for sporadic, event-driven workloads and EC2/Fargate for high-frequency, latency-sensitive services. The choice depends on traffic pattern, latency requirements, and cost optimization."
ElastiCache for Redis (Caching Pattern): How do you design a high-performance, distributed cache-aside architecture using AWS ElastiCache for Redis to offload read-heavy traffic from your primary database, and how do you handle cache expiration (TTL)?
"I use a Cache-Aside Pattern with AWS ElastiCache for Redis to offload read-heavy traffic from the primary database."
Cache-Aside Architecture
- Read Flow
- Application first checks Redis cache for the data.
- If found (cache hit) → Return immediately.
- If not found (cache miss) → Query the database, store in Redis with TTL, and return.
- Write Flow
- Update the database first.
- Then invalidate or update the corresponding cache entry.
Example Code:
public WalletResponse getWalletBalance(Long walletId) {
String cacheKey = "wallet:" + walletId;
// Check cache
String cached = redisTemplate.opsForValue().get(cacheKey);
if (cached != null) {
return objectMapper.readValue(cached, WalletResponse.class);
}
// Cache miss - query DB
Wallet wallet = walletRepository.findById(walletId);
WalletResponse response = WalletResponse.fromEntity(wallet);
// Store in cache with TTL
redisTemplate.opsForValue().set(cacheKey, objectMapper.writeValueAsString(response), 5, TimeUnit.MINUTES);
return response;
}Cache Expiration (TTL)
- I set a reasonable TTL (e.g., 5-10 minutes) based on data volatility.
- For critical data like wallet balance, I use shorter TTL + active invalidation on write.
Real Project Experience
In my Payment Wallet System, I used Redis (ElastiCache) for caching wallet balances and transaction summaries. This reduced database load significantly during peak hours.
Strong Closing Line:
"Cache-Aside with Redis + proper TTL and invalidation strategy helps me achieve high read performance while keeping the primary database load low."
RDS Connection Pools & Alerting (Database Performance): If your application encounters connection timeouts or high latency under peak transactional load, how do you cross-verify AWS CloudWatch metrics for RDS against your application’s internal HikariCP connection pool settings?
"When I see connection timeouts or high latency under peak load, I cross-verify AWS CloudWatch metrics for RDS with my application’s HikariCP settings to find the root cause."
My Diagnostic Approach
- Check CloudWatch Metrics for RDS
- DatabaseConnections — Number of active connections.
- FreeableMemory and CPUUtilization.
- ReadLatency and WriteLatency.
- ConnectionAttempts and FailedConnections.
- Check Application HikariCP Metrics
- hikaricp.connections.active
- hikaricp.connections.max
- hikaricp.connections.usage
- hikaricp.connections.creation
- Cross-Verification
- If RDS shows high DatabaseConnections close to max, but HikariCP shows low active connections → Connection leak in application.
- If HikariCP shows high active connections near max → Increase HikariCP pool size or optimize queries.
- Common Fixes
- Increase HikariCP maximum-pool-size.
- Optimize slow queries.
- Add connection timeout and validation query in HikariCP.
Real Project Experience
In my Payment Wallet System, during peak hours, we faced connection timeouts. I checked CloudWatch and saw RDS connections were high. I then checked HikariCP metrics and found the pool was exhausted. I increased the pool size and added better query optimization, which resolved the issue.
Strong Closing Line:
"I always cross-verify CloudWatch RDS metrics with application-level HikariCP metrics to identify whether the bottleneck is at the database or application connection pool."
SQS Messaging & Idempotency (Event-Driven Architecture): If a Java service processes a message from an AWS SQS (Simple Queue Service) queue but crashes before deleting it, how do you configure SQS Visibility Timeouts and enforce application-layer Idempotency to handle the duplicate message retry safely?
"This is a classic problem in event-driven systems. I handle it using SQS Visibility Timeout and application-level idempotency."
How I Handle It
- SQS Visibility Timeout
- When a message is received, SQS hides it from other consumers for a specific time (Visibility Timeout).
- I set it to a value higher than the expected processing time (e.g., 5-10 minutes).
- If the consumer crashes before deleting the message, it becomes visible again after the timeout and will be retried.
- Application-Level Idempotency
- Every message must carry a unique messageId or idempotencyKey.
- The service checks if the message has already been processed before executing business logic.
Example Code:
@SqsListener("payment-events-queue")
public void processPaymentEvent(PaymentEvent event) {
// Step 1: Check idempotency
if (idempotencyService.isAlreadyProcessed(event.getMessageId())) {
log.info("Message already processed. Skipping.");
return;
}
try {
// Business logic
processPayment(event);
// Mark as processed
idempotencyService.markAsProcessed(event.getMessageId());
// Delete message from SQS (if using manual deletion)
// sqsTemplate.deleteMessage(...);
} catch (Exception ex) {
log.error("Failed to process message", ex);
// Do not delete - SQS will retry after visibility timeout
}
}Real Project Experience
In my Payment Wallet System, I used SQS for event-driven flows. I set a visibility timeout of 10 minutes and enforced idempotency using Redis. This ensured that even if a consumer crashed, the message was safely retried without duplicate processing.
Strong Closing Line:
"The combination of SQS Visibility Timeout and application-level idempotency ensures safe retry without duplicate processing."