project-payment-wallet-system-deep-dive

Interview Readiness - payment-wallet-system 🔗

Day	Focus Area	Duration	What We Will Do
1	Project Overview + Storytelling	1 – 1.5 hrs	- Strong 2–3 min project introduction - High-level architecture explanation (with accurate diagram) - Tech stack justification - Your role & overall narrative
2	Core Technical Deep Dive	1.5 – 2 hrs	- Database design & entities - Microservices split & responsibilities - Key APIs & Feign Client usage - Add Money, Balance, Withdraw flows - P2P Transfer flow (Saga with synchronous calls)
3	Saga Pattern, Concurrency & Transactions	1.5 – 2 hrs	- Detailed P2P Transfer Saga (step-by-step with compensation) - Redis Distributed Locking implementation - Idempotency & Consistency handling - Failure scenarios & rollback logic
4	Security, Resilience & Scalability	1.5 hrs	- JWT Authentication at API Gateway - Resilience4j (Circuit Breaker, Retry, Rate Limiting) - Concurrency & Performance considerations - Docker setup & deployment
5	Challenges, Improvements + Full Mock Interview	1.5 – 2 hrs	- Major challenges faced & STAR stories - Future enhancements (honest & realistic) - Common + tricky follow-up questions - Full mock interview + feedback

--------------------------------------------------------------------------------------------------------

Payment Wallet System - Deep Dive (Most Important)

You must be ready to explain this project for 10-15 minutes.

Architecture Overview:

Multi-module Microservices architecture (User Service, Wallet Service, Transaction Service, Notification Service)
Used Spring Boot 3 + Spring Cloud
Inter-service communication via OpenFeign
Service Discovery using Eureka
API Gateway for routing and rate limiting

Key Features & Your Contribution:

Saga Pattern (Orchestration based) for distributed P2P money transfers
Redis Distributed Locking to prevent double spending / concurrency issues
Resilience4j → Circuit Breaker, Retry, Bulkhead, Fallback
JWT Authentication + API Gateway security
Wallet creation, balance enquiry, transaction history

Technical Decisions & Trade-offs:

Why Saga Pattern instead of 2PC?
Why Redis for locking?
How did you handle partial failures?
Database choice (MySQL) and why not NoSQL?

Challenges Faced:

Handling distributed transactions
Concurrency during money transfer
Performance & scalability

--------------------------------------------------------------------------------------------------------

What does SpringApplication.run() do in Spring Boot?

=> SpringApplication.run() is the main entry point of a Spring Boot application. It is responsible for bootstrapping (starting) the entire Spring Boot application.

public static void main(String[] args) {
SpringApplication.run(MyApplication.class, args); // ← This line
}

What happens when SpringApplication.run() is called?

1. Creates Spring ApplicationContext

=> This is the heart of Spring — the Inversion of Control (IoC) container that manages all beans.

2. Enables Auto-Configuration

=> Automatically configures beans based on the libraries (starters) present in your classpath.

=> Example: If spring-boot-starter-web is added → it auto-configures Tomcat server, DispatcherServlet, etc.

3. Component Scanning

=> Scans your project packages for annotations like @Component, @Service, @Repository, @Controller, @Configuration, etc.

4. Starts the Embedded Server

=> For web applications, it starts an embedded Tomcat (or Jetty/Undertow) server on port 8080 by default.

5. Initializes and Wires All Beans

=> Creates all managed beans and injects dependencies (@Autowired)

6. Runs ApplicationRunners / CommandLineRunners

=> Executes any custom startup logic you have.

7. Keeps the Application Running

=> Keeps the JVM alive so your application doesn’t shut down immediately.

Summary

pringApplication.run() = "Start the complete Spring Boot application with all its magic (auto-configuration, embedded server, dependency injection, etc.)"

How you are encoding and saving the user password. What password encoder are you using in this project ?

=> In this Payment Wallet System, user passwords are securely hashed before saving to the database using Spring Security's PasswordEncoder

=> I am using BCryptPasswordEncoder (the most commonly recommended encoder in Spring Boot applications).

=> The raw password coming from the request is encoded using passwordEncoder.encode(rawPassword)

=> The encoded (hashed) password is then saved in the User entity in the database.

=> During login, we use passwordEncoder.matches(rawPassword, encodedPassword) for verification

Why BCrypt ?

=> It is a strong, one-way hashing algorithm.

=> Spring Security recommends it over older encoders like NoOpPasswordEncoder (which is only for testing).

Why did you override doFilterInternal() in JwtAuthenticationFilter?

=> I extended OncePerRequestFilter and overrode the doFilterInternal() method because this is the standard and recommended way in Spring Security to create a custom filter for JWT authentication

=> OncePerRequestFilter ensures that the filter is executed only once per request, even if there are multiple filter chains.

Overriding doFilterInternal() allows me to write custom logic to:

=> Extract the JWT token from the incoming request.

=> Validate the token.

=> Set the authenticated user in the SecurityContext so that later parts of the application (controllers, services) can access the current user via @AuthenticationPrincipal or SecurityContextHolder.

=> This filter runs before the request reaches the controllers, enabling centralized authentication at the API Gateway level.

Why "Authorization" Header and "Bearer " Prefix?

=> I am using the Authorization header with the Bearer prefix because this is the industry standard for sending JWT tokens

=> Authorization Header: This is the conventional HTTP header used for passing authentication credentials.

=> Bearer Prefix: The word "Bearer" indicates that the token is a Bearer Token (an opaque string that grants access to the bearer). The actual JWT comes after it, separated by a space.

=> Example :

Authorization: Bearer eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.xxxxx.yyyyy

In the code, I check,

=> If the header exists and starts with "Bearer "

=> Then I extract the token by removing the "Bearer " prefix.

=> This approach is widely used (PhonePe, Razorpay, most modern APIs) and makes the API clean and secure.

What is the duty of SecurityFilterChain?

=> SecurityFilterChain is like the security blueprint of the entire application.

It tells Spring Security:

=> Which URLs need protection?

=> Which filter should run?

=> How authentication and authorization should work?

Can you elaborate about Sessions in general? and This project following which session creation approach?

Sessions in General (Simple Explanation)

HTTP Session is a mechanism to store user-specific data on the server side across multiple requests.

Why do we need Sessions?

HTTP is stateless — each request is independent. The server doesn’t remember anything about the previous request.
After a user logs in, we need to remember that the user is authenticated for subsequent requests.
Sessions solve this by creating a session ID (usually stored in a cookie called JSESSIONID).

Traditional Session Flow:

User logs in → Server creates a session and stores user details (userId, roles, etc.) in memory / database / Redis.
Server sends back JSESSIONID cookie to the client.
Client sends this cookie in every future request.
Server looks up the session using the ID and knows who the user is.

Problems with Traditional Sessions in Microservices:

Sticky sessions (user must always go to same server instance).
High memory usage on server.
Difficult to scale horizontally.
Not suitable for mobile apps or distributed systems.

Which Session Creation Approach is this Project Following?

=> In this Payment Wallet System, we are not using traditional HTTP Sessions.
Instead, we are following a Stateless approach using JWT (JSON Web Tokens)

=> In SecurityConfig.java, I have explicitly set:

http.sessionManagement(session ->
session.sessionCreationPolicy(SessionCreationPolicy.STATELESS)
);

Meaning of STATELESS:

=> The server does not create or maintain any session on the server side

=> After successful login, the server returns a JWT token to the client.

=> The client (Postman, mobile app, etc.) sends this JWT token in the Authorization: Bearer <token> header for every subsequent request

=> The JwtAuthenticationFilter validates the token and sets the user in SecurityContext for that request only.

=> This approach is much better for microservices architecture because it is scalable, works well with API Gateway, and doesn’t require session replication across services.

Can you explain the user-service endpoint login flow in high level?

=> Client (Postman / REST Client) sends a POST request to /api/users/login with required requestBody, header.

=> Request reaches the API Gateway first

=> As per SecurityConfig, since this endpoint login is public, JWTFilter is skipped (i.e no authentication required)

=> Request is routed to User Service

=> User Controller endpoint login receives the request and calls Service impl method userService.generateToken(loginRequestDto)

=> User Service Impl - validates the user, if validation is success, return the token to the controller.

=> User Controller - Provides token as the response body to the client

Key points

=> Login is stateless (JWT-based, no server-side session).

=> Password is never stored in plain text (BCrypt hashing).

=> Centralized JWT generation and validation.

=> Login endpoint is public (permitted in SecurityFilterChain).

What is the benefit of Centralized JWT Authentication at API Gateway vs Individual Service-Level JWT Auth?

Endpoint Login flow :

=> Login request reaches the API Gateway first.
=> Since login is a public endpoint, the Gateway allows it without JWT validation.
=> Request is routed to User Service.
=> User Service validates the email/phone and password.
=> If credentials are correct, User Service generates a new JWT token.
=> User Service returns the JWT token to the client.

For All Other Protected Requests (after login):

=> Client sends the JWT token in Authorization: Bearer <token> header.
=> Request first reaches the API Gateway.
=> SecurityConfig - JwtAuthenticationFilter in the Gateway validates the JWT token.
=> After successful validation, the Gateway extracts user details (userId, username, etc.) and adds them to the request headers (e.g., X-User-Id, X-Username).
=> Gateway then routes the request to the respective service (Wallet, Transaction, etc.).
=> The downstream services (do not validate the JWT again). They simply extract the user details from the headers and proceed with business logic.

Benefit

=> The biggest advantage is that JWT validation logic is written and executed only once (at the API Gateway). All other services are relieved from repeating the same validation code.

Why did you create CustomUserDetailsService and override loadUserByUsername()?

=> Spring Security uses the UserDetailsService interface internally during authentication (especially for login).

=> The default implementation doesn't know about my custom User entity (which has fields like email, phone, password, status, etc.).

=> By creating CustomUserDetailsService, I tell Spring Security how to load user details from my database when someone tries to log in

=> That's why I created the class CustomUserDetailsService overrode the method loadUserByUsername

JpaRepository and CRUD Repository are often confusing, whats the relation between them, what are some common methods of CRUD Repository that need not to be declared in the interface that extends JpaRepository?

Relationship Between CrudRepository and JpaRepository

=> CRUD Repository is the superior interface

=> CrudRepository is the base interface in Spring Data JPA. It provides basic CRUD (Create, Read, Update, Delete) operations.

=> JpaRepository is a more powerful interface that extends CrudRepository (via PagingAndSortingRepository).

Hierarchy :

CrudRepository
↓
PagingAndSortingRepository
↓
JpaRepository

=> When we extend JpaRepository, we automatically get all methods from CrudRepository + extra methods (like pagination, sorting, flushing, etc.).

Common Functions from CrudRepository (We Don’t Need to Declare)

Method	Purpose
`save(S entity)`	Save or Update entity
`saveAll(Iterable<S> entities)`	Save multiple entities
`findById(ID id)`	Find by primary key
`existsById(ID id)`	Check if entity exists
`findAll()`	Get all records
`findAllById(Iterable<ID> ids)`	Find multiple by IDs
`count()`	Count total records
`deleteById(ID id)`	Delete by ID
`delete(T entity)`	Delete single entity
`deleteAll(Iterable<T> entities)`	Delete multiple
`deleteAll()`	Delete all records

What's the job of JwtUtil ? and why you used jwt secret key?

Job of JwtUtil:

=> Generate JWT Token — During login, it creates a signed JWT token containing user details (userId, username, roles, expiration time).

=> Validate Token — Checks whether the token is valid, not expired, and properly signed.

=> Extract User Information — Pulls out claims like userId, username, etc., from the token so the application knows who the current user is.

Why JWT Secret Key?

=> The secret key is used to digitally sign the JWT token.

=> It ensures the token cannot be tampered with or forged by anyone.

=> The same secret key is used during validation to verify the token’s authenticity.

Summary

=> JwtUtil is responsible for creating, validating, and reading JWT tokens using a secret key for security

Why did you use Request DTO & Response DTO instead of Entity directly?

Separation of Concerns

=> Entity is designed for database persistence (JPA annotations, relationships, internal fields), while DTOs are designed for API communication. This keeps the layers clean.

Security

=> Entities often contain sensitive fields (like password, internal flags, or database-specific fields).

=> Using DTOs helps me hide sensitive data from the client. For example, I never send the password in any response.

Flexibility & API Evolution

=> API requirements frequently change (adding new fields, changing structure, combining data from multiple entities).

=> DTOs allow me to shape the data exactly as the client needs without modifying the Entity.

=> It protects the internal database model from breaking changes.

Better Control & Maintainability

=> I can apply different validations on DTOs using @Valid

=> I can use ModelMapper or manual mapping to convert between Entity and DTO.

=> It reduces tight coupling between the Controller and the Database layer.

Summary

=> Using DTOs helps me follow the Separation of Concerns principle, improves security by hiding internal fields, and gives flexibility to evolve the API without touching the database entities.

Just explain some major design patterns in general. You may point out example from this project

1. Singleton Pattern (Creational)

=> Purpose: Ensures a class has only one instance and provides global access to it.

=> In this project : JwtUtil, PasswordEncoder, ModelMapper etc. are managed as Singleton beans by Spring

=> Spring Boot automatically registers classes annotated with @Component, @Service, @RestController, @Controller, @Configuration, @Bean (by default) etc., as Singleton beans by default.
This is Spring Container’s way of implementing the Singleton Design Pattern. (You can change it to other scopes (prototype, request, session, etc.) if needed using @Scope annotation.)

=> Benefit : Saves memory and ensures consistent behavior.

2. Builder Pattern (Creational)

=> Purpose: Allows creating complex objects step by step.

=> In this project : I have not used the Builder Pattern explicitly with @Builder annotation.
I have primarily used Lombok’s @Data annotation on DTOs and Entities for generating getters, setters, and other utility methods

=> While Builder Pattern is useful for complex object construction, I kept the DTOs simple in this project as the request/response objects are relatively straightforward.

=> Benefit : Clean and readable object creation

3. Factory Pattern (Creational)

=> Purpose: Creates objects without exposing the creation logic.

=> The methods annotated with @Bean in @Configuration classes are Spring’s way of implementing the Factory Pattern

=> In this project : For example, in ModelMapperConfig, the modelMapper() method acts as a factory method that creates and returns a ModelMapper instance

Aspect	Explanation
Factory Pattern	The `@Bean` method itself is acting as a Factory. It is responsible for creating the object (`new ModelMapper()`).
Singleton Pattern	By default, Spring creates and manages the bean returned by `@Bean` method as a Singleton. Only one instance is created and shared throughout the application.

@Bean
public ModelMapper modelMapper(){
return new ModelMapper();
}

=> The method modelMapper() is acting as a Factory Method — it is responsible for creating and returning a new ModelMapper object.

=> This is the essence of the Factory Pattern: hiding the creation logic and providing a method to get the object.

=> However, because it is a @Bean method inside a @Configuration class, Spring manages the returned object as a Singleton by default.

=> Summary : This is an example of the Factory Pattern, because the @Bean method is responsible for creating and returning the ModelMapper object.
At the same time, Spring registers it as a Singleton bean by default. This default singleton behavior comes from Spring’s ApplicationContext, though the @Configuration annotation helps in defining the bean properly.

4. Strategy Pattern (Behavioral)

=> Purpose: Defines a family of algorithms, encapsulates each one, and makes them interchangeable.

=> Example in project: The way different authentication/validation strategies are handled in JwtAuthenticationFilter.

5. Template Method Pattern (Behavioral)

=> Purpose: Defines the skeleton of an algorithm in a base class, letting subclasses override specific steps.

=> Where used: JwtAuthenticationFilter extends OncePerRequestFilter. The parent class defines the filter flow, and you override doFilterInternal().

6. Adapter Pattern (Structural)

=> Purpose: Allows incompatible interfaces to work together

=> Where used: Feign Clients act as adapters between your services and external/internal REST calls.

7. Facade Pattern (Structural)

=> Purpose: Provides a simplified interface to a complex subsystem.

=> Where used: TransactionService acts as a facade for the entire P2P transfer Saga (hiding debit, credit, compensation, notification complexity).

8. Observer Pattern (Behavioral)

=> When to use? When one object needs to notify other objects about changes.

=> Common example: Parking Lot notifying when it's full.

=> I have not used this pattern in this project

Project payment-wallet-system introduction

=> One of my key projects is the Payment Wallet System, a distributed microservices-based digital wallet application built with Spring Boot and Spring Cloud
=> I designed and developed this system to handle essential wallet operations such as user registration, adding money, balance inquiry, and P2P money transfers
=> The architecture includes five main microservices: User Service, Wallet Service, Transaction Service, Notification Service, along with an API Gateway and Eureka Server for service discovery
=> For the critical P2P money transfer feature, I implemented the Saga Pattern using synchronous Feign Client calls between services, along with compensation logic to handle failures gracefully
=> To manage concurrency and prevent issues like double-spending, I used Redis Distributed Locking in the Wallet Service
=> Security is managed centrally through JWT authentication at the API Gateway level. I also incorporated Resilience4j for implementing Circuit Breaker, Retry, and Rate Limiting patterns.

=> The entire application is fully Dockerized using Docker Compose.

=> This project strengthened my expertise in building reliable distributed systems, managing cross-service transactions, and applying production-grade resilience and security practices

High-Level Architecture Explanation

=> API Gateway: Single entry point for all client requests. Handles JWT authentication and rate limiting centrally.

=> Service Discovery: Eureka Server allows services to register and discover each other dynamically.

=> Inter-service Communication: Synchronous calls using OpenFeign clients.

=> Concurrency Control: Redis is used for distributed locks, especially during money transfers.

=> Resilience: Resilience4j is configured on Feign clients for fault tolerance.

=> Deployment: All services are containerized and managed via Docker Compose.

Tech stack justification

=> Spring Boot 3 + Spring Cloud: Excellent for building maintainable microservices with built-in support for cloud patterns.

=> OpenFeign: Clean, declarative way to make inter-service REST calls.

=> Redis: Best suited for fast, distributed locking to ensure data consistency during concurrent transfers.

=> Resilience4j: Lightweight and native integration with Spring Boot for implementing Circuit Breaker, Retry, etc.

=> JWT at Gateway: Centralized security avoids repeating auth logic in every service.

Database Design & Entities

=> The project uses H2 in-memory database (one per service)

Main Entities

User Service:
User entity (id, username, password, email, fullName, userStatus, etc.)

Wallet Service:
Wallet entity (id, userId, balance, currency, status, etc)

Transaction Service:
Transaction entity (id, transactionId, fromWalletId, toWalletId, amount, status, transactionDate, etc)

NOTE

=> Each service has its own database (Database-per-Service pattern).

=> Relationships are maintained via userId references (no direct foreign keys across services).

Microservices Split & Responsibilities

Service	Port	Responsibility
User Service	8081	User registration, login, profile management
Wallet Service	8082	Wallet creation, add money, balance check, debit/credit operations
Transaction Service	8083	P2P transfer initiation, transaction recording, status management
Notification Service	8084	Sending notifications to users (For now, showing in console for demo purpose
API Gateway	8080	Routing, JWT Authentication, Rate Limiting
Eureka Server	8761	Service registration & discovery

NOTE

=> When we call throw API Gateway, JWT authentication will be through API Gateway (Centrailized auth)

=> When we call throw individual service swagger, then JWT authentication will be in the individual services level

Key APIs & Feign Client Usage

Important Endpoints

=> POST /api/users/register
=> POST /api/users/login
=> POST /api/wallets/me/add-money
=> POST /api/transactions/me/transfer

Feign Clients Used

=> Transaction Service calls Wallet Service (for debit & credit)

=> Transaction Service calls Notification Service

=> Wallet Service may call User Service for validation

Core Flows

User Service

=> Client -> API Gateway -> User Service

=> Register the user (POST /api/users/register)

=> Generate the login token for the registered user (POST /api/users/login)

Wallet Service

=> Client -> API Gateway -> Wallet Service

=> With the login token in the header, add money to the wallet (POST /api/wallets/me/add-money)

Transfer Service

=> Client -> API Gateway -> Transaction Service

=> With the login token in the header, transfer money to any other wallet (POST /api/transactions/me/transfer)

=> On transaction success/failure, will call the Notification Service

=> If any steps failed, compensation (refund) will be triggered as per synchronous saga pattern

Explain your project technically

=> In this Payment Wallet System, I followed the Database per Service pattern using H2. Each microservice owns its data

=> The core flow for P2P transfer uses Saga Pattern implemented synchronously via OpenFeign clients

=> When a transfer is initiated in Transaction Service, it sequentially calls Wallet Service to debit the sender, then credit the receiver, with compensation logic to reverse the debit if credit fails

=> I used Redis Distributed Lock in Wallet Service to ensure thread-safety during debit/credit operations

=> Resilience4j is configured on Feign clients to handle service failures gracefully

How are the services communicating with each other?

=> In this Payment Wallet System, services primarily communicate with each other synchronously using OpenFeign Clients

=> API Gateway acts as the single entry point for all external client requests. It routes the requests to the appropriate backend services after performing JWT authentication and rate limiting.

=> For inter-service communication, I have defined Feign Client interfaces in the services that need to call other services. For example:

=> Transaction Service has Feign Clients to call Wallet Service (for debit and credit operations) and Notification Service

=> Services discover each other dynamically using Eureka Service Discovery

=> All inter-service calls are RESTful (HTTP) via Feign, which provides a clean, declarative way to make these calls

=> I have also configured Resilience4j on these Feign Clients to handle failures gracefully with Circuit Breaker, Retry, and Fallback mechanisms

=> This synchronous approach gives better control over the transaction flow, especially for the Saga Pattern implementation in P2P transfers.

Why Synchronous (Feign)?

=> It was simpler to implement and debug for this project.

=> It also gives immediate feedback on success/failure, which is important for financial transactions

Service Discovery ?

=> I used service discovery because every service registers itself with Eureka Server, so Feign Clients don’t need hard-coded URLs.

No Asynchronous Communication ?

=> Currently, there is no Kafka or message queue. All communication is request-response style

What is Kafka and What is the use of Kafka?

What is Kafka?

=> Apache Kafka is a distributed streaming platform (also called a message broker or event streaming system)

=> It is designed to handle high volumes of real-time data streams reliably and scalably.
=> It works on a Publish-Subscribe model:

Producers publish messages (events) to topics.
Consumers subscribe to those topics and process the messages asynchronously.

What is the use of Kafka?

=> Asynchronous Communication — Services communicate without waiting for each other.
=> Event-Driven Architecture — Different services can react to events independently.
=> High Throughput — Can handle millions of messages per second.
=> Decoupling Services — Producer and Consumer services don’t need to know about each other.
=> Real-time Data Processing — Used for notifications, logging, analytics, etc.
=> Fault Tolerance & Durability — Messages are persisted and can be replayed if needed.

Common Kafka use Cases in Fintech

=> Sending notifications
=> Processing payment events
=> Logging transactions
=> Updating multiple services when a transfer happens (e.g., Wallet + Transaction + Analytics)

What are the pros and cons of using Feign vs Kafka

Feign (Synchronous HTTP Communication):

Pros:

Simpler to implement and debug — request-response model makes the flow easy to trace.
Immediate feedback on success or failure, which is very important for financial transactions like money transfers.
Strong consistency during the Saga flow (I can decide the next step based on the response).
Easier to maintain for smaller to medium-scale systems.
No additional infrastructure needed (no message broker).

Cons:

Tight coupling between services (caller waits for response).
Can lead to cascading failures if not handled properly (I mitigated this using Resilience4j Circuit Breaker and Retry).
Higher latency in chain of calls (e.g., Transaction → Wallet → Notification).
Not ideal for very high throughput or decoupled event-driven systems.

Kafka (Asynchronous Event-Driven):

Pros:

Loose coupling — services don’t wait for each other.
Better scalability and resilience (services can process events at their own pace).
Excellent for eventual consistency and high-throughput scenarios.
Supports complex workflows and multiple consumers for the same event.

Cons:

More complex to implement and debug (requires proper idempotency, ordering guarantees, DLQ handling).
Higher operational overhead (managing Kafka cluster, partitions, consumers).
Eventual consistency can be risky in payment systems if not designed carefully.
Added latency due to async nature.

Aspect	Feign (Current)	Kafka
Communication Style	Synchronous	Asynchronous
Coupling	Tighter	Loose
Consistency	Strong (Immediate)	Eventual
Complexity	Lower	Higher
Debugging	Easier	Harder
Scalability	Good	Excellent
Use Case in Project	P2P Transfer, Add Money	Notifications, Future enhancements

Why you chose Feign in this project over Kafka

=> For a wallet system where money movement must be reliable and consistent, the synchronous Saga approach with Feign gave me better control and simpler compensation logic.

=> However, in the future, I plan to introduce Kafka for non-critical flows like notifications and for making the system more event-driven

How do you handle latency or timeouts in Feign calls?

=> In my Payment Wallet System, I handle latency and timeouts in Feign calls primarily through Resilience4j patterns that I have configured on the Feign Clients

=> I have applied Retry mechanism on Feign clients. If a service call fails due to temporary latency or timeout, Resilience4j automatically retries the request a configured number of times with a delay

=> I have also implemented Circuit Breaker pattern. If a particular service (e.g., Wallet Service) starts responding slowly or times out repeatedly, the Circuit Breaker opens after a threshold, preventing further calls for a cooldown period and returning a fallback response immediately. This stops cascading failures

=> For fallback, I have defined fallback methods that return graceful responses or trigger compensation logic in the Saga flow

=> Additionally, since all inter-service calls go through Feign Clients with Eureka discovery, I rely on default HTTP timeouts provided by Feign + Resilience4j configuration to prevent hanging requests

=> This combination ensures the system remains responsive even when some services experience high latency

Why this approach

=> In financial systems like wallet transfers, we cannot afford long waits, so Circuit Breaker helps fail fast, while Retry handles transient issues like network blips

=> In the current implementation, I have configured Resilience4j for Circuit Breaker and Retry. However, I have not set very fine-grained custom timeout values (like connect-timeout or read-timeout) in application.yml yet. This is something I would tune further.

What is P2P? Why we call it P2P transfer instead of just transfer?

=> P2P stands for Peer-to-Peer

In the context of digital wallets and fintech applications, we specifically use the term P2P Transfer to clearly indicate the nature of the transaction:

P2P Transfer = Money is being transferred directly from one individual/user to another individual/user within the same wallet system.
It is a person-to-person transfer.

Example:

You have ₹5000 in your wallet.
You send ₹1000 to your friend’s phone number / user ID.
This is called P2P Transfer.

Difference from Other Types of Transfers

Type	Meaning	Example
P2P Transfer	User → User (Wallet to Wallet)	Send money to friend
Add Money	Bank Account → Wallet	Recharge your wallet
Withdraw	Wallet → Bank Account	Withdraw to your bank
Merchant Payment	User → Merchant/Business	Pay at a shop via UPI

=> In this project, we call it P2P Transfer because it stands for Peer-to-Peer transfer.
It refers to transferring money directly from one user’s wallet to another user’s wallet within the system

=> This is different from other flows like 'Add Money' (which comes from a bank account) or merchant payments

Can you explain the Add Money vs P2P Transfer flow difference?

In my Payment Wallet System, Add Money and P2P Transfer are two completely different flows with different purposes, services involved, and complexity levels.

1. Add Money Flow:

Purpose: User adds money from an external source (simulated) into their own wallet.
Endpoint: POST /api/wallets/me/add-money
Service Involved: Primarily Wallet Service only.
Steps:
=> API Gateway → Wallet Service (after JWT validation)
=> Wallet Service checks the user’s wallet.
=> Increases the wallet balance.
=> Creates a transaction record (by calling Transaction Service via Feign).
Complexity: Simple and straightforward. No distributed transaction or compensation needed.
No Redis Lock: Not required because it's a single wallet credit operation.

2. P2P Transfer Flow (Peer-to-Peer):

Purpose: User sends money from their wallet to another user’s wallet.
Endpoint: POST /api/transactions/me/transfer
Services Involved: Transaction Service + Wallet Service + Notification Service
Steps (Saga Pattern):
=> Transaction Service receives the request and creates a pending transaction.
=> Calls Wallet Service (Feign) → Debit sender’s wallet.
=> If debit is successful, calls Wallet Service again → Credit receiver’s wallet.
=> Updates transaction status.
=> Calls Notification Service to inform both users.
Key Features: Uses Saga Pattern with compensation logic + Redis Distributed Locking on Wallet Service to handle concurrency safely.
Complexity: Much higher because it involves multiple services and money movement in opposite directions.

Key Differences

Aspect	Add Money	P2P Transfer
Purpose	Load money into own wallet	Send money to another user
Main Service	Wallet Service	Transaction Service (orchestrates)
Services Involved	1–2	3+ (Transaction, Wallet, Notification)
Distributed Transaction	Not needed	Yes (Saga Pattern + Compensation)
Concurrency Control	Not required	Redis Distributed Lock
Complexity	Simple	Complex
Failure Handling	Basic	Compensation (refund) logic

How do you handle data consistency across services?

In this Payment Wallet System, we follow the Database-per-Service pattern, where each microservice has its own H2 database. Because of this, we cannot use traditional distributed transactions like 2PC. Instead, I managed consistency differently based on the flow.

For P2P Transfer (Most Critical Flow):

I used the Saga Pattern with compensation logic.
The Transaction Service coordinates the flow:
- Creates a pending transaction record.
- Calls Wallet Service to debit the sender (with Redis Distributed Lock).
- Calls Wallet Service to credit the receiver.
- Updates transaction status and calls Notification Service.
If any step fails (especially after debit), compensation is triggered — the sender’s money is refunded.
Redis Distributed Locking ensures atomicity during debit/credit operations to prevent double-spending.

For Add Money Flow:

Consistency is much simpler because it is mostly handled within the Wallet Service.
It updates the wallet balance inside a @Transactional method.
It then calls Notification Service to send the add money notification.
Note: Currently, we are not creating a transaction record in the Transaction Service for Add Money. This is a simplification in the current version of the project.

Overall, the system achieves eventual consistency for cross-service operations. The Saga Pattern with compensation is the main mechanism for the complex money movement scenarios, while simpler operations like Add Money rely on local transactions and Resilience4j for reliability.

What is 2PC (Two-Phase Commit)?

2PC stands for Two-Phase Commit. It is a traditional protocol used in distributed systems to achieve strong consistency across multiple databases/services.

How Does 2PC Work?

It has two phases:

Prepare Phase (Voting Phase):
=> The coordinator asks all participating services/databases: "Are you ready to commit this transaction?"
=> Each service locks the required resources and replies Yes or No.
Commit Phase (Decision Phase):
=> If all services reply Yes, the coordinator tells everyone to Commit.
=> If even one service replies No, the coordinator tells everyone to Rollback.

In my Payment Wallet System, I did not use 2PC because of the following reasons:

Aspect	2PC (Traditional)	My Approach (Saga Pattern)
Consistency	Strong ACID consistency	Eventual Consistency
Availability	Lower (all services must be up)	Higher
Performance	Slow (due to locking + coordination)	Better
Complexity	High (needs XA transactions)	Moderate
Use in Microservices	Not recommended	Commonly used
Failure Handling	Global rollback	Compensation logic per service

Main Problems with 2PC in Microservices
=> It creates tight coupling between services.
=> If one service is slow or down, the entire transaction gets blocked.
=> It reduces availability (CAP Theorem trade-off).
=> Modern microservices prefer Saga Pattern instead.

Explain P2P Transfer Saga flow (Synchronous)

Here’s the accurate step-by-step flow as implemented in your code:

Endpoint: POST /api/transactions/me/transfer

Flow Steps:

Transaction Service receives the request.
=> Validates the request.
=> Creates a Transaction record with status PENDING.
=> Calls Wallet Service (via Feign) to debit the sender.
Wallet Service (Debit):
=> Acquires Redis Distributed Lock using the sender’s userId.
=> Checks sufficient balance.
=> Deducts the amount from sender’s wallet.
=> Saves the updated wallet.
=> Releases the lock.
=> Returns success.
Transaction Service:
=> If debit is successful, calls Wallet Service again to credit the receiver.
Wallet Service (Credit):
=> Acquires Redis Distributed Lock on receiver’s userId.
=> Adds the amount to receiver’s wallet.
=> Saves the wallet.
=> Releases the lock.
Transaction Service:
=> Updates the transaction status to COMPLETED.
=> Calls Notification Service to send notifications to sender's userId.
=> Note : For now, we are just showing the notification message in console for demo purpose and also we involved only the sender's userId in the toList.
Compensation Logic (If anything fails):
=> If credit fails after successful debit → Wallet Service performs a refund to the sender.
=> Transaction status is marked as FAILED.

Redis Distributed Locking Implementation

Where it is used: In WalletService during debit and credit operations.

Key Points to Mention:

Prevents race conditions and double-spending when multiple transfers happen simultaneously for the same user.
Uses RedisTemplate.
Lock is acquired with a timeout to avoid deadlocks.
Lock is always released in finally block.

Explain P2P Transfer Saga & Concurrency Control

=> For P2P money transfer, I implemented Saga Pattern using synchronous Feign calls.

=> The flow starts in Transaction Service, which creates a pending transaction and then calls Wallet Service to debit the sender. After successful debit, it calls Wallet Service again to credit the receiver. Finally, it updates the transaction status and notifies the users.

=> To handle concurrency safely, I used Redis Distributed Locking in the Wallet Service for both debit and credit operations. This ensures that only one operation can modify a user’s wallet balance at a time.

=> For failure scenarios, I have implemented compensation logic — if the receiver’s credit fails after sender’s debit, the amount is refunded back to the sender. This helps maintain eventual consistency across services.

What happens if the system crashes after debit but before credit?

In my current implementation of P2P Transfer using Synchronous Saga Pattern:

If the system crashes after successful debit but before credit, the money will be deducted from the sender’s wallet but not credited to the receiver.
The transaction record in Transaction Service will likely remain in PENDING state (depending on where the crash occurs).

Current Handling in the Code:

The compensation (refund) logic is not fully automatic in case of a sudden system crash (e.g., JVM crash, pod restart, or network partition).
However, I have implemented basic compensation logic inside the try-catch blocks in Transaction Service. If the credit call fails due to exception, it calls the refund method on Wallet Service to return the money to the sender.
Redis Distributed Lock helps reduce the window of inconsistency because locks have timeouts.

Limitations (Honest Answer): Currently, there is no background job or saga recovery mechanism (like a cron job that checks for long-pending transactions and triggers compensation).

How I Would Improve It (Senior Touch): In a production system, I would add:

Outbox Pattern or Saga Log table to track saga steps.
A scheduled job that periodically checks for stuck PENDING transactions older than X minutes and triggers compensation (refund).
Distributed Tracing (Zipkin) to easily debug such partial failures.

This scenario is exactly why I chose to mention Saga Pattern with compensation — it allows us to move towards eventual consistency even in failure cases.

What is outbox pattern ?

The Outbox Pattern is a design pattern used in microservices to reliably handle outgoing messages/events in a distributed system, especially when you need strong guarantees that a message should be sent if a database transaction succeeds.

Simple Analogy:

Think of it like writing a letter and putting it in your "Outbox" folder. Even if your computer crashes after saving the letter, the letter is still there to be sent later.

How Outbox Pattern Works:

Same Database Transaction:
=> When your service does a database operation (e.g., debit wallet), you also insert a record into an Outbox table in the same transaction.
=> Example: "Send Credit Event" or "Trigger Compensation".
Background Processor:
=> A separate background job (scheduled task) continuously reads from the Outbox table.
=> It publishes the events/messages to Kafka / RabbitMQ / or calls other services.
=> Once successfully published, it marks the outbox record as PROCESSED or deletes it.
Benefits:
=> Guarantees "At Least Once" delivery.
=> Prevents message loss even if the service crashes after DB commit.
=> Maintains consistency between database state and external actions.

In Context of Payment Wallet System:

When I mentioned "Outbox Pattern or Saga Log table", I meant:

Currently, if a crash happens after debit but before credit, you may lose track of the saga state.
With Outbox Pattern, after a successful debit, you would insert a record in an Outbox table saying "Need to credit receiver".
Even if the service crashes, the background processor will later pick it up and complete the credit (or compensation).

Alternative (Simpler): A Saga Log Table in Transaction Service that records every step of the saga (DEBITED, CREDIT_STARTED, CREDITED, etc.). A background job can scan for incomplete sagas and take action

The Outbox Pattern is a reliable way to publish events after a database transaction. Instead of directly calling other services, we insert records into an outbox table in the same transaction. A background processor then publishes these events.

In my current project, I don’t have the Outbox Pattern implemented yet. That’s why in crash scenarios (after debit but before credit), there is a risk of inconsistency. Implementing Outbox Pattern or a Saga Log table with a recovery job is one of the improvements I plan for making the Saga more robust.

What is Distributed Tracing (Zipkin) ?

=> Distributed Tracing is a technique to monitor and debug requests as they travel across multiple microservices.

=> Zipkin is a tool used to collect and visualize these traces.

=> OpenTelemetry is a modern alternative (and in many ways a successor) to Zipkin

=> In a monolith, it's easy to debug because everything is in one application. But in microservices (like this Payment Wallet System), one user request (e.g., P2P Transfer) may go through API Gateway → Transaction Service → Wallet Service → Notification Service.

=> Distributed Tracing helps you see the full journey of that request across all services.

What is Zipkin?

Zipkin is a popular open-source distributed tracing system. It helps collect, store, and visualize traces (the journey of requests).

How Distributed Tracing with Zipkin Works:

Instrumentation: Each service adds a small tracing library (Spring Cloud Sleuth + Zipkin).
Trace ID Generation: When a request enters the system (usually at API Gateway), a unique Trace ID is generated.
Propagation: This Trace ID is passed along with every inter-service call (Feign, RestTemplate, etc.).
Data Collection: Each service reports timing and metadata (span) to Zipkin.
Visualization: You can search by Trace ID in Zipkin UI and see:
=> How much time each service took
=> Which service failed or was slow
=> Full end-to-end flow

Why is it Useful for this Project?

Especially for scenarios like:

Crash after debit but before credit
Slow Feign calls
Partial failures in Saga Pattern

With Zipkin, you can search the Trace ID and clearly see exactly where the request failed or got delayed.

=> In my current project, I have not integrated Zipkin yet.

=> But it is in my Future Enhancements list because it would greatly help in debugging partial failures in the Saga flow — for example, identifying exactly at which step (debit, credit, or notification) a request failed and why.

=> It helps in performance analysis, bottleneck identification, and root cause analysis in distributed systems

OpenTelemetry over Zipkin ?

=> OpenTelemetry is a modern alternative (and in many ways a successor) to Zipkin

AspectZipkinOpenTelemetry (OTel)
ScopeOnly Distributed TracingFull Observability (Traces + Metrics + Logs)
ApproachFocused tracing toolVendor-neutral standard
InstrumentationMostly manualStrong support for auto-instrumentation
FlexibilityLimitedCan export to Zipkin, Jaeger, Jaeger, Prometheus, etc.
Current Industry TrendOlder technologyDe-facto standard in 2025-2026
ComplexitySimpler for pure tracingSlightly more complex but more powerful
 

Aspect	Zipkin	OpenTelemetry (OTel)
Scope	Only Distributed Tracing	Full Observability (Traces + Metrics + Logs)
Approach	Focused tracing tool	Vendor-neutral standard
Instrumentation	Mostly manual	Strong support for auto-instrumentation
Flexibility	Limited	Can export to Zipkin, Jaeger, Jaeger, Prometheus, etc.
Current Industry Trend	Older technology	De-facto standard in 2025-2026
Complexity	Simpler for pure tracing	Slightly more complex but more powerful

=> OpenTelemetry is not just an alternative — it is becoming the standard for observability

=> Many teams now use OpenTelemetry to generate traces and then export them to Zipkin (or Jaeger, or Grafana Tempo, etc.) for visualization

So, I would like to use OpenTelemetry in this project as future enhancement

=> Use OpenTelemetry for instrumentation in all your Spring Boot services

=> Export traces to Zipkin (if you want to keep using Zipkin UI) 

How does Redis Distributed Lock work in the code ?

In my Payment Wallet System, I have implemented Redis Distributed Lock in the Wallet Service to handle concurrency during debit and credit operations in P2P transfers.

Purpose: It prevents double-spending and race conditions when multiple transfer requests try to modify the same user’s wallet balance at the same time.

How it is Implemented:

I use Redis as the centralized lock store (since it's fast and supports distributed locking).
I used RedisTemplate
In the WalletService, before performing debit or credit:
=> The service tries to acquire a lock using a unique key (usually based on userId, e.g., "wallet:lock:" + userId).
=> The lock has a timeout (lease time) to prevent permanent locking if something goes wrong.
=> Only if the lock is acquired, the balance update operation proceeds.
=> After the operation (success or failure), the lock is released in the finally block to avoid deadlocks.

Key Benefits in this Project:

Ensures that debit and credit operations are atomic across concurrent requests.
Works across multiple instances of Wallet Service (horizontal scaling).

This is a critical part of making the Saga Pattern safe in a distributed environment.

Why didn’t you use Orchestration Saga?

In this Payment Wallet System, I implemented Synchronous Saga Pattern instead of a full Orchestration-based Saga.

Reason for not using Orchestration Saga:

Simplicity & Faster Development Creating a separate dedicated Saga Orchestrator Service would have added extra complexity (additional service, state management, more Feign calls). Since this was a personal/project learning exercise, I chose a simpler approach where Transaction Service itself acts as the coordinator.

Better Control for Financial Transactions With the current synchronous approach, I have direct control over the sequence — debit → credit → update status. This gives immediate feedback on success/failure, which is very important for money movement.

Lower Operational Overhead Orchestration Saga usually requires maintaining saga state in a separate database/table and handling complex rollback workflows from a central place. My current implementation keeps the logic inside the Transaction Service with compensation methods, making it easier to understand and debug.

Current Trade-off: While the current implementation works well, it has some coupling because Transaction Service directly calls Wallet and Notification services. In a true Orchestration Saga, we would have a dedicated orchestrator that only manages the flow without containing business logic.

Future Plan: If the system scales further, I would consider moving to Orchestration Saga (with a dedicated service) or Choreography Saga using Kafka for better decoupling and resilience.

Summary

=> Chose Synchronous Saga (coordinated by Transaction Service) → Simpler, easier to control, faster to implement.

=> Did not choose Orchestration Saga → To avoid extra service + complexity for this project scope.

Orchestration Saga and Choregraphy Saga are same ?

No, Orchestration Saga and Choreography Saga are NOT the same.

Aspect	Orchestration Saga	Choreography Saga
Control	Centralized	Decentralized
Who manages the flow?	A dedicated Saga Orchestrator service	No central controller — services react to events
Communication Style	Command-based (Orchestrator tells services what to do)	Event-based (Services publish events)
Coupling	Tighter (services depend on orchestrator)	Looser (services are independent)
Complexity	Easier to understand and manage sequence	Slightly harder to track full flow
Failure Handling	Orchestrator handles rollback/compensation	Each service does its own compensation
Debugging	Easier (flow is in one place)	Harder (flow is distributed across services)
Scalability	Can become bottleneck	Better scalability

=> Orchestration Saga → Like a Project Manager who tells everyone what to do step by step and coordinates everything.

=> Choreography Saga → Like a group of dancers performing together without a central instructor — each dancer reacts to the music (events) independently.

In the context of this project payment-wallet-system,

=> I used neither pure Orchestration saga nor pure Choreography saga.

=> I implemented a Synchronous Saga where Transaction Service acts as a coordinator (closer to Orchestration but without a dedicated orchestrator service)

Compensation Logic (Rollback) in P2P Transfer

This is one of the most important parts interviewers focus on.

Current Implementation:

The main compensation logic is present in the TransactionService.
If the debit succeeds but credit fails, the system attempts to refund the amount back to the sender.

Flow of Compensation:

Debit operation on sender’s wallet succeeds.
Credit operation on receiver’s wallet fails (exception thrown).
Transaction Service catches the exception and calls a refund method on Wallet Service for the sender.
Transaction status is marked as FAILED.
Notification is sent to the sender about the failure.

Important Note from Code:

Compensation is reactive (only triggered on exception), not proactive or background-based.
There is no Saga Log / Outbox table to recover from crashes after debit.

Idempotency Handling

Current Status in Code: Limited / Basic idempotency.

The project does not have strong idempotency keys (like requestId or transactionId check before processing).
If the same transfer request is retried (due to network issue), it may process again, leading to duplicate debit/credit (though Redis lock reduces probability).
This is a known gap in the current implementation.

Strong Combined Answer for Saga + Concurrency (Full Version)

For P2P transfers, I implemented a Synchronous Saga Pattern coordinated by the Transaction Service.

Step-by-step flow:

Transaction Service creates a PENDING transaction record.
Calls Wallet Service to debit the sender (protected by Redis Distributed Lock).
On successful debit, calls Wallet Service to credit the receiver (again with Redis Lock).
Updates transaction status to COMPLETED.
Calls Notification Service (currently notifies only the sender).

Concurrency Control: I used Redis Distributed Lock in Wallet Service for both debit and credit operations. The lock is acquired using a key like wallet:lock:{userId} with a lease time. It is always released in the finally block.

Compensation Logic: If credit fails after debit, the Transaction Service triggers a refund to the sender. However, in case of sudden crashes after debit, there is currently no automatic recovery job.

This design gives good control but trades off some resilience for simplicity.

Major challenges you faced in Saga + Concurrency (with STAR story)

STAR Answer

Situation: In the Payment Wallet System, the most critical feature was P2P money transfer. Since we were using a microservices architecture with separate databases, ensuring money consistency across services became challenging. The biggest issue was handling concurrent transfers and partial failures in the Saga flow.

Task: I needed to implement a reliable P2P transfer using Saga Pattern such that:

No double-spending happens even under high concurrency.
If any step fails (especially after debit), the system should not leave money in an inconsistent state.
The solution should be simple yet production-like.

Action:

I chose a Synchronous Saga Pattern coordinated by the Transaction Service instead of full Orchestration or Choreography.
Implemented Redis Distributed Locking in the Wallet Service for both debit and credit operations. I used a lock key like wallet:lock:{userId} with proper lease time and ensured the lock is always released in the finally block to prevent deadlocks.
Added compensation logic — if the receiver’s credit failed after successful sender debit, the system automatically triggers a refund to the sender.
Configured Resilience4j (Circuit Breaker + Retry) on all Feign calls to handle transient failures.
Added proper logging and exception handling across services to trace failures.

Result:

Successfully prevented race conditions and double-spending during concurrent transfers.
The compensation logic worked for most exception scenarios (e.g., service down during credit).
The P2P transfer became reliable enough for demonstration.
However, I identified a remaining gap: in case of sudden crashes after debit but before credit, there is no automatic recovery (no background saga recovery job yet).

Key Learning / Senior Touch: "This experience taught me the practical challenges of implementing distributed transactions in microservices. It reinforced why many fintech systems either use Orchestration Saga with proper state management or move towards event-driven Choreography with Kafka. I have noted this as a key improvement area — adding a Saga recovery mechanism using Outbox Pattern or scheduled jobs.

How does Redis Distributed Lock handle deadlocks / timeouts?

In my project, Redis Distributed Lock is implemented in the Wallet Service to protect debit and credit operations during P2P transfers.

How it Handles Deadlocks and Timeouts:

Lease Time (Auto Expiry):
=> When acquiring the lock, I set a lease time (expiration time). Even if the service crashes or the thread hangs, the lock will automatically expire after the lease time. This prevents permanent deadlocks.

TryLock with Timeout:
=> Instead of blocking indefinitely, I use tryLock() with a wait timeout. If the lock is not acquired within a specified time, the operation fails gracefully with a proper error message (e.g., "Wallet is busy, please try again later").

Lock Release in Finally Block:
=> The lock is always released in a finally block. This ensures that even if an exception occurs during debit/credit, the lock is released, preventing deadlocks.

Unique Lock Key:
=> Lock key is generated per user, e.g., wallet:lock:{userId}. This makes the lock granular.

Current Implementation Note: The project uses Redis (via RedisTemplate) for distributed locking. While I have implemented basic safeguards, I have not used Redisson (which provides more advanced features like automatic lock renewal). This is something I would upgrade for production.

Key talking points:

Deadlock Prevention: Lease time + finally block is the main defense.
Timeout Handling: tryLock(waitTime, leaseTime, TimeUnit) pattern helps avoid indefinite waiting.
Limitation: In the current code, lock renewal (watchdog) is not implemented, so long-running operations could lose the lock prematurely.

What idempotency improvements are needed?

Currently, the project has limited idempotency support, which is one of the important areas for improvement in a payment system.

Current State in the Code:

There is no explicit idempotency key (like requestId, idempotencyKey, or clientTransactionId) being passed from the client or checked in the backend.
For P2P Transfer, if the same request is retried (due to network timeout or client retry), there is a risk of duplicate debit/credit because the system does not check whether a transaction with the same details was already processed.
Redis Distributed Lock helps reduce the chance of race conditions, but it does not solve the idempotency problem for retried requests.
Add Money flow also lacks idempotency protection.

Why This is Important: In payment systems, clients (mobile apps) often retry requests automatically. Without idempotency, this can lead to double deduction of money — which is a critical bug in fintech applications.

Improvements I Would Make:

Introduce Idempotency Key:
=> Require clients to send a unique idempotencyKey (UUID) in the request header or body for transfer and add-money APIs.

Store Processed Keys:
=> In Transaction Service, maintain a table or Redis set of processed idempotency keys with TTL (e.g., 24 hours).

Check Before Processing:
=> At the start of transfer flow, check if the idempotency key already exists.
=> If yes → Return the previous response (success or failure) without re-processing.
=> If no → Proceed with Saga and store the key after successful completion.

Combine with Transaction ID:
=> Make the external transactionId unique and use it for deduplication.

This would make the system much safer for real-world usage. Implementing proper idempotency is high on my priority list for future enhancements.

So, are you saying user should hit with their own unique idempotency key?

What is an Idempotency Key?

An Idempotency Key is a unique identifier sent by the client (mobile app / frontend) along with the request. Its purpose is to ensure that even if the same request is sent multiple times, the server processes it only once.

Simple Real-World Example:

Imagine you are doing a P2P transfer of ₹1000.

Without Idempotency Key:

You click "Send" → Request goes to server.
Server starts processing (debit + credit).
Due to network issue, you get a timeout/error.
You click "Send" again (or app auto-retries).
Server processes it again → Money gets deducted twice. (Bad!)

With Idempotency Key:

Before sending, the mobile app generates a unique ID (UUID).
User clicks "Send" → Request includes this key: { "amount": 1000, "toUserId": 123, "idempotencyKey": "abc-xyz-789" }
Server receives it and checks:
- "Have I seen this key before?"
- If No → Process the transfer normally and save this key.
- If Yes → Return the previous result (success or failure) without processing again

Who Should Send the Idempotency Key?

The client (Mobile App / Web App / Postman) should generate and send it.
Backend (your services) should check and store it.

This is an industry standard in all good payment systems (PhonePe, Razorpay, Stripe, etc.).

How It Would Work in Your Project:

Request Example:

POST /api/transactions/me/transfer
{
"receiverId": 456,
"amount": 500,
"description": "Dinner money",
"idempotencyKey": "550e8400-e29b-41d4-a716-446655440000" // ← Client generates this
}

In Transaction Service:

First check if this idempotencyKey was already processed.
If yes → Return cached response.
If no → Proceed with Saga → Store the key after success.

So, are you saying user should hit with their own unique idempotency key?

Why Care About Idempotency Key Even Though This is Backend-Only Project?

Even if your project currently has no frontend/mobile app, you should still care about idempotency for the following important reasons:

1. Real-World Payment Systems Always Face Retries

Even in backend systems, duplicate requests can come from:

=> Network timeouts (client gets timeout but request reached server)
=> API Gateway retries
=> Load balancer retries
=> Postman / testing tools sending request twice by mistake
=> Future mobile/web clients (when this project is consumed)

In fintech, duplicate money deduction is a critical bug. So interviewers expect you to think about this.

2. Backend Should Be Idempotent by Design

=> A good backend should be idempotent regardless of who is calling it. It’s a backend responsibility to protect itself from duplicate processing.

=> You don’t fully depend on the client. The client just helps by sending a key.

3. Current Risk in the Project

In current code:

=> If someone (Postman, another service, or future app) calls the /transfer API twice with the same details in quick succession, there is a high chance of double debit/credit.
=> Redis Lock helps only with simultaneous requests, not with retries that come after a few seconds.

Hey let say the client (stick to postman) is not sending any unique idempotency key, then how can you take care in backend, how can you find it as retry or duplicate request ?

How Backend Can Handle Duplicates Without Client-Sent Idempotency Key

Even without the client sending a key, you can still implement partial idempotency using the data available in the request.

Best Approaches (in order of effectiveness):

Use Combination of Fields as Natural Idempotency Key (Recommended for your project)
=> Create a composite key from the request data.
=> Example for P2P Transfer:
senderUserId + receiverUserId + amount + createdTime (within last 5-10 mins)
How to implement:
=> In TransactionService, before starting the Saga, generate a hash or key like:
Java
String duplicateCheckKey = "transfer:" + fromUserId + ":" + toUserId + ":" + amount + ":" + requestTimeWindow;
=> Check in Redis (with TTL of 10-30 minutes) whether this key already exists.
=> If exists → Treat it as duplicate → Return previous result or error ("Transfer already processed").
=> If not → Process the transfer and store the key in Redis.
Use Database Unique Constraint
=> Add a unique constraint in Transaction table on combination of (senderId, receiverId, amount, createdAt) with a time window.
=> This prevents duplicate inserts at database level.
Generate Idempotency Key in Backend (If client doesn't send)
=> Backend can generate its own key based on request content (using hash of important fields).
=> But this is less reliable than client-generated key.

Summary

=> Even if the client (like Postman) does not send an idempotency key, I can still handle duplicate requests in the backend by creating a natural idempotency key from the request payload.

=> For example, in P2P transfer, I can combine senderUserId + receiverUserId + amount + a short time window (last 10 minutes) and store this composite key in Redis with TTL.

=> Before processing any transfer, I check if this key already exists in Redis. If it does, I treat it as a duplicate request and return the previous status instead of processing again.

=> This approach provides decent protection even without client cooperation, although the gold standard is still to have the client send a unique idempotencyKey.

Overall pros/cons of current Saga approach ?

In this Payment Wallet System, I implemented a Synchronous Saga Pattern coordinated by the Transaction Service using Feign Clients. Here are the major pros and cons of this approach

Pros (Advantages)

Simplicity & Ease of Understanding The entire flow is sequential and easy to trace. Transaction Service clearly controls the steps (debit → credit → update status).
Better Control Over Transaction Flow Immediate feedback after each step (debit success/failure) allows easy decision making for the next step.
Simpler Compensation Logic Compensation (refund) can be triggered immediately in the same call stack if credit fails after debit.
No Extra Infrastructure Needed No requirement for Kafka or message brokers, which reduced complexity and operational overhead.
Easier Debugging Since everything is synchronous, I can follow the request flow easily using logs and Resilience4j fallbacks.
Faster Development Suitable for this project scope and good for learning distributed transaction concepts.

Cons (Disadvantages)

Tight Coupling Transaction Service is directly dependent on Wallet Service and Notification Service. If Wallet Service is down, the entire transfer fails.
Cascading Failure Risk Slow or failing services can block the whole request chain (mitigated partially by Resilience4j).
Limited Resilience in Crash Scenarios If the system crashes after debit but before credit, there is currently no automatic recovery mechanism (no Saga Log or background recovery job).
Scalability Limitation Synchronous calls can become a bottleneck under very high load.
Lack of Strong Idempotency The current design is vulnerable to duplicate processing if the same request is retried.
Not Fully Event-Driven Less flexible for future extensions compared to Choreography-based Saga.

Overall, the current synchronous Saga approach was a good trade-off for this project — it gave me reliability with manageable complexity. However, as the system grows, I plan to evolve it by either introducing a dedicated Orchestrator Service or moving towards event-driven Choreography Saga using Kafka for better decoupling and resilience.

How do you handle millions of requests?

Currently, this Payment Wallet System is designed as a learning / mid-scale application. It is not yet production-ready to handle millions of requests per day, but it has some foundational patterns that support scalability. Here’s an honest breakdown:

What the Current System Has for Handling High Load:

Rate Limiting at API Gateway
=> I have implemented Rate Limiting using Resilience4j at the Gateway level. This protects the system from being overwhelmed by too many requests from a single IP or user.
Resilience4j Patterns
=> Circuit Breaker prevents cascading failures when any service is overloaded.
=> Retry mechanism handles temporary spikes.
=> Bulkhead (if configured) can isolate resources.
Redis Distributed Locking
=> Used in Wallet Service for debit/credit operations. This ensures correctness even under concurrent load, though it can become a contention point at very high scale.
Horizontal Scaling Possible
=> Services are registered with Eureka, so we can run multiple instances of each service.
=> Docker Compose setup makes it easier to scale services.
Caching
=> Basic Redis caching is used for wallet balance (eviction on write).

Current Limitations (Honest Answer):

H2 Database — Not suitable for high throughput. It’s in-memory and single instance.
Synchronous Feign Calls — Can cause latency and cascading effects under high load.
No Asynchronous Processing — Everything is synchronous, which limits throughput.
No Advanced Caching / Read Replicas — Heavy read operations (balance check) can hit the database.
Single Redis Instance — Can become a bottleneck for locking at very high concurrency.

How I Would Scale It for Millions of Requests:

To handle millions of requests, I would implement the following improvements:

Replace H2 with PostgreSQL + Read Replicas for better read scalability.
Introduce Kafka for asynchronous processing (especially for notifications and non-critical steps).
Improve Caching Strategy — Heavy use of Redis Cache for balance checks with proper invalidation.
Database Sharding by userId or walletId for future growth.
Move critical money movement to eventual consistency using better Saga + Outbox Pattern.
Deploy on Kubernetes with Horizontal Pod Autoscaler (HPA) based on CPU/load.
Use Redisson with multi-node Redis cluster for better distributed locking.

In the current version, the system can comfortably handle thousands of requests per minute with proper scaling of instances, but for millions, we need architectural evolution toward higher decoupling and asynchronous processing.

What is Database Sharding and how would you achieve it?

What is Database Sharding?

"Database Sharding is a horizontal scaling technique where we split a large database into multiple smaller databases (called shards) so that the load is distributed across them.

Instead of one big database handling all the data, we divide the data based on a shard key (e.g., userId, walletId, or region). Each shard contains only a subset of the data."

Why Sharding is Needed in Your Project?

In your current Payment Wallet System:

All data (users, wallets, transactions) is stored in H2 (single database per service).
As the number of users and transactions grows to millions, a single database becomes a bottleneck for both read and write operations.

How Would You Implement Sharding in This Project?

Shard Key Choice:

Best shard key for this wallet system would be userId (or walletId since it's 1:1 with user).

How I Would Implement It:

Choose Sharding Strategy:
=> Range-based Sharding: Users with userId 1–1M in Shard 1, 1M+1 to 2M in Shard 2, etc.
=> Hash-based Sharding (Recommended): Use hash(userId) % number_of_shards to decide which shard a user belongs to.
Changes in Architecture:
=> Replace single H2 with multiple PostgreSQL instances (one per shard).
=> Create a Shard Router / Database Router layer that decides which shard to route the request to, based on userId.
Implementation Approach in Your Services:
=> In Wallet Service and Transaction Service, instead of directly using one DataSource, use a routing DataSource.
=> Example logic:
Java
int shardNumber = Math.abs(userId.hashCode()) % TOTAL_SHARDS; // Route to database: "wallet-db-" + shardNumber
=> Use Spring AbstractRoutingDataSource for dynamic routing.
Handling Cross-Shard Operations:
=> P2P Transfer between users in different shards is complex.
=> Solution: Use Saga Pattern + Distributed Transactions (or eventual consistency) across shards.
Future Tech Stack:
=> PostgreSQL with Citus extension (for easier sharding)
=> Or use managed solutions like Amazon RDS + Sharding, CockroachDB, or YugabyteDB.

Pros & Cons of Sharding (Senior Touch)

Pros:

Excellent horizontal scalability
Better performance for high traffic

Cons:

Increased complexity (cross-shard queries are hard)
More operational effort
Joins across shards become difficult

Summary

Database Sharding is splitting one large database into multiple smaller ones using a shard key like userId. In my current project, all data is in single H2 instances per service. For future growth to millions of users, I would shard the Wallet and Transaction databases by userId using hash-based routing. I would implement a routing layer using Spring’s AbstractRoutingDataSource and move to PostgreSQL. This would allow horizontal scaling while keeping the existing Saga and Redis patterns intact.

What is Kubernetes and why you need in your project ?

What is Kubernetes?

Kubernetes (also known as K8s) is an open-source container orchestration platform. It automates the deployment, scaling, and management of containerized applications.

Think of it as a smart manager for containers (like Docker containers).

Simple Analogy:

Docker = Creates individual containers (like cars).
Kubernetes = Manages hundreds or thousands of those cars — decides which ones run where, scales them up/down, restarts failed ones, balances traffic, etc.

Why Kubernetes is Important (Especially for this Project):

In this Payment Wallet System, we currently run services using Docker Compose. This works fine for local development or small scale.

When the system needs to handle real production traffic (thousands to millions of requests), we need Kubernetes because it provides:

Automatic Scaling — Horizontal Pod Autoscaler (HPA) can automatically increase or decrease the number of pods (instances) based on CPU usage, memory, or custom metrics.
High Availability — If one pod crashes, Kubernetes automatically starts a new one.
Load Balancing — Distributes traffic evenly across all running instances.
Service Discovery — Pods can easily find each other.
Rolling Updates & Rollbacks — Deploy new versions without downtime.
Self-Healing — Restarts failed containers automatically.

Summary

Kubernetes is a container orchestration platform that automates the deployment, scaling, and management of containerized microservices.

In my current project, I am using Docker Compose for local deployment. In production, I would deploy all services (User, Wallet, Transaction, Notification, API Gateway) on Kubernetes. I would use Horizontal Pod Autoscaler (HPA) to automatically scale the number of pods based on CPU utilization or request load. This ensures the system can handle traffic spikes reliably, especially during high transaction periods like festivals or salary days.

What could be your database choice for real production application for scalability among the databases H2, MySQL, NoSQL, Oracle SQL, PostgreSQL

=> For this Payment Wallet System in real production, my top recommendation would be PostgreSQL.

Ranking for Scalability + Production Suitability:

Database	Suitability for Production	Recommendation	Reason
PostgreSQL	Excellent	Best Choice	Best balance
MySQL	Very Good	Strong Alternative	Good, but less advanced features
NoSQL (MongoDB/Cassandra)	Moderate	Not Recommended	Consistency issues
Oracle SQL	Excellent	Good but costly	Very expensive
H2	Not Suitable	Only for Dev	In-memory, not scalable

Why PostgreSQL?

Excellent Scalability
=> Supports Horizontal Scaling (Read Replicas) and Vertical Scaling very well.
=> Advanced partitioning and sharding support (especially with Citus extension).

Strong ACID Compliance
=> Critical for a Payment/Wallet system where money consistency is non-negotiable.

Better Concurrency Handling
=> Uses MVCC (Multi-Version Concurrency Control) → Better performance under high concurrent reads/writes compared to MySQL.

Rich Features
=> JSON support, powerful indexing, full-text search, advanced window functions, etc.
=> Great for complex queries involving transactions and analytics.

Open Source & Cost Effective
=> Free, no licensing cost (unlike Oracle), and widely supported on AWS, GCP, Azure.

When I would choose others?

MySQL → If the team is more comfortable with MySQL or using AWS Aurora MySQL.
NoSQL → Only for non-financial parts (e.g., notifications, audit logs, analytics). Not suitable for core wallet & transaction data.
Oracle → Only if the company already has Oracle licenses and enterprise support needs.
H2 → Strictly for local development/testing.

What's the difference between Horizontal Scaling and Vertical Scaling ?

Feature	Vertical Scaling	Horizontal Scaling
Meaning	Scaling Up — Making a single machine more powerful	Scaling Out — Adding more machines
How it Works	Increase CPU, RAM, Storage of one server	Add more servers/instances working together
Example	Upgrading EC2 from t2.micro (1 CPU, 1GB) to t3.large (4 CPU, 16GB)	Running 10 instances of t2.micro behind a Load Balancer
Cost	Can get expensive quickly	More cost effective at large scale
Limitation	Limited by maximum hardware capacity of one machine	Almost unlimited (you can add hundreds of machines)
Complexity	Simpler to implement	More complex (needs load balancer, service discovery, etc.)
Downtime	Usually requires restart	Can be done with zero downtime
Best For	Small to medium applications	Large, high-traffic, production systems

Simple Real-World Analogy:

=> Vertical Scaling → Buying a bigger truck to carry more load.

=> Horizontal Scaling → Buying multiple normal trucks to carry the load together.

In Context of this Payment Wallet System:

=> Currently, the project is running locally using Docker Compose.

=> I have not deployed it on AWS yet.

=> However, I am planning to deploy it on AWS EC2 (t2.micro or t3.micro instance) using Docker Compose in the next few days.

=> I chose EC2 because it is simple, cost-effective (Free Tier eligible), and a good starting point to demonstrate cloud deployment of a multi-microservice application.

Why Redis Distributed Locking in this project? and explain in real time scenario

Purpose (in simple practical terms):

The main purpose of using Redis Distributed Lock is to prevent race conditions and double-spending during money transfers.

Real Practical Scenario:

Imagine a user has ₹1000 in their wallet.

The user (or due to network issue) clicks the "Transfer ₹500" button twice very quickly.
Without locking, both requests can reach the Wallet Service at almost the same time.

What can happen without Lock?

Both requests check balance (₹1000 > ₹500) → Both get approved.
Both requests deduct ₹500.
Result: Wallet balance becomes ₹0 instead of ₹500 → Double deduction (User loses extra ₹500).

This is called a Race Condition.

How Redis Distributed Lock Solves It:

=> Before doing any debit or credit operation, the Wallet Service tries to acquire a lock on that particular user’s wallet. (Sender wallet)
=> Only one request can hold the lock at a time.
=> The second request has to wait until the first one completes (debit + credit) and releases the lock.
=> This ensures that balance check and update happen atomically (as one single safe operation).

So yes — It prevents double payment / double deduction when the user clicks the button multiple times quickly.

Redis will lock just for 60 seconds or seconds that you mentioned right? Will it be enough to prevent accidental double click?

=> Yes, Redis lock usually has a timeout (lease time). 60 seconds is quite common and more than enough to prevent accidental double clicks.

=> In most implementations (including typical ones for this kind of project), the Redis lock is acquired with a lease time like 10–60 seconds.

Why 60 seconds (or even 30 seconds) is sufficient:

=> A normal transfer operation (debit + credit) should complete in less than 1 second (usually 100-500 milliseconds).

=> Accidental double clicks or fast retries happen within milliseconds to 2–3 seconds.

=> The lock only needs to hold for the duration of one complete operation.

=> So even if you set the lock timeout to 10–30 seconds, it is more than enough to safely handle double clicks, network retries, etc.

What happens if the operation takes too long?

=> If the lock expires before the operation finishes (very rare), another request could acquire the lock → risk of race condition.

=> Good practice: Always set a reasonable lease time (e.g., 30 seconds) and release the lock as soon as the operation is done (in finally block).

Consier user has Rs. 5000 in his wallet.
He transfer Rs. 1000 to his friend.
Due to network issue, or any other possible issues. He clicked the send button again.
But he clicked the send button after 60 seconds. Since he has more than enough balance in his account, and also 60 seconds (redis lock timeoutseconds) passed, he will be debited for second time too. This time you will blame user or your increase Redis lock timeout seconds ?

=> In the current implementation, if the user clicks the "Send" button again after the Redis lock timeout (e.g., after 60 seconds), then yes, the second transfer will be processed and money will be debited again.

=> Cannot fully blame the user, because network issues, slow responses, or auto-retry mechanisms in mobile apps often cause such duplicate requests.

Why This Happens?

=> Redis Lock only protects during the short duration of the operation (while the lock is held).

=> Once the lock expires (after 30–60 seconds), a new request can acquire the lock again.

=> There is no mechanism currently to detect that "this same transfer was already attempted".

Correct Solution (What should happen):

=> Even if the user clicks the button multiple times (with delay), the system should process the transfer only once.

Fix at infrastruction level :

=> We can use Redisson client's WatchDog mechanism (Automatic lock renewal)

Best Fix (Recommended):

=> Use an Idempotency Key (client-generated) + check in Redis/Database before processing.

=> Or use a combination of transactionId / fromUserId + toUserId + amount + time window for duplicate detection.

=> In a production system, I would implement an idempotency key so that even delayed duplicate requests are safely rejected or returned with the previous result.
This is one of the key improvements I have noted for this project.

=> I would recommend Idempotency Key from Client side

This Redis distributed lock handling concurrency ?

=> Yes, The Redis Distributed Lock is primarily used for handling concurrency.

=> Concurrency = Multiple requests trying to access/modify the same wallet at the same time.

Practical Example:

User A has ₹5000 in wallet.

Request 1: Transfer ₹2000 to User B
Request 2: Transfer ₹3000 to User C (Both requests arrive almost simultaneously — e.g., user clicked fast or network retry)

Without Lock:

Both requests check balance → Both see ₹5000
Both deduct money → Balance becomes negative or incorrect

With Redis Distributed Lock:

Request 1 acquires the lock → Proceeds with debit
Request 2 waits (or fails quickly) until Request 1 completes and releases the lock
Only after Request 1 finishes, Request 2 can acquire the lock and proceed

This ensures only one operation modifies the wallet balance at any given time.

Have you used Load Balancer in this project ? Do you aware of Load Balancing ?

=> Not explicitly a dedicated Load Balancer like AWS ELB, Nginx, or HAProxy.
However, I have used client-side load balancing provided by Spring Cloud.

=> I am using Spring Cloud Gateway + Eureka Service Discovery.

=> When one service (e.g., Transaction Service) calls another service (e.g., Wallet Service) using OpenFeign, Spring Cloud internally uses Spring Cloud LoadBalancer (previously Ribbon) to distribute the request across multiple instances of the target service.

=> This is called Client-Side Load Balancing.

=> So, while there is no centralized Load Balancer, basic load balancing capability exists through Eureka + Spring Cloud LoadBalancer.

Are you aware of Load Balancing?

=> Yes, I am well aware of Load Balancing.

Load Balancing is the process of distributing incoming network traffic across multiple servers/instances to ensure:

=> No single server gets overwhelmed

=> Better availability and fault tolerance

=> Improved performance

Types of Load Balancing:

=> Client-Side Load Balancing → Client decides which instance to call (Used in this project via Spring Cloud).

=> Server-Side Load Balancing → A dedicated component (like AWS ALB, Nginx) sits in front and distributes traffic.

As per current implementation, this project is capable of how many requests per millisecond ?

=> As per current implementation, this project is NOT designed or tested for high throughput.

=> Requests per millisecond: Very Low (likely 0.1 to 1 request per millisecond at best under ideal conditions).

Why So Low?

=> H2 In-Memory Database — Not optimized for high concurrency.

=> Synchronous Feign Calls — Each P2P transfer makes multiple sequential calls (Transaction → Wallet → Wallet → Notification).

=> Redis Distributed Lock — Creates contention under high concurrency.

=> Single Instance — Everything runs on one machine (local).

=> No Caching Strategy for reads (except basic wallet cache eviction).

Practical Numbers (Rough Estimate):

Scenario	Estimated Capacity
Local Machine (Development)	50 – 200 requests per second
Single EC2 t3.micro	100 – 400 requests per second
Per Millisecond	~0.1 to 0.4 requests/ms
P2P Transfer (Complex flow)	Much lower (due to multiple calls + lock)

=> Currently, this project is a learning/portfolio application, not production-optimized.
With H2 database, synchronous calls, and single instance setup, it can roughly handle 100–300 requests per second under light load

=> For real production scale (thousands of RPS), I would need to introduce PostgreSQL with read replicas, asynchronous processing (Kafka), better caching, and horizontal scaling with Kubernetes.

Scaling Roadmap for Payment Wallet System (High Level Design)

Scaling Roadmap for Payment Wallet System

(From Current State → High Scale Production)

Current State (As Implemented)

Single instance deployment using Docker Compose
H2 in-memory database
Synchronous communication via Feign Client
Basic Redis Distributed Lock
Suitable for learning/demo only (few hundred RPS max)

Phase-wise Scaling Strategy

Phase 1: Basic Production Readiness (500 – 5,000 RPS)

Replace H2 with PostgreSQL (with HikariCP connection pooling)
Implement proper Redis Caching for read-heavy operations (wallet balance)
Enable Horizontal Scaling — Run multiple instances of each service
Introduce AWS Application Load Balancer (ALB) or Nginx
Improve Idempotency using idempotency keys
Add basic monitoring (Spring Boot Actuator + Prometheus)

Phase 2: Medium Scale (5,000 – 50,000 RPS) — Recommended Target

Move to Event-Driven Architecture using Kafka
- Convert Synchronous Saga → Choreography-based Saga
Implement Read Replicas for Wallet & Transaction services
Use Redis Cluster for distributed locking and caching
Deploy on Kubernetes (EKS) with Horizontal Pod Autoscaler (HPA)
Implement Circuit Breaker + Bulkhead tuning
Add Distributed Tracing (OpenTelemetry + Jaeger/Zipkin)

Phase 3: High Scale (50,000+ RPS)

Database Sharding by userId or walletId (using PostgreSQL Citus or custom routing)
Introduce CQRS (Command Query Responsibility Segregation)
Use Cassandra or CockroachDB for high-write transaction logs (optional)
Multi-AZ & Multi-Region deployment
Advanced caching layers (Redis + Local Cache)
Rate limiting at multiple levels (Gateway + Service)

Key Architectural Improvements

From Synchronous → Asynchronous (Biggest impact)
From Monolithic DB → Sharded DB
From Single Instance → Horizontally Scaled
Strong Consistency → Eventual Consistency (where acceptable)

Priority Order (What I Would Do First)

Move from H2 → PostgreSQL
Implement proper Idempotency
Introduce Kafka for async processing (especially notifications & non-critical steps)
Deploy on Kubernetes with multiple instances
Add comprehensive observability

=> Currently, the system is designed for learning purposes. To make it production-ready and capable of handling tens of thousands of requests per second, I would follow a phased approach — starting with PostgreSQL and idempotency, then moving to event-driven architecture using Kafka, and finally achieving horizontal scaling with Kubernetes and database sharding.

What is Saga Pattern (In general, not in the context of your project)?

Saga Pattern is an architectural pattern used in microservices to manage distributed transactions (transactions that span across multiple services).

The Problem it Solves:

In a monolith, you can do everything in one database transaction. In microservices, each service has its own database, so you cannot use a single ACID transaction.

If one service fails in the middle, you need a way to maintain consistency (e.g., don't debit money if credit fails).

How Saga Pattern Works:

It breaks one big business transaction into a series of small local transactions.

Each service performs its own local transaction and publishes an event.
If any step fails, it triggers compensating transactions (rollback actions) to undo the previous successful steps.

Two Types of Saga:

Orchestration-based Saga (Most common)
=> There is a central Saga Orchestrator service.
=> It coordinates all steps and decides what to do next.
=> Easier to manage and debug.
Choreography-based Saga
=> No central coordinator.
=> Services listen to events and react automatically.
=> More loosely coupled but harder to manage.

Key Advantages:

Maintains eventual consistency
More resilient than 2PC (Two Phase Commit)
Each service remains independent

Disadvantages:

More complex to implement
Need to write compensating logic for every step

TOPICS

Featured Post

Spring Framework basic interview Q&A

Popular Posts

--------------------------------------------------------------------------------------------------------

Payment Wallet System - Deep Dive (Most Important)

Why do we need Sessions?

Example:

How Does 2PC Work?

Simple Analogy:

How Outbox Pattern Works:

In Context of Payment Wallet System:

How Distributed Tracing with Zipkin Works:

STAR Answer

Key talking points:

What is an Idempotency Key?

Simple Real-World Example:

Who Should Send the Idempotency Key?

Why Care About Idempotency Key Even Though This is Backend-Only Project?

1. Real-World Payment Systems Always Face Retries

2. Backend Should Be Idempotent by Design

3. Current Risk in the Project

How Backend Can Handle Duplicates Without Client-Sent Idempotency Key

Best Approaches (in order of effectiveness):

Pros (Advantages)

Cons (Disadvantages)

What the Current System Has for Handling High Load:

Current Limitations (Honest Answer):

How I Would Scale It for Millions of Requests:

What is Database Sharding?

Why Sharding is Needed in Your Project?

How Would You Implement Sharding in This Project?

Pros & Cons of Sharding (Senior Touch)

What is Kubernetes?

Simple Analogy:

Why Kubernetes is Important (Especially for this Project):

Real Practical Scenario:

Practical Example:

Scaling Roadmap for Payment Wallet System

Current State (As Implemented)

Phase-wise Scaling Strategy

Key Architectural Improvements

Priority Order (What I Would Do First)

The Problem it Solves:

How Saga Pattern Works:

Two Types of Saga: