Most Spring Boot Developers Still Can’t Answer These Interview Questions (Part 3)

These aren’t trivia questions. They’re the kind that reveal whether you’ve actually operated Spring Boot in production.

Over the last few years, I’ve noticed an interesting pattern in senior backend interviews.

The questions rarely start with Spring Boot.

They start with production incidents.

“Our service suddenly started returning 500s.”

“Authentication randomly stopped working.”

“An exception was thrown, but the database still committed.”

On the surface, these sound like framework questions. In reality, they’re debugging questions in disguise. The actual test is whether you know how to think when something is broken and the dashboard looks completely fine.

Here are five of those scenarios.

“Everything looks healthy. Why is the API returning 500s?”

The monitoring dashboard shows nothing alarming.

Traffic: 15,000 requests per minute. CPU: 35%. Memory: 45%. No OOM errors. No obvious exceptions in the logs. And yet customers are hitting 500 errors consistently.

Take 30 seconds. Don’t scroll yet. Where do you start?

Most engineers go straight to logs. That’s reasonable. But what if the logs show nothing useful — just connection timeouts and generic failures, no stack traces pointing anywhere interesting?

This is where the investigation gets more honest.

CPU was under 40%. Memory looked fine. Nothing in Grafana was alarming. It took an embarrassingly long time before someone suggested taking a thread dump — and there it was.

Every Tomcat worker thread was in WAITING, blocked on an internal HTTP service that had quietly started taking four seconds per response. The JVM wasn’t struggling. The threads were just sitting there, waiting.

That’s the core issue with thread pool exhaustion. CPU and memory being healthy tells you the JVM is fine. It says nothing about whether the threads are actually doing useful work. A Spring Boot application under Tomcat has a fixed number of worker threads — around 200 by default. If every thread is blocked waiting on a slow downstream call, new requests queue up. The queue fills. Requests time out. Everything reports 500, while your CPU sits comfortably at 35%.

The diagnostic isn’t more dashboards. It’s a thread dump — which shows what every thread is actually doing at a given moment.

Spring Boot Actuator’s /actuator/metrics endpoint also exposes HikariCP pool metrics by default: active connections, pending acquisitions, timeouts. These numbers explain what’s happening in a way that CPU graphs never will.

“An exception was thrown. Why did the database still commit?”

Here’s a method worth reading carefully before scrolling:

@Transactional
public void createOrder() {
saveOrder();
try {
paymentService.process();
} catch (Exception e) {
log.error(e.getMessage());
}
}

Does this roll back?

No. Catching the exception and swallowing it means Spring never sees it. From the transaction manager’s perspective, the method completed normally. The transaction commits.

The first answer is usually obvious. The second one isn’t.

Remove the try-catch and let the exception propagate. Most developers stop here, confident the transaction will now roll back. But Spring’s default rollback behaviour only triggers on unchecked exceptions — RuntimeException and its subclasses. If paymentService.process() throws a checked exception, the transaction commits anyway. You’d need @Transactional(rollbackFor = Exception.class) to change that. This surprises a lot of people the first time it costs them something in production.

Now change the scenario again. What if paymentService isn’t an injected bean at all — what if it’s just another method in the same class?

Spring wraps your bean in a proxy. When an external caller invokes createOrder(), the call goes through that proxy, which is what applies the transaction logic. But when createOrder() calls another method on this, it bypasses the proxy entirely. It’s a direct Java method call. Whatever transaction annotation lives on that inner method is completely ignored — the proxy never got involved.

The annotation isn’t magic. Spring only gets a chance to apply transaction logic when execution passes through its proxy. Once you understand that, a lot of “mysterious” transaction behaviour stops being mysterious — the call is just going around the proxy instead of through it.

“Authentication randomly stopped working for two endpoints”

A routine deployment goes out. Within minutes, reports start coming in.

/api/orders         ✅
/api/users ✅
/actuator/health ❌
/internal/status ❌

The authentication interceptor isn’t running for those two paths.

The initial instinct is path configuration — an exclusion pattern that’s too broad, or a mismatched URL. Worth checking. But here’s where it gets interesting: if the paths look correct, the problem isn’t configuration. It’s a misunderstanding of what interceptors actually intercept.

Spring MVC interceptors only apply to requests that pass through the DispatcherServlet. Requests handled by other servlets, static resource handlers, or requests resolved before reaching the dispatcher won’t trigger interceptors at all. If /actuator endpoints bypass the dispatcher — which they often do — an interceptor-based authentication strategy silently misses them. No error. No warning. Just no authentication.

Spring Security operates at the servlet filter level — earlier in the request lifecycle, before the dispatcher is involved. A filter sees every request regardless of how it’s handled downstream.

This leads to an architectural question worth sitting with: should authentication live in an interceptor at all?

For most production systems, the answer is probably no. Authentication is a security boundary. Security boundaries belong as early in the request pipeline as possible. Putting authentication in a Spring MVC interceptor means accepting that anything handled outside the dispatcher context is silently unprotected — and that any future routing change could silently open a gap.

“The scheduled job is running three times simultaneously”

A Spring Boot application with @Scheduled gets deployed to Kubernetes with three replicas.

Pod A  →  executes
Pod B → executes
Pod C → executes

Every pod runs its own JVM. Every JVM schedules the job independently. The job runs three times.

Take 30 seconds. How do you fix it?

My first instinct when I hit this was “just use ShedLock.” Acquire a distributed lock before running, release it after. Clean. Obvious. Done.

Five minutes later, someone asked: “What happens if the pod crashes after acquiring the lock?”

The lock is still held. No other pod will acquire it until it expires. Set the expiry too long and the next scheduled execution is skipped entirely. Set it too short and another pod might acquire the lock before the original job finishes — which was the original problem. There’s no clean answer. Lock expiry is a genuine tradeoff between availability and correctness, and the right value depends on how long the job actually runs and what concurrent execution would actually cost.

This is where idempotency stops being academic. Distributed locking reduces the probability of concurrent execution — it doesn’t eliminate every failure mode. If the job is idempotent, running it twice shifts the failure mode from “data corruption” to “redundant work,” which is dramatically easier to tolerate. If it isn’t idempotent, you’re depending on the lock to be perfect.

Distributed locks under failure conditions are not perfect.

Even with locking in place, the business logic should still be idempotent. The lock is a first line of defence. Idempotency is what keeps you safe when the first line doesn’t hold.

For jobs where correctness is critical and idempotency isn’t practical, a Kubernetes CronJob is often the cleaner architecture — coordination at the infrastructure level, rather than the application level. Solving duplicate execution is easy. Solving it correctly under failure, with defined behaviour for every crash scenario, is the actual problem.

“One configuration change took down production”

A timeout value gets changed in a YAML file. Deployment succeeds. Health checks pass.

timeout: 500ms  →  timeout: 5ms

Within two minutes, the entire service is failing. Every downstream call times out instantly because nothing can complete in five milliseconds.

@ConfigurationProperties with Bean Validation is the right starting point — @Validated with @Min and @Max constraints means the application won’t start if configuration violates those bounds. A 5ms timeout fails at startup, before any traffic is served.

But the deeper issue isn’t validation syntax.

Configuration changes regularly bypass the same engineering discipline applied to code. A YAML change rarely goes through code review. It rarely gets load tested in staging. It gets treated as a config tweak — not a code change — even though it can cause exactly the same damage.

The real risk is misconfiguration that passes validation but is still wrong for the actual system. No constraint annotation catches a timeout that’s numerically valid but wrong for the downstream service it’s throttling. What catches it is treating configuration with the same seriousness as code: review, staging, and explicit testing before it reaches production.

The Pattern Underneath All of This

None of these five scenarios are really about Spring Boot.

Thread pool exhaustion isn’t a Spring Boot concept. Proxy limitations aren’t specific to Spring. Distributed lock expiry is a general distributed systems problem. Configuration discipline is just good engineering.

Spring Boot is just the vocabulary.

Production thinking is the language.

Knowing the vocabulary helps you read the documentation. Knowing the language is what gets you through the incident at 2 AM — when the dashboard looks fine, the logs say nothing useful, and something is clearly wrong.

If Part 4 happens, it’ll be even less about Spring Boot — and even more about the kinds of failures that don’t show up in any tutorial.

Part of a series on system design, production engineering, and the interview questions that reveal how engineers actually think under pressure. And if you’re preparing for practical engineering interviews or trying to improve production-level thinking beyond just solving DSA problems, I’ve also been exploring platforms like PracHub.


Most Spring Boot Developers Still Can’t Answer These Interview Questions (Part 3) was originally published in Javarevisited on Medium, where people are continuing the conversation by highlighting and responding to this story.

This post first appeared on Read More