I Watched a Microsoft Java Champion Talk About K8s Efficiency. Here is What I Learned.

Why 60% of Java workloads on Kubernetes are wasting money and the 4 lessons from Microsoft & Akamas to fix it.

Java is the backbone of enterprise software, and Kubernetes is the operating system of the cloud. It would be logical to assume that by 2026, these two giants would work perfectly together out of the box. Yet, after attending a recent deep-dive session between Bruno Borges (Principal Product Manager at Microsoft and Java Champion) and Stefano Doni (CTO at Akamas), I realized that this assumption is costing companies a fortune in wasted resources and performance issues.

Analyzing telemetry from thousands of production JVMs, the data they presented was sobering: the vast majority of Java workloads on Kubernetes are running with default configurations that are actively damaging applications.

Here are the four critical lessons I took away from the session.

Lesson 1: “Container-Aware” doesn’t mean “Container-Optimized”

The first major misconception is about memory configuration. We are often led to believe that modern Java versions are “container-aware”, which implies the JVM sees container limits and configures itself optimally. In reality, the JVM is aware of the limits but operates on a conservative configuration. If an explicit heap size is not set, the JVM typically defaults to using only 25% of the memory limit for the heap.

This default configuration leads to a huge waste of resources. For instance, if a container has 1 GB of RAM, the JVM may allocate only 256 MB of heap size. This practice is common, with the majority of surveyed JVMs running with no more than 1 GB of RAM and 1 CPU.

“Yes, if you don’t set the heap size, you are wasting resources, and as we saw in the data, most containers have at least 1 GB of RAM and with 1 GB of RAM, you’re allocating 256 MB of heap size.” — Bruno Borges

This conservative default is a legacy of an era when the JVM was designed to share server resources and avoid consuming all available memory. However, in a containerized environment where resources are guaranteed, this conservative behavior prevents the JVM from maximizing resource utilization.

Lesson 2: “Micro-Containers” are a latency trap

The trend of using “Micro-Containers”, often assigning less than one CPU, creates a fundamental architectural conflict for Java. Java is a highly multi-threaded engine, relying on background threads for core tasks like Garbage Collection (GC) and the Just-In-Time (JIT) compiler.

You have to imagine the Kubernetes CPU limit not as a speed limit, but as a time quota. If you give a container half a CPU (500m), you get 50 milliseconds of runtime every 100 milliseconds. But Java is a Just-In-Time (JIT) compiled language: when compilation starts, it consumes that quota instantly.

If the JVM is not given enough CPU, all utilization may go to GC or JIT, preventing the business code from running. The result is application freezing and high latency, which developers often mistake for high user load.

“You think it’s because of too many users are hitting the web service when in reality it’s because the garbage collector is working and that’s where CPU throttling is happening.” — Bruno Borges

Lesson 3: Scaling out can make things worse

Adding more pods (horizontal scaling) can worsen performance if each pod is too small. The JVM is a highly multi-threaded engine that requires resources for internal management threads, such as the GC and JIT compiler. In a small container, these internal threads fight the business logic for limited CPU. If the JVM is not given enough CPU, it spends most of its time performing its own internal work, not the business work.

The data presented in the webinar showed that a strategy of using small containers and many replicas can actually result in resource waste.

“I would say that’s that’s okay, but do consider if running slightly bigger containers with slightly fewer instances will give you the same amount of resource utilization. You’re still using the same amount of CPU and memory, but will give you much better performance because now the JVM has room to breathe. The JVM has room to operate at a more stable level.” — Bruno Borges

A strategy of slightly increasing the CPU for existing replicas instead of drastically increasing the replica count is recommended for most of the JVM workloads.

Lesson 4: The “silent” GC switch

A significant trap lies in Garbage Collector (GC) ergonomics. The JVM automatically selects its default GC algorithm based on the amount of memory and CPU it detects. If developers test their applications in generous staging environments, the JVM is likely to use a high-performance GC, like G1GC. However, if that same code is deployed to small, “cost-optimized” production containers with insufficient CPU or memory, the JVM may silently downgrade to the single-threaded SerialGC.

SerialGC is rarely explicitly configured by developers. While it can be fine for most applications, it often leads to severe tail latency due to its “stop the world” mechanism. The core issue is the unexpected switch from the tested runtime to a different one.

“The default depends on how much memory and how much CPU you have, which again if it’s not enough CPU or memory you actually default to Serial GC, which can be great for most applications.” — Bruno Borges

This silent GC downgrade prevents developers from comparing performance against a consistent baseline.

Stop Guessing

The conclusion of the discussion wasn’t to suggest a single magic flag. The point is that manual tuning at scale has become impossible. You cannot solve the three-variable equation of throughput, latency, and cost by guessing.

This is where Akamas Insights comes into play. Instead of manually auditing YAML files, Akamas connects to your observability stack and runs a comprehensive, automated health check. It tells you immediately where you are wasting memory due to the “25% trap”, where CPU throttling is killing your latency, and how to right-size your containers to stop flying blind.

I Watched a Microsoft Java Champion Talk About K8s Efficiency. Here is What I Learned. was originally published in Javarevisited on Medium, where people are continuing the conversation by highlighting and responding to this story.

This post first appeared on Read More