Java Is Fast. Your Code Might Not Be

A deep dive into common Java anti-patterns that degrade performance, demonstrating how fixing them can lead to a 5x increase in throughput.
Part 1 of 3 in the Java Performance Optimization series. Parts 2 and 3 coming soon.
I built a Java order-processing app for a talk I gave at DevNexus a couple of weeks ago. The app worked. Tests passed. I ran a load test and collected a Java Flight Recording (JFR).
Before any changes: 1,198ms elapsed time, 85,000 orders per second, peak heap sitting at just over 1GB, 19 GC pauses.
After: 239ms. 419,000 orders per second. 139MB heap. 4 GC pauses.
Same app. Same tests. Same JDK. No architectural changes. And those numbers get a lot more meaningful when you consider that code like this doesnât run on a single box in production. It runs across a fleet.
In Part 2 Iâll walk through the profiling data behind those numbers: the flame graph, which methods were actually hot, and what changed when we fixed them. Before we get there, you need to know what kinds of things we were actually fixing.
The problems were patterns that show up in real codebases. They compile fine, they sneak through code review, and theyâre the kind of thing you could miss without profiling data telling you where to look. Here are eight of them.
TL;DR: Fixing anti-patterns like these turned a Java app that took 1,198ms into one that took 239ms. Here are some to look for and fix:
String concatenation in loopsâ O(n²) copying from immutabilityO(n²) stream iteration inside loopsâ streaming the full list per elementString.format() in hot pathsâ slowest string builder, parses format every callAutoboxing in hot pathsâ millions of throwaway wrapper objectsExceptions for control flowâ fillInStackTrace() walks the entire call stackToo-broad synchronizationâ one lock becomes the bottleneckRecreating reusable objectsâ ObjectMapper, DateTimeFormatter, Gson per call**Virtual thread pinning (JDK 21â23)**â synchronized + blocking I/O pins carriers
After fixing: 5x throughput, 87% less heap, 79% fewer GC pauses. Same app, same tests, same JDK.
1. String Concatenation in Loops
String report = "";
for (String line : logLines) {
report = report + line + "\n";
}
This code looks good, right? The problem is what String
immutability means in practice.
Every time you use +
, Java creates a brand new String
object, a full copy of all previous content with the new bit appended. The old one gets discarded. This happens every single iteration.
The characters being copied scale as O(n²). If you have 10,000 lines, iteration 1 copies roughly nothing, iteration 5,000 copies 5,000 characters worth of accumulated content, iteration 10,000 copies all of it. BellSoft ran JMH benchmarks on exactly this and showed that when n grows by 4x, the loop-concatenation version slows down by more than 7x, much worse than linear growth.
The fix:
StringBuilder sb = new StringBuilder();
for (String line : logLines) {
sb.append(line).append("\n");
}
String report = sb.toString();
StringBuilder
works off a single mutable character buffer. One allocation. Every append
writes into that buffer. One toString()
at the end.
Note: Since JDK 9, the compiler is smart enough to optimize "Order: " + id + " total: " + amount
on a single line. But that optimization doesnât carry into loops. Inside a loop, you still get a new StringBuilder
created and thrown away on every iteration. You have to declare it before the loop yourself, like the fix above shows.
2. Accidental O(n²) with Streams Inside Loops
for (Order order : orders) {
int hour = order.timestamp().atZone(ZoneId.systemDefault()).getHour();
long countForHour = orders.stream()
.filter(o -> o.timestamp().atZone(ZoneId.systemDefault()).getHour() == hour)
.count();
ordersByHour.put(hour, countForHour);
}
This looks reasonable. Youâre grouping orders by hour. But look at whatâs happening: for each order, youâre streaming over the entire list to count how many orders share that hour. If you have 10,000 orders, thatâs 10,000 iterations times 10,000 stream elements. Thatâs 100 million comparisons for what should be a single pass.
In my demo app, this exact pattern was the single largest CPU hotspot. It accounted for nearly 71% of CPU stack samples in the JFR recording.
The fix:
for (Order order : orders) {
int hour = order.timestamp().atZone(ZoneId.systemDefault()).getHour();
ordersByHour.merge(hour, 1L, Long::sum);
}
One pass. O(n). Each order increments its hourâs count directly. You could also use Collectors.groupingBy(... Collectors.counting())
to do it in a single stream pipeline, but the merge approach is clear and avoids the overhead of creating a stream at all.
If you see a .stream()
call inside a loop body, thatâs a signal to pause and check whether youâre doing redundant work.
3. String.format() in Hot Paths
public String buildOrderSummary(String orderId, String customer, double amount) {
return String.format("Order %s for %s: $%.2f", orderId, customer, amount);
}
String.format()
tends to get recommended as the clean, readable way to build strings. Yep, itâs readable and itâs also the slowest string-building option in Java when youâre calling it frequently.
Baeldung ran JMH benchmarks across every string concatenation approach in Java. String.format()
came in last in every category. It has to parse the format string every call, run regex-based token matching, and dispatch through the full java.util.Formatter
machinery. StringBuilder
was consistently the fastest.
The fix:
return "Order " + orderId + " for " + customer + ": $" + String.format("%.2f", amount);
Use String.format()
for the numeric formatting where you need it, and let the compiler optimize the rest. Or just use a StringBuilder
if you need full control.
String.format()
is fine for config loading, startup code, error messages, anywhere that runs infrequently. Move it out of anything your profiler says is hot.
4. Autoboxing in Hot Paths
Long sum = 0L;
for (Long value : values) {
sum += value;
}
Whatâs actually happening at the JVM level:
Long sum = Long.valueOf(0L);
for (Long value : values) {
sum = Long.valueOf(sum.longValue() + value.longValue());
}
Each iteration unboxes sum
to get a long
, adds, then boxes the result back into a new Long
object. With a million elements, youâve created a million Long
objects that the GC has to clean up. Each Long
on a 64-bit JVM takes roughly 16 bytes on the heap. Thatâs 16MB of heap churn for what should be a simple addition loop.
long sum = 0L; // primitive, not the wrapper
for (long value : values) {
sum += value;
}
Where this tends to sneak in: aggregation and processing loops. Summing metrics, accumulating counters, building stats. Boxed types creep in because someone used Long
in a collection signature somewhere upstream and nobody thought about what it costs downstream in the loop. That can be legitimately easy to miss.
Watch for Integer
, Long
, or Double
used as local loop variables or accumulators. Also watch for List<Long>
and Map<String, Integer>
in frequently-called code. Every .get()
and .put()
involves a box/unbox round trip that youâre paying for silently.
5. Exceptions for Control Flow
public int parseOrDefault(String value, int defaultValue) {
try {
return Integer.parseInt(value);
} catch (NumberFormatException e) {
return defaultValue;
}
}
If this method is called in a tight loop with a meaningful percentage of non-numeric inputs, you have a performance problem that might not look like one.
The expensive part is Throwable.fillInStackTrace()
, which runs inside the Throwable
constructor every time an exception is created. It walks the entire call stack via a native method and materializes it into StackTraceElement
objects. The deeper your call stack, the more expensive this is. Imagine a situation in a framework like Spring where this can get very deep. Norman Maurer from the Netty pro
Source: Hacker News










