A Use For Java
I’m as surprised as you are that I found a use for Java.
There is no shortage of criticism for Java and the JVM ecosystem in 2026. Personally, even though I’m still relatively early on into my career, I’m already quite tired of Java. And it’s for likely all the same reasons that you would expect. Verbosity, fundamental coupling to object oriented design, heavyweight runtime, etc. etc. (it’s also prone to eating up all that RAM that I spent millions on)
As someone who spends a great deal of time learning about newer, emerging languages, it’s almost impossible to get excited about writing Java. I’ll never reach for it for a personal project, and when I discover a popular modern software project that’s written in Java, I’m almost confused. “Why didn’t they use Rust? If they wanted GC, why not Go?” Or some variation thereof. And of course, that view is a bit naive. Just because a language is new & exciting, and it has all these cool features that address the shortcomings of other languages, doesn’t mean that people can’t keep enjoying the existing languages.
Just because I don’t like Java, and believe that newer languages are cooler, shouldn’t mean that we should all give up on Java. Nor should it mean that we should frown upon new Java projects. Despite this, the question was always looming in the back of my mind: Is there a genuine use case for Java that another, newer language genuinely isn’t a good fit for? More informally: “when should you reach for Java?”
The historical argument was cross-platform support. That’s more or less a solved problem today; almost all modern languages have cross-platform support without a JVM runtime. Another historical argument may have been better userspace threading support, but that fell apart relatively quickly as well. I’m sure there are things that I’m missing; I don’t have the historical context of someone who’s been doing Java for 20 years.
But in my mind, the reason one would reach for Java was because their company wrote Java. It’s almost like when you would buy an Xbox because all your friends had Xbox. You would code in Java because the rest of your company’s codebase was already written in Java, and there was already all this investment in the ecosystem, and it was “battle-tested”. That never really felt like a good argument to me. It seemed very much like the sunk cost fallacy. The only way we would get more “battle-tested” languages would be to actually invest in the newer languages.
Recently, I took an interest in learning about dedicated batch processing and streaming architectures. I was going through the historical ‘big data’ papers, starting with MapReduce1, and working my way up to Dataflow2. Along the way, while I was reading about Apache Spark3 and Apache Flink4, I started to try to look through the code. The Apache Software Foundation is the foundational Java alliance; everything they do is built on the JVM. So I wasn’t surprised when I saw that Flink, Spark, Beam, all of these incredible projects were all in Java.
Then, the same questions started to surface again. “Why Java? If we want low latency, why use a language with a runtime? If we want high parallelism, why not choose a language with a more modern concurrency model? If we want to run heavy, CPU-intensive workloads with in-memory datasets, why use a language that is already so memory-intensive, with such high overhead?”
That isn’t to say that Java can’t do performance, or that its concurrency model is obsolete. There’s so much work you can do with GC tuning, JIT warming, and there are tons of amazing libraries and methodologies to write highly parallel Java programs. But my major qualm is that this is all extra work; things that you need to do on top of what the language gives you. By contrast, many modern languages feel like they give you these powers from the start, without much extra effort.
So, as many people like me might do, I started to think about what it would take to implement a system like Apache Spark or Apache Flink in another, newer language. Surely Rust could solve all of these problems, with so much better performance and all the nice ergonomics and superpowers that it provides. Why hadn’t someone thought of this sooner?
I ran into a problem very soon into the design phase. And even though it was a fundamental blocker for my initial idea, it was actually pretty exciting, because it was a problem that could only be solved with a very unique combination of components that only something like the JVM could provide.
The problem
Let’s take Apache Spark for example. At a high level, we want a distributed system of machines, that all work together to solve a problem. The user structures the problem in such a way that we can distribute computations among our cluster of workers, and have them all work separately but coordinate their efforts to achieve a result much quicker. Additionally, we want this distributed system to be adaptable; to take new work from the coordinator, and to not have to recompile/redeploy to solve the new problem. This way, the system is live; we give it more jobs as often as we’d like, and we never have to shut it down in between jobs.
This would be simple to do, if the types of computations that we were doing at the worker nodes was from a known, canonical set of instructions. For example, in a distributed database system, the coordinator (primary) can have replicas replicate state by sending an arbitrary sequence of commands to them (Write-Ahead Log). The replicas understand the format of the log, and even though the contents are in an arbitrary order (and contain arbitrary values), there is a canonical set of high-level instructions that will never change (INSERT, ALTER TABLE, etc.). Since the user speaks a declarative language (SQL), the coordinator and replicas are able to do this relatively simply.
Spark is different than that. Spark sends actual Java closures to workers. Within the closures, the user can write whatever Java code they want; imperative code. They aren’t forced to adhere to some API or canonical instruction set (other than JVM bytecode, technically). When you’re actually using Spark, you can literally provide a .jar file to the coordinator. The Jar of course contains Spark API code for the transformations & actions, but within those transformations, there’s arbitrary Java code that runs on the worker nodes.
What does it actually take to allow a arbitrary user-provided Jar to run across worker nodes (with underlying capacities & architectures that may be heterogenous)? If we take just a single closure, and want it to run across many machines, how do we do that without recompilation at the workers? There needs to be several things in place:
- Runtime reflection, to introspect into the closure object and pick apart references/state
- A Stable ABI, so that serializing/deserializing the closure object is straightforward
- The ability to run arbitrary bytecode at runtime at the workers, once they’ve deserialized the closure and loaded it into memory.
- An intermediate representation that is agnostic to the underlying worker’s actual native instruction set, if we have a heterogenous fleet
The JVM has all these things. It can use runtime reflection to introspect into closures, and can perform serialization/deserialization in a standard way. The JVM runtime is literally a bytecode interpreter with a JIT, so it can not only run arbitrary bytecode at runtime, it can even compile it to native code for better performance.
These are all fundamentally challenging things to do in non-JVM languages. In fact, an environment like the JVM is pretty much the only type of runtime that could support all of these things (that I know of).
Let’s rewrite it in Rust
I love Rust. I spend a lot more time writing Java, and a great deal of time in between wondering how much of the Java code I wrote would be better and faster if it was in Rust. But what’s so interesting is that in order to solve the above problem in Rust, we would essentially be just writing a JVM runtime all over again.
I spent a good deal of time trying to think about how I could solve this problem in Rust. What if we use a proc macro over the closure? I.e. pass the closure to syn to output a token stream, then run that? Not possible, unless we write a Rust interpreter. And what good would a Rust interpreter do? At that point, we might as well use Python. The same goes for spawning a subprocess. Maybe we could spawn a subprocess which compiles the closure/token stream, then runs it? But that defeats the point, which is to avoid recompilation at the workers.
If we wanted to run arbitrary runtime closures in Rust (at least in the same way as Spark), we would need all those listed features, none of which Rust has, by design. Even if the Rust ABI was stabilized tomorrow, the others are really hard challenges to solve if you want to remain in-process! And this is true of any language that ships a static binary. It’s due to the unique place that the JVM sits–between interpreting bytecode and running native code (via JIT)–that we’re able to do such a thing at all.
There is no shortage of projects today whose whole pitch is “let’s just take this existing thing and rewrite it in Rust.” I’ll be the first to admit that I’m much more inclined to look into a new software project if it’s written in Rust. And that’s not necessarily a bad thing. Modern languages that escape the overbearing weight of object oriented design are endlessly fascinating to read about and see in action. Especially when there are so many cases of projects that can achieve such excellent performance without sacrificing the quality of the developer experience, as with Rust. But even I must admit that there is a limit. You can’t just rewrite everything in Rust.
And this isn’t to say that Rust has a fundamental shortcoming here. It’s just an example of a sharp boundary that exists between the two languages, and the types of problems that they’re good at solving. Even as a fundamental Rust optimist, I have to acknowledge that the Venn diagrams of both languages isn’t just one big circle. There are problems that make one say “Maybe I should write this in Java”, even in 2026.