Sunday, September 01, 2024

 Jacobin at the 3-year Mark

A Brief Look at the History

The project to build a JVM with go, Jacobin, has just reached its third anniversary. This is a good time to reflect a bit on the project. Three years ago, I started work on Jacobin with the belief that I could have a JVM that would run Java 11 code seamlessly in 18-24 months. I am a bit chastened by how much I underestimated the time line and the difficulty! (In addition, Jacobin has advanced to support for Java 17.)

Initially, like most JVM projects I believe, I wrote the parser for class files. Because Oracle's JVM spec is clear and greatly detailed about class anatomy, this work moved along quickly. Then came implementation of the interpreter: interpreting and running the bytecodes. 

This required a huge amount of code to get all the basics in place: not only the interpreter itself, but also the classloaders, method tables, frames and the frame stack, and many other small items--all of which had to be written accurately and integrated correctly with the other bits of the system. 

The interpreter is almost finished. We've implemented 202.5 of the 204 byte codes. INVOKEINTERFACE explains the decimal fraction--it's almost complete. The final remaining byte code is INVOKEDYNAMIC, which, we expect, will not be finished before year-end. Maybe longer, alas. It's a beast!

My principal collaborator, Richard Elkins, has made two enormous contributions (and many, many smaller ones). The first is he wrote (and keeps extending, dang it!) a suite of end-to-end integration tests that give Jacobin a heavy workout. These are the kinds of tests that throw and catch an exception 100 times, or throw, catch and rethrow, catch that and rethrow a dozen layers deep to uncover seams or not-quite-correct details. These tests (along with 700+ unit tests) have pushed us to refine the implementation and make sure everything works correctly.

That is, except in one area where we have struggled greatly and where Richard has been working steadfastly for a long time. This is an issue I entirely failed to anticipate: how much of the standard JDK libraries are written in native code--that is, not in Java.

At first, we chose to reimplement those methods in go. But that proved unsatisfactory for several reasons, most especially that one small native method could plug into a warren of rabbit holes, each with its own set of native functions that were all needed just to make the one original method work. 

Recently, we've begun experimenting with using the PureGo project to bridge from Jacobin to the native function libraries in the JDK. If we can get it to work suitably for our needs, it would remove the pressure to reimplement the core Java libraries in go.

The other unexpected problem (which we have solved) was static initialization blocks. They are a feature of Java that is almost never used by developers, but which is extensively employed by the JVM and the JDK libraries. Unfortunately, the JVM spec barely mentions static initialization blocks and we had to figure this out somewhat blindly until we were able to work out the details and get the blocks running as expected by the JDK.

Our goal now is to finish the last of the bytecodes, address the open seams revealed by Richard's test suite and then run the Computer Language Benchmarks. Once those are running, we'll begin inviting folks who follow the project to do early testing. (If you want to be among those early testers, give us a star on GitHub and open a ticket. Anyone who's opened a ticket with us is automatically enrolled in the alpha and beta programs.)

What's the Motivation for all this Effort?

Initially, I started this project because I thought it would be great to expand my knowledge of Java by understanding the JVM better. It was an educational  passion project. But as the years have gone by, a new and more useful mission has emerged: Jacobin is the only Java 17-capable JVM written entirely in one language. This means that you can download the source code, compile it, and run it in your IDE and see Java instructions executing one at a time. That's really quite cool!

We've decided to explore making an elegant UI as a viewport into Jacobin, so that if you want to know exactly how your Java program executes, you can observe the whole thing at the level of detail you want. 

In the official JDKs (those based on OpenJDK), this is very difficult for the typical developer. Those JDKs are written in multiple languages and the code is difficult to follow. While heavily commented in places, those comments are aimed at JDK experts who have full contextual information. (This is not a critique, I should add. With 100+ developers working on the JDK, comments are necessarily oriented towards developers with the requisite knowledge and background.)

Jacobin, by contrast, is written in one language, heavily commented and, we believe, approachable by the average developer. The trade-off will be speed and a few missing features, mostly detailed on the project's GitHub status page

That page also shows the overall project status. 

The Last Six Months

Jacobin 0.5.0, which first saw life on 28 Feb of this year added: all missing byte codes other that INVOKEINTERFACE and INVOKEDYNAMIC, implemented the full exception throw and catch mechanisms, interned Java strings, added the Java File I/O libraries, and began deep exploration of PureGo for calling native methods.

Testing

As we've discussed many times in these updates, we're close to fanatical about testing. We currently rely on 712 unit tests and 155 end-to-end integration tests. For a total of 867 tests (up from 708 six months ago). 

By the Numbers

Jacobin consists of 54,156 lines (includes codes, comments, and blank lines). The Jacotest suite consists of 28,486 lines, for a total project size of 82,642 lines. Of those, 25,588 lines are production code and 57,054 are testing code. This means our testing code is 223% the size of production code. This ratio is down from six months ago, when we were over 300%. This decrease is due to the great number of native functions we've translated into go but have not tested extensively until we find the definitive approach to integrating those functions with Jacobin (as touched on briefly above). Once that's worked out, those functions will see our usual heavy testing. 

If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow us on Twitter (@jacobin_jvm)



Monday, February 26, 2024

Jacobin JVM at 30 Months

This month, the Jacobin JVM project reaches the 30-month milestone, with release 0.4.0. Because for the last six months Richard Elkins (@texadactyl) and I have been working together daily on features, we've made good progress. Our goal is before year-end to have it run a standard set of benchmarks. After that, we'll begin to ask for volunteers to test Jacobin with their code. As ever, the larger goal is to deliver a more-than-minimal JVM written entirely in a single language (go).

To be honest, Jacobin is already much more than minimal, but we want to get it closer to feature parity with the HotSpot JVM, which is the JVM that ships in OpenJDK. During the past six months, we've added:

* exception handling, both caught and uncaught exceptions and errors. For uncaught errors, we try to provide somewhat more detail about the exception than does the HotSpot JVM. However, for users who prefer HotSpot's exact wording, we provide the -strictJDK command-line option, which uses the exact same wording as HotSpot. 

* improved diagnostic data in trace logs. Prior to this release, out trace logs were focused on the bytecode instructions, showing the class, method, bytecode and the top of the operand stack (TOS). We now print out the entire operand stack with each bytecode instruction so that we can watch data items move up and down the stack as pushes and pops move them. While this generates huge trace listings, it lets us watch the execution of classes in a real-time document. 

* handling methods with a variable number of arguments

static initializer blocks. Initiatlizer blocks are rarely used by developers, but crucial to the operation of the JVM. At the language level, they're blocks of code between {{ and }} or in freestanding blocks of code between marked static{ ...code here... }. They're most often used to initialize static variables. The code blocks are executed before any code in a class, even before a constructor. Inside the JVM, they appear when classes use static variables, which means frequently. And they can entail complex chain reactions in which they need to instantiate other classes and run their static initializer blocks. 

* revised architecture. One of the confounding aspects of working on a system with so many discrete subsystems that must all interoperate in a carefully choreographed process is that it's difficult to anticipate the exact shape and interfaces a subsystem must have when it's first designed. In part, that's because we generally cannot implement all the features right away--only the essential ones. Gradually, as Jacobin moves forward, earlier decisions to not include certain lesser-used features need to be revised. In this release, we revised how we look up methods and how we handle static variables. In both cases, we simplified existing code. 

Hacker News

Jacobin JVM made the front page of Hacker News. That post by Ye Lin Aug, generated 184 interesting comments. We appreciated this unexpected coverage and did our best to answer the many questions. 

What's next

In the next six month sprint, we are hopeful that we can:

* implement all remaining bytecodes except INVOKEDYNAMIC, which will surely take us longer to complete

* implement java.lang.Class: there are several Java classes that are so dependent on the JVM's design that every JVM needs to implement them by hand. These include classes for threads, debugging classes, and, of course, java.lang.Class...among others.

* add file I/O libraries (it might seem odd to see this here, but the JDK's file I/O libraries are native functions. We need to implement then in go. This will primarily be via use of the Facade design pattern, but there will likely be some additional coding required.)

* expanded work on handling JAR files. Presently Jacobin does handle JAR files. However, we want to make sure that code is robust enough to handle all details and forms of JAR files, so that execution never fails.

All of this in preparation for running benchmark suites and, eventually, soliciting alpha testers.

In the above text, I've referred to this milestone as a  "release." The term is misleading. We're not creating a release, but just marking the code at this 2.5-year anniversary as v. 0.4.0. As discussed on the GitHub project site, we don't yet recommend you try Jacobin. However, by the end of the next sprint, we hope to start inviting folks to give it a try. 

Testing

As discussed in previous posts, we're deeply committed to testing. Jacobin's test suites currently run a total of 708 tests, which include 597 unit tests and 111 integration tests. We'll be boosting these number significantly in preparation for inviting alpha testers. 

Jacobin by the Numbers

At present, Jacobin consists of a production codebase of 15,814 lines (includes code, comments, and blank lines). The testing code consists of 24,178 lines plus 26,874 lines in the Jacotest test suite. This gives 50,912 lines of tests, which is 3.22x the size of the production code. Our eventual goal is a significantly greater multiple. 

If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow us on Twitter (@jacobin_jvm)