Binstock on Software

Jacobin JVM after 3½ years of Development

On February 28, we reached the 3½-year point in the construction of Jacobin, the JVM written entirely in go. As it usually does, a six-month anniversary prompts us to update the project status and publish the latest numbers about the code base.

What we just did

The past six months were spent primarily cleaning up the code base, including rewrites of two significant subsystems: the interpreter and the logging/tracing framework. While no one wants to spend project time on rewrites, Richard Elkins and I concluded that while designs for the two subsystems were good enough for earlier versions of Jacobin, they were not robust enough to carry us into the next phase of the project. This was in large part due to the organic way in which Jacobin has grown. As we understand the innards of the JVM more deeply, we recognize that minor decisions made a while ago impose constraints on us now that would have been difficult to foresee at that earlier time.

For example, our original interpreter was implemented as a huge switch statement with 203 cases (one for each bytecode). This created a single mega-sized function of more than 3,000 lines. Our tools had a hard time parsing such a large function and supporting refactoring and other activities. We converted the huge switch into a dispatch table, with function pointers pointing to 203 different functions. Now, each discrete bytecode function is easily understood by our tools. This design is also more efficient: the functions for the most frequently executed bytecodes remain in cache, while the never-executed bytecodes remain outside of the cache. This frees up cache for other code.

Our initial logging system had also outgrown its design. Originally, it was structured by levels of granularity: severe-warning-fine-finer-finest levels. Alas, I had no particular scheme to identify what showed in fine vs. finer vs. finest, so it was never clear which level to use to get the data we needed. Our new tracing framework works primarily by topic: JVM initialization, class loading, bytecode execution, etc. Now, it’s clear what gets logged where. Within some of those categories, limited control over verbosity is provided.

One of the best aspects of Jacobin is the extremely detailed bytecode tracing we can do. Using -trace=verbose on the command line, shows every executing instruction along with the state of the frame’s operand stack before the instruction is executed. This visibility gives us a deep view into the running code. Sometimes, though, you just need to know which bytecodes in which methods are being run--without the ancillary data. You can get this with
-trace=inst

During the past six months, we’ve also greatly expanded the port of JDK library native methods (that is, methods written in C++ and accessed via JNI) to go. Because they're now implemented in go, we can call them faster than JNI would allow as well as stay true to our mission of a 100% go implementation.

We also contributed a correction to the JVM specification document. Rah!

What’s next…

Having redesigned key system and cleaned up code, we’re now ready to tackle the last two unimplemented bytecodes: finish INVOKEINTERFACE and begin work on INVOKEDYNAMIC. We hope to have them wrapped up in the next six months. And then, we’ll fill in the features we need to successfully run third-party benchmark suites. If we can run a full suite of benchmarks by end of year, we’ll be happy. At that point, we can begin working on visibility (our raison d’être) and performance.

By the numbers

The rewrite and refactoring slightly shrank the size of our codebase. In the last six months, we’ve committed 721 times to the GitHub repository. Jacobin now consist of 44,862 lines (which includes code, comments, and blank lines). The Jacotest suite, which comprises most of our full-class tests consists of an additional 23,421 lines – giving the complete project 68,283 lines.

As we’ve discussed many times before, we’re intently preoccupied with code integrity, so we do lots of testing. Currently, we run 836 unit tests and 180 end-to-end tests in the Jacotest suite with every non-trivial commit—for a total of 1,016 tests. This compares with 867 tests at the end of the last period. The unit tests comprise 22,684 lines. Adding the Jacotest suite results in a total of 46,105 lines of test code exercising 22,178 lines of production code. That is, our testing codebase is 207% the size of our production code. Six months ago, it was 223%, so we’ve slipped a bit—but we’re working on it!

Show your support

If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow platypusguy on Bluesky.

Binstock on Software

Sunday, March 02, 2025

Jacobin JVM after 3½ years of Development

No comments:

Blog Archive

Contributors