Jacobin JVM after 3½ years of Development
On February 28, we reached the 3½-year point in the construction of
Jacobin, the JVM written entirely in go. As it usually does, a six-month
anniversary prompts us to update the project status and publish the latest numbers about the code base.
What we just did
The past six months were spent primarily cleaning up the
code base, including rewrites of two significant subsystems: the interpreter
and the logging/tracing framework. While no one wants to spend project time on
rewrites, Richard Elkins and I concluded that while designs for the two subsystems
were good enough for earlier versions of Jacobin, they were not robust enough to carry
us into the next phase of the project. This was in large part due to the
organic way in which Jacobin has grown. As we understand the innards of the JVM
more deeply, we recognize that minor decisions made a while ago impose
constraints on us now that would have been difficult to foresee at that earlier time.
For example, our original interpreter was implemented as a
huge switch statement with 203 cases (one for each bytecode). This created a
single mega-sized function of more than 3,000 lines. Our tools had a hard time parsing such a large function and supporting refactoring
and other activities. We converted the huge switch into a dispatch table, with
function pointers pointing to 203 different functions. Now, each discrete
bytecode function is easily understood by our tools. This design is also more
efficient: the functions for the most frequently executed bytecodes remain in cache,
while the never-executed bytecodes remain outside of the cache. This frees up
cache for other code.
Our initial logging system had also outgrown its design.
Originally, it was structured by levels of granularity:
severe-warning-fine-finer-finest levels. Alas, I had no particular scheme to
identify what showed in fine vs. finer vs. finest, so it was never clear which
level to use to get the data we needed. Our new tracing framework works primarily
by topic: JVM initialization, class loading, bytecode execution, etc. Now, it’s
clear what gets logged where. Within some of those categories, limited control
over verbosity is provided.
One of the best aspects of Jacobin is the extremely detailed
bytecode tracing we can do. Using -trace=verbose
on the command line, shows every executing instruction along with the state of
the frame’s operand stack before the instruction is executed. This visibility
gives us a deep view into the running code. Sometimes, though, you just need to know which
bytecodes in which methods are being run--without the ancillary data. You can
get this with
-trace=inst
During the past six months, we’ve also greatly expanded the
port of JDK library native methods (that is, methods written in C++ and accessed via JNI) to
go. Because they're now implemented in go, we can call them faster than JNI would allow as well as stay true to
our mission of a 100% go implementation.
We also contributed a correction to the JVM specification
document. Rah!
What’s next…
Having redesigned key system and cleaned up code, we’re now
ready to tackle the last two unimplemented bytecodes: finish INVOKEINTERFACE and begin work
on INVOKEDYNAMIC. We hope to have them wrapped up in the next six months. And
then, we’ll fill in the features we need to successfully run third-party
benchmark suites. If we can run a full suite of benchmarks by end of year, we’ll be
happy. At that point, we can begin working on visibility (our raison d’être)
and performance.
By the numbers
The rewrite and refactoring slightly shrank the size of our codebase.
In the last six months, we’ve committed 721 times to the GitHub repository. Jacobin
now consist of 44,862 lines (which includes code, comments, and blank lines).
The Jacotest suite, which comprises most of our full-class tests consists of an additional 23,421 lines – giving the complete project 68,283
lines.
As we’ve discussed many times before, we’re intently
preoccupied with code integrity, so we do lots of testing. Currently, we run
836 unit tests and 180 end-to-end tests in the Jacotest suite with every non-trivial commit—for
a total of 1,016 tests. This compares with 867 tests at the end of the last
period. The unit tests comprise 22,684 lines. Adding the
Jacotest suite results in a total of 46,105 lines of test code exercising 22,178 lines of
production code. That is, our testing codebase is 207% the size of our
production code. Six months ago, it was 223%, so we’ve slipped a bit—but we’re
working on it!
Show your support
If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow platypusguy on Bluesky.
No comments:
Post a Comment