Binstock on Software: jacobin

Showing posts with label jacobin. Show all posts

Monday, February 26, 2024

Jacobin JVM at 30 Months

This month, the Jacobin JVM project reaches the 30-month milestone, with release 0.4.0. Because for the last six months Richard Elkins (@texadactyl) and I have been working together daily on features, we've made good progress. Our goal is before year-end to have it run a standard set of benchmarks. After that, we'll begin to ask for volunteers to test Jacobin with their code. As ever, the larger goal is to deliver a more-than-minimal JVM written entirely in a single language (go).

To be honest, Jacobin is already much more than minimal, but we want to get it closer to feature parity with the HotSpot JVM, which is the JVM that ships in OpenJDK. During the past six months, we've added:

* exception handling, both caught and uncaught exceptions and errors. For uncaught errors, we try to provide somewhat more detail about the exception than does the HotSpot JVM. However, for users who prefer HotSpot's exact wording, we provide the -strictJDK command-line option, which uses the exact same wording as HotSpot.

* improved diagnostic data in trace logs. Prior to this release, out trace logs were focused on the bytecode instructions, showing the class, method, bytecode and the top of the operand stack (TOS). We now print out the entire operand stack with each bytecode instruction so that we can watch data items move up and down the stack as pushes and pops move them. While this generates huge trace listings, it lets us watch the execution of classes in a real-time document.

* handling methods with a variable number of arguments

* static initializer blocks. Initiatlizer blocks are rarely used by developers, but crucial to the operation of the JVM. At the language level, they're blocks of code between {{ and }} or in freestanding blocks of code between marked static{ ...code here... }. They're most often used to initialize static variables. The code blocks are executed before any code in a class, even before a constructor. Inside the JVM, they appear when classes use static variables, which means frequently. And they can entail complex chain reactions in which they need to instantiate other classes and run their static initializer blocks.

* revised architecture. One of the confounding aspects of working on a system with so many discrete subsystems that must all interoperate in a carefully choreographed process is that it's difficult to anticipate the exact shape and interfaces a subsystem must have when it's first designed. In part, that's because we generally cannot implement all the features right away--only the essential ones. Gradually, as Jacobin moves forward, earlier decisions to not include certain lesser-used features need to be revised. In this release, we revised how we look up methods and how we handle static variables. In both cases, we simplified existing code.

Hacker News

Jacobin JVM made the front page of Hacker News. That post by Ye Lin Aug, generated 184 interesting comments. We appreciated this unexpected coverage and did our best to answer the many questions.

What's next

In the next six month sprint, we are hopeful that we can:

* implement all remaining bytecodes except INVOKEDYNAMIC, which will surely take us longer to complete

* implement java.lang.Class: there are several Java classes that are so dependent on the JVM's design that every JVM needs to implement them by hand. These include classes for threads, debugging classes, and, of course, java.lang.Class...among others.

* add file I/O libraries (it might seem odd to see this here, but the JDK's file I/O libraries are native functions. We need to implement then in go. This will primarily be via use of the Facade design pattern, but there will likely be some additional coding required.)

* expanded work on handling JAR files. Presently Jacobin does handle JAR files. However, we want to make sure that code is robust enough to handle all details and forms of JAR files, so that execution never fails.

All of this in preparation for running benchmark suites and, eventually, soliciting alpha testers.

In the above text, I've referred to this milestone as a "release." The term is misleading. We're not creating a release, but just marking the code at this 2.5-year anniversary as v. 0.4.0. As discussed on the GitHub project site, we don't yet recommend you try Jacobin. However, by the end of the next sprint, we hope to start inviting folks to give it a try.

Testing

As discussed in previous posts, we're deeply committed to testing. Jacobin's test suites currently run a total of 708 tests, which include 597 unit tests and 111 integration tests. We'll be boosting these number significantly in preparation for inviting alpha testers.

Jacobin by the Numbers

At present, Jacobin consists of a production codebase of 15,814 lines (includes code, comments, and blank lines). The testing code consists of 24,178 lines plus 26,874 lines in the Jacotest test suite. This gives 50,912 lines of tests, which is 3.22x the size of the production code. Our eventual goal is a significantly greater multiple.

If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow us on Twitter (@jacobin_jvm)

Wednesday, August 09, 2023

Jacobin at the 2-year Mark

Jacobin (a JVM written entirely in Go) just reached its 2-year anniversary. Since our 18-month update, a lot has happened. We have:

Added instantiation of non-static classes
Added support for superclasses
Implemented the JDK’s native math libraries in Go
Added support for multidimensional arrays
Added support for compact strings
The interpreter now handles 190 bytecodes (out of 203)
Default to using the classes and libraries bunded with the OpenJDK
Significant instruction-level tracing capabilities (see below)

What we’re working on now and taking up shortly:

Making sure that our test suites generate the same results as the OpenJDK JVM
Adding the final bytecodes to the interpreter. (Some of these are very complicated, so they will likely take a while.)
Add exception handling
Add support for interfaces

Even before these goals are attained, we expect that to start running benchmarks and third-party test suites on Jacobin.

Much of the good progress we’ve made since our 18-month update is due to the addition of Richard Elkins (@texadactyl) to the team. He implemented the JDK’s native math libraries and has created a test suite, Jacotest, which grinds on existing and upcoming features.

Tracing and Peering into the JVM

Our progress remains very much aligned with the original goals for Jacobin: a JVM capable of running Java17 programs, written entirely in Go with no dependencies, delivered as a small executable from a cohesive, extensively commented codebase.

At present, Jacobin is a 3.9MB executable that is tested daily on Windows, Linux, and MacOS. Because it’s a single codebase, we have the pleasure of loading it into our IDE (GoLand, kindly provided by JetBrains) and stepping through the execution of a class bytecode-by-bytecode following the execution path across classes and libraries.

To give us a roadmap, we expanded our already detailed instruction tracing to show the values on the operand stack and other useful details. Here is a sample of the tracing log (available by specifying the -trace:inst option on the command line):

java/lang/StringLatin1 meth: inflate PC: 30, GOTO TOS: -

java/lang/StringLatin1 meth: inflate PC: 3, ILOAD TOS: -

java/lang/StringLatin1 meth: inflate PC: 5, ILOAD TOS: 0 int64 22

java/lang/StringLatin1 meth: inflate PC: 7, IF_ICMPGE TOS: 1 int64 22

java/lang/StringLatin1 meth: inflate PC: 33, RETURN TOS: -

java/lang/StringLatin1 meth: toChars PC: 14, ALOAD_1 TOS: -

java/lang/StringLatin1 meth: toChars PC: 15, ARETURN TOS: 0 Object

java/lang/String meth: toCharArray PC: 14, GOTO TOS: 0 Object

java/lang/String meth: toCharArray PC: 24, ARETURN TOS: 0 Object

main meth: main PC: 41, ASTORE TOS: 0 Object: &{{68288800 0} <nil> [{[I 0xc000004450}]}

main meth: main PC: 43, GETSTATIC TOS: -

(Some entries removed for simplicity.) In this listing, you see on the extreme left, the class name, the method name, the program counter (PC, which is the number of the bytecode being executed), the bytecode, and the value on the top of the stack (TOS). In this, TOS: 0 means there is one item on the stack (at position 0) and its type and value are shown immediately to the right (or on the next line in case of line wrapping).

Notice that in this excerpt, execution starts in java.lang.StringLatin1/inflate(), eventually returns to the calling function in java.lang.String, toCharArray(). When this completes, it returns to the main method in the class called main. which is loaded with a pointer to an object that consists of an array of integers (in this particular case, an array of chars that form a string)

Testing

As stated in our previous posts, we’re deeply committed to testing. Currently, Jacobin uses a testbed of 618 tests: 525 unit tests and additional 93 tests in the Jacotest suite. Even at this level, we’re not satisfied with the depth of coverage, and we expect to continue expanding the testing aggressively.

By the Numbers

Jacobin consists of 11,097 lines (this includes code, comments, and blank lines). The 525 unit tests represent 21,465 lines. The Jacotest suite consists of and additional 21,921 lines (mostly Java). This totals to 43,386 lines of testing code, which means our test code is currently 3.91x the size of our production code. We aim to increase that ratio as we move forward.

So, where do we stand?

We’re not quite ready for users to begin testing Jacobin. In this coming year, we aim to ship a release that you can try out and test with your own Java classes. At that point, we’ll pivot to improving performance. (If you want to jump the gun, though, you can always download the code and do a build. Instructions on the release page.)

If you want to help the project, we’d love a star on GitHub (this helps keeps our motivation high) and perhaps let others know about the project.

Tuesday, February 14, 2023

Jacobin JVM at 18 months

Earlier this month, the Jacobin JVM project (a JVM written in Go) reached its 18-month milestone. Since our post at the 12-month mark, we have added support for numerous Java bytecodes to the interpreter, including all the bytecodes for longs, floats, doubles and their operations, all the bit manipulations, and all operations on single-dimensional arrays of primitives. We've implemented 176 bytecodes at present and expect to finish up the remaining ones we need during the coming six months.

At present, Jacobin can execute simple static classes, which is enough to allow us to test functionality and to begin running benchmarks. While performance has not in any way been a goal during our work, as we get closer to finishing the interpreter, it will assume greater importance. @suresk is already sketching out an observability client, similar to VisualVM and other tools, to guide our optimization work.

Jacobin continues to meet our initial goals: it is written entirely in Go and has no dependencies. It runs fast and the executable is only 3.1MB (on Windows). It runs Java class files and JARs compiled by Java 7 through Java 17.

By the numbers

Jacobin's codebase consists of 25,813 lines (which include code, comments, and blank lines). As mentioned in earlier posts, we have a very deep commitment to testing as shown by the fact that this codebase includes 18,015 lines of testing code for the 7,798 of production code. This is a ratio of testing code to production code of 2.31x -- our highest to date (as we set out to do in earlier posts). Those 18K lines represent 429 unit and integration tests.

Easy Things You Can Do to Help

While Jacobin is still in pre-alpha mode, if you choose to build it or run one of the posted executables on GitHub, we’d love your feedback. We respond quickly to any and all feedback and questions. In this regard, Richard Elkins (@texadactyl) deserves our heartfelt thanks for running Jacobin on various test files and sharing his results with us.

If you’d just like to show your support for the project, we'd love a star on GitHub. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, please follow our handle (@jacobin_jvm) to keep abreast of what we’re doing.

Sunday, May 08, 2022

Jacobin JVM project after nine months

After nine months, Jacobin has been steadily moving forward. The biggest news of this quarter is that Spencer Uresk (@suresk on Twitter and suresk on GitHub) has joined the project. He's made his presence felt right away by rewriting how Jacobin loads JDK classes at start-up. Previously, we provided a curated set of JDK classes in the Jacobin distribution. These classes were searched for in the directory specified by JACOBIN_HOME. Spencer's improvement is that the classes are now loaded directly from the Java distribution on the runtime system. This means that if your system already has a JDK installed on it, all you need to run Jacobin is the single Jacobin executable file. Spencer has now turned his attention to running JAR files (because at present, Jacobin runs only individual class files).

While Spencer's working on that, I (Andrew Binstock) am continuing the work on the bytecode interpreter. Work is slow but steady and I aim to have it mostly complete by the end of this three-month cycle.

As of this quarter, we are compatible with classes through Java 17 (previously only through Java 11). We don't enforce sealed classes (Java 17's big new feature) but we can execute our test classes from Java 17 just fine.

By the numbers

Project size has risen from 17,588 lines in our codebase to 19,173, which consists of 6,383 lines of production code and 12,970 lines in 248 tests--a 2.003x ratio of test code to production code. We'll look to increase that ratio as we move forward. (It was at 1.4x at the three-month mark, and 2.10x at the six-month mark.)

How you can help

If you're interested in this project and you're on Github, we'd love a star. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, follow our handle (@jacobin_jvm). Thanks for your interest and support!

Friday, December 17, 2021

How the Jacobin JVM Accesses Methods

Executing methods is the principal activity of the JVM. There are many steps involved in finding, loading, executing methods correctly. The Jacobin JVM uses a variety of techniques to accelerate this process as described here. (To follow, you need to know just a little Java.)

Methods in class files

Java methods are stored in class files in a section where various kinds of class attributes are located. Each method contains instructions in the form of Java bytecodes. It also contains a series of attributes that provide additional execution information (such as data for handling exceptions, debugging info, etc.) Functions are stored by name and type, which are represented by indexes into an area of the class file called the constant pool. Those indexes ultimately point to strings in UTF-8 format (actually, a Java-specific variant of UTF-8). A typical example looks like this:

java/io/PrintStream.println:(I)V

This shows the usual println() method that prints an integer to the console. Note that the name of the class precedes the method name. The class name has transformed the usual . into forward slashes. The single dot demarks the method name, which is followed by a colon and the method signature. The part in parentheses indicates the parameter type (I=integer) and the V after the closing parenthesis indicates the return value, which here is void (V=void). Note to Java nerds: the signature of a method typically does not include the return value. It's specified here so that the JVM knows what to expect as a return value.

Extracting methods for use by the JVM

The classloader is a JVM subsystem that locates classes needed by the application, parses them, and places (or loads) the parsed data into an important area of the JVM called the method area. The method area, despite its name, holds entire classes. When an app requires a method, it looks into the method area and determines whether the class has been loaded. If not, it asks the classloader subsystem to locate and load the class into the method area. Once the class is there, the JVM looks through all the method, resolves the name and signature strings for each of the methods and sees whether they match the method being looked for. When a match is found, the bytecodes are loaded and executed. (If the method is not found, a runtime error results.)

This search can be extremely expensive. For example, the Java standard Class.class in Java 11 has 139 methods--that's potentially a lot of look-ups! To save time, most JVMs, including Jacobin, cache the method data once it's been looked up, so that the search is performed only once.

The Method Table

In Jacobin, the caching is done using a method table (see file MTable.go). When a method is invoked, Jacobin (like many JVMs), first checks the method table to see whether the method has previously been located and loaded. If not, then the search as described previously is performed. In Jacobin, the method is located and stored in the method table and then the look-up in the table is performed a second time and the result passed to the calling method.

Additional Considerations

Thread safety: The method table, like the method area, is a JVM-wide data structure. That is, all executing threads in the JVM can access it. As a result, it's conceivable that two threads would be updating the method table simultaneously. To avoid this problem, the table uses a mutex lock on every update.

Performance: While developing the many capabilities of a JVM, Jacobin is aiming for acceptable performance. Eventually, though we'll be working very hard to maximize performance. Some of the techniques we have in our notebooks for future enhancements (some of which are used in other JVMs):

For the main() class and other classes that might appear in the same JAR, loading the methods directly into the method table, rather than waiting for the initial method search to load them.
When a method is loaded into the method table, deleting it from the class entry in the method area. There is no need to have the same data in memory twice. Doing this, reduces the memory footprint of the JVM.
When a class's methods are searched for a match (that is, prior to an entry in the method table), if no match is found in the class, then the superclass must be checked. If that fails, then that super-class's superclass is checked and so on up the chain until java.lang.Object is reached at the top of the object hierarchy. A simple optimization is to give every loaded class a complete list of all the superclass methods with pointers to them, so that the JVM does not have to climb the hierarchy in its search, but can tell quickly whether the method exists or not.

There are surely other optimizations and refinements, which we hope to explore and to include if they lead to better execution.

Tuesday, November 02, 2021

Jacobin JVM project after three months

Development on Jacobin, the JVM written in go that supports Java 11, has been proceeding rapidly. In the 100 days since the beginning of the project, there have been 314 pushed commits. I'll give more stats below. Here's where we stand:

Jacobin can read, parse, format check, and load class files. This process happens very quickly. For example, running all these steps on one of the largest classes in the JDK distribution, BigDecimal.class, takes just 2ms. When parsed, BigDecimal has 1567 entries in its constant pool, 37 fields, and 167 methods. That's a huge class!

When a class is loaded by Jacobin or any other JVM, it necessarily pulls in other classes to be loaded. For example, all classes run from the command-line have a superclass. Often, that superclass is java.lang.Object, which depends on other classes. Among these are java.lang.Class and java.lang.String; various I/O classes are needed as well. The OpenJDK-based JVMs (essentially, all JVMs except IBM's J9 and some embeddable VMs) address this need by preloading hundreds of widely used classes at JVM start-up. For a look at the list of all the classes loaded just to display the JVM version info, run this from the command line:

java -verbose:class -version

On my Java 11 test system, this command preloads 381 classes (in 347ms!) While Jacobin does not need as many classes loaded to run the specified class, it needs a subset of them. The next step in the project is to identify the required classes and load them quickly. To this end, loading opertions (parsing and format checking) will need to be done in parallel. Fortunately, one of the go language's strengths is a rich set of easy-to-use resources for precisely this kind of concurrent operation.

After this task is completed, work will begin on execution.

Testing Thoroughly

One of the principal goals of Jacobin is to be a reliable JVM. This requires disciplined work in the planning, development, and testing. Development is based entirely in tasks which are logged in a cloud-instance of JetBrains' excellent tool, YouTrack (graciously provide for free). You can see the presence of this tracking, in that every commit on GitHub starts with the corresponding task name. (Presently, the most recent task is JACOBIN-89.) Quality of the code is reviewed by automatic linters on GitHub. Currently, the code merits an A+. The goreport badge on the jacobin GitHub project, takes you to the most recent report.

Testing is done on a near-fanatical basis. Let me explain:

In 2005, I was a contractor with Agitar, a now-shuttered company that made a tool which would read a Java codebase and generate unit tests for missing areas of coverage. It worked great. In conversations with their sales engineers, they told me they used a back-of-the-envelope calculation to assess a company's commitment to testing. They compared the size of the test codebase to the production code. If the test codebase was 50% the size, the company had some commitment to testing. Over 80% was a clear and strong commitment to testing, and over 100% meant a deeply engrained testing culture.

The current code base of Jacobin consists of 8,342 lines (includes: code, comments, blank lines). Of those, 4,718 lines are in tests. That is, the testing codebase is 130.2% the size of the production code. The goal is to get that ratio even higher. Future quarterly updates will reveal our success in this effort.

Want to help?

It's always great to know a project is interesting to others. If Jacobin is interests you and you want to encourage its progress, a GitHub star is our preference. If you want to participate more directly, let me know in the comments, which are kept private. We also love code reviews, suggestions, and later on, we'll surely need folks to do testing. Whatever your interest, thanks for your time!

Binstock on Software