Friday, December 17, 2021

How the Jacobin JVM Accesses Methods

Executing methods is the principal activity of the JVM. There are many steps involved in finding, loading, executing methods correctly. The Jacobin JVM uses a variety of techniques to accelerate this process as described here. (To follow, you need to know just a little Java.) 

Methods in class files

Java methods are stored in class files in a section where various kinds of class attributes are located. Each method contains instructions in the form of Java bytecodes. It also contains a series of attributes that provide additional execution information (such as data for handling exceptions, debugging info, etc.) Functions are stored by name and type, which are represented by indexes into an area of the class file called the constant pool. Those indexes ultimately point to strings in UTF-8 format (actually, a Java-specific variant of UTF-8). A typical example looks like this:

java/io/PrintStream.println:(I)V

This shows the usual println() method that prints an integer to the console. Note that the name of the class precedes the method name. The class name has transformed the usual . into forward slashes. The single dot demarks the method name, which is followed by a colon and the method signature. The part in parentheses indicates the parameter type (I=integer) and the V after the closing parenthesis indicates the return value, which here is void (V=void). Note to Java nerds: the signature of a method typically does not include the return value. It's specified here so that the JVM knows what to expect as a return value.

Extracting methods for use by the JVM

The classloader is a JVM subsystem that locates classes needed by the application, parses them, and places (or loads) the parsed data into an important area of the JVM called the method area. The method area, despite its name, holds entire classes. When an app requires a method, it looks into the method area and determines whether the class has been loaded. If not, it asks the classloader subsystem to locate and load the class into the method area. Once the class is there, the JVM looks through all the method, resolves the name and signature strings for each of the methods and sees whether they match the method being looked for. When a match is found, the bytecodes are loaded and executed. (If the method is not found, a runtime error results.)

This search can be extremely expensive. For example, the Java standard Class.class in Java 11 has 139 methods--that's potentially a lot of look-ups! To save time, most JVMs, including Jacobin, cache the method data once it's been looked up, so that the search is performed only once.

The Method Table

In Jacobin, the caching is done using a method table (see file MTable.go). When a method is invoked, Jacobin (like many JVMs), first checks the method table to see whether the method has previously been located and loaded. If not, then the search as described previously is performed. In Jacobin, the method is located and stored in the method table and then the look-up in the table is performed a second time and the result passed to the calling method.  

Additional Considerations

Thread safety: The method table, like the method area, is a JVM-wide data structure. That is, all executing threads in the JVM can access it. As a result, it's conceivable that two threads would be updating the method table simultaneously. To avoid this problem, the table uses a mutex lock on every update. 

Performance: While developing the many capabilities of a JVM, Jacobin is aiming for acceptable performance. Eventually, though we'll be working very hard to maximize performance. Some of the techniques we have in our notebooks for future enhancements (some of which are used in other JVMs):

  • For the main() class and other classes that might appear in the same JAR, loading the methods directly into the method table, rather than waiting for the initial method search to load them. 
  • When a method is loaded into the method table, deleting it from the class entry in the method area. There is no need to have the same data in memory twice. Doing this, reduces the memory footprint of the JVM.
  • When a class's methods are searched for a match (that is, prior to an entry in the method table), if no match is found in the class, then the superclass must be checked. If that fails, then that super-class's superclass is checked and so on up the chain until java.lang.Object is reached at the top of the object hierarchy. A simple optimization is to give every loaded class a complete list of all the superclass methods with pointers to them, so that the JVM does not have to climb the hierarchy in its search, but can tell quickly whether the method exists or not. 

There are surely other optimizations and refinements, which we hope to explore and to include if they lead to better execution.