Binstock on Software: Java

Showing posts with label Java. Show all posts

Monday, February 26, 2024

Jacobin JVM at 30 Months

This month, the Jacobin JVM project reaches the 30-month milestone, with release 0.4.0. Because for the last six months Richard Elkins (@texadactyl) and I have been working together daily on features, we've made good progress. Our goal is before year-end to have it run a standard set of benchmarks. After that, we'll begin to ask for volunteers to test Jacobin with their code. As ever, the larger goal is to deliver a more-than-minimal JVM written entirely in a single language (go).

To be honest, Jacobin is already much more than minimal, but we want to get it closer to feature parity with the HotSpot JVM, which is the JVM that ships in OpenJDK. During the past six months, we've added:

* exception handling, both caught and uncaught exceptions and errors. For uncaught errors, we try to provide somewhat more detail about the exception than does the HotSpot JVM. However, for users who prefer HotSpot's exact wording, we provide the -strictJDK command-line option, which uses the exact same wording as HotSpot.

* improved diagnostic data in trace logs. Prior to this release, out trace logs were focused on the bytecode instructions, showing the class, method, bytecode and the top of the operand stack (TOS). We now print out the entire operand stack with each bytecode instruction so that we can watch data items move up and down the stack as pushes and pops move them. While this generates huge trace listings, it lets us watch the execution of classes in a real-time document.

* handling methods with a variable number of arguments

* static initializer blocks. Initiatlizer blocks are rarely used by developers, but crucial to the operation of the JVM. At the language level, they're blocks of code between {{ and }} or in freestanding blocks of code between marked static{ ...code here... }. They're most often used to initialize static variables. The code blocks are executed before any code in a class, even before a constructor. Inside the JVM, they appear when classes use static variables, which means frequently. And they can entail complex chain reactions in which they need to instantiate other classes and run their static initializer blocks.

* revised architecture. One of the confounding aspects of working on a system with so many discrete subsystems that must all interoperate in a carefully choreographed process is that it's difficult to anticipate the exact shape and interfaces a subsystem must have when it's first designed. In part, that's because we generally cannot implement all the features right away--only the essential ones. Gradually, as Jacobin moves forward, earlier decisions to not include certain lesser-used features need to be revised. In this release, we revised how we look up methods and how we handle static variables. In both cases, we simplified existing code.

Hacker News

Jacobin JVM made the front page of Hacker News. That post by Ye Lin Aug, generated 184 interesting comments. We appreciated this unexpected coverage and did our best to answer the many questions.

What's next

In the next six month sprint, we are hopeful that we can:

* implement all remaining bytecodes except INVOKEDYNAMIC, which will surely take us longer to complete

* implement java.lang.Class: there are several Java classes that are so dependent on the JVM's design that every JVM needs to implement them by hand. These include classes for threads, debugging classes, and, of course, java.lang.Class...among others.

* add file I/O libraries (it might seem odd to see this here, but the JDK's file I/O libraries are native functions. We need to implement then in go. This will primarily be via use of the Facade design pattern, but there will likely be some additional coding required.)

* expanded work on handling JAR files. Presently Jacobin does handle JAR files. However, we want to make sure that code is robust enough to handle all details and forms of JAR files, so that execution never fails.

All of this in preparation for running benchmark suites and, eventually, soliciting alpha testers.

In the above text, I've referred to this milestone as a "release." The term is misleading. We're not creating a release, but just marking the code at this 2.5-year anniversary as v. 0.4.0. As discussed on the GitHub project site, we don't yet recommend you try Jacobin. However, by the end of the next sprint, we hope to start inviting folks to give it a try.

Testing

As discussed in previous posts, we're deeply committed to testing. Jacobin's test suites currently run a total of 708 tests, which include 597 unit tests and 111 integration tests. We'll be boosting these number significantly in preparation for inviting alpha testers.

Jacobin by the Numbers

At present, Jacobin consists of a production codebase of 15,814 lines (includes code, comments, and blank lines). The testing code consists of 24,178 lines plus 26,874 lines in the Jacotest test suite. This gives 50,912 lines of tests, which is 3.22x the size of the production code. Our eventual goal is a significantly greater multiple.

If you'd like to show your support for Jacobin JVM, we'd love a ⭐ on GitHub. That helps keep our motivation high! If you want more frequent updates, please follow us on Twitter (@jacobin_jvm)

Wednesday, August 09, 2023

Jacobin at the 2-year Mark

Jacobin (a JVM written entirely in Go) just reached its 2-year anniversary. Since our 18-month update, a lot has happened. We have:

Added instantiation of non-static classes
Added support for superclasses
Implemented the JDK’s native math libraries in Go
Added support for multidimensional arrays
Added support for compact strings
The interpreter now handles 190 bytecodes (out of 203)
Default to using the classes and libraries bunded with the OpenJDK
Significant instruction-level tracing capabilities (see below)

What we’re working on now and taking up shortly:

Making sure that our test suites generate the same results as the OpenJDK JVM
Adding the final bytecodes to the interpreter. (Some of these are very complicated, so they will likely take a while.)
Add exception handling
Add support for interfaces

Even before these goals are attained, we expect that to start running benchmarks and third-party test suites on Jacobin.

Much of the good progress we’ve made since our 18-month update is due to the addition of Richard Elkins (@texadactyl) to the team. He implemented the JDK’s native math libraries and has created a test suite, Jacotest, which grinds on existing and upcoming features.

Tracing and Peering into the JVM

Our progress remains very much aligned with the original goals for Jacobin: a JVM capable of running Java17 programs, written entirely in Go with no dependencies, delivered as a small executable from a cohesive, extensively commented codebase.

At present, Jacobin is a 3.9MB executable that is tested daily on Windows, Linux, and MacOS. Because it’s a single codebase, we have the pleasure of loading it into our IDE (GoLand, kindly provided by JetBrains) and stepping through the execution of a class bytecode-by-bytecode following the execution path across classes and libraries.

To give us a roadmap, we expanded our already detailed instruction tracing to show the values on the operand stack and other useful details. Here is a sample of the tracing log (available by specifying the -trace:inst option on the command line):

java/lang/StringLatin1 meth: inflate PC: 30, GOTO TOS: -

java/lang/StringLatin1 meth: inflate PC: 3, ILOAD TOS: -

java/lang/StringLatin1 meth: inflate PC: 5, ILOAD TOS: 0 int64 22

java/lang/StringLatin1 meth: inflate PC: 7, IF_ICMPGE TOS: 1 int64 22

java/lang/StringLatin1 meth: inflate PC: 33, RETURN TOS: -

java/lang/StringLatin1 meth: toChars PC: 14, ALOAD_1 TOS: -

java/lang/StringLatin1 meth: toChars PC: 15, ARETURN TOS: 0 Object

java/lang/String meth: toCharArray PC: 14, GOTO TOS: 0 Object

java/lang/String meth: toCharArray PC: 24, ARETURN TOS: 0 Object

main meth: main PC: 41, ASTORE TOS: 0 Object: &{{68288800 0} <nil> [{[I 0xc000004450}]}

main meth: main PC: 43, GETSTATIC TOS: -

(Some entries removed for simplicity.) In this listing, you see on the extreme left, the class name, the method name, the program counter (PC, which is the number of the bytecode being executed), the bytecode, and the value on the top of the stack (TOS). In this, TOS: 0 means there is one item on the stack (at position 0) and its type and value are shown immediately to the right (or on the next line in case of line wrapping).

Notice that in this excerpt, execution starts in java.lang.StringLatin1/inflate(), eventually returns to the calling function in java.lang.String, toCharArray(). When this completes, it returns to the main method in the class called main. which is loaded with a pointer to an object that consists of an array of integers (in this particular case, an array of chars that form a string)

Testing

As stated in our previous posts, we’re deeply committed to testing. Currently, Jacobin uses a testbed of 618 tests: 525 unit tests and additional 93 tests in the Jacotest suite. Even at this level, we’re not satisfied with the depth of coverage, and we expect to continue expanding the testing aggressively.

By the Numbers

Jacobin consists of 11,097 lines (this includes code, comments, and blank lines). The 525 unit tests represent 21,465 lines. The Jacotest suite consists of and additional 21,921 lines (mostly Java). This totals to 43,386 lines of testing code, which means our test code is currently 3.91x the size of our production code. We aim to increase that ratio as we move forward.

So, where do we stand?

We’re not quite ready for users to begin testing Jacobin. In this coming year, we aim to ship a release that you can try out and test with your own Java classes. At that point, we’ll pivot to improving performance. (If you want to jump the gun, though, you can always download the code and do a build. Instructions on the release page.)

If you want to help the project, we’d love a star on GitHub (this helps keeps our motivation high) and perhaps let others know about the project.

Tuesday, February 14, 2023

Jacobin JVM at 18 months

Earlier this month, the Jacobin JVM project (a JVM written in Go) reached its 18-month milestone. Since our post at the 12-month mark, we have added support for numerous Java bytecodes to the interpreter, including all the bytecodes for longs, floats, doubles and their operations, all the bit manipulations, and all operations on single-dimensional arrays of primitives. We've implemented 176 bytecodes at present and expect to finish up the remaining ones we need during the coming six months.

At present, Jacobin can execute simple static classes, which is enough to allow us to test functionality and to begin running benchmarks. While performance has not in any way been a goal during our work, as we get closer to finishing the interpreter, it will assume greater importance. @suresk is already sketching out an observability client, similar to VisualVM and other tools, to guide our optimization work.

Jacobin continues to meet our initial goals: it is written entirely in Go and has no dependencies. It runs fast and the executable is only 3.1MB (on Windows). It runs Java class files and JARs compiled by Java 7 through Java 17.

By the numbers

Jacobin's codebase consists of 25,813 lines (which include code, comments, and blank lines). As mentioned in earlier posts, we have a very deep commitment to testing as shown by the fact that this codebase includes 18,015 lines of testing code for the 7,798 of production code. This is a ratio of testing code to production code of 2.31x -- our highest to date (as we set out to do in earlier posts). Those 18K lines represent 429 unit and integration tests.

Easy Things You Can Do to Help

While Jacobin is still in pre-alpha mode, if you choose to build it or run one of the posted executables on GitHub, we’d love your feedback. We respond quickly to any and all feedback and questions. In this regard, Richard Elkins (@texadactyl) deserves our heartfelt thanks for running Jacobin on various test files and sharing his results with us.

If you’d just like to show your support for the project, we'd love a star on GitHub. Knowing people are interested in Jacobin really helps keep our motivation and spirits high. If you're on Twitter, please follow our handle (@jacobin_jvm) to keep abreast of what we’re doing.

Friday, December 17, 2021

How the Jacobin JVM Accesses Methods

Executing methods is the principal activity of the JVM. There are many steps involved in finding, loading, executing methods correctly. The Jacobin JVM uses a variety of techniques to accelerate this process as described here. (To follow, you need to know just a little Java.)

Methods in class files

Java methods are stored in class files in a section where various kinds of class attributes are located. Each method contains instructions in the form of Java bytecodes. It also contains a series of attributes that provide additional execution information (such as data for handling exceptions, debugging info, etc.) Functions are stored by name and type, which are represented by indexes into an area of the class file called the constant pool. Those indexes ultimately point to strings in UTF-8 format (actually, a Java-specific variant of UTF-8). A typical example looks like this:

java/io/PrintStream.println:(I)V

This shows the usual println() method that prints an integer to the console. Note that the name of the class precedes the method name. The class name has transformed the usual . into forward slashes. The single dot demarks the method name, which is followed by a colon and the method signature. The part in parentheses indicates the parameter type (I=integer) and the V after the closing parenthesis indicates the return value, which here is void (V=void). Note to Java nerds: the signature of a method typically does not include the return value. It's specified here so that the JVM knows what to expect as a return value.

Extracting methods for use by the JVM

The classloader is a JVM subsystem that locates classes needed by the application, parses them, and places (or loads) the parsed data into an important area of the JVM called the method area. The method area, despite its name, holds entire classes. When an app requires a method, it looks into the method area and determines whether the class has been loaded. If not, it asks the classloader subsystem to locate and load the class into the method area. Once the class is there, the JVM looks through all the method, resolves the name and signature strings for each of the methods and sees whether they match the method being looked for. When a match is found, the bytecodes are loaded and executed. (If the method is not found, a runtime error results.)

This search can be extremely expensive. For example, the Java standard Class.class in Java 11 has 139 methods--that's potentially a lot of look-ups! To save time, most JVMs, including Jacobin, cache the method data once it's been looked up, so that the search is performed only once.

The Method Table

In Jacobin, the caching is done using a method table (see file MTable.go). When a method is invoked, Jacobin (like many JVMs), first checks the method table to see whether the method has previously been located and loaded. If not, then the search as described previously is performed. In Jacobin, the method is located and stored in the method table and then the look-up in the table is performed a second time and the result passed to the calling method.

Additional Considerations

Thread safety: The method table, like the method area, is a JVM-wide data structure. That is, all executing threads in the JVM can access it. As a result, it's conceivable that two threads would be updating the method table simultaneously. To avoid this problem, the table uses a mutex lock on every update.

Performance: While developing the many capabilities of a JVM, Jacobin is aiming for acceptable performance. Eventually, though we'll be working very hard to maximize performance. Some of the techniques we have in our notebooks for future enhancements (some of which are used in other JVMs):

For the main() class and other classes that might appear in the same JAR, loading the methods directly into the method table, rather than waiting for the initial method search to load them.
When a method is loaded into the method table, deleting it from the class entry in the method area. There is no need to have the same data in memory twice. Doing this, reduces the memory footprint of the JVM.
When a class's methods are searched for a match (that is, prior to an entry in the method table), if no match is found in the class, then the superclass must be checked. If that fails, then that super-class's superclass is checked and so on up the chain until java.lang.Object is reached at the top of the object hierarchy. A simple optimization is to give every loaded class a complete list of all the superclass methods with pointers to them, so that the JVM does not have to climb the hierarchy in its search, but can tell quickly whether the method exists or not.

There are surely other optimizations and refinements, which we hope to explore and to include if they lead to better execution.

Sunday, September 28, 2008

Banishing Return Status Codes

The most enduringly popular post on this blog is Perfecting OO's Small Classes and Short Methods, which presents a short series of stringent guidelines to help an imperative-trained developer master OO.

If I were to add one item to the list, it would be: Don't use return codes to indicate the status of an action. Developers trained in languages such as C have the habit of using return codes to indicate the success or the nature of failure of the work done by a function. This approach is used because of the lack of a structured exception mechanism. But when exceptions are part of the language, the use of status codes isa poor choice. Among the key reasons are: many status codes are easily ignored; developers will expect problems to be reported via the exception mechanism; exceptions are much more descriptive. And finally, exceptions enable return codes to be used for something useful--namely returning a data item.

Astute readers will note that in Java, null is frequently used as a return value to indicate a problem (as in Collections). This practice subverts the previous points, and it too should be avoided. Returning a null presents code with many problems it should not have to face. The first is the risk of a null-pointer blow-up because the return value was accessed without being checked. This leads to the code bloat of endless null value checks. A much better solution, which avoids this problem, is to return an empty item (empty string, empty collection, etc.). This too communicates that no data item fulfilled the function's mandate, but it does not risk the null-pointer problem, and it frequently requires no special code to handle the error condition.

Hence, if your OO code is characterized by heavy reliance on return codes (many of which I am certain are not checked), consider rewriting it in favor of exceptions and use return statements solely for returning non-null data items.

Monday, September 01, 2008

A Parameter-Validation Smell and a Solution

Last week, Jeff Fredrick and I did a day-long code review of Platypus. We used a pair-programming approach, with Jeff driving and I helping with the navigation. Eventually, we got into the input parser, which parses input lines into a series of tokens: text, commmands, macros, and comments. Macros can require a second parsing pass, and commands often require additional parsing of parameters.

Once you get a parser working well (that is, it passes unit and functional tests, and it handles errors robustly), you generally don't want to mess with refactoring it. Experience tells you that parsers have hideous code in them and wisdom tells you to leave it alone. However, we launched in.

A frequent cause of otiose code was my extensive parameter checking. Parameters were validated at every step as tokens passed through multiple levels of parsing logic. Likewise, the movement of the parse point was updated multiple tiems as the logic resolved itself back up the processing stack. This too had to be validated repeatedly.

Jeff came up with an elegant refactoring that I could not find in the usual sources. He created an inner class consisting of the passed variables, a few methods for validating them, and a few more methods for manipulating them.

This class was then passed to the methods in lieu of the individual parameters--thereby reducing the number of parameters to one or two. And because the class constructor verified the initialization of the fields, I need only to check whether the passed class was null, rather than validate each of the internal fields.

The effect was to reduce complexity of already complex code, enforce DRY, and place the validation of the variables inside a class that contained them--a set of small, but important improvements. And like many of the best refactorings, it seems obvious in retrospect.

So, if you find your class's methods are repeatedly validating the same parameters, try bundling them in an inner class along with their validation logic. You'll like the results.

Tuesday, June 03, 2008

The Handiest Java Book in Years.

One of the constant challenges I have as a Java developer is keeping up with the numerous good FOSS dev tools. I no sooner start testing one tool and adapting my project to it, when a new one comes along. Being an analyst and naturally curious, this new product (or new release) represents a constant temptation. Is it better than what I am using? How much effort is required to try it out? What does it do better? On and on.

I can put a lot of those concerns to rest now. I just received a copy of Java Power Tools from O'Reilly and it's exactly what I've been looking for. It contains deep explanations of the principal FOSS dev tools in 10 major categories. These explanations are not two- or four-page summaries, but in-depth expositions that provide crucial info on the strengths and weaknesses of the product. The author, John Smart, then provides detailed tutorial on using the product. It's clear he's spent lots of time exploring the dark corners of each tool. And he makes good use of that knowledge in his comparisons and comments on the products.

If you want to spend an hour or so coming up to speed on what a product is about before installing it (and without having to work through the usually limited docs), this book will get you there faster and enable you get an overview of a whole lot of tools quickly and with the assurance you have a clear understanding. Here are the tools that are covered, followed by the number of pages for each one in parentheses:

BUILD TOOLS: Ant (55), Maven (60)
SCM: CVS (20), Subversion (78)
CI: Continuum (24p) Cruise Control (19) LuntBuild (32) Hudson (19)
IM: Openfire (12)
UNIT TESTING: JUnit (20) TestNG (25) Cobertura (17)
OTHER TESTING: StrutsTestCase (10) DbUnit (44p) JUnitPerf (10) JMeter (20) SoapUI (22) Selenium (30( Fest (9)
PROFILING: with Sun tools (16) with Eclipse (15)
DEFECT MANAGEMENT: Bugzilla (20) Trac (35)
QUALITY: Checkstyle (20) PMD (18p) FindBugs (12) Jupiter (18) Mylyn (14p)

All told, 856 pages of crisp, well-written explanations. A must-have reference for the bookshelf.

Thursday, May 22, 2008

Is the popularity of unit tests waning?

Before getting into my concerns about whether unit testing's popularity has peaked, let me state that I think unit testing is the most important benefit wrought by the agile revolution. I agree that you can write perfectly good programs without unit tests (we did put man on the moon in 1969, after all), but for most programs of any size, you're likely to be far better off using unit tests than not.

The problem is that only a small subset of developers understand that. And recent data points suggests that the number of programmers who use unit tests is not exactly growing quickly. I'll list some of the data points below that I've been developing for my column in SD Times.

1) Commercial products on the wane. Agitar was a company whose entire fate was tied to the popularity of unit testing. Despite very good products, a free service to auto-generate unit tests for your code, and some terrific exponents (especially Alberto Savoia and Jeff Frederick) to tell their story, the company closed a down a few weeks ago, essentially having come to the conclusion that it could never be sold at a price that could repay investors. So rather than ask for more funding, it closed down. If unit testing were gaining popularity robustly, Agitar surely would have come to a different conclusion.

2) Few OSS products. Except for the xUnit frameworks themselves, few FOSS tools for unit testing have been adopted. The innovative Jester project, which built a tool that looked for untested or poorly tested logic, essentially stopped development a long time ago because to quote the founder, Ivan Moore, in a comment to me "so few sites are into unit testing enough to care about perfecting their tests."

3) Major Java instructors aren't teaching it. Consider this interview with Cay Horstmann, co-author of the excellent Core Java books. (He asks, "If so many experienced developers don't write unit tests, what does that say?" In speculating on an answer, he implies that good developers don't need unit tests. Ugh!)

4) Unit testing books are few and far between. I am seeing about one new one a year. And as yet, not a single book on JUnit 4, which has been out for nearly three years(!).

5) Alternative unit-testing frameworks, such as the excellent TestNG, are essentially completely invisible. I was at a session on scripting this spring at SD West and in a class of 30 or so, two people had heard of TestNG (the teacher and I).

I could speculate on causes, but I have no clear culprit to point to. Certainly, unit testing needs to be evangelized more. And evangelized correctly. The folks who insist on 100% code coverage are making a useful tool unpalatable to serious programmers (as discussed here by Howard Lewis Ship, the inventor of Tapestry). But, I think the cause has to be something deeper than this. I would love to hear thoughts from readers in real-world situations where unit testing has been abandoned, cut back, or simply rejected--and why.

It would be a shame to have unit testing disappear and its current users viewed as aging, pining developers hankering for a technology the world has largely passed by. That would return programmers to the tried-and-true practice of glassy-eyed staring at a debugger for hours--something I have not missed at all.

Wednesday, April 23, 2008

Perfecting OO's Small Classes and Short Methods

In The ThoughtWorks Anthology a new book from the Pragmatic Programmers, there is a fascinating essay called “Object Calisthenics” by Jeff Bay. It’s a detailed exercise for perfecting the writing of the small routines that demonstrate characterize good OO implementations. If you have developers who need to improve their ability to write OO routines, I suggest you have a look-see at this essay. I will try to summarize Bay’s approach here.

He suggests writing a 1000-line program with the constraints listed below. These constraints are intended to be excessively restrictive, so as to force developers out of the procedural groove. I guarantee if you apply this technique, their code will move markedly towards object orientation. The restrictions (which should be mercilessly enforced in this exercise) are:

1. Use only one level of indentation per method. If you need more than one level, you need to create a second method and call it from the first. This is one of the most important constraints in the exercise.

2. Don’t use the ‘else’ keyword. Test for a condition with an if-statement and exit the routine if it’s not met. This prevents if-else chaining; and every routine does just one thing. You’re getting the idea.

3. Wrap all primitives and strings. This directly addresses “primitive obsession.” If you want to use an integer, you first have to create a class (even an inner class) to identify it’s true role. So zip codes are an object not an integer, for example. This makes for far clearer and more testable code.

4. Use only one dot per line. This step prevents you from reaching deeply into other objects to get at fields or methods, and thereby conceptually breaking encapsulation.

5. Don’t abbreviate names. This constraint avoids the procedural verbosity that is created by certain forms of redundancy—if you have to type the full name of a method or variable, you’re likely to spend more time thinking about its name. And you’ll avoid having objects called Order with methods entitled shipOrder(). Instead, your code will have more calls such as Order.ship().

6. Keep entities small. This means no more than 50 lines per class and no more than 10 classes per package. The 50 lines per class constraint is crucial. Not only does it force concision and keep classes focused, but it means most classes can fit on a single screen in any editor/IDE.

7. Don’t use any classes with more than two instance variables. This is perhaps the hardest constraint. Bay’s point is that with more than two instance variables, there is almost certainly a reason to subgroup some variables into a separate class.

8. Use first-class collections. In other words, any class that contains a collection should contain no other member variables. The idea is an extension of primitive obsession. If you need a class that’s a subsumes the collection, then write it that way.

9. Don’t use setters, getters, or properties. This is a radical approach to enforcing encapsulation. It also requires implementation of dependency injection approaches and adherence to the maxim “tell, don’t ask.”

Taken together, these rules impose a restrictive encapsulation on developers and force thinking along OO lines. I assert than anyone writing a 1000-line project without violating these rules will rapidly become much better at OO. They can then, if they want, relax the restrictions somewhat. But as Bay points out, there’s no reason to do so. His team has just finished a 100,000-line project within these strictures.

Monday, April 07, 2008

Easy Does It With easyb

I just got back from the CITcon conference, which is the thrice-yearly confab of agile developers who use continuous integration (the "CIT" in the conference name). This was my second time at CITcon. It's an open-space conference that is--surprise!--free, and chock-a-block full of good information. The principal reason it's so informative is that anyone committed enough to CI to go to a conference has probably spent a lot of time thinking about how to solve problems of build and test at his/her site. And this concern and reflection on these issues is amply evident in the discussions in the hallways and the informal presentations.

All the sessions I attended were thought-provoking. But probably the most interesting was a presentation by Andy Glover, the president of Stelligent, an agile consultancy. He runs a great blog in which has been touting a tool called easyb, which enables you to script unit tests so that they describe a scenario (rather than a code feature) and then test for the expected result. I've read Andy's enthusiasm for easyb, but it wasn't until I saw him demo it that I understood what the excitement was about.

The key benefits are 1) you can show a non-programmer (like the manager who is expecting the software any day now) that you have written tests that match every one of his requirements--easyb enables you to do this by writing the test in near English language; 2) you can test at a slightly higher level than the unit test: rather than test tiny features individually, you can quite easily test a succession of conditions that are chained together.

This approach is called--a little misleadingly,--behavior-driven development; which was an immediate turn off for me. I really don't want to learn another x-driven development. I just want to do what I do better. And I think easyb might just be such a tool. So, don't worry about the name, and hop over to the easyb website for a quick look-see. You'll like what you find.

Tuesday, February 19, 2008

Restarting the Platypus And the Lessons Learned

As many of you know, I have spent much of my free time during the last 24 months working on an open-source project called Platypus. The project's goal is to implement a command language like TeX, which enables users to embed formatting commands directly into text and generate documents of typeset quality in PDF, Microsoft Word, and HTML. The aims of Platypus are to be much easier to use than Tex and to provide many features of interest to developers, especially for printing code and listings.

After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.

In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.

PROJECT AND DESIGN LESSONS

1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.

2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.

3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.

4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.

PROGRAMMING LESSONS:

a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.

b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.

c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.

OBSERVATIONS ABOUT TOOLS

1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.

2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.

3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)

There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.

Words of Thanks

It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!

Tuesday, December 11, 2007

Beautiful Code vs. Readable Code

For many years--decades actually--I was a big fan of beautiful code. I read almost everything by Brian Kernighan, Jon Bentley, and P. J. Plauger. This passion for elegant code was an attempt to re-create the rush I felt when I first read:

*x++ = *y++

in the C Programming Language. I'd never seen anything so beautifully succinct. It was luminous!

But as years passed, I read many clever algorithms, many impressive optimizations, many small tricks. And I got less and less charge from each of these discoveries. The reason, quite frankly, is that they almost always fell into one of two categories: some very elegant expressiveness in a new language (Ruby converts from Java can attest to this) or a technique that I'm not likely to ever use. In other words, I was chasing baubles.

In time, my esthetic sense turned to code clarity for its jollies. Today, if I can pick up a blob of complex code, read it in one pass, and accurately understand what it's doing; then I feel the rush again. I most often have this feeling when reading the code of great, non-academic developers. To be honest, when reveling in such moments, I frequently have the perception that my code is not like theirs. Even my best code doesn't quite snap together like theirs does. And I have wondered what I could do to improve my code clarity.

Kent Beck's new book, Implementation Patterns is a short handbook on code clarity. I have read much of it and already I recognize some bad habits that undermine my code's readability. Beck basically looks at typical coding issues and dispenses sage advice.

This means that some recommendations are best suited to beginners, while there are just enough of the other more subtle suggestions to keep the attention of a veteran who cares about clarity. For example, one poor habit I have developed without thinking about it too much is mixing levels of abstraction in the same method. So, for example (using Beck's example):

void process() {
input();
count++;
output();
}

Here the second statement is clearly at a different level of abstraction than the other two, which makes the code harder to read quickly. Beck proposes the following, which I agree is clearer.

void process() {
input();
tally();
output();
}

There are many other habits of mine that this book has illuminated. And in the half a dozen changes it will bring to my style, I think I have derived more benefit than in all the essays I've read on beautiful code.

Before leaving off, however, I should point out two caveats. The beginner-to-intermediate material dominates; so you'll need to skim over large parts of the text to extract the valuable nuggets. (However, this aspect makes it a great gift for junior programmers at your site.) A second point is that the book lacks for good editing. A book on code clarity should be pellucid; this one is not. (Consider the use of the word 'patterns,' which is highly misleading. It's not at all about patterns.) But these are forgivable issues. The book is a useful read if you share my appreciation of clear code.

Friday, May 18, 2007

Groovy Gaining Traction

Java developers suddenly have a wealth of choices when it comes to dynamic languages that run on the JVM. There's JavaFX, which Sun announced at JavaOne this year, and JRuby, which Sun expects to complete sometime this year, and then, of course, there's my favorite: Groovy. Groovy makes writing Java programs far easier. It essentially takes Java and removes the syntactical cruft, leaving a neat language that makes you terrifically productive.

Because Groovy took a long time getting out of the gate, it's taken some licks in the press. However, it's clear that Java developers are catching on to its benefits. The JavaOne bookstore published its daily top-10 sales during the show. The picture on this post, shows the Day 2 list with two Groovy titles in the top 10 (at places 5 and 8). Overall, the Groovy bible, Groovy in Action, came in at number 5 for the show. Interest is definitely growing.

If you haven't tried Groovy yourself, it's definitely worth a look. Here are a couple of good overviews:

The Groovy Getting Started Guide
Andrew Glover's Overview and Tutorial on DeveloperWork

Wednesday, May 16, 2007

Unit Testing Private Variables and Functions

How do you write unit tests to exercise private functions and check on private variables? For my projects, I have relied on a technique of adding special testing-only methods to my classes. These methods all have names that begin with FTO_ (for testing only). My regular code may not call these functions. Eventually, I'll write a rule that code-checkers can enforce to make sure that these violations of data hiding don't accidentally appear in non-test code.

However, for a long time I've wanted to know if there is a better way to do this. So, I did what most good programmers do--I asked someone who knows testing better than I do. Which meant talking to the ever-kind Jeff Frederick, who is the main committer of the popular CI server Cruise Control (and the head of product development at Agitar).

Jeff contended that the problem is really one of code design. If all methods are short and specific, then it should be possible to test a private variable by probing the method that uses it. Or said another way: if you can't get at the variable to test it, chances are it's buried in too much code. (Extract Method, I have long believed, is the most important refactoring.)

Likewise private methods. Make 'em small, have them do only one thing, and call them from accessible methods.

I've spent a week noodling around with this sound advice. It appeals to me because almost invariably when I refactor code to make it more testable, I find that I've improved it. So far, Jeff is mostly right. I can eliminate most situations by cleaning up code. However, there are a few routines that look intractable. While I work at find a better way to refactor them (a constant quest of mine, actually), I am curious to know how you solve this problem.

Monday, April 16, 2007

Update to my review of Java IDEs

My review of three leading enterprise Java IDEs appeared in later March in InfoWorld. I've received some nice comments on the piece and a few corrections. Most of the corrections come from Sun in my coverage of NetBeans 5.5. Here are the principal items:

I complained that NetBeans does not use anti-aliased fonts. I overlooked a switch that can turn on these fonts in Java 5. On Java 6, they're on by default, if your system is set with font smoothing on. (It's hard to figure why NetBeans does not default to these fonts on Java 5, as do Eclipse, IntelliJ, and most of the other IDEs.)
I recommended that potential users look at NetBeans 6.0 beta, because it adds many of the features that I complain are missing. Sun gently points out that the current version of 6.0 is not quite in beta yet, but should be in beta within the next few months. For the latest version and roadmap, go to netbeans.org
After extensive discussions, Sun convinced me that they have greater support for Java 6 than I originally gave them credit for. I originally wrote they provided 'minimal' coverage. In retrospect, I would say 'good' support for Java 6.
Finally, Sun was kind enough to point out that I give them too much credit in saying that they have deployment support for JavaDB. In fact, they provide only start/stop control from within NetBeans.

If there are further corrections, I'll update this post.