Tuesday, February 19, 2008

Restarting the Platypus And the Lessons Learned

As many of you know, I have spent much of my free time during the last 24 months working on an open-source project called Platypus. The project's goal is to implement a command language like TeX, which enables users to embed formatting commands directly into text and generate documents of typeset quality in PDF, Microsoft Word, and HTML. The aims of Platypus are to be much easier to use than Tex and to provide many features of interest to developers, especially for printing code and listings.

After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.

In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.


1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.

2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.

3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.

4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.


a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.

b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.

c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.


1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.

2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.

3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)

There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.

Words of Thanks

It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!


Anonymous said...

Since you are using JavaScript as the internal command language, have you considered "going Rhino" for the whole shebang?

Andrew Binstock said...

No, I haven't. Java has better tools than JS; I know Java better; and by staying with Java, I can reuse some of the code that's already been written. That being said, if I can find a way to unit-test JS-based routines and they don't cause too much havoc in terms of performance and memory consumption, I am likely to use JS for implementing some commands.

Anonymous said...

While it is a admirable project to do a follow on th TeX, I have to wonder who will use it?
While there is a user base for TeX it seems that they are mostly, shall we say, "legacy" users.
One would think that the next generation of academicians would be more inclined to use a graphical interface.
Also, there is the matter of time frame. It took Knuth the better part of 10 years just to get the basic program done. It does not seem much more efficient than he.
What will be useful 10 years from now?
Last, I would never consider using JS. While it is a widely know language, and it is easy to put in, I don't see that it is particularly applicable.
These are just my thoughts, and I would be interested in your opinion.

Andrew Binstock said...

@anonymous Thanks for a very thoughtful post. I appreciate your comments.

Agreed that if Platypus simply replaces TeX, it has little value. The legacy TeX crowd is not about to switch.

However, Platypus aims to go beyond the TeX document use case and address the following:

1) lack of tools for developers. Especially the lack of tools for documentation, which I believe is one of the key factors that so many projects have such poor or non-existent docs. The ability to handle code intelligently, provide special support for listings, and to create docs in PDF, HTML, and other formats should also help in using Platypus for documentation.

2) templating: Data fields can be read from a file and loaded into the document, enabling the creation of reports as well as live documents.

The graphical front-end will need to be addressed at some point. First with a Platypus-friendly editor and possibly eventually a GUI tool. Possibly LyX modified to output Platypus. That's a bit down the road at the moment, but I agree it's important.

The time frame for Platypus should be more favorable to me than it was to Knuth with TeX. He was starting from scratch. I can (and do) ride on his shoulders; plus he had to invent a whole font system, which is not an issue for me. Moreover, there are excellent back-end libraries for PDF document creation, as well as for other formats. Still, to succeed, other developers will surely need to participate in Platypus.

I am coming around to your opinion on JS, and am now looking more closely at Groovy as the language for scripting. Fortunately, there are choices ;-)

Anonymous said...

Ok, now I have to ask, why have you not done any visible design work.

I can't find even a high level design. While I am not a proponent of death by analysis, you do need some.

But how can any design be done when there is no functional specification? I am not a typesetting domain expert, I am a software engineer.

I hardly think typesetting is a moving target as far as a specification. Right now you need to stop and create a detailed specification that defines exactly what the program will DO specifically. Sorry, "but typesetting like TeX but better" won't work as a functional specification.
If I were to decide I wanted to help, I would have no idea how I could help you.
Then after you have a specification, which should not in any way address HOW it will be done, you should then create a design that at least names the components that are responsible for implementing the features in the specification.
After the the high level design is done, you need to define the interfaces between the components.
With no mention of how those components will work.

I know you know all this, and yet you haven't done it. Until you do you will just be drifting. You are way too caught up in tools and implementations, and I think you are not addressing the big picture.

If you do as I suggest, you will find it much easier to get help as well. If you have this design with clear interfaces, others can go off and build pieces of it. You must have a team if you ever expect to achieve this goal.

You know it's like that letter to the editor about patterns - he expected a pattern was a substitute for a specification. Even if his co-worker had know the facade pattern he still would not have gotten the API he wanted. (another big problem with patterns).

A pattern is not a specification, it is a means of implementing a specification.
The design follows the specification, the implementation follows the design.

Andrew Binstock said...

@anonymous. Thanks for your follow-up post. Your comments had the effect of making me get around to updating the requirements doc from my various notes. I've now posted the revised reqs on the Platypus site at: http://platypus.pz.org/PlatypusFunctionalRequirements-v2.01.pdf

You're right that I should also post the interfaces, especially for the plug-ins. I am still working on these and so won't have them posted for a little bit yet.

Thanks for the push to get this info published on the site.

Anonymous said...

You need not post this, but just as some additional comments - if you think about it the internet is just one giant specification.
Implementations come and go, but the spec is stable. There are 1000s of web server implementations, but there is only on specification.
RFC's never deal with implementations, only with what gets done, not how.

Moral: Spend more time up front creating good specification for your project, and it will have a long life. Implementations may come and go, but a good design lives on.

Anonymous said...

I've been following the project for a while, it's unfortunate that the project has to be restarted from the ground up but I'm still excited about using Platypus in the future. Anyway best of luck on the project, I'm looking forward to testing the new releases!

Ittay Dror said...

I think the gradle project got things right:

1. you can easily use any ant task
2. there's a model for defining dependencies, that uses ivy
3. plugins can provide model objects a-la maven's pom. they can also provide standard targets (so to build a standard jar, your build file is just 'usePlugin("java")'.
4. you can rewrite any target
5. each module has an object representing it. then, in the root build file (or any module's build file) you can manipulate these objects, rather than needing to create a build file in each (even if it is just 5 lines). but, if you do have a build file, it inherits methods & variables from its parent (the one that defined to use it).
6. build scripts are groovy, so you have methods, control structures etc.
7. cross module target dependencies. so you can generate sources in module M1, then compile module M2, then compile module M1.

Andrew Binstock said...

@ittay Interesting you should mention Gradle. I am in the process of evaluating it. Initial results looks promising. Thanks for the post.

Anonymous said...

i use itext's xml-scripting to produce a huge product-catalog which is a composition of text (descriptions), images and prices.

im not lucky with that and i guess that a platypus 1.0.x would be a much better hit of my needs ...

have you considered to integrate platypus with itext as a subproject/module or associated project ? i believe this would give it's polularity a boost.

i also use groovy which IMO is a much better choice than JS. but be warned. groovy makes things easy only if you have reached a certain level of knowledge. beyond that level is loss.

have a nice time !

Anonymous said...

you might also look at lout.wiki.sourceforge.net which is another typesetting system that addresses weaknesses in TeX.

Andrew Binstock said...

@Anonymous re Lout: When I started Platypus, Lout had been untouched for years by its developer, Jeffrey Kingston, and its output driver for PDF was broken. At the time, it was not a realistic option.

Lout has improved in the intervening period, as Kingston is more active once again. Lout is an improvement over troff and TeX, but it has constraints, which Platypus looks to transcend:

1) Lout emits only PostScript and text.
2) Lout has no scripting language.
3) Lout is advanced very slowly, with long gaps before bugs are fixed.
4) Lout has poor support for Windows.

#1 and #4 are deal-killers for me. And #2 is definitely problematic.

Still, as I said, it is an improvement over TeX and troff.