Sunday, August 20, 2006

Writing your own language -- How to choose a VM

Let's say you want to write your own language, possibly even your own domain-specific language (DSL) and you want to run it on a VM. Which VM do you target?

You might instinctively think of the JVM (especially because there are so many tools to help you target the JVM) or the .NET CLR. These choices have the benefits of being highly optimized platforms on which numerous, substantial libraries already run. However in many cases, they will not be the best choice. For several reasons: they're big and difficult for users who are not technical to install (and you can't bundle them with your app); they can't be embedded into other applications; they might not support features you need.

There are many interesting alternatives that are small, reasonably fast, and have active communities supporting them.

There are, of course, the Perl and Python VMs. As you'll quickly see upon careful examination, however, both of those VMs are intimately tied to the languages they run. (No suprise there.) In addition, neither VM was really designed for targeting by other apps, so info on developing languages for them is not widely/conveniently available.

Other VMs do provide extensive support for new language developers, because they know this is the only way for them to build community. Here are three among the many you could choose from:

Lua, which consistently is the fastest performing VM outside of the JVM. It's also small, open source, widely used, and easy to embed. It's also very well documented and supported by an active community.

Neko VM, small, easy to embed, very actively supported by its developer. Particularly amenable to embedding in C applications.

Pawn, designed primarily for embedding. Its own default language is a lot like C.

These VMs are all open source. Other VM candidates are listed here, although this list is far from complete.

Most of these VMs encourage your compiler to output not bytecodes but source code using their native language. In most cases, this is the right approach because 1) the VMs can optimize your code better than you can write in their bytecode format, and 2) it saves you a whole lot of aggravation by not having to learn the bytecode and the minutia of VM internals. (Some knowldege of VM interals will, of course, be necessary.)

That being said, you do you decide which VM fits your needs best? Some principal considerations include (more in the link below):

  • Does the VM have native support for the data items you need?
  • Does the VM support the language features you need (garbage collection, multithreading, tail recursion, etc.)?
  • Does the VM support the performance features you need (optimization, JIT compiler, etc.)?
Once you have found a virtual machine that suits your needs, you have only to check out the community behind it, to make sure you're not nearly or completely on your own.

Note: Lambda the Ultimate is a terrific site for aficionados of programming language development. Here is a link to a post on this topic that might shed further light. If this area interest you, definitely tag/bookmark the Lambda site.

Final note: JetBrains, the makers of the popular IntelliJ IDE, have a mechanism that can be modified to provide an IDE extension for your language.


Anonymous said...

You can bundle the JRE with your app, depending on your interpretation of the licence. That should be easier to understand when it's all GPL, hopefully later on this year.

Anonymous said...

Try adding the Tcl VM to your list. It is used at least by the L language and the Tcl language itself.

Anonymous said...

What about LLVM ?

Anonymous said...

I wonder, though, if all VMs don't overly recapitulate the semantics of the language for which they were originally designed. For instance, the JVM and the CLR embody concepts of "stack" and "heap" that make continuations harder to develop. The CLR (2.0+) has baked-in the concept of generic types, but that buys in to a whole system of types that not every language designer might agree with.