Binstock on Software: 2007

Tuesday, December 11, 2007

Beautiful Code vs. Readable Code

For many years--decades actually--I was a big fan of beautiful code. I read almost everything by Brian Kernighan, Jon Bentley, and P. J. Plauger. This passion for elegant code was an attempt to re-create the rush I felt when I first read:

*x++ = *y++

in the C Programming Language. I'd never seen anything so beautifully succinct. It was luminous!

But as years passed, I read many clever algorithms, many impressive optimizations, many small tricks. And I got less and less charge from each of these discoveries. The reason, quite frankly, is that they almost always fell into one of two categories: some very elegant expressiveness in a new language (Ruby converts from Java can attest to this) or a technique that I'm not likely to ever use. In other words, I was chasing baubles.

In time, my esthetic sense turned to code clarity for its jollies. Today, if I can pick up a blob of complex code, read it in one pass, and accurately understand what it's doing; then I feel the rush again. I most often have this feeling when reading the code of great, non-academic developers. To be honest, when reveling in such moments, I frequently have the perception that my code is not like theirs. Even my best code doesn't quite snap together like theirs does. And I have wondered what I could do to improve my code clarity.

Kent Beck's new book, Implementation Patterns is a short handbook on code clarity. I have read much of it and already I recognize some bad habits that undermine my code's readability. Beck basically looks at typical coding issues and dispenses sage advice.

This means that some recommendations are best suited to beginners, while there are just enough of the other more subtle suggestions to keep the attention of a veteran who cares about clarity. For example, one poor habit I have developed without thinking about it too much is mixing levels of abstraction in the same method. So, for example (using Beck's example):

void process() {
input();
count++;
output();
}

Here the second statement is clearly at a different level of abstraction than the other two, which makes the code harder to read quickly. Beck proposes the following, which I agree is clearer.

void process() {
input();
tally();
output();
}

There are many other habits of mine that this book has illuminated. And in the half a dozen changes it will bring to my style, I think I have derived more benefit than in all the essays I've read on beautiful code.

Before leaving off, however, I should point out two caveats. The beginner-to-intermediate material dominates; so you'll need to skim over large parts of the text to extract the valuable nuggets. (However, this aspect makes it a great gift for junior programmers at your site.) A second point is that the book lacks for good editing. A book on code clarity should be pellucid; this one is not. (Consider the use of the word 'patterns,' which is highly misleading. It's not at all about patterns.) But these are forgivable issues. The book is a useful read if you share my appreciation of clear code.

Sunday, December 09, 2007

Commercial CI server now available free

JetBrains, the folks behind many well-loved dev tools (such as IntelliJ IDEA and ReSharper) have announced that TeamCity 3.0, the just-released version of their continuous-integration server, is now available for free for teams of up to 20 developers. TeamCity's stock in trade is ease of use (and particularly good integration with IntelliJ IDEA).

Sunday, November 25, 2007

The Fallacy of 100% Code Coverage

While I love RSS for aggregating feeds from various blogs, nothing beats having an expert combing through articles and posts, culling the best ones. Few people, if any, do that culling better for software development than Andy Glover of Stelligent. His blog posts weekly collections of interesting links to tools and agile development topics. It's one of the first RSS items I read. (Fair warning, Glover makes his selections livelier by describing them with terms from the disco era.)

A meme that has appeared recently in his links is a curious dialog about the value of 100% code coverage in unit tests. It was not until I read those posts that I realized there were people still unclear on this issue. So, let's start at the foundational level: 100% code coverage is a fallacious goal. Unit testing is designed to provide two principal benefits: 1) validate the operation of code; 2) create sensors that can detect when code operation has changed, thereby identifying unanticipated effects of code changes. There is no point in writing tests that do not fulfill one of the two goals. Consequently, a getter or setter should not be the target of a unit test:

public void setHeight( float newHeight )
{ height = newHeight; }

This code cannot go wrong (unless you believe that your language's assignment operator doesn't work consistently ;-). Likewise, there is no benefit in a test as a sensor here. The operation of this code cannot change. Hence, any time spent writing a unit test for this routine is wasted.

Developers who can't resist the allure of 100% code coverage are, it seems, seduced by one of two drivers:

1) their coverage tool gives classes with 100% code coverage a green bar (or other graphical satisfaction). Describing this phenomenon, Cedric Beust, the author of the TestNG testing framework, writes in his just released (and highly recommended) book, Next Generation Java Testing, that this design of coverage tools is "evil." (p. 149) And later, he correctly piles on with: "Even if we were stupid enough to waste the time and effort to reach 100% code coverage..." The problem with coding for graphical trinkets is explained in the next driver.

2) if developers attain 100% code coverage--even at the cost of writing meaningless tests--they can be certain they haven't forgotten to test some crucial code. This viewpoint is the real illusion. By basing proper testing on 100% code coverage, the developer has confused two issues. It's what you're testing and how (so, quality) that determines code quality, not numerical coverage targets (quantity). Capable writers of unit tests know that some routines need validation through dozens of unit tests. Because these tests repeatedly work the same code, they don't result in greater test coverage, but they do result in greater code quality. By pushing for an artificial goal of 100% a developer is incentivized against writing multiple tests for complex code, in order to have the time to write tests for getters and setters. That surely can't be right.

Wednesday, November 07, 2007

Great Book for Programming Fonts

As I've learned from working on Platypus, programming font operations is one of the most complex and convoluted areas of software development you're likely to run into...ever. It's like driving to the river madness and drinking deeply, then walking around the desert naked for 40 days, all the while reassuring yourself that you must be making progress because you're still coding. It's awful.

The reasons are complex and numerous. First among these is that file formats are capricious things. Microsoft and Adobe have both published numerous font formats--some in response to market needs, others for competitive reasons, still others because of internal pressures. The second problem is that these formats are designed for use by font experts, not by developers. They often include cryptic parameters, tables within tables, and absolutely nothing that is clear or obvious save the copyright notice.

Third is the matter of encoding. There are numerous encodings of font characters. These too seem driven by reasons largely outside of need and formulated with no particular eye to future requirements. Try to figure out encodings for CJK fonts (Chinese, Japanese, Korean character sets), and you'll feel like walking around with your hair on fire. Even in simple encodings, difficulties arise. For example, Apple and Windows use different encodings in the basic character sets, which is why apostrophes in Mac-generated documents show up on some PCs as the euro symbol. Unicode? Foggettabout it. No font today implements close to all the characters. And those that come even halfway (of which none are free that I'm aware), they are huge multimegabyte propositions. In sum, fonts are a topic shot through and through with problems and treacherous details.

Until now, there has been no central reference that developers could turn to for help. Each new font (PostScript, TrueType, OpenType, etc.) required starting anew and learning the peculiarities from scratch . But a new 1000-page book by Yannis Haralambous, entitled Fonts & Encodings (from O'Reilly) has just appeared and it's the first real tie-line to sanity in the jungle of glyphs. It explains fonts, formats, and encodings in tremendous detail; along with elaborate discussions of tools. It is the defining book for technical users of fonts.

Before I discuss two limitations, I want to reiterate that this is a great book and nothing I say should override this view. However, it's not a developer-oriented book. Except for some SVG XML and some TeX, there is little source code. So, information on how to access font data and use it to lay out documents programmatically or just to print text is still left as a challenge to the reader (though the book gets you most of the way there). The book also discussed MetaFont in too much detail, in my view, because this format, which is now little used, is extensively described by its inventor, Donald Knuth. I'd have preferred more coverage of bitmap fonts, say, then re-presenting this info. But these two items aside, this is the book to get if you ever have to do anything with fonts. It'll give you hope; real hope.

Thursday, November 01, 2007

Most Popular CI Servers (An informal survey)

At CITCON Brussels last month, attendees were allowed to write up questions of any kind on a bulletin board and others could come by and post answer as they were moved to. One poll was: Which CI server do you use? The answers were, in order of popularity:

CruiseControl
Hudson
Anthill Pro, TeamCity (tied)

I don't have the exact vote counts, because the results were taken down before I could get the final tallies. But suffice it to say, that CruiseControl received the lion's share, Hudson a handlful, and Anthill Pro and TeamCity garnered 1 vote each.

This survey, of course, is not scientific. Despite the fact that it was a CI conference, the voters were self-selecting, and ThoughtWorks, which is the company behind CruiseControl, was well represented at the conference. (It is in fact a sponsor of CITCON.) So, high CruiseControl figures would be expected. (Even without this factor, though, I expect it would have placed first due to its quality implementation, wide industry adoption, and its considerable lead in plug-ins to various tools and packages.)

The Hudson numbers, however, are interesting and probably meaningful. Hudson is a comparative newcomer to CI. But it has been winning converts quickly due to its ease of use. If you have a small project or just want to test the waters of CI, Hudson might well be the server to use.

Anthill Pro is a high-end CI server that can be found in two versions: an OSS version and a paid version. It was not until this conference, though, that I discovered these are completely different products. They were built from different codebases and the OSS version was a one-time release that is not updated.

I was surprised that LuntBuild made no appearance, as not so long ago, its users were raving about its tremendous ease of use. Perhaps Hudson is stealing its thunder, or perhaps its users just weren't at CITCON. It's hard to say in a small poll.

Monday, October 29, 2007

CITCON Brussels 2007

I recently returned from CITCON, the continuous-integration conference. It's held three times a year, once each in the US, Europe, and Asia. Even though, CITCON lasts only one evening plus one full day, it was surely one of the most informative developer events I have ever attended. And I have been to a lot of developer conferences/shows/seminars.

What made CITCON so productive was several unusual aspects:

1) registration is limited to 100 attendees, so it has a human-sized feel to it. Unlike some shows where there are several hundred attendees in a single session, at CITCON, by the third session, I knew several of the attendees in the room from previous sessions. By the end, I'd been in sessions with about half the attendees, it seemed.

2) CITCON uses the open conference format, in which there is no pre-existing list of presentations. Rather, everyone gets together the first night and proposes topics they'd like to know more about or ones they'd like to present. Then attendees get to mark the ones they'd like to go to. The sessions with the most votes are slotted for the next day. You are free to drift in and out of the sessions as you wish. In future posts, I'll discuss this format in greater depth. However, it creates an interesting blend: some sessions are presentation, others are discussion. Of these, the discussions were consistently the most interesting. For example, one session I attended (I was one of six attendees) discussed the problem of delaying commits until after code reviews. How do you handle the opposing pressures of quality enforcement vs. timely commits? Various attendees explained what they had done at their sites and what had not worked. Finally, Patrick Smith from Agitar, expressed a method he had used in consulting, and the session moved to analysis of the benefits of that approach. This kind of fruitful sharing is near impossible at regular tradeshows except at BoF sessions, which often still lack the give-and-take of a shared problem.

3) The conference sessions all fit into one day. This remarkably changes your motivation. You show up early, you go to every session, you stay late, and you hang out afterwards for social hour to go over notes with other attendees, especially those who have interest/challenges in common with you.

Focused, informative, and no talking heads. Very cool. I'll be back. Next instance is in Denver in April.

Tuesday, October 09, 2007

Continuous Integration Servers

The authors of the recent book on Continuous Integration have undertaken a series of snapshots of CI servers, which will be a big help to everyone who is assessing CI.

There are many CI servers to choose from, as this table amply demonstrates. For myself, I am undertaking evaluations of three options; and after some preliminary research, I have decided on these finalists:

CruiseControl (the grand-daddy of them all and the defining CI server)
Continuum (because I work with Maven 2, this should be a good fit)
Hudson (admired for its ease of use)

I'll report on my findings, although the results will be colored somewhat by how well these tools work with my current project, whose CI needs are clear, but modest. Partly in preparation, I will shortly be hanging with the cool cats at CITCON, a CI conference being held October 19-20 in Brussels. Attendance is free but limited to 100 attendees. CITCON is the brainchild of Jeff Frederick and Paul Julius of CruiseControl fame, but it typically draws cognoscenti and users of other CI servers. The format is open, meaning that it's more of a series of BoF sessions than pure lecture. I'll report on the good stuff in future posts for those who can't make it to Belgium in time.

Thursday, September 27, 2007

New Uses of Virtualization: Slides Here

Earlier this week, I lectured at InfoWorld's Virtualization Summit on a topic that has interested me for a long time: uses of virtualization outside of the two principal use cases (server consolidation and developer testing of portability). Here is the slide deck from the presentation. It discusses security, training, demo's, desktop consolidation, and virtual appliances, among other uses.

Monday, September 17, 2007

From Ant to Maven

I spent part of the last week migrating Platypus from Ant to Maven 2. This is a migration I've been itching to do for a while. I don't much like Ant, because I find I spend far too much time struggling with it.

Maven, by comparison, works on a "convention not configuration" model that centers on a specific build sequence and an expected file layout for your project. Understand these two, and Maven makes builds simple and very rich. For one, Maven downloads all dependencies for utilities and reports you want to run as part of your build cycle. There is no more wrestling with Ant's dependency errors. In addition, Maven's end product (beyond the build's binaries) is a website that it re-creates on each run; it loads the site with reports and data about your build. So, you and the team always know where things stand with the project.

The one complaint I read about Maven 2 is that it's hard to find the info you need to set it up and use it. This is actually not the case, if you know where to look. Unfortunately, it takes a lot of digging before you find that two excellent 300-page PDF tutorials are available at no cost. Plus a great introduction. So for those who need to know, here are the links to Maven support docs:

Introduction to Apache Maven 2 (35 pages, good overview, and Getting Started)

Better Builds with Maven 2 (300 pages. Very good guide. PDF only.)

Maven, The Definitive Guide (270+ pages, HTML and PDF. The PDF looks like advance sheets for an O'Reilly book. Also excellent, and somewhat more detailed than the previous book.)

Those resources should solve nearly any issue you encounter. In an upcoming column in SD Times, I describe in greater detail the benefits I have found in migrating from Ant to Maven. Try Maven, you'll like it!

Sunday, September 09, 2007

So much for strong passwords...

Perhaps you think Fgpyyih804423 is a strong password. As discussed here, it took OphCrack 160 seconds to break it. That's scary!

Sunday, September 02, 2007

Ubuntu Everywhere?

I have long felt that desktop Linux would become a reality only when you could go to a Linux gathering and find no more than a third of the attendees at the command line. In other words, as long as users are frequently at the command line, the OS is not ready for a big share of the desktop. Desktop users require ease of use.

Earlier this summer, I was at O'Reilly's Ubuntu Live conference in Portland, and the Ubuntu tribe were almost all using the GUI interface. This inflection point confirms for me Ubuntu's claim as the desktop Linux distro. (The conference was especially enjoyable because of the lack of zealotry. It was simply a conclave of the interested with no excess of the us-against-the-world mentality--a factor which made it a far more rewarding experience.)

Having secured its place on the desktop, Ubuntu is trying to move to the server, where competition is much more intense, and where the desktop origins could help as well as hurt. Time will tell.

However, the desktop roots did not preclude Ubuntu's use in Microwulf, the first-ever supercomputer for less than $2500 and first-ever under the $100/Gigaflop threshold.

For your friends who want to try Ubuntu, but who are not geeks, I highly recommend an approachable, not-too-techie intro: Ubuntu Linux for Non-Geeks: A Pain-Free, Project-Based, Get-Things-Done Guidebook from the ever readable No-Startch Press.

Tuesday, August 21, 2007

Good, approachable book on SOA

SOA is becoming increasingly hot and lots of developers are wondering what they need to know to implement it without getting lost in the competing standards, the infinite implementation details, and the lack of robust tools.

To the rescue comes SOA for the Business Developer from Ben Margolis, which presents the core technologies of SOA without doing the usual deep dive to the lowest levels of detail. This refreshing approach enables you to read about the five central technologies (XML, XPath, BPEL, and the upcoming SCA and SDO) without it being a massive effort. These technologies are presented clearly (the style is remarkably readable) and each is highlighted with a few key examples consisting of working code. The purpose is not to convert you into an expert into any of these, but to give you enough familiarity that you understand how the pieces work, how they fit together, and from there how to go about writing a simple SOA application, should you want to.

The book is perfect for development managers who want to come up to speed on the SOA components and look at some code without getting dragged into minutia. It's also just right for real geeks who need talking knowledge of the same. The easy style, good examples, and compact size (300 pages) mean that you can go from 0-60 pretty quickly.

Recommended (with the hope that other volumes that provide such a gentle intro for existing developers will become more common).

Wednesday, August 08, 2007

Unboxing Gotcha in Java

In a recent blog post, Java wiseman Norman Richards, points out that this innocent-looking line of code contains a gotcha that most IDEs (he checked Eclipse, I checked IntelliJ) don't warn you about:

int sucker = thisMethodReallyReturnsAnIntegerNotAnInt();

Due to autoboxing (well, technically, unboxing), this code could set sucker to null, which pretty much guarantees the dreaded NullPointerException.

What Richards doesn't say is that this code is so innocent looking (and the bug possibility so little recognized), that almost no one would write a unit test for this assignment. At least, that is, until it blows up the first time.

Good catch. OK, back to work: Ctl-F "Integer"...

Thursday, August 02, 2007

Continous Integration Book is Out

The first book dedicated solely to continuous integration has just come out. I've been poring over it and have learned several things. The great dearth of useful documentation for CI implementations made me very hopeful that this book would give me a wealth of new insights.

Unfortunately, I was a bit disappointed. The book does a good job of explaining what CI is and why you should use it; and it's the text I'd rely on to sway a manager who needed convincing. But after these explanations, the book wanders around. I understand the problem: it's hard to talk about CI without giving examples for a specific CI server--of which there are so many. So the authors chose to talk about other topics: specifically, build and test issues. This is a good book for best practices for build and test cycles; but alas these are treated as disjoint topics from CI.

What more disappointed me was the lack of information on choosing a CI server. Many (but not all) CI packages are given very cursory discussions in Appendix B, where they share space with discussions of Ant and Maven. The book definitely punted here when it should have done right by its readers and really explained the differences and offered guidelines on choosing properly.

Finally, a personal grouse point. This book follows a fad in computer books of putting an epigraph at the start of every chapter. Properly chosen epigraphs should be 1) witty 2) incisive or 3) unexpected. Most of those in this book are prosaic. Do we gain anything from this quotation from Larry Bird "First, master the fundamentals." or from Henry Ford intoning "Quality means doing it right when no one is looking."?

Overall, I feel this could have been a great book. The thirst for this information is deep and the authors are knowledgeable. However, it didn't quite come together in this edition.

Tuesday, July 31, 2007

Needed Code Reminders

In a recent blog post, Larry O'Brien, points out that we need more varied types of tags in our code than the overworked TODO. He suggests one that enables you to mark code intended for a future feature (despite YAGNI). Which I understand, but would strongly tend to avoid. I think the XP folks have it right in hewing closely to YAGNI.

But Larry's larger theme comes up for me as well. There are some useful tags I can think of. The most urgent to me is one that you can put on suspected cruft. Something like: CRUFT. This would suggest that maintenance needs to be done to check if this code is in use anywhere, and if not then to delete it. Not just "in use" in the technical sense, but in the sense of doing anything that's still needed.

I've also been hankering for a more graded version of TODO. Such as TODO-NOW and then plain TODO. The latter would mean "todo sometime."

IntelliJ (and presumably other IDEs) enable you to do create these custom tags and have them recognized/processed by the IDE, which is a really helpful.

Thursday, July 12, 2007

A Limitation in Subversion. Any ideas?

Today, I was speaking with an expert at CollabNet (the Subversion folks) about a problem I have with Subversion. To my amazement, he told me Subversion can't handle my (apparently) simple problem.

When I work on Platypus, I start by checking out my code from the public Subversion repository. I put it in my local project directory, and work there. Along the way, I might write code that breaks existing tests. At some point before I get all tests working, I want to check my code into an SCM, so that I can get back to where I am right now, if something should go wrong. Best practices in SCM forbid committing code to a project directory if it breaks tests. So, the way to do what I need, it would seem, is to use a temporary repositorty to commit to until I'm finished coding. Then, when all tests pass, I can commit the code back to the original public Platypus repository.

But Subversion does not permit checking code from one directory into different repositories. The suggestion from CollabNet was to copy the code to a temporary directory from which I could commit it to a separate Subversion repository. This, of course, is a non-starter--you don't keep two live copies of the code you're working on. That makes for all sorts of errors.

Another suggestion was to create a branch in the public repository and commit the small changes to that branch, and eventually merge the branch when all tests pass. I don't like this either, because it forces me to check non-working code into a public repository--the whole thing I want to avoid.

The only solution I've come up with is to use a different SCM (Perforce) for my local repository. Check temporary code into it. Then when everything is copascetic, check the tested code into the public Subversion directory. This works, but it's hardly elegant.

How do you solve this problem or how do you avoid it to begin with? TIA

Monday, July 09, 2007

Wikipedia: The New Google?

The other day over lunch, x86 optimizing expert Rich Gerber articulated a change in his on-line searching that has also been showing up in my habits recently: I often search for a term in Wikipedia before I look in Google. And in many cases, my quest ends successfully in Wikipedia without recourse to Google or additional searching.

The difference is not so much in the quality of the information (although Wikipedia is demonstrably excellent), but in the quality of the links. For example, compare Wikipedia and Google results for Lua. Do Google first. At first blush, the results look pretty good. Then, search via Wikipedia. See those links, and you'll need no further convincing. (For the moment, I won't bring in the language tutorial and lots of other useful information found on the same page.)

As Gerber points out succinctly: Which would you rather have: a machine's interpolation of relevant links or links chosen by experts?

Wednesday, June 27, 2007

Office 2007: Getting the Hang of It

In mid-May, I decided to switch to Office 2007, fearing that I would eventually start receiving docs in the new formats and then be forced to migrate. The expected flow of docs has not materialized (except from a few contacts at Microsoft); however, I have persevered with Office 2007.

I mostly use three apps: Word (lots), Outlook (some, as I don't use it for email), and Excel (somewhat less). So, I know Word 2007 fairly well. Initially, I hated it and turned off the new silly ribbon. After heavily customizing the icon bar with the features I use most, I began finding the ribbon more useful, and now I leave it on all the time. In addition, my fingers are finding their way around the commands quickly. As time passes, my appreciation of Word 2007 deepens. There are many neat features in the user experience that are difficult to describe without doing a screencast. Suffice it to say that the new interface does eventually make you more productive and your documents more elegant. (I don't use the collaboration features very much, I should note.)

I'm going to stay with Word 2007, which I didn't expect--two weeks ago I was still cursing myself for migrating. Now that I am more productive and enjoying Word 2007, I can honestly ask myself: Would I recommend the upgrade to someone with similar needs? Probably not. The new features are certainly nice to have, but they're not compelling enough be the sole reason for migrating. Meaning that if for some reason, I were forced to switch back to Word 2003, I would manage fine-- but I would have to adjust to the absence of convenience features.

Thursday, June 21, 2007

I am one of the little green men

In addition to my regular work, I have begun writing a monthly column for GreenerComputing.com, a site that specializes in discussing news and issues relating to environmental matters in computing.

The principal green issues in IT generally fall into three areas:

power consumption
regulatory compliance
hardware disposal

As anyone who follows the news these days already knows, there's lots of buzz about green, so this is a fun area to working in and a good complement to my traditional areas of professional focus.

Unlike the mainstream green topics (consumer, building design, etc.), green IT makes sense only if the bottom-line aspects also make sense. So, green IT operates within narrow constraints. And because of this, it lacks the moralizing dimension and is instead purely pragmatic. I think I'll like this aspect.

Monday, June 11, 2007

Milestone 1 of Platypus shipped today

The open-source project I've been working on for months shipped its first milestone today. The project is called Platypus (for page layout and typesetting system) and it enables you to generate PDF docs (eventually HTML and RTF as well) from text files in which you embed formatting commands. It's reminiscent of TeX but updates many features from that system, adds ease of use (especially!), and eventually will add numerous report-friendly and developer-oriented features (such as language-sensitive code listings).

The current milestone is a small subset intended for early adopters who are interested enough to send feedback. It can do the following:

basic Type 1 fonts
bold, italics, underline, strikethrough
foreign characters (enough for French, German, Spanish)
left, center, right alignment and justified text
customizable page size
customizable margins
indented/unindented paragraphs
page-number footer
debugging features

Milestone 2 should ship in late Q4 2007.

Documentation, examples, source code, intended schedule, and other resources are available at the project website.

Wednesday, June 06, 2007

Great Nerd Stim!

Writing a parser represents a kind of trophy achievement in programming, in my opinion. For years, it meant using very quirky tools such as yacc, and (later) bison. In the last few years, however, a new generation of parser generators has emerged that have eased the task considerably. Among those are CUP and JavaCC.

No tool, however, has generated more excitement than ANTLR, which is celebrated by its users for its ease of use and its ability to generate parsers in several languages. The ANTLR site has plenty of info on how to use it, but there has long been an unfilled need for a book you can bring with you to read on a long flight (and imagine the cool languages you could write, which is the real the nerd-stim).

This month, however, the pragmatic programmers filled that gap with a new book, The Definitive ANTLR Reference: Building Domain-Specific Languages . It a very readable intro to writing parsers in general, and specifically to writing ANTLR-usable language specifications. Normally, such books are long, dry, pedantic ordeals that force you to write many small tests versions before you understand enough theory to start the actual work you want to do. This book takes away much of that drudgery and makes the topic truly approachable. Actually, it sort of lulls you into the false belief that you could write a new scripting language fairly easily. Writing and implementing language specs is still not easy, but with this book and ANTLR software, it's easier than it's ever been.

Monday, May 28, 2007

Comments--How many should you have?

While there is considerable conversation about how many unit tests to write (I have two recent posts on the topic--here and here--based on conversations with vendors), few people have much to say re how many comments there should be. Once the usual themes (more comments, keep comments up-to-date, avoid pointless comments) have been stated, the conversation ends. Everyone understands what relevant and up-to-date comments mean, but few will hazard a guess as to how many of them necessary.

Interestingly, the comment ratio is a key factor in one of the most interesting metrics, the maintainability index (MI). It is also a ratio that is rewarded by Ohloh.net, the emerging site for tracking details of open-source projects. Ohloh gives projects a kudo for a high percentage of comments. The question is how high is high enough to earn the kudo. According to Ohloh, the average open source project runs around 35% comments. Projects in the top third overall get the kudo. I don't know the cut-off for this top third, but I do know Apache's FOP with a 46% ratio of comments definitely qualifies.

Comment count is a metric that is particularly easy to spoof. You could do what some OSS projects do and list all the license terms at the top of each file. I've always disliked scrolling through this pointless legalese. In my file headers, I simply point to the URL of the license, which is sufficient. But all the license boilerplate inflates comment counts amazingly. So does commenting out code and leaving it in the codebase (gag!) or writing Javadoc for simple getters and setters (also gag).

But where legitimate comments are concerned, 35% is probably a good working number to shoot for. I find many OSS projects at this ratio have quite readable code. I'll be working to bring my own codebase up to this level.

Friday, May 18, 2007

Groovy Gaining Traction

Java developers suddenly have a wealth of choices when it comes to dynamic languages that run on the JVM. There's JavaFX, which Sun announced at JavaOne this year, and JRuby, which Sun expects to complete sometime this year, and then, of course, there's my favorite: Groovy. Groovy makes writing Java programs far easier. It essentially takes Java and removes the syntactical cruft, leaving a neat language that makes you terrifically productive.

Because Groovy took a long time getting out of the gate, it's taken some licks in the press. However, it's clear that Java developers are catching on to its benefits. The JavaOne bookstore published its daily top-10 sales during the show. The picture on this post, shows the Day 2 list with two Groovy titles in the top 10 (at places 5 and 8). Overall, the Groovy bible, Groovy in Action, came in at number 5 for the show. Interest is definitely growing.

If you haven't tried Groovy yourself, it's definitely worth a look. Here are a couple of good overviews:

The Groovy Getting Started Guide
Andrew Glover's Overview and Tutorial on DeveloperWork

Wednesday, May 16, 2007

Unit Testing Private Variables and Functions

How do you write unit tests to exercise private functions and check on private variables? For my projects, I have relied on a technique of adding special testing-only methods to my classes. These methods all have names that begin with FTO_ (for testing only). My regular code may not call these functions. Eventually, I'll write a rule that code-checkers can enforce to make sure that these violations of data hiding don't accidentally appear in non-test code.

However, for a long time I've wanted to know if there is a better way to do this. So, I did what most good programmers do--I asked someone who knows testing better than I do. Which meant talking to the ever-kind Jeff Frederick, who is the main committer of the popular CI server Cruise Control (and the head of product development at Agitar).

Jeff contended that the problem is really one of code design. If all methods are short and specific, then it should be possible to test a private variable by probing the method that uses it. Or said another way: if you can't get at the variable to test it, chances are it's buried in too much code. (Extract Method, I have long believed, is the most important refactoring.)

Likewise private methods. Make 'em small, have them do only one thing, and call them from accessible methods.

I've spent a week noodling around with this sound advice. It appeals to me because almost invariably when I refactor code to make it more testable, I find that I've improved it. So far, Jeff is mostly right. I can eliminate most situations by cleaning up code. However, there are a few routines that look intractable. While I work at find a better way to refactor them (a constant quest of mine, actually), I am curious to know how you solve this problem.

Thursday, May 10, 2007

Reusing IDE and SATA Drives: A Solution

Because I review lots and lots of tools, I find myself going through PCs pretty quickly. It's not fair to gauge the performance of a product on old hardware, so each year I buy new PCs. Over the course of years, I've accumulated lots of IDE drives from the PCs I've discarded. I rarely ever use them, but every once in a while I would like to know what's on them and whether I can reuse one of them. Unfortunately, this is a time consuming task, especially hooking up the drive to a PC that will access it.

I recently came across an elegant solution to this problem: the USB 2.0 Universal Drive Adapter from newertech. This device comes with a power supply for the HDD and a separate cable that plugs into an ATA-IDE drive, a notebook IDE drive, or a SATA drive. The other end of the cable is a USB plug. So, you attach the cord and the power the drive, plug the USB end of the cable into your PC--and the IDE drive magically pops up as a USB drive on your system, with full read and write capabilities.

I have cleaned up a bunch of IDE drives during the last week using this adapter. In the process, I've discovered it has some limitations. It did not work well on older drives. Some would not power up (but they did start up when I swapped them into a PC), and others did not handle read/writes well (Windows generated write errors), although it's hard to know if the errors come from the drive or the adapter. But for most drives from the last few years, the product worked without a hitch. Neat solution and, at $24.95 retail, a no-brainer purchase.

Further note: I increasingly use virtualization for my testbed infrastructure. When I'm done with a review, I archive the VMs. This keeps my disks mostly pristine and reusable, so I am not discarding disks with old PCs nearly as much as before. The fact that drives today are > 500GB also helps. ;-)

Monday, April 30, 2007

How many unit tests per method? A rule of thumb

The other day, I met with Burke Cox who heads up Stelligent, a company that specializes in helping sites set up their code-quality infrastructure (build systems, test frameworks, code coverage analysis, continuous integration--the whole works, all on a fixed-price basis). One thing Stelligent does before leaving is to impart some of the best practices they've developed over the years.

A best practice Stelligent uses for determining the number of unit tests to write for a given method struck me as completely original. The number of tests is based on the method's cyclomatic complexity (aka McCabe complexity). This complexity metric starts at 1 and adds 1 for every path the code can take. Many tools today generate cyclomatic complexity counts for methods. Stelligent's rule of thumb is:

complexity 1-3: no test (likely the method is a getter/setter)
complexity 3-10: 1 test
complexity 11+: number of tests = complexity / 2

I like this guide, but would change one aspect. I think a cyclomatic complexity of 10 should have more than 1 test. I'd be more inclined to go with: complexity 3-10: # of tests = complexity / 3.

Note: you have to be careful with cyclomatic complexity. I recently wrote a switch statement that had 40 cases. Technically, that's a complexity measure of 40. Obviously, it's pointless to write lots of unit tests for 40 case statements that differ trivially. But, when the complexity number derives directly from the logic of the method, I think Stelligent's rule of thumb (with my modification) is an excellent place to start.

Monday, April 23, 2007

Effectiveness of Pair-wise Tests

The more I use unit testing, the more I wander into areas that are more typically the province of true testers (rather than of developers). One area I frequently visit is the problem of combinatorial testing, which is how to test code where there is a large number of possible values it must handle. Let's say, I have a function with four parameters that are all boolean. There are, therefore, 16 possible combos. My temptation is to write 16 unit tests. But the concept of pairwise testing argues against this test-every-permutation approach. It is based on the belief that most bugs occur between the interaction of pairs of values (rather than a specific configuration of three, or four of them). So, pair-wise experts look at the 16 possible values my switches can have and choose the minimum number in which all pairs of values have been exercised. It turns out there are 5 tests that will exercise every pair of switch combinations.

The question I've wondered about is if I write those five unit tests, rather than the more ambitious 16 tests, what have I given up? The answer is: not much. At the recent Software Test & Performance Conference, I attended a session by BJ Rollison who heads up TestingMentor.com when he's not drilling Microsoft's testers in the latest techniques. He provided some interesting figures from Microsoft's analysis of pair-wise testing.

For attrib.exe (which takes a path + 6 optional args), he did minimal testing, pairwise, and comprehensive testing with the following results:

Minimal: 9 tests, 74% code coverage, 358 code blocks covered.
Pairwise: 13 tests, 77% code coverage, 370 code blocks covered
Maximal: 972 tests, 77% code coverage, 370 code blocks covered

A similar test exploration with findstr.exe (which takes a string + 19 optional args) found that pairwise testing via 136 tests covered 74% of the app, while maximal coverage consisting of 3,533 tests covered 76% of the app.

These numbers make sense. Surely, if you test a key subset of pairs possibilities, testing additional combinations is not likely to exercise completely different routines, so code coverage should not increase much for the tests that exceed pair-wise recommendations. What surprised me was that pairwise got such high-numbers to begin with. 70+ % is pretty decent coverage.

From now on, pair-wise testing will be part of my unit-testing design toolbox. For a list of tools that can find the pairs to test, see here. Rollison highly recommended Microsoft's free PICT tool (see previous link), which also provides a means to specify special relationships between the various factors in the combinations.

Monday, April 16, 2007

Update to my review of Java IDEs

My review of three leading enterprise Java IDEs appeared in later March in InfoWorld. I've received some nice comments on the piece and a few corrections. Most of the corrections come from Sun in my coverage of NetBeans 5.5. Here are the principal items:

I complained that NetBeans does not use anti-aliased fonts. I overlooked a switch that can turn on these fonts in Java 5. On Java 6, they're on by default, if your system is set with font smoothing on. (It's hard to figure why NetBeans does not default to these fonts on Java 5, as do Eclipse, IntelliJ, and most of the other IDEs.)
I recommended that potential users look at NetBeans 6.0 beta, because it adds many of the features that I complain are missing. Sun gently points out that the current version of 6.0 is not quite in beta yet, but should be in beta within the next few months. For the latest version and roadmap, go to netbeans.org
After extensive discussions, Sun convinced me that they have greater support for Java 6 than I originally gave them credit for. I originally wrote they provided 'minimal' coverage. In retrospect, I would say 'good' support for Java 6.
Finally, Sun was kind enough to point out that I give them too much credit in saying that they have deployment support for JavaDB. In fact, they provide only start/stop control from within NetBeans.

If there are further corrections, I'll update this post.

Tuesday, March 27, 2007

InfoWorld Moves to Online-Only

InfoWorld magazine announced today that it would be abandoning the print publication and going to an entirely online format. This move continues a trend that has been in place for many years in technical publications. I understand the economics of the move and think that on that basis, it's the right move. However, I confess some sadness at this transition. I like print media. I am currently overseas. And it was a great pleasure to pack magazines with my carry-on luggage and read through pages of material on the plane. SD Times and Doctor Dobb's are the only developer mags I read regularly that still come in printed format. And I generally read their printed version before I read the online material.

For the same reasons, I bemoan the lack of printed documentation today. Reading printed docs for tools I use rgularly is a rewarding activity. Inevitably, I find features I didn't know about. This is more difficult to do in the search-and-locate model that digital documents deliver. Seapine and Perforce are among the last of the development tools vendors to provide printed docs. And I do love them for it.

Anyway, starting on April 3, you'll have to read my InfoWorld reviews strictly on line (at www.infoworld.com). See you there!

Thursday, March 22, 2007

Characterization Tests

In my current column in SD Times, I discuss characterization tests, which are an as-yet little discussed form of testing. They are unit tests whose purpose is not to validate the correctness of code but to capture in tests the behavior of existing code. The idea, first popularized in Michael Feathers' excellent volume Working Effectively with Legacy Code, is that the tests can reveal the scope of any changes you make to a codebase.

You write comprehensive tests of the codebase before you touch a line. Then, you make your changes to the code, and rerun the tests. The failing tests reveal the dependencies on the code you modified. You clean up the failing tests and now they reflect the modified code base. Then, rinse, lather, repeat.

The problem historically with this approach is that writing the tests is a colossal pain, especially on large code bases. Moreover, large changes in the code break so many tests that no one wants to go back and redo 150 tests, say, just to capture the scope of changes. And so frequently, the good idea falls by the way side--with unfortunate effects when it comes time for functionally testing the revised code.To the rescue comes Agitar with a free service called JunitFactory. Get a free license and send them your Java code they will send you back the characterization tests for it. Cool idea. Change your code, run the tests, verify that nothing unexpected broke. And then have JUnitFactory re-create a complete set of characterization tests. How cool is that?

Actually, I use Agitar's service not only for characterization but for my program as I am developing it. The tests always show me unit tests I never thought of writing. Try it out. You'll like it.

Sunday, March 18, 2007

MIPS per Watt: The Progression...

Last week I purchased a Kill-a-Watt Electricity Usage Monitor, which measures the wattage used by any plugged-in device. It's already proving its value.

I began measuring the electrical consumption of the three workstations I use the most. The numbers strongly suggest that the power savings from multicore are real. The question that remains is whether they're substantial enough to matter to many folks, especially small sites with only a few systems. Here we go. (Notes: All workstations are Dell systems with 2GB or 3GB RAM and 2 SATA HDDs. The performance measurements are the processor arithmetic tests from the highly regarded Sandra benchmark suite. Energy consumption measured when the systems were at rest. Systems are shown in chronological order of manufacture, oldest ones first.)

The upshot is that the dual-core systems are definitely the performance equivalents of earlier dual-processor Xeon beasts (look at the performance parity between the two Intel processors), but enegy consumption of the multicore systems is almost 40% less.

However, the difference in energy consumption between the the Pentium D and the AMD system is not that great. Moreover, the difference in CPU performance, while appearing to be a lot, feels the same when I'm at the controls.

So, I think multiprocessor boxes are easy candidates for replacement by multicore systems, but upgrading multicores does not look compelling currently. (The Pentium D system is about a year older than the AMD system.)

Wednesday, March 07, 2007

Mylar, Tasktop, and the Intriguing Logo

Yesterday, I was down at EclipseCon, a yearly gathering of the Eclipse faithful. I had lunch with Mik Kersten, the driving force behind the suddenly very popular Mylar plug-in to Eclipse that helps you organize tasks and the data that goes with them. He's working on a similar idea for desktops at his new company, Tasktop. From what I saw this will be very useful in managing the mass of data we all deal with daily. Betas are expected in late Q2.

Before you go over to the website, try to guess the meaning of Tasktop's logo: <=>

When Mik first asked me, I thought it could be dual-headed double arrow, an emoticon for a very happy, surprised person, or a vague reference to XML. But those are all wrong. Correct is: less is more. Cute!

Thursday, March 01, 2007

How Many Unit Tests Are Enough?

Recently, I was down visiting the folks at Agitar, who make great tools for doing unit testing. Going there always results in interesting conversations because they really live and breathe unit testing and are always finding new wrinkles in how to apply the technique. During one conversation, they casually threw out a metric for unit testing that I'd never heard before. It answers the question of: How many unit tests are enough? You'll note that pretty much all the books and essays on unit testing go to great pains to avoid answering this question, for fear, I presume, that by presenting any number they will discourage developers from writing more. Likewise, if the suggested number is high, they risk discouraging developers who will see the goal as overwhelming and unreachable.

The engineers at Agitar, however, did offer a number (as a side comment to an unrelated point). They said (I'm paraphrasing) that if the amount of test code equals the amount of program code, you're generally in pretty good shape. Their experience shows that parity of the two codebases translates into code coverage of around 70%, which means a well-tested app. Needless to say, they'd probably want to qualify this statement, but as a rule of thumb, I think it's a very practical data point--even if a bit ambitious.

I will see how it works out on my open-source projects. Currently, my ratio of test-code to app-code is (cough, cough) just over 42%. I guess I know what I'll be doing this weekend.

Friday, February 23, 2007

A Web Host Bill of Rights

Like many small businesses, my company (Pacific Data Works) contracts web-hosting to a third party. Actually third parties. We have two websites, one hosted at LunarPages the other at Web.com (formerly called Interland). Our main site and mail server is at Interland.

Two days ago, Web.com suffered a massive "facilities" problem that for 10 hours shut down not only hosted accounts like ours, but Web.com itself. The company's own website was off the air.
Because of this problem, all e-mails sent to us and to other companies hosted at Web.com were bounced back as undeliverable.

Everyone understands that grave things can happen in which web-hosting services are compromised, but I don't understand what happened next: nothing. Web.com sent out no notice to customers that they might have lost e-mails or apologizing for the inconvenience. I don't care about the apology although it would be nice, but I do care about not finding out about bounced emails until I started receiving word from correspondents who were surprised their e-mails to us were rejected.

LunarPages suffered a day-long black-out last year due to a power problem in their host building. They didn't notify anyone either, but they did place a long mea-culpa on their official blog, explaining the problem. However, their website still advertises that they are down less than 9 hours a year.

I think it is about time for a Bill of Rights for customer of Web hosters. At minimum:

Web hosts should notify customers when the Web site has been unavailable for more than 4 hours.
Web hosts should notify customers when e-mail service has been down for any period in which incoming emails were bounced back.
Web hosts must post accurate information about outages on their website. An outage is defined as any period of time in which more than 20% of hosted accounts are not available.
All outages should be fully explained as to the nature of the problem and what is being done to make sure it will not recur.
Web hosts should refund the pro-rated share of hosting fees for outages automatically.

I think that's a good start.

Thursday, February 15, 2007

Languages and compilation speed

Yesterday, I went to visit Electric Cloud, a company that provides a suite of tools for building large applications, including a build manager and a distributed make system. While I was there I met the CEO, John Ousterhout, the designer of Tcl/Tk. He made an interesting observation about large compilations: in visits to customers with large codebases to compile, they had found that C++ code compiled the slowest, next was C, while Java was the fastest mainstream language to compile. The position of C++ does not surprise me, but Java being faster than C did--although, to be honest, I'd never given the matter much thought.

Because this discussion was a branch off the main topic, I didn't get to pursue it further, but Ousterhout did attribute it in passing to the efficiency of Java compilers. Without more info, though, I'm not sure I'm convinced that it's a question of compiler efficiency. I suspect that not needing to generate native code is a big factor, as is the simplified linking step.

Monday, February 12, 2007

Virtualization Slides

I gave a pair of talks today at InfoWorld's Virtualization Executive Forum. One was pure lecture, the other a panel. The lecture (on the variety of uses of virtualization in IT) in particular was well attended. It seems managers correctly sense that there's a lot more you can do with virtualization than desktop code verification and server consolidation. Here are the slides from the lecture.

Tuesday, February 06, 2007

What Today's DNS Attack Looked Like

As you might have read in the press today, a group of hackers tried to take down the domain name system (DNS), which is a key component of the Internet. They did this by flooding two of the top level DNS servers with requests. This picture, provided by Ripe.Net, shows the attack in full swing.

On the left are numbers 1-13, which refer to the 13 top-level DNS servers. They handle DNS requests from lower-level servers that cannot resolve a particular DNS address. As can be seen, two of the servers were targeted simultaneously in an attack that lasted several hours.

For readers who aren't familiar with DNS, it's the service that translates URLs into actual numerical addresses (which is how the Internet actually runs). So cnn.com, for example, is translated by a DNS server into 64.236.29.120. Knock out the servers that do this translation and you can only get to sites via their numerical IP addresses.

Fortunately, these 13 top-level DNS servers are redundant, so all this attack did was to slow some Internet/Web queries. However, if all 13 had been attacked, things would have become quite serious.

Monday, January 29, 2007

Learning Java via TDD: An impressive approach

I have never been a fan of test-driven development. I think the concept is a curiosity that makes you write lots of throw-away code, while keeping lots of code that really doesn't test anything useful. But my most fundamental disagreement with TDD is that it is counter to the way people think. And part of the great joy of programming is building solutions, rather than building problems for the solutions to solve. For this reason, I think that many people who like TDD in the abstract eventually move back to a model of writing the code and immediately writing unit tests to exercise it. (This happens to be my model, but I didn't come to it via TDD. Rather, I adopted it because I became profoundly and deeply convinced of the value of lots of unit tests written immediately after coding.)

I've expressed this view before and would have held on to it unmodified had I not come across a terrific and unique Java tutorial: Agile Java by Jeff Langr. Forget the "agile" attribute. This is a book that teaches Java via TDD. And, it turns out, this is a very interesting way of doing things. How many times have we seen Java tomes start with "Hello World"? This one starts by teaching JUnit. And through JUnit, it teaches TDD, and then Java.

This approach, which is brilliantly original, has very distinct benefits: Firstly, readers are much more in tune with Java's verbose but unrevealing error messages. Rather than looking at a stack dump in disbelief, they're used to debugging and jumping in. Secondly, of course, they're used to testing and to writing code that is testable. The third and for me most important benefit is that this book cannot present Java in the usual way. (Such as: here's an inner class. Here's why you need it. Here's a code snippet. OK, now off to statics....) Rather, each chapter requires writing a mini-app that exercises the topics the author wants to present. So, you get to think about OO and Java, rather than merely learning language syntax.

The book also forces the reader to think about challenging testing issues. And I do mean challenging. It is the only text I've ever read anywhere that shows how to run a unit test on a log record written to the console. How would you solve that problem?

The only drawback is that the same topics are discussed in many different places throughout the book (each discussion amplifying earlier points), so you really can't use it as a reference after the fact. This is a small gripe, as there are plenty of Java references on line and in print.

I recently wrote a review of Java tutorials (before I knew about this book). Despite the many great titles I discuss in that article, if I were teaching a Java class today, this is definitely the book I would use. Bar none. And I don't even like TDD.

Saturday, January 27, 2007

One activity that is inherently productive: unit testing

In a post today in his blog, Larry O'Brien states: "...let's be clear that of all the things we know about software development, there are only two things that we know to be inherently highly productive: Well-treated talented programmers and iterative development incorporating client feedback"

I find it hard not to add unit testing to this list.

Of all the things that have changed in how I program during the last six or seven years, nothing comes close to unit testing in terms of making me more productive. In addition, it has made me a better programmer, because as I write code I am thinking about how to test it. And the changes that result are almost always improvements.

Wednesday, January 24, 2007

Multicores not as productive as you expected?

For a while, I have been intrigued by how small the performance pop is from multicore processor designs. I've written about this and, finally, I think I can begin to quantify it. I'll use AMD's processors for the sole reason that the company has for years posted a performance measure for each of its processors. (Originally, this data was a move to counter Intel's now-abandoned fascination with high clock speeds.)

This chart shows the performance as published by AMD and the corresponding clock speeds for most of its recent processors. I have broken the figures out for the three branches in the Athlon processor family (which is AMD's desktop chip).

There are several interesting aspects to this chart, but the one I want to focus on is the performance of the rightmost entry. The dual-core Athon 64 X2 with a rating of 5200 has a clock speed of 2600MHz. Now, notice the Athlon XP with a rating of 2600 (10th entry from the left): it has a clock speed of 2133 MHz.

In theory, since the AMD ratings are linear, a dual-core processor should give you near but not quite the performance of two single-core chips. So two 2600-rated chips should give you roughly the performance of a 5200-rated dual core chip. Using the chart, we would expect two 2.133GHz cores to give us the 5200 performance figure. In reality, though, it takes two 2.6GHz cores to do this--far more than we would expect. It's actually an even wider gap that that, because the dual-core chips have faster buses and larger caches than the Athlon processors we're comparing it to, so it can make far better use of the processor on each clock cycle.

So, why does it take so much more than twice the clock speed to deliver dual-core performance? The orignial 2600-rated Athlon XP had a memory manager built into the chip. On the X2 chip, however, the two cores do not have dedicated memory managers--instead, they share a single on-chip memory controller. This adds overhead. The cores also share interfaces to the rest of the system and so again must work through resource contention to get the attention they need.

Don't be fooled into thinking this is an AMD-specific issue. (As I said earlier, I used AMD only because they are kind enough to publish clock speed and performance data for their chips.) This is not an AMD-only problem. Intel is in exactly the same boat--what is shared between cores is plenty expensive. Expect, as time passes, to see chip vendors trying to limit these shared resources.

Tuesday, January 23, 2007

Enter The Komodo Dragon

Komodo 4.0 from Active State is out of beta, and was released today. Komodo has long been viewed as the premier IDE for scripting languages (Python, Perl, and Tcl especially). It continues this tradition with this release and adds Ruby and RoR support. It's the first high-end IDE with intelligent editing and debugging for Ruby and RoR that I'm aware of.

Komodo also provides numerous tools for Ajax development including Javascript debugging, specific editors for XML, HTML, and CSS. Plus an HTTP viewer and a DOM editor.

If you work in any of the languages Komodo supports, you owe it to yourself to examine it (for free). If you work in any two of them, you probably should just buy it. At $245, it's a steal.

Wednesday, January 17, 2007

C PDF Library

In my current column in SD Times, I discuss the open-source iText library for creating PDF files. iText enables developers to create reports in PDF, HTML, and RTF from any Java application--including servlets. I mentioned in the article that I previously had looked for a PDF library in C and could not find one that was free and open source.

Eagle-eyed reader Reid Thompson kindly sent me a note pointing out the Libharu project. In my quick scan, it's not quite as feature-packed as iText, but it does cover pretty much all the functionality needed for business reports. And it has one advantage over iText that will prove compelling to some developers: Ruby bindings.

Tuesday, January 16, 2007

Want to be a Jolt Award Finalist? Three Steps

I have been a Jolt judge for all 17 years that the award has existed. This position has been a great privilege because during that time span, the award has become the equivalent of the Academy Awards for software-development tools. One reason for this rise to prominence is that vendors sense that the judges put a lot of work and deliberation into their choices. This somewhat understates the work we do, as I'll explain shortly. This post, however, focuses on a common query from vendors whose products did not advance to the final round: what more could we have done to advance? The answer frequently is: plenty.

To give context to the pointers below, it's important to understand a few things about the Jolt judging process:

For judges, the Jolt season represents a period of intense activity. I expect that in any given Jolt season I will spend more than 100 hours on product selection. That's a lot of time, and it's all volunteer. We receive no payment for our time.
Judge deliberations are secret. We use several mechanisms for sharing our perspectives. Their contents are sealed at the end of the judging cycle. Only discussions relating to procedural matters are retained year to year. So, asking for the judges' rationale for a certain decision will not (or, at least, should not) result in useful information.
Judges vote in secret. No one but the person tabulating the results at Doctor Dobb's knows which judge voted for what. Frequently, the results are mystifying to me. I don't understand why product X was left out, while the clearly superior product Y was included. There is only one obvious answer: other people evaluate the product differently than I do.
Judges recuse themselves from any category of products in which they can derive financial benefit or where they work for one of the vendors whose product is nominated.

So, how can a vendor influence a product's fate?

Have a good product. This more than any other factor will improve your prospects. If your product is nominated year after year, make sure that you have something new to say. We frequently kick out products that are the same as last year's save for a few tweaks. Remember this is an annual award, so greatness must have occurred during the coverage year (generally November to November).
Be able to articulate why your product is better than others. Judges who have never heard of your product need a reason to vote for it. Give it to them. Many vendors set up portals specifically for Jolt judges. They include movie clips of the product (10-15 minutes), screen shots, and generated reports. This is a superb idea. Judges can go to the site and in 20 minutes figure out whether they see any magic there. If you choose to do this, emphasize how your product is different or better than others. Don't try to demo every feature. The judges just want to know why they should vote for you. If you want to create a competitive grid with feature comparisons, this helps too.
Follow up with the judges. Two categories I voted in this weekend (when we voted for finalists) had more than 30 products. After looking at a large number of websites, the products tend to blur. Even though I take notes, when I go back and re-read them, it's hard to remember my exact perspective. If I don't know a product beforehand, it's likely to fall in this blurry region unless it has some incredibly good (or bad) feature. You have a PR agency, right? Put them to work. Have them contact me. Send me a press kit in the mail. Some companies used to send 'swag'--an industry terms for those inexpensive promotional chotchkas vendors give out. Better yet, send out a boxed copy of your software. This helps. In a sea of choices, having a name to remember and with which I can associate specific features is a big plus.

Depending on how I split my votes I can vote for anywhere from 4 to 10 products per category. I almost never vote for as few as four. A problem I have is that once I've voted for the top products, I might have only a few votes left for the remaining 20-30 products. At this point, I need some reason to vote for your product. Just because it's good is not sufficient. Make its good points memorable and you're likely to get one of those last few votes.

The coming stage--choosing the winner from the finalists--is completely different. Now judges will want to download software. A common error vendors commit at this point is to make the licensing difficult. The more difficult you make the licensing, the less time I will have to look at your product. This seems obvious, but many vendors have a fear that somehow judges will do something terrible with their software--something like using it after the judging period is over. All the judges know that they can't turn over their license key to their employers. At worst, they will keep using the product for themselves. This is desirable. You want judges of the product to tell other developers, "I use the Acme debugger and it runs circles around product X." Almost invariably, the judges are influencers in major communities. Celebrate their use of your product. Make licensing easy.

I hope this post answers some preliminary questions.

Friday, January 05, 2007

Correction Re Java Tutorials

Grrr. I intensely dislike being wrong in reviews and recommendations. I put lots of effort into my published writings so that readers can rely with confidence on what I recommend. So, it's quite painful--actually, it's a sense of shame and depression coupled with a lot of pain--to realize that I bollixed a review in print.

Such is the case with my quick guide to Java tutorials that appears in the current column in SD Times. In it, I write the following (after discussing the major books in this market):

Finally, for those who want something more serious but don’t require the omnibus tomes, there’s “The Java Tutorial” 4th Edition, by Zakhour et al. (Addison-Wesley Professional). In 600 pages, it presents all of the language proper, with well-chosen code examples, plus the basics of the major API sets. It’s put out by the same team that developed Sun’s outstanding online Java tutorials, which might be the best tutorials ever developed for any language. Get this book to start with, unless one of the others has a particular feature you feel is critical. Either way, you’ll be treated well.

The problem is this book is really not very good and I have to retract my recommendation. It contains some unpardonable errors (its list of Java keywords is incorrect; it refers readers to other books for topics it introduces and never brings to a close--but it only gives you the other book's title, not a chapter or page number, nor even a link to a website that covers the same topic; it makes reference to material that has not yet been presented; and, finally, it makes points that it flags as important, with no explanation as to why.). Don't consider this book, instead go to Volume 1 of Core Java to get the basics of the language in fewer than 800 pages.

My erroneous recommendation comes from an aspect that I failed to consider. The book does have good parts, especially those that refer to the latest items added to Java (enums, for example). I looked at these and was impressed. I then made the error of thinking that the rest of the book was as good as the sections I'd read. The trouble is this book is a gang project and clearly there is considerable fluctuation in the work quality of various authors. I happened to hit the few highlights in my review pass.

Curiously, a few years ago, this problem would not have occurred. Books were primarily written by one or two authors, so quality was consistent throughout--be it good or bad. Any given pair of chapters pretty much reflected the book's overall quality. With the gang books, this is much more rarely the case, as I was badly reminded.