Sunday, November 25, 2007

The Fallacy of 100% Code Coverage

While I love RSS for aggregating feeds from various blogs, nothing beats having an expert combing through articles and posts, culling the best ones. Few people, if any, do that culling better for software development than Andy Glover of Stelligent. His blog posts weekly collections of interesting links to tools and agile development topics. It's one of the first RSS items I read. (Fair warning, Glover makes his selections livelier by describing them with terms from the disco era.)

A meme that has appeared recently in his links is a curious dialog about the value of 100% code coverage in unit tests. It was not until I read those posts that I realized there were people still unclear on this issue. So, let's start at the foundational level: 100% code coverage is a fallacious goal. Unit testing is designed to provide two principal benefits: 1) validate the operation of code; 2) create sensors that can detect when code operation has changed, thereby identifying unanticipated effects of code changes. There is no point in writing tests that do not fulfill one of the two goals. Consequently, a getter or setter should not be the target of a unit test:

public void setHeight( float newHeight )
{ height = newHeight; }

This code cannot go wrong (unless you believe that your language's assignment operator doesn't work consistently ;-). Likewise, there is no benefit in a test as a sensor here. The operation of this code cannot change. Hence, any time spent writing a unit test for this routine is wasted.

Developers who can't resist the allure of 100% code coverage are, it seems, seduced by one of two drivers:

1) their coverage tool gives classes with 100% code coverage a green bar (or other graphical satisfaction). Describing this phenomenon, Cedric Beust, the author of the TestNG testing framework, writes in his just released (and highly recommended) book, Next Generation Java Testing, that this design of coverage tools is "evil." (p. 149) And later, he correctly piles on with: "Even if we were stupid enough to waste the time and effort to reach 100% code coverage..." The problem with coding for graphical trinkets is explained in the next driver.

2) if developers attain 100% code coverage--even at the cost of writing meaningless tests--they can be certain they haven't forgotten to test some crucial code. This viewpoint is the real illusion. By basing proper testing on 100% code coverage, the developer has confused two issues. It's what you're testing and how (so, quality) that determines code quality, not numerical coverage targets (quantity). Capable writers of unit tests know that some routines need validation through dozens of unit tests. Because these tests repeatedly work the same code, they don't result in greater test coverage, but they do result in greater code quality. By pushing for an artificial goal of 100% a developer is incentivized against writing multiple tests for complex code, in order to have the time to write tests for getters and setters. That surely can't be right.

Wednesday, November 07, 2007

Great Book for Programming Fonts

As I've learned from working on Platypus, programming font operations is one of the most complex and convoluted areas of software development you're likely to run into...ever. It's like driving to the river madness and drinking deeply, then walking around the desert naked for 40 days, all the while reassuring yourself that you must be making progress because you're still coding. It's awful.

The reasons are complex and numerous. First among these is that file formats are capricious things. Microsoft and Adobe have both published numerous font formats--some in response to market needs, others for competitive reasons, still others because of internal pressures. The second problem is that these formats are designed for use by font experts, not by developers. They often include cryptic parameters, tables within tables, and absolutely nothing that is clear or obvious save the copyright notice.

Third is the matter of encoding. There are numerous encodings of font characters. These too seem driven by reasons largely outside of need and formulated with no particular eye to future requirements. Try to figure out encodings for CJK fonts (Chinese, Japanese, Korean character sets), and you'll feel like walking around with your hair on fire. Even in simple encodings, difficulties arise. For example, Apple and Windows use different encodings in the basic character sets, which is why apostrophes in Mac-generated documents show up on some PCs as the euro symbol. Unicode? Foggettabout it. No font today implements close to all the characters. And those that come even halfway (of which none are free that I'm aware), they are huge multimegabyte propositions. In sum, fonts are a topic shot through and through with problems and treacherous details.

Until now, there has been no central reference that developers could turn to for help. Each new font (PostScript, TrueType, OpenType, etc.) required starting anew and learning the peculiarities from scratch . But a new 1000-page book by Yannis Haralambous, entitled Fonts & Encodings (from O'Reilly) has just appeared and it's the first real tie-line to sanity in the jungle of glyphs. It explains fonts, formats, and encodings in tremendous detail; along with elaborate discussions of tools. It is the defining book for technical users of fonts.

Before I discuss two limitations, I want to reiterate that this is a great book and nothing I say should override this view. However, it's not a developer-oriented book. Except for some SVG XML and some TeX, there is little source code. So, information on how to access font data and use it to lay out documents programmatically or just to print text is still left as a challenge to the reader (though the book gets you most of the way there). The book also discussed MetaFont in too much detail, in my view, because this format, which is now little used, is extensively described by its inventor, Donald Knuth. I'd have preferred more coverage of bitmap fonts, say, then re-presenting this info. But these two items aside, this is the book to get if you ever have to do anything with fonts. It'll give you hope; real hope.

Thursday, November 01, 2007

Most Popular CI Servers (An informal survey)

At CITCON Brussels last month, attendees were allowed to write up questions of any kind on a bulletin board and others could come by and post answer as they were moved to. One poll was: Which CI server do you use? The answers were, in order of popularity:

Anthill Pro, TeamCity (tied)

I don't have the exact vote counts, because the results were taken down before I could get the final tallies. But suffice it to say, that CruiseControl received the lion's share, Hudson a handlful, and Anthill Pro and TeamCity garnered 1 vote each.

This survey, of course, is not scientific. Despite the fact that it was a CI conference, the voters were self-selecting, and ThoughtWorks, which is the company behind CruiseControl, was well represented at the conference. (It is in fact a sponsor of CITCON.) So, high CruiseControl figures would be expected. (Even without this factor, though, I expect it would have placed first due to its quality implementation, wide industry adoption, and its considerable lead in plug-ins to various tools and packages.)

The Hudson numbers, however, are interesting and probably meaningful. Hudson is a comparative newcomer to CI. But it has been winning converts quickly due to its ease of use. If you have a small project or just want to test the waters of CI, Hudson might well be the server to use.

Anthill Pro is a high-end CI server that can be found in two versions: an OSS version and a paid version. It was not until this conference, though, that I discovered these are completely different products. They were built from different codebases and the OSS version was a one-time release that is not updated.

I was surprised that LuntBuild made no appearance, as not so long ago, its users were raving about its tremendous ease of use. Perhaps Hudson is stealing its thunder, or perhaps its users just weren't at CITCON. It's hard to say in a small poll.