Thursday, November 18, 2010

The Most Important Book of The Year


Continuous Delivery
by Jez Humble and David Farley


I have reviewed many books on this website and I have gone through numerous others as part of my work on the Jolt Awards, but it’s been a very long time since I’ve read a book as useful and likely game-changing as Continuous Delivery.

The basic premise of the book is that we need to move past continuous integration into a fuller cycle of activities that go beyond build and test. Specifically, this new orientation calls for building and testing on all platforms, creating and deploying the final deliverables for all platforms—with every check-in. The benefit of this approach is that the development organization at any given moment always has: 1) immediate feedback on deployment issues, 2) a deployable binary; 3) a completely automated process to build, test, and deploy on all platforms.

This simple concept—a kind of continuous integration on mega steroids—has profound repercussions, all of which make your process better. The first and most important is that you have to automate everything downstream from the coding. And the authors mean everything. The most common point where people hem and haw about automation is deployment. But Humble and Farley make it clear you have to “bring that pain forward,” and fix the process so it can be automated. (If you don’t have any idea how you might refine and automate deployment, think virtualization. Can you emulate your current systems on virtual machines and then progressively simplify deployment of the software to the point of automation? Good, you’re on your way.)

But the mechanics of deployment may be the least of your challenges (And here, the book’s name could be viewed as misleading: Deployment is only one aspect it covers.) You also have to build, run, and test the software on every platform you ship on. You’re not reasonably going to be able to do that if you have to change configurations and manually reset values for different platforms. The authors guide you to finding the one path that gets you across the river Jordan without spending 40 years in the desert of bit twiddling. The key is to use a single codebase and move the platform dependent stuff into configuration files. This is non-trivial, but the authors offer plenty of good advice.

Testing is another topic Humble and Farley explore in great depth. Testing in the context of continuous delivery is not just running unit tests and a regression suite. No , this is running all tests—unit, integration, UAT, and so on. How to automate them effectively occupies probably the largest chunk of the book. Even if you don’t accept the continuous delivery concept, this section is worth the price of admission. It’s mind-expanding, in ways that the hundreds of articles we’ve all read about agile testing on Digg and Reddit never touch on. You see very quickly how much more automation you could do and how to get from your miserable semi-manual existence to the smooth flow of full and continuous automation.

What impresses about the book is how the authors consistently work through hard problems. They are not daunted by them and there is no attempt to pas over them with hand waving. Hard things are examined in detail with a perspective that derives from the authors’ own extensive experience.

I have literally never read a better book on process. I believe that going forward, this book will redefine agile process and CI; and it will have as much influence as--I have to go back to 1999, here--Fowler’s book on Refactoring did on code.

Monday, October 25, 2010

Bluebeam's PDF Creation Tool Suite

I use a variety of PDF tools in my editorial work. I frequently create, mark up, manipulate, and combine PDFs. In addition, I contribute to the open source Platpus typesetting project, whose major output format is PDFs. And the PDF plugin is my specific bailiwick. So, over the years, I've come to know a thing or two about PDFs, as well as the limitations of PDF tools.

The standard for PDF tools has been Adobe's Acrobat suite. But this suite is expensive, somewhat quirky, and at times works poorly with other tools. Acrobat plugins to Microsoft Office and Internet Explorer are especially unreliable, and they frequently make their host programs behave erratically. I always uninstall them.

This means I need to use other options to convert Word documents to PDF. There are several common solutions out there, none but one of them is completely satisfactory. For example, the Microsoft Office PDF plugin does not embed all fonts, nor does it give you the option to do so. It does not embed the Base14 fonts.

This is a design error (that is common). Here is its history. For many years, Adobe guaranteed that Adobe Acrobat Reader would provide 14 fonts (the so-called Base14 fonts) in all implementations. These fonts were Times Roman, Courier, and Helvetica typefaces (each in regular, bold, italic, and bold italic—so 12 fonts) plus a Symbol and a Dingbat font. The rule was you did not need to embed these fonts in PDF documents, because Acrobat Reader would supply them. This scheme never worked very well. Its first limitation was that not all Times Roman fonts looked the same, so the same document could look strikingly different on two different computers. A few years ago, Adobe quietly discontinued supporting Base14 fonts in Acrobat Reader. The result is that if you're creating a PDF for distribution, you must embed all fonts, even the old Base14 fonts, if you want it to maintain your original format and layout.

The Microsoft Office plugin does not have this option, so as a result PDFs you generate with it are not guaranteed to look correct on other systems. And, in fact, they frequently do not.

The PDF generator that come with Adobe Acrobat (not the Reader, but the paid tools) works better. It does offer an option to embed all fonts. However, in Word documents with many links, it fails to identify all links. And so rather than be clickable, the links show up as pure text.

To remedy this, I tested various Word-to-PDF tools and found none that consistently met all requirements until I ran into Bluebeam PDF Revu, a tool I had not previously heard of.

The first thing I noticed was that Bluebeam's plugins were stable and they worked correctly. The second thing I discovered was that Revu found all links in documents and by default, it embedded all fonts. So far, so good. The attention to small details in its PDFs are part of Bluebeam's DNA—it was designed as a tool for CAD users, so correctly rendering every detail of a document is a specialty.

Like the Adobe Acrobat toolbox, Revu provides editing capabilities, with better text mark-up tools than Acrobat. It also enables you to construct your own menu of tools for faster access to frequently performed operations. Form handling, digital signatures, etc. work exactly as expected. Multi-document processing can also be automated with the product. Adobe Acrobat Pro—the comparable offering from Adobe—retails at $449 list, and $350 at Amazon. The academic version of Acrobat can be found for the same price as the full Bluebeam Revu ($149) product. So, if you want the full range of options, better implemented than in Adobe's offering, and at a lower price, have a look at Bluebeam PDF Revu. (They offer a 30-day free trial.)

Thursday, February 04, 2010

Keeping LOC and Tests in Balance


The proliferation of metrics in software development threatens to take important quantitative measures and bury them beneath an avalanche of noisy numbers. Consequently, it's important to look for certain ratios and trends among the numbers to inform you whether a project is healthy. One tell-tale relation links LOCs and number of tests. These two values should grow in direct proportion to each other.

The included diagram presents the ratio of these two values for Platypus, the OSS project I work on.

As you can see, except for a few dips here and there, these numbers have stayed in lock step for the last 18 months. And, as you might expect, code coverage from these tests has similarly remained in fairly narrow range--right around 60%.

The most typical violation of this ratio is, as you would guess, a jump in LOCs without a corresponding rise in tests. This is something managers should watch out for. With a good dashboard, they can tell early on when these trend lines diverge. This is frequently, but not always, always indicative of a problem. (For example, it could be that a lot of code without tests was imported to the project.) Whatever the cause is, managers need to find out and respond accordingly.

(For the record, the tests counted in this diagram include unit tests and functional tests.)

Sunday, November 22, 2009

The Limitations of TDD

During the last 12-18 months, TDD has broken into the mainstream, it seems. And now, we're starting to see some backlash, as its limitations become better understood. Here is a sample discussion from Artima.com. Cédric Beust, who wrote the commentary, is not some unknown guy with a weird name. He wrote the TestNG unit testing framework, which is second only to JUnit in popularity. He also wrote the book, Next Generation Java Testing, which is probably the best book on pragmatic software testing that I've read in a long time. Here goes...

> That's an interesting point. Are you, in effect, saying
> that unit testing is overly emphasized, and at the expense
> of other forms of testing?


This has also been my experience, although to be honest, I see this problem more in agile/XP literature than in the real world.

This is the reason why I claim that:

- TDD encourages micro-design over macro-design
- TDD generates code churn

If you obsessively do TDD, you write tests for code that you are pretty much guaranteed to throw away. And when you do that, you will have to refactor your tests or rewrite them completely. Whether this refactoring can be done automatically or not is beside the point: you are in effect creating more work for yourself.

When I start solving a problem, I like to iterate two or three times on my code before I'm comfortable enough to write a test.

Another important point is that unit tests are a convenience for *you*, the developer, while functional tests are important for your *users*. When I have limited time, I always give priority to writing functional tests. Your duty is to your users, not to your test coverage tools.

You also bring up another interesting point: overtesting can lead to paralysis. I can imagine reaching a point where you don't want to modify your code because you will have too many tests to update (especially in dynamically typed languages, where you can't use tools that will automate this refactoring for you). The lesson here is to do your best so that your tests don't overlap.

--Cedric Beust

Tuesday, August 04, 2009

My Interview with Alexander Stepanov and Paul McJones

InformIT.com has posted my interview with Alexander Stepanov (of STL fame) and his co-author Paul McJones. Their just-released book, Elements of Programming, tries to map algorithm implementations back to symbolic logic and algebraic theorems, thereby--in theory--improving their design and correctness.

In the discussion, we broach many topics that derive from this approach to programming.

Saturday, July 25, 2009

Groovy Books

I have been using Groovy to write functional tests for Platypus, the open-source typesetting project I work on. I am likely to make Groovy the default scripting language for Platypus in the next milestone. In the process, I've had to come up to speed on Groovy and I've been reading through and looking over the various Groovy titles on the market. Here's my take.


The Groovy bible today, without the slightest doubt, is Groovy in Action which at 650+ pages is also the most detailed book. Its principal limitation is that Groovy has undergone several revisions since it came out. Because of this, a second edition is being written. Early access to e-drafts of that edition are available here, although little as yet has been published.

If you'd like a shorter and more up-to-date introduction to Groovy, I recommend Programming Groovyby Venkat Subramaniam. At less than 300 pages, it's a quick read, provides all the needed info quickly, and covers all the highlights, with a good balance of detail.

Many people consider Grails to be the killer app for Groovy. It's a web framework that rides above Spring and Hibernate and removes much of the complexity of using those components. If you are learning Groovy to use Grails, then Beginning Groovy and Grailsis an excellent choice. It's clear, approachable, and teaches you enough Groovy to be able to follow the tutorial on Grails.

Once you get comfortable with basic Groovy, you'll quickly find yourself pining for a book of recipes that shows you how to quickly get basic tasks done using Groovy metaphors. There are two somewhat flawed recipe books on the market. The first is Groovy Recipesfrom Scott Davis, a well-regarded lecturer in the Groovy area. While calling itself a recipe book, it frequently diverges into tutorials and odd humor--both of which are obstacles when trying to find information. Some important topics are not covered at all, such as testing--which is one of the major areas where Groovy benefits Java. Database access is also not covered. In other areas, Davis' explanations seem to lack an understanding of what the user would be looking for. Nonetheless, I have successfully used some of Davis' recipes in my work. A good alternative is Groovy and Grails Recipesfrom Bashar Abdul-Jawad. This title is a true recipe book and very readable. The Groovy portion is too short, however, and an important section on file recipes (which does appear in the Davis book) is omitted. However, if you're learning Groovy to get to Grails, this is the best choice. And Abdul-Jawad does a good job understanding what readers are looking for.

Ideally, O'Reilly would publish one of its trademark comprehensive recipes book and we could all settle on that. However, when I contacted O'Reilly about upcoming Groovy titles, the company indicated it had none in the immediate pipeline.

That's pretty much it for Groovy books; although there are several others that focus exclusively on Grails. One publisher, Apress, seems to dominate that Grails market. The two titles above that cover Grails are from Apress as is the Definitive Guide to Grails, written by Graeme Rocher, who designed Grails. In the past I've been skeptical of Apress books due to wide variations in their quality, but the Groovy/Grails titles I've examined have been consistently of high quality.

As Groovy gains a wider audience, I expect more titles to emerge from all the technical book publishers.




Wednesday, May 20, 2009

The Fan programming language: compile to Java and .NET

I have recently been playing with Fan, a programming language that reminds me a lot of Groovy, but has additional capabilities, such as actors. Its binaries run either on the JVM or .NET. Below is my recent column in SDTimes about the language. 

In recent times, we are seeing an extraordinary proliferation of new languages. On one hand, thousands of domain-specific languages (DSLs) have been spawned by the advent of tools that facilitate their creation. On the other hand, we find an equal surge in full-scale, general-purpose programming languages.

 The renaissance of these larger programming languages derives from several advances: 1) a renewed interest in dynamic languages and their benefits; 2) hardware that’s fast enough to run dynamic languages rapidly; and 3) the existence of two run-time environments—the JVM and the .NET CLR—that are widely used, well understood, and fast. As a result, we have an embarrassment of language choices that was inconceivable a decade ago.

In this column, I have previously highlight various interesting options among these languages: Ruby, Groovy, D, NetRexx, and a few others that elegantly address specific problems. Recently, I have been spending time with the Fan programming language, which while still early in its development cycle, is more finished and mature than most new languages at this point in their development.

Fan is a dynamic, OO language that runs on the JVM and the .NET CLR. It does this by generating intermediate code (called fcode) that is dynamically translated into Java bytecodes or a .NET DLL at startup. This step introduces a slight pause, after which programs run at full “native” speed for the given environment.

New languages arise because a developer needed to solve a problem that was not addressed well by common alternatives. The developers of Fan, a pair of brothers—Brian and Andy Frank—worked on embedded Java applications and found it difficult to sell the accompanying software to customers who were committed to Windows Mobile and .NET. So, they decided to write Fan to solve the problem and to keep it small enough that it could fit easily in a mobile device. 

In the process, they removed language verbosity and added features they wanted. Their vision is remarkably balanced and complete. The language, on the verge of a freezing its 1.0 features, offers: dynamic typing and/or strong typing (à la Groovy), closures and first-class functions, extensive concurrency support (thread-safe classes with immutability specified, threads with built-in message passing, and actors), and elegant handling of various namespace issues. Low-level features include default method parameters, nullable data types, built-in field accessors, unchecked-only exceptions, and simplified numerics. The numerics handle the overflow problem that is the favorite of language puzzle writers: all integers are longs and all floats are doubles. So either type uses 64-bits and effectively does not overflow. Chars are 16-bit UTF entities.

A particularly interesting aspect of Fan is the libraries. As Brian Frank told me, “Solving the JVM/CLR portability was the easy part. The hard part was what to do with the libraries and APIs.” What the brothers did was to rethink the API sets, eliminate cruft, and use a different concept of grouping. Whereas .NET and Java both use a large number of packages that include moderate numbers of classes, Fan uses few packages that contains large numbers of classes. The result is that a developer can almost always can guess correctly which package to link to for a specific need. In addition, Fan has sensible, built-in library defaults. For example, all files I/O defaults to buffered.

The good design of a language can take it only so far. To succeed, it needs good tools, good docs, and an active community. The language tools (compiler, etc.) are all open source and written in Fan. The code is clean and surprisingly readable. As to IDE support, there is currently a plugin for JetBrains IDEA and one in the very early stages for Eclipse . The Frank brothers do all their coding in regular text editors.

The documents are very good. Probably, the best I’ve seen for any new language at this point and far better than much older “new” languages, such as D. The website is well organized and elegant; and the tutorials and “cookbook” entries clean and plentiful. It’s difficult to assess language community size in general, but more so with Fan because it does not figure on Tiobe, due I suspect to the difficulty of teasing out data for a language named Fan. For this reason and for richer Google search results, there is a move afoot to change the name of the language. Nonetheless, the community is definitely small and active. The latter aspect due to the responsiveness of the Frank brothers to users’ questions, requests, and defect reports. 

Fan solves a lot of problems elegantly. If it continues growing as it has during the past year, I anticipate it will evolve into an attractive solution for some development organizations.

The biggest challenge right now is the early stage in which most IDE plugins are currently found. A second limitation, which is about to be fixed in the upcoming point release, is that libraries and binary modules are all placed by default in the same directory. The discussion on this point, found on the language's discussion boards, shows the attentive regard of the Frank brothers for their users as they kicked around various schemes, elicited comments, and posted thoughtful replies. It's one of the most spam-free, low-noise discussion groups I've been a part of in a long while. I expect good things from this language.

Monday, January 05, 2009

The Agile Rules in HP's Original Garage

According to a recent HP poster, these were the rules in Bill Hewlett and Dave Packard's famous garage:


  • Believe you can change the world.
  • Work quickly, keep the tools unlocked, work whenever.
  • Know when to work alone and when to work together.
  • Share tools, ideas. Trust your colleagues.
  • No Politics. No bureaucracy. (These are ridiculous in a garage).
  • The customer defines a job well done.
  • Radical ideas are not bad ideas.
  • Invent different ways of working.
  • Make a contribution every day. If it doesn’t contribute, it doesn’t leave the garage.
  • Believe that together we can do anything.
  • Invent.

  • Curiously, it sounds like something the agile guys might have written (had they not written the manifesto). I prefer this wording because of its greater applicability and more dynamic presentation.

    Thursday, November 13, 2008

    Bob Martin's "Clean Code" Reviewed

    I have gone through "Uncle Bob" Martin's new book, Clean Code,which is a lenthy presentation of rules that will help Java developers write better code. It's similar to Kent Beck's Implementation Patterns,except more code-fixated. Clean Code has some good points, but it contains several weaknesses that seem to have gone entirely by the reviewers on Amazon. So, here's the scoop.

    First of all, it's well hidden, but the book is only partially written by Bob Martin. Many chapters are written by other consultants who work at Martin's company--many of whom I've never heard of. The one stand-out exception is Michael Feathers, whose chapter on error handling is one of the clearest in the book. I wish he had written more.

    The main body consists primarily of explaining various coding rules that Martin calls heuristics and to which he assigns coded abbreviations for later reference. Alas, unlike patterns that have meaningful names as shortcuts, Martin chooses meaningless notations such as C2 and G26. So, "the function should do nothing but compacting[G30]" is a shortcut for the author, but a pain for the reader who has to cross-reference these references repeatedly to know what Martin is talking about.

    Unlike Beck's book, there is no theoretical framework to Martin's prescriptions. The book is a series of examples from which he teases this rule and that. Because of this lack of framework there is a certain desultory aspect--the rules come in seemingly random order.

    Some of them make you want to leap up and clap. For example, his rule that Javadoc should not contain HTML. How many times I've come to the same conclusion! I want to read comments in code easily. The small lift that HTML brings to Javadoc pages is not in anyway worth the difficulty it adds to the reading of comments in code. Bob Martin's one of the first persons I've encountered to say so unequivocally.

    Other rules are good, but later contradicted. For example, Martin states that you should never leave commented-out code in place. [C5] As he points out, no one knows why it's commented out and so it remains in place forever. However, later on in an example of refactoring code per his own rules, Martin comments out large blocks of code without an explanation of how that squares with his earlier advice. (p.374)

    Martin also uses questionable coding preferences. For example, all of his code uses indents of 2 columns. 2 columns? It makes every routine look like a solid chunk of code. It's clearly not a practice to be recommended.

    A large portion of the book is an example of Martin refactoring someone else's code. He takes a long piece from an OSS project and proceeds to "improve" it. I found this section uncompelling. Perhaps because in Fowler's masterpiece Refactoring,each refactoring magically transforms the code. By comparison, Bob Martin's work seems journeyman-like. I didn't find the initial code interesting nor did I find Martin's cleaned-up version luminous. I was expecting a before-and-after scenario that would make me sit up and take notice. Instead, the exercise felt preachy, condescending at times, and ultimately not terribly convincing.

    My last gripe addresses an inexcusable error: typos. There aren't many but they are frequent enough to be distracting. For example, Martin seemingly does not understand the difference between it's and its. (p. 272, p. 296, among others) And his code contains typos too. (p. 309). This carelessness erodes credibility. Books that preach quality should be flawless at the level of spelling and grammar.

    Overall, I think some organizations can use several of Martin's heuristics as a means of boosting their in-house coding standards. But I doubt that careful coders will find much of value. Those developers will be better served by Beck's Implementation Patterns,which is based on principles and so communicates much more information in fewer words. Since my review of Beck's book, I must confess my admiration for it has deepened, and it's the volume I would recommend if you're looking to write cleaner code.

    Sunday, September 28, 2008

    Banishing Return Status Codes

    The most enduringly popular post on this blog is Perfecting OO's Small Classes and Short Methods, which presents a short series of stringent guidelines to help an imperative-trained developer master OO.

    If I were to add one item to the list, it would be: Don't use return codes to indicate the status of an action. Developers trained in languages such as C have the habit of using return codes to indicate the success or the nature of failure of the work done by a function. This approach is used because of the lack of a structured exception mechanism. But when exceptions are part of the language, the use of status codes isa poor choice. Among the key reasons are: many status codes are easily ignored; developers will expect problems to be reported via the exception mechanism; exceptions are much more descriptive. And finally, exceptions enable return codes to be used for something useful--namely returning a data item.

    Astute readers will note that in Java, null is frequently used as a return value to indicate a problem (as in Collections). This practice subverts the previous points, and it too should be avoided. Returning a null presents code with many problems it should not have to face. The first is the risk of a null-pointer blow-up because the return value was accessed without being checked. This leads to the code bloat of endless null value checks. A much better solution, which avoids this problem, is to return an empty item (empty string, empty collection, etc.). This too communicates that no data item fulfilled the function's mandate, but it does not risk the null-pointer problem, and it frequently requires no special code to handle the error condition.

    Hence, if your OO code is characterized by heavy reliance on return codes (many of which I am certain are not checked), consider rewriting it in favor of exceptions and use return statements solely for returning non-null data items.

    Monday, September 01, 2008

    A Parameter-Validation Smell and a Solution

    Last week, Jeff Fredrick and I did a day-long code review of Platypus. We used a pair-programming approach, with Jeff driving and I helping with the navigation. Eventually, we got into the input parser, which parses input lines into a series of tokens: text, commmands, macros, and comments. Macros can require a second parsing pass, and commands often require additional parsing of parameters.

    Once you get a parser working well (that is, it passes unit and functional tests, and it handles errors robustly), you generally don't want to mess with refactoring it. Experience tells you that parsers have hideous code in them and wisdom tells you to leave it alone. However, we launched in.

    A frequent cause of otiose code was my extensive parameter checking. Parameters were validated at every step as tokens passed through multiple levels of parsing logic. Likewise, the movement of the parse point was updated multiple tiems as the logic resolved itself back up the processing stack. This too had to be validated repeatedly.

    Jeff came up with an elegant refactoring that I could not find in the usual sources. He created an inner class consisting of the passed variables, a few methods for validating them, and a few more methods for manipulating them.

    This class was then passed to the methods in lieu of the individual parameters--thereby reducing the number of parameters to one or two. And because the class constructor verified the initialization of the fields, I need only to check whether the passed class was null, rather than validate each of the internal fields.

    The effect was to reduce complexity of already complex code, enforce DRY, and place the validation of the variables inside a class that contained them--a set of small, but important improvements. And like many of the best refactorings, it seems obvious in retrospect.

    So, if you find your class's methods are repeatedly validating the same parameters, try bundling them in an inner class along with their validation logic. You'll like the results.

    Tuesday, June 03, 2008

    The Handiest Java Book in Years.


    One of the constant challenges I have as a Java developer is keeping up with the numerous good FOSS dev tools. I no sooner start testing one tool and adapting my project to it, when a new one comes along. Being an analyst and naturally curious, this new product (or new release) represents a constant temptation. Is it better than what I am using? How much effort is required to try it out? What does it do better? On and on.

    I can put a lot of those concerns to rest now. I just received a copy of Java Power Tools from O'Reilly and it's exactly what I've been looking for. It contains deep explanations of the principal FOSS dev tools in 10 major categories. These explanations are not two- or four-page summaries, but in-depth expositions that provide crucial info on the strengths and weaknesses of the product. The author, John Smart, then provides detailed tutorial on using the product. It's clear he's spent lots of time exploring the dark corners of each tool. And he makes good use of that knowledge in his comparisons and comments on the products.

    If you want to spend an hour or so coming up to speed on what a product is about before installing it (and without having to work through the usually limited docs), this book will get you there faster and enable you get an overview of a whole lot of tools quickly and with the assurance you have a clear understanding. Here are the tools that are covered, followed by the number of pages for each one in parentheses:

    BUILD TOOLS: Ant (55), Maven (60)
    SCM: CVS (20), Subversion (78)
    CI: Continuum (24p) Cruise Control (19) LuntBuild (32) Hudson (19)
    IM: Openfire (12)
    UNIT TESTING: JUnit (20) TestNG (25) Cobertura (17)
    OTHER TESTING: StrutsTestCase (10) DbUnit (44p) JUnitPerf (10) JMeter (20) SoapUI (22) Selenium (30( Fest (9)
    PROFILING: with Sun tools (16) with Eclipse (15)
    DEFECT MANAGEMENT: Bugzilla (20) Trac (35)
    QUALITY: Checkstyle (20) PMD (18p) FindBugs (12) Jupiter (18) Mylyn (14p)

    All told, 856 pages of crisp, well-written explanations. A must-have reference for the bookshelf.

    Thursday, May 22, 2008

    Is the popularity of unit tests waning?

    Before getting into my concerns about whether unit testing's popularity has peaked, let me state that I think unit testing is the most important benefit wrought by the agile revolution. I agree that you can write perfectly good programs without unit tests (we did put man on the moon in 1969, after all), but for most programs of any size, you're likely to be far better off using unit tests than not.

    The problem is that only a small subset of developers understand that. And recent data points suggests that the number of programmers who use unit tests is not exactly growing quickly. I'll list some of the data points below that I've been developing for my column in SD Times.

    1) Commercial products on the wane. Agitar was a company whose entire fate was tied to the popularity of unit testing. Despite very good products, a free service to auto-generate unit tests for your code, and some terrific exponents (especially Alberto Savoia and Jeff Frederick) to tell their story, the company closed a down a few weeks ago, essentially having come to the conclusion that it could never be sold at a price that could repay investors. So rather than ask for more funding, it closed down. If unit testing were gaining popularity robustly, Agitar surely would have come to a different conclusion.

    2) Few OSS products. Except for the xUnit frameworks themselves, few FOSS tools for unit testing have been adopted. The innovative Jester project, which built a tool that looked for untested or poorly tested logic, essentially stopped development a long time ago because to quote the founder, Ivan Moore, in a comment to me "so few sites are into unit testing enough to care about perfecting their tests."

    3) Major Java instructors aren't teaching it. Consider this interview with Cay Horstmann, co-author of the excellent Core Java books. (He asks, "If so many experienced developers don't write unit tests, what does that say?" In speculating on an answer, he implies that good developers don't need unit tests. Ugh!)

    4) Unit testing books are few and far between. I am seeing about one new one a year. And as yet, not a single book on JUnit 4, which has been out for nearly three years(!).

    5) Alternative unit-testing frameworks, such as the excellent TestNG, are essentially completely invisible. I was at a session on scripting this spring at SD West and in a class of 30 or so, two people had heard of TestNG (the teacher and I).

    I could speculate on causes, but I have no clear culprit to point to. Certainly, unit testing needs to be evangelized more. And evangelized correctly. The folks who insist on 100% code coverage are making a useful tool unpalatable to serious programmers (as discussed here by Howard Lewis Ship, the inventor of Tapestry). But, I think the cause has to be something deeper than this. I would love to hear thoughts from readers in real-world situations where unit testing has been abandoned, cut back, or simply rejected--and why.

    It would be a shame to have unit testing disappear and its current users viewed as aging, pining developers hankering for a technology the world has largely passed by. That would return programmers to the tried-and-true practice of glassy-eyed staring at a debugger for hours--something I have not missed at all.

    Friday, April 25, 2008

    Knuth Interview Posted

    My interview with Donald Knuth is now posted. It's a long piece, that has some unusually interesting points, including:

    - why Knuth doesn't believe in designing code for reuse
    - he's most unconvinced of multithreading and multicore on the desktop
    - discussion of the tools he uses to program and write (including Ubuntu)
    - etc.

    A very fun read (and a fun interview to do).

    Wednesday, April 23, 2008

    Perfecting OO's Small Classes and Short Methods

    In The ThoughtWorks Anthology a new book from the Pragmatic Programmers, there is a fascinating essay called “Object Calisthenics” by Jeff Bay. It’s a detailed exercise for perfecting the writing of the small routines that demonstrate characterize good OO implementations. If you have developers who need to improve their ability to write OO routines, I suggest you have a look-see at this essay. I will try to summarize Bay’s approach here.

    He suggests writing a 1000-line program with the constraints listed below. These constraints are intended to be excessively restrictive, so as to force developers out of the procedural groove. I guarantee if you apply this technique, their code will move markedly towards object orientation. The restrictions (which should be mercilessly enforced in this exercise) are:

    1. Use only one level of indentation per method. If you need more than one level, you need to create a second method and call it from the first. This is one of the most important constraints in the exercise.

    2. Don’t use the ‘else’ keyword. Test for a condition with an if-statement and exit the routine if it’s not met. This prevents if-else chaining; and every routine does just one thing. You’re getting the idea.

    3. Wrap all primitives and strings. This directly addresses “primitive obsession.” If you want to use an integer, you first have to create a class (even an inner class) to identify it’s true role. So zip codes are an object not an integer, for example. This makes for far clearer and more testable code.

    4. Use only one dot per line. This step prevents you from reaching deeply into other objects to get at fields or methods, and thereby conceptually breaking encapsulation.

    5. Don’t abbreviate names. This constraint avoids the procedural verbosity that is created by certain forms of redundancy—if you have to type the full name of a method or variable, you’re likely to spend more time thinking about its name. And you’ll avoid having objects called Order with methods entitled shipOrder(). Instead, your code will have more calls such as Order.ship().

    6. Keep entities small. This means no more than 50 lines per class and no more than 10 classes per package. The 50 lines per class constraint is crucial. Not only does it force concision and keep classes focused, but it means most classes can fit on a single screen in any editor/IDE.

    7. Don’t use any classes with more than two instance variables. This is perhaps the hardest constraint. Bay’s point is that with more than two instance variables, there is almost certainly a reason to subgroup some variables into a separate class.

    8. Use first-class collections. In other words, any class that contains a collection should contain no other member variables. The idea is an extension of primitive obsession. If you need a class that’s a subsumes the collection, then write it that way.

    9. Don’t use setters, getters, or properties. This is a radical approach to enforcing encapsulation. It also requires implementation of dependency injection approaches and adherence to the maxim “tell, don’t ask.”

    Taken together, these rules impose a restrictive encapsulation on developers and force thinking along OO lines. I assert than anyone writing a 1000-line project without violating these rules will rapidly become much better at OO. They can then, if they want, relax the restrictions somewhat. But as Bay points out, there’s no reason to do so. His team has just finished a 100,000-line project within these strictures.

    Monday, April 07, 2008

    Easy Does It With easyb

    I just got back from the CITcon conference, which is the thrice-yearly confab of agile developers who use continuous integration (the "CIT" in the conference name). This was my second time at CITcon. It's an open-space conference that is--surprise!--free, and chock-a-block full of good information. The principal reason it's so informative is that anyone committed enough to CI to go to a conference has probably spent a lot of time thinking about how to solve problems of build and test at his/her site. And this concern and reflection on these issues is amply evident in the discussions in the hallways and the informal presentations.

    All the sessions I attended were thought-provoking. But probably the most interesting was a presentation by Andy Glover, the president of Stelligent, an agile consultancy. He runs a great blog in which has been touting a tool called easyb, which enables you to script unit tests so that they describe a scenario (rather than a code feature) and then test for the expected result. I've read Andy's enthusiasm for easyb, but it wasn't until I saw him demo it that I understood what the excitement was about.

    The key benefits are 1) you can show a non-programmer (like the manager who is expecting the software any day now) that you have written tests that match every one of his requirements--easyb enables you to do this by writing the test in near English language; 2) you can test at a slightly higher level than the unit test: rather than test tiny features individually, you can quite easily test a succession of conditions that are chained together.

    This approach is called--a little misleadingly,--behavior-driven development; which was an immediate turn off for me. I really don't want to learn another x-driven development. I just want to do what I do better. And I think easyb might just be such a tool. So, don't worry about the name, and hop over to the easyb website for a quick look-see. You'll like what you find.

    Monday, March 31, 2008

    Great Reference For Ruby


    Ruby aficionados have been working for the last few years under a serious handicapt: there was not good, up-to-date reference on their favorite language. Sure, the Pickaxe book provided some guidance, but it's a hybrid work--part tutorial, part reference. And the reference section was a summary, rather than an in-depth exposition.

    Ever-dependable O'Reilly just released Ruby Programming Language, which is without a doubt the definitive Ruby reference. Not only is it co-authored by Yukihiro "Matz" Matusmoto, the inventor of Ruby, but it is superbly well edited, so that every page is full of useful information presented clearly. And at more than 400 pages, that's a lot of information. Couple this book with The Ruby Cookbook, which I reviewed on this blog, and you have probably the best 1-2 combination for learning and using Ruby.

    Tuesday, February 19, 2008

    Restarting the Platypus And the Lessons Learned

    As many of you know, I have spent much of my free time during the last 24 months working on an open-source project called Platypus. The project's goal is to implement a command language like TeX, which enables users to embed formatting commands directly into text and generate documents of typeset quality in PDF, Microsoft Word, and HTML. The aims of Platypus are to be much easier to use than Tex and to provide many features of interest to developers, especially for printing code and listings.

    After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.

    In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.

    PROJECT AND DESIGN LESSONS

    1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.

    2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.

    3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.

    4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.

    PROGRAMMING LESSONS:

    a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.

    b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.

    c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.

    OBSERVATIONS ABOUT TOOLS

    1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.

    2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.

    3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)

    There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.

    Words of Thanks

    It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!

    Friday, January 25, 2008

    Internal USB Ports: What do you think they're for?

    Earlier this week, I was being briefed by HP about some recently released workstations. As we were moving through the slide-deck, a small item caught my attention: one workstation claimed to have 2 USB ports on the front panel, 6 on the back, and 2 marked "internal." Why, I asked, would anyone want an internal USB port on a PC? Care to guess?

    The answer is: for dongle keys. Yeah, they're still around and they use USB form factors. The internal aspect is interesting. It's designed so you can insert the dongle, lock the PC and nobody walks off with the dongle key.

    I honestly would never have guessed.

    Tuesday, January 22, 2008

    Excellent Explanation of Dependency Injection (Inversion of Control)

    I've read lots of explanations of Dependency Injection or DI (formerly known as Inversion of Control) and the associated Hollywood Principle ("Don't call us, we'll call you."). They all tend to be unclear, either because they delve immediately into highly detailed explanations, or they tie the explanation specifically to one particular technology. Such that either the pattern is lost or its simplicity is. Here is clearest explanation I've found--slightly edited for brevity (from the very good Spring in Action, 2nd. Ed. by Craig Walls):

    "Any nontrivial application is made up of two or more classes that collaborate with each other to perform some business logic. Traditionally, each object is responsible for obtaining its own references to the objects it collaborates with (its dependencies). When applying DI, the objects are given their dependencies at creation time by some external entity that coordinates each object in the system. In other words, dependencies are injected into objects."

    I find that very clear.

    Dependency Injection was originally called Inversion of Control (IoC) because the normal control sequence would be the object finds the objects it depends on by itself and then calls them. Here, this is reversed: The dependencies are handed to the object when it's created. This also illustrates the Hollywood Principle at work: Don't call around for your dependencies, we'll give them to you when we need you.

    If you don't use DI, you're probably wondering why it's a big deal. It delivers a key advantage: loose coupling. Objects can be added and tested independently of other objects, because they don't depend on anything other than what you pass them. When using traditional dependencies, to test an object you have to create an environment where all of its dependencies exist and are reachable before you can test it. With DI, it's possible to test the object in isolation passing it mock objects for the ones you don't want or need to create. Likewise, adding a class to a project is facilitated because the class is self-contained, so this avoids the "big hairball" that large projects often evolve into.

    The challenge of DI is writing an entire application using it. A few classes are no big deal, but a whole app is much more difficult. For entire applications, you frequently want a framework to manage the dependencies and the interactions between objects. DI frameworks are often driven by XML files that help specify what to pass to whom and when. Spring is a full-service Java DI framework; other lighter DI frameworks include NanoContainer and the even more lightweight PicoContainer .

    Most of these frameworks have good tutorials to help beginners find their way.

    Wednesday, January 09, 2008

    Use Virtualization to Avoid Malware While WebSurfing

    In presentations at Infoworld's Virtualization Summits (slides here), I have repeatedly discussed how virtualization can prevent malware infections when you surf the web. The idea is to surf and do all transactions from inside a VM. Most attendees listen to this suggestion, but they seem primarily to be waiting for me to move onto the meat of my talk. I suspect they don't take the advice to heart because they feel they have various utilities on the alert for viruses and malware infections. However, as we see here, even well-known companies, such as Sears and Kmart, install key loggers and malware that route private data to third parties. Meaning, that even if you go only to sites you believe are known good, you can still be infected with malware.

    By browsing from within a VM, you protect yourself against many malicious packages. In the ideal scenario, you use two VMs: One for important transactions where security is paramount (online banking, investment accounts, etc.) and another for all other browsing.

    If either VM becomes infected, delete it, make a clone of the master VM, and resume browsing. Periodically, you should throw out the "just browsing" VM and bring over a clean instance, so that any undetected stealth malware is disposed of. You'll need to bring over your bookmarks file when you swap VMs or, if you prefer, you can use any of the tag services (del.icio.us and the like) to maintain your list of favorites.

    I use VirtualPC from Microsoft, which can be downloaded for free. You can use it to run a Windows VM, but you need to make sure you have valid licenses for those VMs. (Actually, until April 1, you need no license at all. You can download a Windows VM with IE installed directly from Microsoft.) Using a UNIX/Linux VM is an alternative approach that provides three advantages over Windows: licenses are free, the VMs are smaller (less than the 750MB Windows needs, typically), and malware writers rarely target Linux, so your VMs stay cleaner/safer longer.

    One version of Linux you can't use for this purpose, though, is Ubuntu, surprisingly. It does not install correctly on Microsoft Virtual PC. Despite a wealth of tips, I have not been able to find a way to get it to run. However, Novell SUSE works fine. And I am sure other distros do too.

    Anyway, this rarely discussed use of virtualization enables me to surf with impunity and with no fear of being hijacked.

    Tuesday, December 11, 2007

    Beautiful Code vs. Readable Code



    For many years--decades actually--I was a big fan of beautiful code. I read almost everything by Brian Kernighan, Jon Bentley, and P. J. Plauger. This passion for elegant code was an attempt to re-create the rush I felt when I first read:

    *x++ = *y++

    in the C Programming Language. I'd never seen anything so beautifully succinct. It was luminous!

    But as years passed, I read many clever algorithms, many impressive optimizations, many small tricks. And I got less and less charge from each of these discoveries. The reason, quite frankly, is that they almost always fell into one of two categories: some very elegant expressiveness in a new language (Ruby converts from Java can attest to this) or a technique that I'm not likely to ever use. In other words, I was chasing baubles.

    In time, my esthetic sense turned to code clarity for its jollies. Today, if I can pick up a blob of complex code, read it in one pass, and accurately understand what it's doing; then I feel the rush again. I most often have this feeling when reading the code of great, non-academic developers. To be honest, when reveling in such moments, I frequently have the perception that my code is not like theirs. Even my best code doesn't quite snap together like theirs does. And I have wondered what I could do to improve my code clarity.

    Kent Beck's new book, Implementation Patterns is a short handbook on code clarity. I have read much of it and already I recognize some bad habits that undermine my code's readability. Beck basically looks at typical coding issues and dispenses sage advice.

    This means that some recommendations are best suited to beginners, while there are just enough of the other more subtle suggestions to keep the attention of a veteran who cares about clarity. For example, one poor habit I have developed without thinking about it too much is mixing levels of abstraction in the same method. So, for example (using Beck's example):

    void process() {
    input();
    count++;
    output();
    }

    Here the second statement is clearly at a different level of abstraction than the other two, which makes the code harder to read quickly. Beck proposes the following, which I agree is clearer.

    void process() {
    input();
    tally();
    output();
    }

    There are many other habits of mine that this book has illuminated. And in the half a dozen changes it will bring to my style, I think I have derived more benefit than in all the essays I've read on beautiful code.

    Before leaving off, however, I should point out two caveats. The beginner-to-intermediate material dominates; so you'll need to skim over large parts of the text to extract the valuable nuggets. (However, this aspect makes it a great gift for junior programmers at your site.) A second point is that the book lacks for good editing. A book on code clarity should be pellucid; this one is not. (Consider the use of the word 'patterns,' which is highly misleading. It's not at all about patterns.) But these are forgivable issues. The book is a useful read if you share my appreciation of clear code.

    Sunday, December 09, 2007

    Commercial CI server now available free

    JetBrains, the folks behind many well-loved dev tools (such as IntelliJ IDEA and ReSharper) have announced that TeamCity 3.0, the just-released version of their continuous-integration server, is now available for free for teams of up to 20 developers. TeamCity's stock in trade is ease of use (and particularly good integration with IntelliJ IDEA).

    Sunday, November 25, 2007

    The Fallacy of 100% Code Coverage

    While I love RSS for aggregating feeds from various blogs, nothing beats having an expert combing through articles and posts, culling the best ones. Few people, if any, do that culling better for software development than Andy Glover of Stelligent. His blog posts weekly collections of interesting links to tools and agile development topics. It's one of the first RSS items I read. (Fair warning, Glover makes his selections livelier by describing them with terms from the disco era.)

    A meme that has appeared recently in his links is a curious dialog about the value of 100% code coverage in unit tests. It was not until I read those posts that I realized there were people still unclear on this issue. So, let's start at the foundational level: 100% code coverage is a fallacious goal. Unit testing is designed to provide two principal benefits: 1) validate the operation of code; 2) create sensors that can detect when code operation has changed, thereby identifying unanticipated effects of code changes. There is no point in writing tests that do not fulfill one of the two goals. Consequently, a getter or setter should not be the target of a unit test:

    public void setHeight( float newHeight )
    { height = newHeight; }

    This code cannot go wrong (unless you believe that your language's assignment operator doesn't work consistently ;-). Likewise, there is no benefit in a test as a sensor here. The operation of this code cannot change. Hence, any time spent writing a unit test for this routine is wasted.

    Developers who can't resist the allure of 100% code coverage are, it seems, seduced by one of two drivers:

    1) their coverage tool gives classes with 100% code coverage a green bar (or other graphical satisfaction). Describing this phenomenon, Cedric Beust, the author of the TestNG testing framework, writes in his just released (and highly recommended) book, Next Generation Java Testing, that this design of coverage tools is "evil." (p. 149) And later, he correctly piles on with: "Even if we were stupid enough to waste the time and effort to reach 100% code coverage..." The problem with coding for graphical trinkets is explained in the next driver.

    2) if developers attain 100% code coverage--even at the cost of writing meaningless tests--they can be certain they haven't forgotten to test some crucial code. This viewpoint is the real illusion. By basing proper testing on 100% code coverage, the developer has confused two issues. It's what you're testing and how (so, quality) that determines code quality, not numerical coverage targets (quantity). Capable writers of unit tests know that some routines need validation through dozens of unit tests. Because these tests repeatedly work the same code, they don't result in greater test coverage, but they do result in greater code quality. By pushing for an artificial goal of 100% a developer is incentivized against writing multiple tests for complex code, in order to have the time to write tests for getters and setters. That surely can't be right.

    Wednesday, November 07, 2007

    Great Book for Programming Fonts




    As I've learned from working on Platypus, programming font operations is one of the most complex and convoluted areas of software development you're likely to run into...ever. It's like driving to the river madness and drinking deeply, then walking around the desert naked for 40 days, all the while reassuring yourself that you must be making progress because you're still coding. It's awful.

    The reasons are complex and numerous. First among these is that file formats are capricious things. Microsoft and Adobe have both published numerous font formats--some in response to market needs, others for competitive reasons, still others because of internal pressures. The second problem is that these formats are designed for use by font experts, not by developers. They often include cryptic parameters, tables within tables, and absolutely nothing that is clear or obvious save the copyright notice.

    Third is the matter of encoding. There are numerous encodings of font characters. These too seem driven by reasons largely outside of need and formulated with no particular eye to future requirements. Try to figure out encodings for CJK fonts (Chinese, Japanese, Korean character sets), and you'll feel like walking around with your hair on fire. Even in simple encodings, difficulties arise. For example, Apple and Windows use different encodings in the basic character sets, which is why apostrophes in Mac-generated documents show up on some PCs as the euro symbol. Unicode? Foggettabout it. No font today implements close to all the characters. And those that come even halfway (of which none are free that I'm aware), they are huge multimegabyte propositions. In sum, fonts are a topic shot through and through with problems and treacherous details.

    Until now, there has been no central reference that developers could turn to for help. Each new font (PostScript, TrueType, OpenType, etc.) required starting anew and learning the peculiarities from scratch . But a new 1000-page book by Yannis Haralambous, entitled Fonts & Encodings (from O'Reilly) has just appeared and it's the first real tie-line to sanity in the jungle of glyphs. It explains fonts, formats, and encodings in tremendous detail; along with elaborate discussions of tools. It is the defining book for technical users of fonts.

    Before I discuss two limitations, I want to reiterate that this is a great book and nothing I say should override this view. However, it's not a developer-oriented book. Except for some SVG XML and some TeX, there is little source code. So, information on how to access font data and use it to lay out documents programmatically or just to print text is still left as a challenge to the reader (though the book gets you most of the way there). The book also discussed MetaFont in too much detail, in my view, because this format, which is now little used, is extensively described by its inventor, Donald Knuth. I'd have preferred more coverage of bitmap fonts, say, then re-presenting this info. But these two items aside, this is the book to get if you ever have to do anything with fonts. It'll give you hope; real hope.

    Thursday, November 01, 2007

    Most Popular CI Servers (An informal survey)

    At CITCON Brussels last month, attendees were allowed to write up questions of any kind on a bulletin board and others could come by and post answer as they were moved to. One poll was: Which CI server do you use? The answers were, in order of popularity:

    CruiseControl
    Hudson
    Anthill Pro, TeamCity (tied)

    I don't have the exact vote counts, because the results were taken down before I could get the final tallies. But suffice it to say, that CruiseControl received the lion's share, Hudson a handlful, and Anthill Pro and TeamCity garnered 1 vote each.

    This survey, of course, is not scientific. Despite the fact that it was a CI conference, the voters were self-selecting, and ThoughtWorks, which is the company behind CruiseControl, was well represented at the conference. (It is in fact a sponsor of CITCON.) So, high CruiseControl figures would be expected. (Even without this factor, though, I expect it would have placed first due to its quality implementation, wide industry adoption, and its considerable lead in plug-ins to various tools and packages.)

    The Hudson numbers, however, are interesting and probably meaningful. Hudson is a comparative newcomer to CI. But it has been winning converts quickly due to its ease of use. If you have a small project or just want to test the waters of CI, Hudson might well be the server to use.

    Anthill Pro is a high-end CI server that can be found in two versions: an OSS version and a paid version. It was not until this conference, though, that I discovered these are completely different products. They were built from different codebases and the OSS version was a one-time release that is not updated.

    I was surprised that LuntBuild made no appearance, as not so long ago, its users were raving about its tremendous ease of use. Perhaps Hudson is stealing its thunder, or perhaps its users just weren't at CITCON. It's hard to say in a small poll.

    Monday, October 29, 2007

    CITCON Brussels 2007

    I recently returned from CITCON, the continuous-integration conference. It's held three times a year, once each in the US, Europe, and Asia. Even though, CITCON lasts only one evening plus one full day, it was surely one of the most informative developer events I have ever attended. And I have been to a lot of developer conferences/shows/seminars.

    What made CITCON so productive was several unusual aspects:

    1) registration is limited to 100 attendees, so it has a human-sized feel to it. Unlike some shows where there are several hundred attendees in a single session, at CITCON, by the third session, I knew several of the attendees in the room from previous sessions. By the end, I'd been in sessions with about half the attendees, it seemed.

    2) CITCON uses the open conference format, in which there is no pre-existing list of presentations. Rather, everyone gets together the first night and proposes topics they'd like to know more about or ones they'd like to present. Then attendees get to mark the ones they'd like to go to. The sessions with the most votes are slotted for the next day. You are free to drift in and out of the sessions as you wish. In future posts, I'll discuss this format in greater depth. However, it creates an interesting blend: some sessions are presentation, others are discussion. Of these, the discussions were consistently the most interesting. For example, one session I attended (I was one of six attendees) discussed the problem of delaying commits until after code reviews. How do you handle the opposing pressures of quality enforcement vs. timely commits? Various attendees explained what they had done at their sites and what had not worked. Finally, Patrick Smith from Agitar, expressed a method he had used in consulting, and the session moved to analysis of the benefits of that approach. This kind of fruitful sharing is near impossible at regular tradeshows except at BoF sessions, which often still lack the give-and-take of a shared problem.

    3) The conference sessions all fit into one day. This remarkably changes your motivation. You show up early, you go to every session, you stay late, and you hang out afterwards for social hour to go over notes with other attendees, especially those who have interest/challenges in common with you.

    Focused, informative, and no talking heads. Very cool. I'll be back. Next instance is in Denver in April.

    Tuesday, October 09, 2007

    Continuous Integration Servers

    The authors of the recent book on Continuous Integration have undertaken a series of snapshots of CI servers, which will be a big help to everyone who is assessing CI.

    There are many CI servers to choose from, as this table amply demonstrates. For myself, I am undertaking evaluations of three options; and after some preliminary research, I have decided on these finalists:

    • CruiseControl (the grand-daddy of them all and the defining CI server)
    • Continuum (because I work with Maven 2, this should be a good fit)
    • Hudson (admired for its ease of use)

    I'll report on my findings, although the results will be colored somewhat by how well these tools work with my current project, whose CI needs are clear, but modest. Partly in preparation, I will shortly be hanging with the cool cats at CITCON, a CI conference being held October 19-20 in Brussels. Attendance is free but limited to 100 attendees. CITCON is the brainchild of Jeff Frederick and Paul Julius of CruiseControl fame, but it typically draws cognoscenti and users of other CI servers. The format is open, meaning that it's more of a series of BoF sessions than pure lecture. I'll report on the good stuff in future posts for those who can't make it to Belgium in time.

    Thursday, September 27, 2007

    New Uses of Virtualization: Slides Here

    Earlier this week, I lectured at InfoWorld's Virtualization Summit on a topic that has interested me for a long time: uses of virtualization outside of the two principal use cases (server consolidation and developer testing of portability). Here is the slide deck from the presentation. It discusses security, training, demo's, desktop consolidation, and virtual appliances, among other uses.

    Monday, September 17, 2007

    From Ant to Maven

    I spent part of the last week migrating Platypus from Ant to Maven 2. This is a migration I've been itching to do for a while. I don't much like Ant, because I find I spend far too much time struggling with it.

    Maven, by comparison, works on a "convention not configuration" model that centers on a specific build sequence and an expected file layout for your project. Understand these two, and Maven makes builds simple and very rich. For one, Maven downloads all dependencies for utilities and reports you want to run as part of your build cycle. There is no more wrestling with Ant's dependency errors. In addition, Maven's end product (beyond the build's binaries) is a website that it re-creates on each run; it loads the site with reports and data about your build. So, you and the team always know where things stand with the project.

    The one complaint I read about Maven 2 is that it's hard to find the info you need to set it up and use it. This is actually not the case, if you know where to look. Unfortunately, it takes a lot of digging before you find that two excellent 300-page PDF tutorials are available at no cost. Plus a great introduction. So for those who need to know, here are the links to Maven support docs:


    Those resources should solve nearly any issue you encounter. In an upcoming column in SD Times, I describe in greater detail the benefits I have found in migrating from Ant to Maven. Try Maven, you'll like it!

    Sunday, September 09, 2007

    So much for strong passwords...

    Perhaps you think Fgpyyih804423 is a strong password. As discussed here, it took OphCrack 160 seconds to break it. That's scary!

    Sunday, September 02, 2007

    Ubuntu Everywhere?

    I have long felt that desktop Linux would become a reality only when you could go to a Linux gathering and find no more than a third of the attendees at the command line. In other words, as long as users are frequently at the command line, the OS is not ready for a big share of the desktop. Desktop users require ease of use.

    Earlier this summer, I was at O'Reilly's Ubuntu Live conference in Portland, and the Ubuntu tribe were almost all using the GUI interface. This inflection point confirms for me Ubuntu's claim as the desktop Linux distro. (The conference was especially enjoyable because of the lack of zealotry. It was simply a conclave of the interested with no excess of the us-against-the-world mentality--a factor which made it a far more rewarding experience.)

    Having secured its place on the desktop, Ubuntu is trying to move to the server, where competition is much more intense, and where the desktop origins could help as well as hurt. Time will tell.

    However, the desktop roots did not preclude Ubuntu's use in Microwulf, the first-ever supercomputer for less than $2500 and first-ever under the $100/Gigaflop threshold.

    For your friends who want to try Ubuntu, but who are not geeks, I highly recommend an approachable, not-too-techie intro: Ubuntu Linux for Non-Geeks: A Pain-Free, Project-Based, Get-Things-Done Guidebook from the ever readable No-Startch Press.

    Tuesday, August 21, 2007

    Good, approachable book on SOA

    SOA is becoming increasingly hot and lots of developers are wondering what they need to know to implement it without getting lost in the competing standards, the infinite implementation details, and the lack of robust tools.

    To the rescue comes SOA for the Business Developer from Ben Margolis, which presents the core technologies of SOA without doing the usual deep dive to the lowest levels of detail. This refreshing approach enables you to read about the five central technologies (XML, XPath, BPEL, and the upcoming SCA and SDO) without it being a massive effort. These technologies are presented clearly (the style is remarkably readable) and each is highlighted with a few key examples consisting of working code. The purpose is not to convert you into an expert into any of these, but to give you enough familiarity that you understand how the pieces work, how they fit together, and from there how to go about writing a simple SOA application, should you want to.

    The book is perfect for development managers who want to come up to speed on the SOA components and look at some code without getting dragged into minutia. It's also just right for real geeks who need talking knowledge of the same. The easy style, good examples, and compact size (300 pages) mean that you can go from 0-60 pretty quickly.

    Recommended (with the hope that other volumes that provide such a gentle intro for existing developers will become more common).

    Wednesday, August 08, 2007

    Unboxing Gotcha in Java

    In a recent blog post, Java wiseman Norman Richards, points out that this innocent-looking line of code contains a gotcha that most IDEs (he checked Eclipse, I checked IntelliJ) don't warn you about:

    int sucker = thisMethodReallyReturnsAnIntegerNotAnInt();

    Due to autoboxing (well, technically, unboxing), this code could set sucker to null, which pretty much guarantees the dreaded NullPointerException.

    What Richards doesn't say is that this code is so innocent looking (and the bug possibility so little recognized), that almost no one would write a unit test for this assignment. At least, that is, until it blows up the first time.

    Good catch. OK, back to work: Ctl-F "Integer"...

    Thursday, August 02, 2007

    Continous Integration Book is Out



    The first book dedicated solely to continuous integration has just come out. I've been poring over it and have learned several things. The great dearth of useful documentation for CI implementations made me very hopeful that this book would give me a wealth of new insights.

    Unfortunately, I was a bit disappointed. The book does a good job of explaining what CI is and why you should use it; and it's the text I'd rely on to sway a manager who needed convincing. But after these explanations, the book wanders around. I understand the problem: it's hard to talk about CI without giving examples for a specific CI server--of which there are so many. So the authors chose to talk about other topics: specifically, build and test issues. This is a good book for best practices for build and test cycles; but alas these are treated as disjoint topics from CI.

    What more disappointed me was the lack of information on choosing a CI server. Many (but not all) CI packages are given very cursory discussions in Appendix B, where they share space with discussions of Ant and Maven. The book definitely punted here when it should have done right by its readers and really explained the differences and offered guidelines on choosing properly.

    Finally, a personal grouse point. This book follows a fad in computer books of putting an epigraph at the start of every chapter. Properly chosen epigraphs should be 1) witty 2) incisive or 3) unexpected. Most of those in this book are prosaic. Do we gain anything from this quotation from Larry Bird "First, master the fundamentals." or from Henry Ford intoning "Quality means doing it right when no one is looking."?

    Overall, I feel this could have been a great book. The thirst for this information is deep and the authors are knowledgeable. However, it didn't quite come together in this edition.

    Tuesday, July 31, 2007

    Needed Code Reminders

    In a recent blog post, Larry O'Brien, points out that we need more varied types of tags in our code than the overworked TODO. He suggests one that enables you to mark code intended for a future feature (despite YAGNI). Which I understand, but would strongly tend to avoid. I think the XP folks have it right in hewing closely to YAGNI.

    But Larry's larger theme comes up for me as well. There are some useful tags I can think of. The most urgent to me is one that you can put on suspected cruft. Something like: CRUFT. This would suggest that maintenance needs to be done to check if this code is in use anywhere, and if not then to delete it. Not just "in use" in the technical sense, but in the sense of doing anything that's still needed.

    I've also been hankering for a more graded version of TODO. Such as TODO-NOW and then plain TODO. The latter would mean "todo sometime."

    IntelliJ (and presumably other IDEs) enable you to do create these custom tags and have them recognized/processed by the IDE, which is a really helpful.

    Thursday, July 12, 2007

    A Limitation in Subversion. Any ideas?

    Today, I was speaking with an expert at CollabNet (the Subversion folks) about a problem I have with Subversion. To my amazement, he told me Subversion can't handle my (apparently) simple problem.

    When I work on Platypus, I start by checking out my code from the public Subversion repository. I put it in my local project directory, and work there. Along the way, I might write code that breaks existing tests. At some point before I get all tests working, I want to check my code into an SCM, so that I can get back to where I am right now, if something should go wrong. Best practices in SCM forbid committing code to a project directory if it breaks tests. So, the way to do what I need, it would seem, is to use a temporary repositorty to commit to until I'm finished coding. Then, when all tests pass, I can commit the code back to the original public Platypus repository.

    But Subversion does not permit checking code from one directory into different repositories. The suggestion from CollabNet was to copy the code to a temporary directory from which I could commit it to a separate Subversion repository. This, of course, is a non-starter--you don't keep two live copies of the code you're working on. That makes for all sorts of errors.

    Another suggestion was to create a branch in the public repository and commit the small changes to that branch, and eventually merge the branch when all tests pass. I don't like this either, because it forces me to check non-working code into a public repository--the whole thing I want to avoid.

    The only solution I've come up with is to use a different SCM (Perforce) for my local repository. Check temporary code into it. Then when everything is copascetic, check the tested code into the public Subversion directory. This works, but it's hardly elegant.

    How do you solve this problem or how do you avoid it to begin with? TIA