Binstock on Software: unit testing

Showing posts with label unit testing. Show all posts

Thursday, May 22, 2008

Is the popularity of unit tests waning?

Before getting into my concerns about whether unit testing's popularity has peaked, let me state that I think unit testing is the most important benefit wrought by the agile revolution. I agree that you can write perfectly good programs without unit tests (we did put man on the moon in 1969, after all), but for most programs of any size, you're likely to be far better off using unit tests than not.

The problem is that only a small subset of developers understand that. And recent data points suggests that the number of programmers who use unit tests is not exactly growing quickly. I'll list some of the data points below that I've been developing for my column in SD Times.

1) Commercial products on the wane. Agitar was a company whose entire fate was tied to the popularity of unit testing. Despite very good products, a free service to auto-generate unit tests for your code, and some terrific exponents (especially Alberto Savoia and Jeff Frederick) to tell their story, the company closed a down a few weeks ago, essentially having come to the conclusion that it could never be sold at a price that could repay investors. So rather than ask for more funding, it closed down. If unit testing were gaining popularity robustly, Agitar surely would have come to a different conclusion.

2) Few OSS products. Except for the xUnit frameworks themselves, few FOSS tools for unit testing have been adopted. The innovative Jester project, which built a tool that looked for untested or poorly tested logic, essentially stopped development a long time ago because to quote the founder, Ivan Moore, in a comment to me "so few sites are into unit testing enough to care about perfecting their tests."

3) Major Java instructors aren't teaching it. Consider this interview with Cay Horstmann, co-author of the excellent Core Java books. (He asks, "If so many experienced developers don't write unit tests, what does that say?" In speculating on an answer, he implies that good developers don't need unit tests. Ugh!)

4) Unit testing books are few and far between. I am seeing about one new one a year. And as yet, not a single book on JUnit 4, which has been out for nearly three years(!).

5) Alternative unit-testing frameworks, such as the excellent TestNG, are essentially completely invisible. I was at a session on scripting this spring at SD West and in a class of 30 or so, two people had heard of TestNG (the teacher and I).

I could speculate on causes, but I have no clear culprit to point to. Certainly, unit testing needs to be evangelized more. And evangelized correctly. The folks who insist on 100% code coverage are making a useful tool unpalatable to serious programmers (as discussed here by Howard Lewis Ship, the inventor of Tapestry). But, I think the cause has to be something deeper than this. I would love to hear thoughts from readers in real-world situations where unit testing has been abandoned, cut back, or simply rejected--and why.

It would be a shame to have unit testing disappear and its current users viewed as aging, pining developers hankering for a technology the world has largely passed by. That would return programmers to the tried-and-true practice of glassy-eyed staring at a debugger for hours--something I have not missed at all.

Tuesday, February 19, 2008

Restarting the Platypus And the Lessons Learned

As many of you know, I have spent much of my free time during the last 24 months working on an open-source project called Platypus. The project's goal is to implement a command language like TeX, which enables users to embed formatting commands directly into text and generate documents of typeset quality in PDF, Microsoft Word, and HTML. The aims of Platypus are to be much easier to use than Tex and to provide many features of interest to developers, especially for printing code and listings.

After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.

In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.

PROJECT AND DESIGN LESSONS

1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.

2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.

3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.

4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.

PROGRAMMING LESSONS:

a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.

b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.

c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.

OBSERVATIONS ABOUT TOOLS

1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.

2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.

3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)

There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.

Words of Thanks

It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!

Wednesday, May 16, 2007

Unit Testing Private Variables and Functions

How do you write unit tests to exercise private functions and check on private variables? For my projects, I have relied on a technique of adding special testing-only methods to my classes. These methods all have names that begin with FTO_ (for testing only). My regular code may not call these functions. Eventually, I'll write a rule that code-checkers can enforce to make sure that these violations of data hiding don't accidentally appear in non-test code.

However, for a long time I've wanted to know if there is a better way to do this. So, I did what most good programmers do--I asked someone who knows testing better than I do. Which meant talking to the ever-kind Jeff Frederick, who is the main committer of the popular CI server Cruise Control (and the head of product development at Agitar).

Jeff contended that the problem is really one of code design. If all methods are short and specific, then it should be possible to test a private variable by probing the method that uses it. Or said another way: if you can't get at the variable to test it, chances are it's buried in too much code. (Extract Method, I have long believed, is the most important refactoring.)

Likewise private methods. Make 'em small, have them do only one thing, and call them from accessible methods.

I've spent a week noodling around with this sound advice. It appeals to me because almost invariably when I refactor code to make it more testable, I find that I've improved it. So far, Jeff is mostly right. I can eliminate most situations by cleaning up code. However, there are a few routines that look intractable. While I work at find a better way to refactor them (a constant quest of mine, actually), I am curious to know how you solve this problem.

Monday, April 30, 2007

How many unit tests per method? A rule of thumb

The other day, I met with Burke Cox who heads up Stelligent, a company that specializes in helping sites set up their code-quality infrastructure (build systems, test frameworks, code coverage analysis, continuous integration--the whole works, all on a fixed-price basis). One thing Stelligent does before leaving is to impart some of the best practices they've developed over the years.

A best practice Stelligent uses for determining the number of unit tests to write for a given method struck me as completely original. The number of tests is based on the method's cyclomatic complexity (aka McCabe complexity). This complexity metric starts at 1 and adds 1 for every path the code can take. Many tools today generate cyclomatic complexity counts for methods. Stelligent's rule of thumb is:

complexity 1-3: no test (likely the method is a getter/setter)
complexity 3-10: 1 test
complexity 11+: number of tests = complexity / 2

I like this guide, but would change one aspect. I think a cyclomatic complexity of 10 should have more than 1 test. I'd be more inclined to go with: complexity 3-10: # of tests = complexity / 3.

Note: you have to be careful with cyclomatic complexity. I recently wrote a switch statement that had 40 cases. Technically, that's a complexity measure of 40. Obviously, it's pointless to write lots of unit tests for 40 case statements that differ trivially. But, when the complexity number derives directly from the logic of the method, I think Stelligent's rule of thumb (with my modification) is an excellent place to start.

Monday, April 23, 2007

Effectiveness of Pair-wise Tests

The more I use unit testing, the more I wander into areas that are more typically the province of true testers (rather than of developers). One area I frequently visit is the problem of combinatorial testing, which is how to test code where there is a large number of possible values it must handle. Let's say, I have a function with four parameters that are all boolean. There are, therefore, 16 possible combos. My temptation is to write 16 unit tests. But the concept of pairwise testing argues against this test-every-permutation approach. It is based on the belief that most bugs occur between the interaction of pairs of values (rather than a specific configuration of three, or four of them). So, pair-wise experts look at the 16 possible values my switches can have and choose the minimum number in which all pairs of values have been exercised. It turns out there are 5 tests that will exercise every pair of switch combinations.

The question I've wondered about is if I write those five unit tests, rather than the more ambitious 16 tests, what have I given up? The answer is: not much. At the recent Software Test & Performance Conference, I attended a session by BJ Rollison who heads up TestingMentor.com when he's not drilling Microsoft's testers in the latest techniques. He provided some interesting figures from Microsoft's analysis of pair-wise testing.

For attrib.exe (which takes a path + 6 optional args), he did minimal testing, pairwise, and comprehensive testing with the following results:

Minimal: 9 tests, 74% code coverage, 358 code blocks covered.
Pairwise: 13 tests, 77% code coverage, 370 code blocks covered
Maximal: 972 tests, 77% code coverage, 370 code blocks covered

A similar test exploration with findstr.exe (which takes a string + 19 optional args) found that pairwise testing via 136 tests covered 74% of the app, while maximal coverage consisting of 3,533 tests covered 76% of the app.

These numbers make sense. Surely, if you test a key subset of pairs possibilities, testing additional combinations is not likely to exercise completely different routines, so code coverage should not increase much for the tests that exceed pair-wise recommendations. What surprised me was that pairwise got such high-numbers to begin with. 70+ % is pretty decent coverage.

From now on, pair-wise testing will be part of my unit-testing design toolbox. For a list of tools that can find the pairs to test, see here. Rollison highly recommended Microsoft's free PICT tool (see previous link), which also provides a means to specify special relationships between the various factors in the combinations.

Thursday, March 22, 2007

Characterization Tests

In my current column in SD Times, I discuss characterization tests, which are an as-yet little discussed form of testing. They are unit tests whose purpose is not to validate the correctness of code but to capture in tests the behavior of existing code. The idea, first popularized in Michael Feathers' excellent volume Working Effectively with Legacy Code, is that the tests can reveal the scope of any changes you make to a codebase.

You write comprehensive tests of the codebase before you touch a line. Then, you make your changes to the code, and rerun the tests. The failing tests reveal the dependencies on the code you modified. You clean up the failing tests and now they reflect the modified code base. Then, rinse, lather, repeat.

The problem historically with this approach is that writing the tests is a colossal pain, especially on large code bases. Moreover, large changes in the code break so many tests that no one wants to go back and redo 150 tests, say, just to capture the scope of changes. And so frequently, the good idea falls by the way side--with unfortunate effects when it comes time for functionally testing the revised code.To the rescue comes Agitar with a free service called JunitFactory. Get a free license and send them your Java code they will send you back the characterization tests for it. Cool idea. Change your code, run the tests, verify that nothing unexpected broke. And then have JUnitFactory re-create a complete set of characterization tests. How cool is that?

Actually, I use Agitar's service not only for characterization but for my program as I am developing it. The tests always show me unit tests I never thought of writing. Try it out. You'll like it.

Thursday, March 01, 2007

How Many Unit Tests Are Enough?

Recently, I was down visiting the folks at Agitar, who make great tools for doing unit testing. Going there always results in interesting conversations because they really live and breathe unit testing and are always finding new wrinkles in how to apply the technique. During one conversation, they casually threw out a metric for unit testing that I'd never heard before. It answers the question of: How many unit tests are enough? You'll note that pretty much all the books and essays on unit testing go to great pains to avoid answering this question, for fear, I presume, that by presenting any number they will discourage developers from writing more. Likewise, if the suggested number is high, they risk discouraging developers who will see the goal as overwhelming and unreachable.

The engineers at Agitar, however, did offer a number (as a side comment to an unrelated point). They said (I'm paraphrasing) that if the amount of test code equals the amount of program code, you're generally in pretty good shape. Their experience shows that parity of the two codebases translates into code coverage of around 70%, which means a well-tested app. Needless to say, they'd probably want to qualify this statement, but as a rule of thumb, I think it's a very practical data point--even if a bit ambitious.

I will see how it works out on my open-source projects. Currently, my ratio of test-code to app-code is (cough, cough) just over 42%. I guess I know what I'll be doing this weekend.

Saturday, January 27, 2007

One activity that is inherently productive: unit testing

In a post today in his blog, Larry O'Brien states: "...let's be clear that of all the things we know about software development, there are only two things that we know to be inherently highly productive: Well-treated talented programmers and iterative development incorporating client feedback"

I find it hard not to add unit testing to this list.

Of all the things that have changed in how I program during the last six or seven years, nothing comes close to unit testing in terms of making me more productive. In addition, it has made me a better programmer, because as I write code I am thinking about how to test it. And the changes that result are almost always improvements.

Binstock on Software