I have gone through "Uncle Bob" Martin's new book, Clean Code,which is a lenthy presentation of rules that will help Java developers write better code. It's similar to Kent Beck's Implementation Patterns,except more code-fixated. Clean Code has some good points, but it contains several weaknesses that seem to have gone entirely by the reviewers on Amazon. So, here's the scoop.
First of all, it's well hidden, but the book is only partially written by Bob Martin. Many chapters are written by other consultants who work at Martin's company--many of whom I've never heard of. The one stand-out exception is Michael Feathers, whose chapter on error handling is one of the clearest in the book. I wish he had written more.
The main body consists primarily of explaining various coding rules that Martin calls heuristics and to which he assigns coded abbreviations for later reference. Alas, unlike patterns that have meaningful names as shortcuts, Martin chooses meaningless notations such as C2 and G26. So, "the function should do nothing but compacting[G30]" is a shortcut for the author, but a pain for the reader who has to cross-reference these references repeatedly to know what Martin is talking about.
Unlike Beck's book, there is no theoretical framework to Martin's prescriptions. The book is a series of examples from which he teases this rule and that. Because of this lack of framework there is a certain desultory aspect--the rules come in seemingly random order.
Some of them make you want to leap up and clap. For example, his rule that Javadoc should not contain HTML. How many times I've come to the same conclusion! I want to read comments in code easily. The small lift that HTML brings to Javadoc pages is not in anyway worth the difficulty it adds to the reading of comments in code. Bob Martin's one of the first persons I've encountered to say so unequivocally.
Other rules are good, but later contradicted. For example, Martin states that you should never leave commented-out code in place. [C5] As he points out, no one knows why it's commented out and so it remains in place forever. However, later on in an example of refactoring code per his own rules, Martin comments out large blocks of code without an explanation of how that squares with his earlier advice. (p.374)
Martin also uses questionable coding preferences. For example, all of his code uses indents of 2 columns. 2 columns? It makes every routine look like a solid chunk of code. It's clearly not a practice to be recommended.
A large portion of the book is an example of Martin refactoring someone else's code. He takes a long piece from an OSS project and proceeds to "improve" it. I found this section uncompelling. Perhaps because in Fowler's masterpiece Refactoring,each refactoring magically transforms the code. By comparison, Bob Martin's work seems journeyman-like. I didn't find the initial code interesting nor did I find Martin's cleaned-up version luminous. I was expecting a before-and-after scenario that would make me sit up and take notice. Instead, the exercise felt preachy, condescending at times, and ultimately not terribly convincing.
My last gripe addresses an inexcusable error: typos. There aren't many but they are frequent enough to be distracting. For example, Martin seemingly does not understand the difference between it's and its. (p. 272, p. 296, among others) And his code contains typos too. (p. 309). This carelessness erodes credibility. Books that preach quality should be flawless at the level of spelling and grammar.
Overall, I think some organizations can use several of Martin's heuristics as a means of boosting their in-house coding standards. But I doubt that careful coders will find much of value. Those developers will be better served by Beck's Implementation Patterns,which is based on principles and so communicates much more information in fewer words. Since my review of Beck's book, I must confess my admiration for it has deepened, and it's the volume I would recommend if you're looking to write cleaner code.
Thursday, November 13, 2008
Sunday, September 28, 2008
Banishing Return Status Codes
The most enduringly popular post on this blog is Perfecting OO's Small Classes and Short Methods, which presents a short series of stringent guidelines to help an imperative-trained developer master OO.
If I were to add one item to the list, it would be: Don't use return codes to indicate the status of an action. Developers trained in languages such as C have the habit of using return codes to indicate the success or the nature of failure of the work done by a function. This approach is used because of the lack of a structured exception mechanism. But when exceptions are part of the language, the use of status codes isa poor choice. Among the key reasons are: many status codes are easily ignored; developers will expect problems to be reported via the exception mechanism; exceptions are much more descriptive. And finally, exceptions enable return codes to be used for something useful--namely returning a data item.
Astute readers will note that in Java, null is frequently used as a return value to indicate a problem (as in Collections). This practice subverts the previous points, and it too should be avoided. Returning a null presents code with many problems it should not have to face. The first is the risk of a null-pointer blow-up because the return value was accessed without being checked. This leads to the code bloat of endless null value checks. A much better solution, which avoids this problem, is to return an empty item (empty string, empty collection, etc.). This too communicates that no data item fulfilled the function's mandate, but it does not risk the null-pointer problem, and it frequently requires no special code to handle the error condition.
Hence, if your OO code is characterized by heavy reliance on return codes (many of which I am certain are not checked), consider rewriting it in favor of exceptions and use return statements solely for returning non-null data items.
If I were to add one item to the list, it would be: Don't use return codes to indicate the status of an action. Developers trained in languages such as C have the habit of using return codes to indicate the success or the nature of failure of the work done by a function. This approach is used because of the lack of a structured exception mechanism. But when exceptions are part of the language, the use of status codes isa poor choice. Among the key reasons are: many status codes are easily ignored; developers will expect problems to be reported via the exception mechanism; exceptions are much more descriptive. And finally, exceptions enable return codes to be used for something useful--namely returning a data item.
Astute readers will note that in Java, null is frequently used as a return value to indicate a problem (as in Collections). This practice subverts the previous points, and it too should be avoided. Returning a null presents code with many problems it should not have to face. The first is the risk of a null-pointer blow-up because the return value was accessed without being checked. This leads to the code bloat of endless null value checks. A much better solution, which avoids this problem, is to return an empty item (empty string, empty collection, etc.). This too communicates that no data item fulfilled the function's mandate, but it does not risk the null-pointer problem, and it frequently requires no special code to handle the error condition.
Hence, if your OO code is characterized by heavy reliance on return codes (many of which I am certain are not checked), consider rewriting it in favor of exceptions and use return statements solely for returning non-null data items.
Monday, September 01, 2008
A Parameter-Validation Smell and a Solution
Last week, Jeff Fredrick and I did a day-long code review of Platypus. We used a pair-programming approach, with Jeff driving and I helping with the navigation. Eventually, we got into the input parser, which parses input lines into a series of tokens: text, commmands, macros, and comments. Macros can require a second parsing pass, and commands often require additional parsing of parameters.
Once you get a parser working well (that is, it passes unit and functional tests, and it handles errors robustly), you generally don't want to mess with refactoring it. Experience tells you that parsers have hideous code in them and wisdom tells you to leave it alone. However, we launched in.
A frequent cause of otiose code was my extensive parameter checking. Parameters were validated at every step as tokens passed through multiple levels of parsing logic. Likewise, the movement of the parse point was updated multiple tiems as the logic resolved itself back up the processing stack. This too had to be validated repeatedly.
Jeff came up with an elegant refactoring that I could not find in the usual sources. He created an inner class consisting of the passed variables, a few methods for validating them, and a few more methods for manipulating them.
This class was then passed to the methods in lieu of the individual parameters--thereby reducing the number of parameters to one or two. And because the class constructor verified the initialization of the fields, I need only to check whether the passed class was null, rather than validate each of the internal fields.
The effect was to reduce complexity of already complex code, enforce DRY, and place the validation of the variables inside a class that contained them--a set of small, but important improvements. And like many of the best refactorings, it seems obvious in retrospect.
So, if you find your class's methods are repeatedly validating the same parameters, try bundling them in an inner class along with their validation logic. You'll like the results.
Once you get a parser working well (that is, it passes unit and functional tests, and it handles errors robustly), you generally don't want to mess with refactoring it. Experience tells you that parsers have hideous code in them and wisdom tells you to leave it alone. However, we launched in.
A frequent cause of otiose code was my extensive parameter checking. Parameters were validated at every step as tokens passed through multiple levels of parsing logic. Likewise, the movement of the parse point was updated multiple tiems as the logic resolved itself back up the processing stack. This too had to be validated repeatedly.
Jeff came up with an elegant refactoring that I could not find in the usual sources. He created an inner class consisting of the passed variables, a few methods for validating them, and a few more methods for manipulating them.
This class was then passed to the methods in lieu of the individual parameters--thereby reducing the number of parameters to one or two. And because the class constructor verified the initialization of the fields, I need only to check whether the passed class was null, rather than validate each of the internal fields.
The effect was to reduce complexity of already complex code, enforce DRY, and place the validation of the variables inside a class that contained them--a set of small, but important improvements. And like many of the best refactorings, it seems obvious in retrospect.
So, if you find your class's methods are repeatedly validating the same parameters, try bundling them in an inner class along with their validation logic. You'll like the results.
Tuesday, June 03, 2008
The Handiest Java Book in Years.
One of the constant challenges I have as a Java developer is keeping up with the numerous good FOSS dev tools. I no sooner start testing one tool and adapting my project to it, when a new one comes along. Being an analyst and naturally curious, this new product (or new release) represents a constant temptation. Is it better than what I am using? How much effort is required to try it out? What does it do better? On and on.
I can put a lot of those concerns to rest now. I just received a copy of Java Power Tools from O'Reilly and it's exactly what I've been looking for. It contains deep explanations of the principal FOSS dev tools in 10 major categories. These explanations are not two- or four-page summaries, but in-depth expositions that provide crucial info on the strengths and weaknesses of the product. The author, John Smart, then provides detailed tutorial on using the product. It's clear he's spent lots of time exploring the dark corners of each tool. And he makes good use of that knowledge in his comparisons and comments on the products.
If you want to spend an hour or so coming up to speed on what a product is about before installing it (and without having to work through the usually limited docs), this book will get you there faster and enable you get an overview of a whole lot of tools quickly and with the assurance you have a clear understanding. Here are the tools that are covered, followed by the number of pages for each one in parentheses:
BUILD TOOLS: Ant (55), Maven (60)
SCM: CVS (20), Subversion (78)
CI: Continuum (24p) Cruise Control (19) LuntBuild (32) Hudson (19)
IM: Openfire (12)
UNIT TESTING: JUnit (20) TestNG (25) Cobertura (17)
OTHER TESTING: StrutsTestCase (10) DbUnit (44p) JUnitPerf (10) JMeter (20) SoapUI (22) Selenium (30( Fest (9)
PROFILING: with Sun tools (16) with Eclipse (15)
DEFECT MANAGEMENT: Bugzilla (20) Trac (35)
QUALITY: Checkstyle (20) PMD (18p) FindBugs (12) Jupiter (18) Mylyn (14p)
All told, 856 pages of crisp, well-written explanations. A must-have reference for the bookshelf.
Thursday, May 22, 2008
Is the popularity of unit tests waning?
Before getting into my concerns about whether unit testing's popularity has peaked, let me state that I think unit testing is the most important benefit wrought by the agile revolution. I agree that you can write perfectly good programs without unit tests (we did put man on the moon in 1969, after all), but for most programs of any size, you're likely to be far better off using unit tests than not.
The problem is that only a small subset of developers understand that. And recent data points suggests that the number of programmers who use unit tests is not exactly growing quickly. I'll list some of the data points below that I've been developing for my column in SD Times.
1) Commercial products on the wane. Agitar was a company whose entire fate was tied to the popularity of unit testing. Despite very good products, a free service to auto-generate unit tests for your code, and some terrific exponents (especially Alberto Savoia and Jeff Frederick) to tell their story, the company closed a down a few weeks ago, essentially having come to the conclusion that it could never be sold at a price that could repay investors. So rather than ask for more funding, it closed down. If unit testing were gaining popularity robustly, Agitar surely would have come to a different conclusion.
2) Few OSS products. Except for the xUnit frameworks themselves, few FOSS tools for unit testing have been adopted. The innovative Jester project, which built a tool that looked for untested or poorly tested logic, essentially stopped development a long time ago because to quote the founder, Ivan Moore, in a comment to me "so few sites are into unit testing enough to care about perfecting their tests."
3) Major Java instructors aren't teaching it. Consider this interview with Cay Horstmann, co-author of the excellent Core Java books. (He asks, "If so many experienced developers don't write unit tests, what does that say?" In speculating on an answer, he implies that good developers don't need unit tests. Ugh!)
4) Unit testing books are few and far between. I am seeing about one new one a year. And as yet, not a single book on JUnit 4, which has been out for nearly three years(!).
5) Alternative unit-testing frameworks, such as the excellent TestNG, are essentially completely invisible. I was at a session on scripting this spring at SD West and in a class of 30 or so, two people had heard of TestNG (the teacher and I).
I could speculate on causes, but I have no clear culprit to point to. Certainly, unit testing needs to be evangelized more. And evangelized correctly. The folks who insist on 100% code coverage are making a useful tool unpalatable to serious programmers (as discussed here by Howard Lewis Ship, the inventor of Tapestry). But, I think the cause has to be something deeper than this. I would love to hear thoughts from readers in real-world situations where unit testing has been abandoned, cut back, or simply rejected--and why.
It would be a shame to have unit testing disappear and its current users viewed as aging, pining developers hankering for a technology the world has largely passed by. That would return programmers to the tried-and-true practice of glassy-eyed staring at a debugger for hours--something I have not missed at all.
The problem is that only a small subset of developers understand that. And recent data points suggests that the number of programmers who use unit tests is not exactly growing quickly. I'll list some of the data points below that I've been developing for my column in SD Times.
1) Commercial products on the wane. Agitar was a company whose entire fate was tied to the popularity of unit testing. Despite very good products, a free service to auto-generate unit tests for your code, and some terrific exponents (especially Alberto Savoia and Jeff Frederick) to tell their story, the company closed a down a few weeks ago, essentially having come to the conclusion that it could never be sold at a price that could repay investors. So rather than ask for more funding, it closed down. If unit testing were gaining popularity robustly, Agitar surely would have come to a different conclusion.
2) Few OSS products. Except for the xUnit frameworks themselves, few FOSS tools for unit testing have been adopted. The innovative Jester project, which built a tool that looked for untested or poorly tested logic, essentially stopped development a long time ago because to quote the founder, Ivan Moore, in a comment to me "so few sites are into unit testing enough to care about perfecting their tests."
3) Major Java instructors aren't teaching it. Consider this interview with Cay Horstmann, co-author of the excellent Core Java books. (He asks, "If so many experienced developers don't write unit tests, what does that say?" In speculating on an answer, he implies that good developers don't need unit tests. Ugh!)
4) Unit testing books are few and far between. I am seeing about one new one a year. And as yet, not a single book on JUnit 4, which has been out for nearly three years(!).
5) Alternative unit-testing frameworks, such as the excellent TestNG, are essentially completely invisible. I was at a session on scripting this spring at SD West and in a class of 30 or so, two people had heard of TestNG (the teacher and I).
I could speculate on causes, but I have no clear culprit to point to. Certainly, unit testing needs to be evangelized more. And evangelized correctly. The folks who insist on 100% code coverage are making a useful tool unpalatable to serious programmers (as discussed here by Howard Lewis Ship, the inventor of Tapestry). But, I think the cause has to be something deeper than this. I would love to hear thoughts from readers in real-world situations where unit testing has been abandoned, cut back, or simply rejected--and why.
It would be a shame to have unit testing disappear and its current users viewed as aging, pining developers hankering for a technology the world has largely passed by. That would return programmers to the tried-and-true practice of glassy-eyed staring at a debugger for hours--something I have not missed at all.
Friday, April 25, 2008
Knuth Interview Posted
My interview with Donald Knuth is now posted. It's a long piece, that has some unusually interesting points, including:
- why Knuth doesn't believe in designing code for reuse
- he's most unconvinced of multithreading and multicore on the desktop
- discussion of the tools he uses to program and write (including Ubuntu)
- etc.
A very fun read (and a fun interview to do).
- why Knuth doesn't believe in designing code for reuse
- he's most unconvinced of multithreading and multicore on the desktop
- discussion of the tools he uses to program and write (including Ubuntu)
- etc.
A very fun read (and a fun interview to do).
Wednesday, April 23, 2008
Perfecting OO's Small Classes and Short Methods
In The ThoughtWorks Anthology a new book from the Pragmatic Programmers, there is a fascinating essay called “Object Calisthenics” by Jeff Bay. It’s a detailed exercise for perfecting the writing of the small routines that demonstrate characterize good OO implementations. If you have developers who need to improve their ability to write OO routines, I suggest you have a look-see at this essay. I will try to summarize Bay’s approach here.
He suggests writing a 1000-line program with the constraints listed below. These constraints are intended to be excessively restrictive, so as to force developers out of the procedural groove. I guarantee if you apply this technique, their code will move markedly towards object orientation. The restrictions (which should be mercilessly enforced in this exercise) are:
1. Use only one level of indentation per method. If you need more than one level, you need to create a second method and call it from the first. This is one of the most important constraints in the exercise.
2. Don’t use the ‘else’ keyword. Test for a condition with an if-statement and exit the routine if it’s not met. This prevents if-else chaining; and every routine does just one thing. You’re getting the idea.
3. Wrap all primitives and strings. This directly addresses “primitive obsession.” If you want to use an integer, you first have to create a class (even an inner class) to identify it’s true role. So zip codes are an object not an integer, for example. This makes for far clearer and more testable code.
4. Use only one dot per line. This step prevents you from reaching deeply into other objects to get at fields or methods, and thereby conceptually breaking encapsulation.
5. Don’t abbreviate names. This constraint avoids the procedural verbosity that is created by certain forms of redundancy—if you have to type the full name of a method or variable, you’re likely to spend more time thinking about its name. And you’ll avoid having objects called Order with methods entitled shipOrder(). Instead, your code will have more calls such as Order.ship().
6. Keep entities small. This means no more than 50 lines per class and no more than 10 classes per package. The 50 lines per class constraint is crucial. Not only does it force concision and keep classes focused, but it means most classes can fit on a single screen in any editor/IDE.
7. Don’t use any classes with more than two instance variables. This is perhaps the hardest constraint. Bay’s point is that with more than two instance variables, there is almost certainly a reason to subgroup some variables into a separate class.
8. Use first-class collections. In other words, any class that contains a collection should contain no other member variables. The idea is an extension of primitive obsession. If you need a class that’s a subsumes the collection, then write it that way.
9. Don’t use setters, getters, or properties. This is a radical approach to enforcing encapsulation. It also requires implementation of dependency injection approaches and adherence to the maxim “tell, don’t ask.”
Taken together, these rules impose a restrictive encapsulation on developers and force thinking along OO lines. I assert than anyone writing a 1000-line project without violating these rules will rapidly become much better at OO. They can then, if they want, relax the restrictions somewhat. But as Bay points out, there’s no reason to do so. His team has just finished a 100,000-line project within these strictures.
He suggests writing a 1000-line program with the constraints listed below. These constraints are intended to be excessively restrictive, so as to force developers out of the procedural groove. I guarantee if you apply this technique, their code will move markedly towards object orientation. The restrictions (which should be mercilessly enforced in this exercise) are:
1. Use only one level of indentation per method. If you need more than one level, you need to create a second method and call it from the first. This is one of the most important constraints in the exercise.
2. Don’t use the ‘else’ keyword. Test for a condition with an if-statement and exit the routine if it’s not met. This prevents if-else chaining; and every routine does just one thing. You’re getting the idea.
3. Wrap all primitives and strings. This directly addresses “primitive obsession.” If you want to use an integer, you first have to create a class (even an inner class) to identify it’s true role. So zip codes are an object not an integer, for example. This makes for far clearer and more testable code.
4. Use only one dot per line. This step prevents you from reaching deeply into other objects to get at fields or methods, and thereby conceptually breaking encapsulation.
5. Don’t abbreviate names. This constraint avoids the procedural verbosity that is created by certain forms of redundancy—if you have to type the full name of a method or variable, you’re likely to spend more time thinking about its name. And you’ll avoid having objects called Order with methods entitled shipOrder(). Instead, your code will have more calls such as Order.ship().
6. Keep entities small. This means no more than 50 lines per class and no more than 10 classes per package. The 50 lines per class constraint is crucial. Not only does it force concision and keep classes focused, but it means most classes can fit on a single screen in any editor/IDE.
7. Don’t use any classes with more than two instance variables. This is perhaps the hardest constraint. Bay’s point is that with more than two instance variables, there is almost certainly a reason to subgroup some variables into a separate class.
8. Use first-class collections. In other words, any class that contains a collection should contain no other member variables. The idea is an extension of primitive obsession. If you need a class that’s a subsumes the collection, then write it that way.
9. Don’t use setters, getters, or properties. This is a radical approach to enforcing encapsulation. It also requires implementation of dependency injection approaches and adherence to the maxim “tell, don’t ask.”
Taken together, these rules impose a restrictive encapsulation on developers and force thinking along OO lines. I assert than anyone writing a 1000-line project without violating these rules will rapidly become much better at OO. They can then, if they want, relax the restrictions somewhat. But as Bay points out, there’s no reason to do so. His team has just finished a 100,000-line project within these strictures.
Monday, April 07, 2008
Easy Does It With easyb
I just got back from the CITcon conference, which is the thrice-yearly confab of agile developers who use continuous integration (the "CIT" in the conference name). This was my second time at CITcon. It's an open-space conference that is--surprise!--free, and chock-a-block full of good information. The principal reason it's so informative is that anyone committed enough to CI to go to a conference has probably spent a lot of time thinking about how to solve problems of build and test at his/her site. And this concern and reflection on these issues is amply evident in the discussions in the hallways and the informal presentations.
All the sessions I attended were thought-provoking. But probably the most interesting was a presentation by Andy Glover, the president of Stelligent, an agile consultancy. He runs a great blog in which has been touting a tool called easyb, which enables you to script unit tests so that they describe a scenario (rather than a code feature) and then test for the expected result. I've read Andy's enthusiasm for easyb, but it wasn't until I saw him demo it that I understood what the excitement was about.
The key benefits are 1) you can show a non-programmer (like the manager who is expecting the software any day now) that you have written tests that match every one of his requirements--easyb enables you to do this by writing the test in near English language; 2) you can test at a slightly higher level than the unit test: rather than test tiny features individually, you can quite easily test a succession of conditions that are chained together.
This approach is called--a little misleadingly,--behavior-driven development; which was an immediate turn off for me. I really don't want to learn another x-driven development. I just want to do what I do better. And I think easyb might just be such a tool. So, don't worry about the name, and hop over to the easyb website for a quick look-see. You'll like what you find.
All the sessions I attended were thought-provoking. But probably the most interesting was a presentation by Andy Glover, the president of Stelligent, an agile consultancy. He runs a great blog in which has been touting a tool called easyb, which enables you to script unit tests so that they describe a scenario (rather than a code feature) and then test for the expected result. I've read Andy's enthusiasm for easyb, but it wasn't until I saw him demo it that I understood what the excitement was about.
The key benefits are 1) you can show a non-programmer (like the manager who is expecting the software any day now) that you have written tests that match every one of his requirements--easyb enables you to do this by writing the test in near English language; 2) you can test at a slightly higher level than the unit test: rather than test tiny features individually, you can quite easily test a succession of conditions that are chained together.
This approach is called--a little misleadingly,--behavior-driven development; which was an immediate turn off for me. I really don't want to learn another x-driven development. I just want to do what I do better. And I think easyb might just be such a tool. So, don't worry about the name, and hop over to the easyb website for a quick look-see. You'll like what you find.
Monday, March 31, 2008
Great Reference For Ruby
Ruby aficionados have been working for the last few years under a serious handicapt: there was not good, up-to-date reference on their favorite language. Sure, the Pickaxe book provided some guidance, but it's a hybrid work--part tutorial, part reference. And the reference section was a summary, rather than an in-depth exposition.
Ever-dependable O'Reilly just released Ruby Programming Language, which is without a doubt the definitive Ruby reference. Not only is it co-authored by Yukihiro "Matz" Matusmoto, the inventor of Ruby, but it is superbly well edited, so that every page is full of useful information presented clearly. And at more than 400 pages, that's a lot of information. Couple this book with The Ruby Cookbook, which I reviewed on this blog, and you have probably the best 1-2 combination for learning and using Ruby.
Tuesday, February 19, 2008
Restarting the Platypus And the Lessons Learned
As many of you know, I have spent much of my free time during the last 24 months working on an open-source project called Platypus. The project's goal is to implement a command language like TeX, which enables users to embed formatting commands directly into text and generate documents of typeset quality in PDF, Microsoft Word, and HTML. The aims of Platypus are to be much easier to use than Tex and to provide many features of interest to developers, especially for printing code and listings.
After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.
In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.
PROJECT AND DESIGN LESSONS
1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.
2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.
3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.
4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.
PROGRAMMING LESSONS:
a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.
b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.
c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.
OBSERVATIONS ABOUT TOOLS
1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.
2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.
3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)
There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.
Words of Thanks
It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!
After approximately 20,000 lines of Java code and comments, I have concluded that I need to restart and re-architect the project. The more I code, the more I see that I am adding top floors to a leaning tower. Eventually I'll topple it. So by restarting Platypus, I hope to straighten out architectural shortcomings and deliver a better, more expandable product more quickly.
In the process of coming to this decision, I have been able to crystallize several key lessons, a few of which I could probably have seen foreseen.
PROJECT AND DESIGN LESSONS
1) It's extremely difficult to figure out where your architecture is deficient if you have never done the kind of project you're currently undertaking. The best you can do is layout some basic architecture, abide by good dev practices, and learn as you go. Alas, as in this case, it took 20K lines of work to recognize that the architecture was irretrievably flawed and how.
2) First, do the known hard parts that can't be avoided. In the case of Platypus, I knew from the get-go I wanted a full programming language for the user to employ (the lack of which is one of the major failings of TeX). Early on, I decided that language would be JavaScript (JS). And having decided that and knowing that Java 6 had built-in support for JS, I put the issue aside for later implementation. When I revisited implementing JS, I realized that my command syntax no longer worked well with JS and that some commands would have been better implemented in JS. Had I written code and worked with embedded JS from the beginning, I could have avoided these dissonances, and I would have experienced Platypus more from the perspective of the user.
3) If you have to write a parser, write it just once. I wrote a parser for basic and compound commands, but did not anticipate intricacies of the syntax for very complex commands (think of commands to specify a table, for example). When it came time to add these commands, I found myself undoing a lot of parser work and then trying to back-fit existing syntax in to the new grammar. Parsers are a nightmare to get right, so make sure you write them just once. This means planning all your syntax ahead of time and in great detail.
4) Learn how to present your project crisply. Everyone understands writing a debugger for a hot new language is a cool project that will attract contributors. But projects where there is no immediately identifiable built-in community require crisply articulated messages. I did not do this well.
PROGRAMMING LESSONS:
a) Design Java classes around dependency injection. This will make the classes work better together and help you test them well. I am not saying use a DI framework, which is overkill in many situations, just use the DI pattern.
b) When it comes to modularity for input and output processing, plug-ins are an excellent design model. They form a very convenient way to break projects into sub-projects, and they make it easier for contributors to work on a discrete part of the project.
c) Unit testing delivers extra value when you're writing difficult code. The ability to be deep in the parser and test for some specific semantic occurrence right away is pure gold. Unit testing can also show up design errors. At one point, I was implementing a whole slew of similar commands (relating to output of special characters). I'd made each special character its own class; so each character required: copying a class, renaming it, and changing two strings inside it. Then rinse, lather, repeat. Likewise, my unit tests were just copied, tweaked, and run. This obviously is not a great use of unit tests (what are you actually testing? your ability to tweak a test template?). This was a good clue that the design needs revisiting. And in fact, it did. In the new design, one class will handle all special characters.
OBSERVATIONS ABOUT TOOLS
1) Of all the Java IDEs I've used, I like IntelliJ IDEA best. And since I review IDEs regularly for InfoWorld, I've used lots of them including the high-priced pups. IDEA provides the best user experience, bar none. However, as the number of classes in Platypus climbed towards 200, I noticed IDEA became very slow. Clicking on a file to haul it into an edit window was a tedious process, sometimes taking 30 seconds or more (even with v. 7 of the product). Because of this, I will look elsewhere at restart. I expect to use NetBeans 6 (a hugely improved IDE, by the by) at least initially. We'll see how that works out.
2) I switched from Ant to Maven while working on Platypus. Maven is a much better solution for many build steps than Ant. See my blog post on this. However, I dislike both products. I find that I still have to waste lots of time doing simple configurations. Also, I also don't like using XML for configuring builds. I generally concur with Tapestry's Howard Ship, that Ivy plus some other tool might be a better solution. I'll explore this as I go.
3) Continuous Integration (CI) is a good concept and there are truly great tools out there. But outside of providing historical data about a project, CI's value is limited on a one-developer project. This especially true when builds are already being done on a separate machine and only code that past tests is checked into the repository. (Nonetheless, the historical data is reason enough to continue using it.)
There are surely other lessons to be learned, and as they come to me, I'll post them on this blog if they seem useful.
Words of Thanks
It would be quite wrong to end this post without pausing to deeply thank Jeff Frederick, who was exceedingly generous with his time and insights while I worked on this first phase and who hastened my realization of several important aspects that I've touched on in this post. Thank you!
Friday, January 25, 2008
Internal USB Ports: What do you think they're for?
Earlier this week, I was being briefed by HP about some recently released workstations. As we were moving through the slide-deck, a small item caught my attention: one workstation claimed to have 2 USB ports on the front panel, 6 on the back, and 2 marked "internal." Why, I asked, would anyone want an internal USB port on a PC? Care to guess?
The answer is: for dongle keys. Yeah, they're still around and they use USB form factors. The internal aspect is interesting. It's designed so you can insert the dongle, lock the PC and nobody walks off with the dongle key.
I honestly would never have guessed.
The answer is: for dongle keys. Yeah, they're still around and they use USB form factors. The internal aspect is interesting. It's designed so you can insert the dongle, lock the PC and nobody walks off with the dongle key.
I honestly would never have guessed.
Tuesday, January 22, 2008
Excellent Explanation of Dependency Injection (Inversion of Control)
I've read lots of explanations of Dependency Injection or DI (formerly known as Inversion of Control) and the associated Hollywood Principle ("Don't call us, we'll call you."). They all tend to be unclear, either because they delve immediately into highly detailed explanations, or they tie the explanation specifically to one particular technology. Such that either the pattern is lost or its simplicity is. Here is clearest explanation I've found--slightly edited for brevity (from the very good Spring in Action, 2nd. Ed. by Craig Walls):
"Any nontrivial application is made up of two or more classes that collaborate with each other to perform some business logic. Traditionally, each object is responsible for obtaining its own references to the objects it collaborates with (its dependencies). When applying DI, the objects are given their dependencies at creation time by some external entity that coordinates each object in the system. In other words, dependencies are injected into objects."
I find that very clear.
Dependency Injection was originally called Inversion of Control (IoC) because the normal control sequence would be the object finds the objects it depends on by itself and then calls them. Here, this is reversed: The dependencies are handed to the object when it's created. This also illustrates the Hollywood Principle at work: Don't call around for your dependencies, we'll give them to you when we need you.
If you don't use DI, you're probably wondering why it's a big deal. It delivers a key advantage: loose coupling. Objects can be added and tested independently of other objects, because they don't depend on anything other than what you pass them. When using traditional dependencies, to test an object you have to create an environment where all of its dependencies exist and are reachable before you can test it. With DI, it's possible to test the object in isolation passing it mock objects for the ones you don't want or need to create. Likewise, adding a class to a project is facilitated because the class is self-contained, so this avoids the "big hairball" that large projects often evolve into.
The challenge of DI is writing an entire application using it. A few classes are no big deal, but a whole app is much more difficult. For entire applications, you frequently want a framework to manage the dependencies and the interactions between objects. DI frameworks are often driven by XML files that help specify what to pass to whom and when. Spring is a full-service Java DI framework; other lighter DI frameworks include NanoContainer and the even more lightweight PicoContainer .
Most of these frameworks have good tutorials to help beginners find their way.
"Any nontrivial application is made up of two or more classes that collaborate with each other to perform some business logic. Traditionally, each object is responsible for obtaining its own references to the objects it collaborates with (its dependencies). When applying DI, the objects are given their dependencies at creation time by some external entity that coordinates each object in the system. In other words, dependencies are injected into objects."
I find that very clear.
Dependency Injection was originally called Inversion of Control (IoC) because the normal control sequence would be the object finds the objects it depends on by itself and then calls them. Here, this is reversed: The dependencies are handed to the object when it's created. This also illustrates the Hollywood Principle at work: Don't call around for your dependencies, we'll give them to you when we need you.
If you don't use DI, you're probably wondering why it's a big deal. It delivers a key advantage: loose coupling. Objects can be added and tested independently of other objects, because they don't depend on anything other than what you pass them. When using traditional dependencies, to test an object you have to create an environment where all of its dependencies exist and are reachable before you can test it. With DI, it's possible to test the object in isolation passing it mock objects for the ones you don't want or need to create. Likewise, adding a class to a project is facilitated because the class is self-contained, so this avoids the "big hairball" that large projects often evolve into.
The challenge of DI is writing an entire application using it. A few classes are no big deal, but a whole app is much more difficult. For entire applications, you frequently want a framework to manage the dependencies and the interactions between objects. DI frameworks are often driven by XML files that help specify what to pass to whom and when. Spring is a full-service Java DI framework; other lighter DI frameworks include NanoContainer and the even more lightweight PicoContainer .
Most of these frameworks have good tutorials to help beginners find their way.
Wednesday, January 09, 2008
Use Virtualization to Avoid Malware While WebSurfing
In presentations at Infoworld's Virtualization Summits (slides here), I have repeatedly discussed how virtualization can prevent malware infections when you surf the web. The idea is to surf and do all transactions from inside a VM. Most attendees listen to this suggestion, but they seem primarily to be waiting for me to move onto the meat of my talk. I suspect they don't take the advice to heart because they feel they have various utilities on the alert for viruses and malware infections. However, as we see here, even well-known companies, such as Sears and Kmart, install key loggers and malware that route private data to third parties. Meaning, that even if you go only to sites you believe are known good, you can still be infected with malware.
By browsing from within a VM, you protect yourself against many malicious packages. In the ideal scenario, you use two VMs: One for important transactions where security is paramount (online banking, investment accounts, etc.) and another for all other browsing.
If either VM becomes infected, delete it, make a clone of the master VM, and resume browsing. Periodically, you should throw out the "just browsing" VM and bring over a clean instance, so that any undetected stealth malware is disposed of. You'll need to bring over your bookmarks file when you swap VMs or, if you prefer, you can use any of the tag services (del.icio.us and the like) to maintain your list of favorites.
I use VirtualPC from Microsoft, which can be downloaded for free. You can use it to run a Windows VM, but you need to make sure you have valid licenses for those VMs. (Actually, until April 1, you need no license at all. You can download a Windows VM with IE installed directly from Microsoft.) Using a UNIX/Linux VM is an alternative approach that provides three advantages over Windows: licenses are free, the VMs are smaller (less than the 750MB Windows needs, typically), and malware writers rarely target Linux, so your VMs stay cleaner/safer longer.
One version of Linux you can't use for this purpose, though, is Ubuntu, surprisingly. It does not install correctly on Microsoft Virtual PC. Despite a wealth of tips, I have not been able to find a way to get it to run. However, Novell SUSE works fine. And I am sure other distros do too.
Anyway, this rarely discussed use of virtualization enables me to surf with impunity and with no fear of being hijacked.
By browsing from within a VM, you protect yourself against many malicious packages. In the ideal scenario, you use two VMs: One for important transactions where security is paramount (online banking, investment accounts, etc.) and another for all other browsing.
If either VM becomes infected, delete it, make a clone of the master VM, and resume browsing. Periodically, you should throw out the "just browsing" VM and bring over a clean instance, so that any undetected stealth malware is disposed of. You'll need to bring over your bookmarks file when you swap VMs or, if you prefer, you can use any of the tag services (del.icio.us and the like) to maintain your list of favorites.
I use VirtualPC from Microsoft, which can be downloaded for free. You can use it to run a Windows VM, but you need to make sure you have valid licenses for those VMs. (Actually, until April 1, you need no license at all. You can download a Windows VM with IE installed directly from Microsoft.) Using a UNIX/Linux VM is an alternative approach that provides three advantages over Windows: licenses are free, the VMs are smaller (less than the 750MB Windows needs, typically), and malware writers rarely target Linux, so your VMs stay cleaner/safer longer.
One version of Linux you can't use for this purpose, though, is Ubuntu, surprisingly. It does not install correctly on Microsoft Virtual PC. Despite a wealth of tips, I have not been able to find a way to get it to run. However, Novell SUSE works fine. And I am sure other distros do too.
Anyway, this rarely discussed use of virtualization enables me to surf with impunity and with no fear of being hijacked.