Wednesday, April 23, 2008

Perfecting OO's Small Classes and Short Methods

In The ThoughtWorks Anthology a new book from the Pragmatic Programmers, there is a fascinating essay called “Object Calisthenics” by Jeff Bay. It’s a detailed exercise for perfecting the writing of the small routines that demonstrate characterize good OO implementations. If you have developers who need to improve their ability to write OO routines, I suggest you have a look-see at this essay. I will try to summarize Bay’s approach here.

He suggests writing a 1000-line program with the constraints listed below. These constraints are intended to be excessively restrictive, so as to force developers out of the procedural groove. I guarantee if you apply this technique, their code will move markedly towards object orientation. The restrictions (which should be mercilessly enforced in this exercise) are:

1. Use only one level of indentation per method. If you need more than one level, you need to create a second method and call it from the first. This is one of the most important constraints in the exercise.

2. Don’t use the ‘else’ keyword. Test for a condition with an if-statement and exit the routine if it’s not met. This prevents if-else chaining; and every routine does just one thing. You’re getting the idea.

3. Wrap all primitives and strings. This directly addresses “primitive obsession.” If you want to use an integer, you first have to create a class (even an inner class) to identify it’s true role. So zip codes are an object not an integer, for example. This makes for far clearer and more testable code.

4. Use only one dot per line. This step prevents you from reaching deeply into other objects to get at fields or methods, and thereby conceptually breaking encapsulation.

5. Don’t abbreviate names. This constraint avoids the procedural verbosity that is created by certain forms of redundancy—if you have to type the full name of a method or variable, you’re likely to spend more time thinking about its name. And you’ll avoid having objects called Order with methods entitled shipOrder(). Instead, your code will have more calls such as Order.ship().

6. Keep entities small. This means no more than 50 lines per class and no more than 10 classes per package. The 50 lines per class constraint is crucial. Not only does it force concision and keep classes focused, but it means most classes can fit on a single screen in any editor/IDE.

7. Don’t use any classes with more than two instance variables. This is perhaps the hardest constraint. Bay’s point is that with more than two instance variables, there is almost certainly a reason to subgroup some variables into a separate class.

8. Use first-class collections. In other words, any class that contains a collection should contain no other member variables. The idea is an extension of primitive obsession. If you need a class that’s a subsumes the collection, then write it that way.

9. Don’t use setters, getters, or properties. This is a radical approach to enforcing encapsulation. It also requires implementation of dependency injection approaches and adherence to the maxim “tell, don’t ask.”

Taken together, these rules impose a restrictive encapsulation on developers and force thinking along OO lines. I assert than anyone writing a 1000-line project without violating these rules will rapidly become much better at OO. They can then, if they want, relax the restrictions somewhat. But as Bay points out, there’s no reason to do so. His team has just finished a 100,000-line project within these strictures.


Anonymous said...

Somewhat reminds me of, §3 :

" Object-oriented programming generates a lot of what looks like work. Back in the days of fanfold, there was a type of programmer who would only put five or ten lines of code on a page, preceded by twenty lines of elaborately formatted comments. Object-oriented programming is like crack for these people: it lets you incorporate all this scaffolding right into your source code. Something that a Lisp hacker might handle by pushing a symbol onto a list becomes a whole file of classes and methods. So it is a good tool if you want to convince yourself, or someone else, that you are doing a lot of work."

I guess such a 1000 lines project can easily be reduced to something like maybe 400 lines in a more usual, less constrained OO-way (one where you would allow the else keyword for example, or one where you would allow a loop in your method). That leads us to about 150 lines of Python, and even 80/100 lines of Lisp. Hey, the whole project now fits in one or two screens ! But of course, it does not look like hard work anymore...

akuhn said...

Greate exercise! Reminds of Challenge #1 of our programming II lecture:

Anonymous said...

Nice article and a good suggestion of an exercise. I can see how the restrictions will enforce a good (or at least better) adoption of OO practises.

I really can't agree with "9. Don’t use setters, getters". I'm of the opinion that these kinds of methods should always be used. Keeping the validation code in one place (e.g. erroring when setting something to null) is just common sense.

When writing something using IOC putting a similar validation in the getter gives me a sanity check that my Spring XML is correct and is injecting all the required things.

Also, without using a getter (and assuming the attributes are private) how can you override the functionality of a getter in inheriting classes? I'm wondering if you've just missed the word "public" and number 9 should read "Do not use public getters, setters...". If that's the case, can I just change my naming convention? ;-)

Anonymous said...

Hold on. If I am allowed only one level of identing in a method then I can't use 'if' statements at all. Perhaps they mean only two levels....

Anonymous said...

Still an awesome idea though.

Anonymous said...

Perhaps these rules are good for newbies who might otherwise right 'bad' OO code by, for instance, writing a 1000 line method crammed with all sorts of unrelated code. But for 'good' coders blindly constaining your code following these rules will, at best, lead to code that is no better than what they would have written in the first place and in most cases will lead to code that is not as good.

'good' code is tightly bound and loosely coupled (you never defined what good code is). Let's assume that a coder has written some code that is as tightly bound, loosely coupled, and as readable as the code could ever be, yet the code violates some of the constraints that you have listed (and good code will often violate these constraints, they are rather arbitrary, without proof that they correlate highly to 'good' code).
Then rewriting the code only make the code less good, not better.

If you want to make good coders out of newbie coders then instead of merely giving them rules to follow teach them the principles of what makes code 'good' and have regular code reviews to improve technique.

Anonymous said...

I like this. I am obsessed with OOP and I am always looking for ideas on how to improve my code and this really helps.

Thanks alot.

Anonymous said...

This is a joke, right? :D I guess you could write a 100000-line program under these restrictions, but it would probably do about as much as a 2000-line normally written one.

Anonymous said...

From reddit comments

"Don’t use any classes with more than two instance variables. "

class Point3D {
int x,y,z;


Unknown said...

Why would anyone store a zip code as an integer. When do you perform a calculation with a zip code? I agree that zip codes should be encapsulated in their own object, but the zip code should be a string and there should be a property called country that stores an ISO country code so that the zip code string can be validated against the particular format of the country. Just sayin'.

Anonymous said...

I think some people are missing the point of this exercise. Nobody is claiming that you should rigidly follow these rules for your production code. It's an exercise to help change the way you think about structuring your code, and give you better OO habits when writing real-world code.

Anonymous said...

"Nobody is claiming that you should rigidly follow these rules for your production code."

In my opinion the very last paragraph implies that you should follow these rules.

I wonder how many lines of spurious code are in their 100,000 line project...

Douglas Squirrel said...

You could enforce some of these rules with a tool like Checkstyle in Java. I wonder if anyone has tried this? For a 100000-line project you would need some kind of automated enforcement, I would think.

My colleagues have recently been moving away from rigid rules of this type, but maybe we should try this as an exercise since our OO skills are not what we would like.

Jeffrey Fredrick said...

"3. Wrap all primitives and strings."

This is one of my favorite suggestions, but I come at it a slightly different way that I think explains why it is a good idea. What I argue is that "data + constraints = a type". So with the zip code example you should create a ZipCode class rather than using an int or a String because a zip code can't be just any String or int, it will need to be validated and you'll want that validation in a single location.

Anonymous said...

I do not see how 8 and 9 can coexist.

If you have a wrapper around a collection, encapsulating the collection, how would one manipulate the data?

With 9 in effect, how does one even gain access to the collection to manipulate it?

I like most of these ideas though. I say, pick a couple that makes most practical sense (there are certainly 3-4 there). The others, don't bother.

Anonymous said...

It isn't supposed to be how you should write programs. It is an exercise to help you improve your object orientated thinking in real applciations. I think people are missing that. This isn't supposed to be an article on how to do object orientated design. It is an essay on imposing insane limitations on an exercise to improve your overall ability to develop oo applications.

MarlonV said...

Good code, bad code, little code, alot?
At the end, if I can write sane code, which delivers the expected result in an efficient non-eyestabbing kind of way, I am happy.

Putting restrictions on everything is so western... tell that to Da vinci

Paul Keeble said...

Well after doing this experiment I can say that the results for me were interesting:
1) My classes were always smaller and numerous than most of the developers I have worked with, now they are ridiculously so.
2) I ended up with some highly reuseable pieces, because of these rules
3) The domain model had a lot of extras in it that I wouldn't have created, and likely I wouldn't do that again
3) I had to break the 2 attributes rule twice, because it made sense to do so.
4) I don't miss getters and setterss.
5) "Micro repeats" of business code have always been a problem in all the systems I've worked with, this code didn't seem to have any because not having getters forced the logic back where it belonged.

I'm sure the program could have been shorter, but it was nice and simple to unit test and simple to understand.

The code certainly feels different to what I normally write and in a mostly good way. I felt I learnt something doing the experiment and I'd recommend giving it a go to anyone.

Anonymous said...

These rules are quite common in smalltalk programs. Another smalltalk rule, from Kent Beck in Smalltalk Best Practice Patterns pg 22, is "Divide your program into methods that perform one identifiable task. Keep all of the operations in a method at the same level of abstraction. This will naturally result in programs with many small methods, each a few lines long"

Nat Pryce said...

Interesting, but very little of those exercises have much to do with object orientation. They are really about procedural programming with abstract data types. Not a single one of those rules touches on the one thing common between object-oriented languages: polymorphism.

Here's a new rule you could follow: don't just avoid "else" blocks, avoid "if" statements altogether.

Mike Simons said...

I see this exercise as akin to a database normalization exercise.

It may be good to know how to get to 5NF but that doesn't mean you should always use it.

A follow up exercise on how to de-normalize back to a usable and maintable level would be cool.

Unknown said...

> no more than 50 lines per class

This constraint makes it pretty much impossible to write proper javadoc.

Thorough javadoc will more or less reduce the lines available for Java code to something like 30 lines. Such a limited line count will cause the code to artificially split into a lot more classes, resulting in lower cohesion and higher coupling.

Unknown said...

I fully agree with Nat Pryce. Due to the "procedural programming with abstract data types" effect of the constraints, the exercise focuses the programmer on domain modeling as opposed to hack hack hacking in C, Python or Lisp. If you accept these premises, the LOC pollution argument is of a different significance.

Andrew Binstock said...

@christian vest. You're right. I think the recommendation applies to non-Javadoc lines. Otherwise, the small methods this approaches favors would be swamped by Javadoc and a class would never fit on single screen.

Andrew Binstock said...

@nat pryce and @evangelos. Agreed with your overall point, although these exercises do support a key OO quality: encapsulation.

Anonymous said...

To be super pedantic, the rules force you to think about "information hiding" not "encapsulation". The two are orthogonal concerns.

Unknown said...

@binstock regard the 50 line cap.

If we're only to count the lines of actual Java code, and not let javadoc lines count toward the 50 line cap, then that defeats the argument that 50 lines is what can reasonably fit within the window of most editors. You could, ofcourse, use automatic code-folding on your javadoc, or you could create interfaces for every method worth documenting and then put the javadoc in the interface, but neither of these approaches seem ideal. They are both whacky solutions to self-inflicted made-up problems. The former smells like a loop-hole and the latter ... well, I don't know how to put it, but it doesn't feel right either, though you might somehow get it to work with Qi4j or similar styles code.

Some of the other posters mention that these rules might be intended for newbies, as a way to pound a deeper understanding of OO into their thought process. If so, then the rules might be designed under the assumption that these people don't write any javadoc to begin with. So it might be assumed that whatever newbies might learn from doing a project or two under these constraints is more important than taking on a habit of writing javadoc.

Anonymous said...

"Don’t use any classes with more than two instance variables. "

class Point2D {
int x,y;

class Point3D {
Point2D point2D;
int z;
} voilà!

Anonymous said...

"If you have a wrapper around a collection, encapsulating the collection, how would one manipulate the data?"

Let the class that encapsulates the data perform operations on the data.

"With 9 in effect, how does one even gain access to the collection to manipulate it?"

The encapsulating class performs the necessary operations on the collection.

Not saying it's easy, but I think it's a good excercize.

Anonymous said...

"class Point2D {
int x,y;

class Point3D {
Point2D point2D;
int z;
} voilà!"

Anybody who's really programmed geometry code
will know it should be

class Point3D {
int p[3];

as everything you would want to do will naturally
be expressed as a matrix/vector op.

Anonymous said...

It should be stressed more strongly that many of these rules are ONLY good for isolated exercises. Restrictions like not allowing more than two instance variables will lead to an excessive amount of abstraction. Also, while getters (setters not so could call anything a setter as long as arguments are passed) should be avoided, they're necessary when designing anything that needs to have a degree of flexibility (DLLs used by a bunch of different programs, etc.). These are good for little exercises to goof off with, but I would never want to work with code that had 10 50 line classes doing the same work that one 200 line class could do - guidelines like these turn programming into secretarial work more than anything else. I like to program, not type.

Keeping the number of files small per package as well as encapsulating many data types are good rules to live by, but don't take anything to an extreme.

Anonymous said...

I have a question about rule #1. A C++ method often looks like:

bool X::DoSomething()
  Do some setup ...
  for(i=0, i < ToDo; i++)
    Do some stuff ...
    if (a condition)
      return true;
    Do some other stuff ...
  Do some cleanup ...
  return false;

This is two levels of indentation. How would you do the equivalent with one?

Thanks for helping me understand this!

Andrew Binstock said...

@anonymous re rlue #1 (two levels of indent): The for-loop would be put in a separate method.

Remember, this is just an exercise.

Unknown said...

We tried this one night at a monday night code session. It is not practical.

What's worst, in the thoughtworks article the example code breaks the very rules he sets forth.

Also, realize that anyplace you have
new ArrayList < MyClass >()

you will be writing a brand new class.
and 3D point won't be
new 3dPoint (1,2,3) but
new 3dPoint(new Coordinate(1), new Coordinate(2), new Coordinate(3));
new 3dPoint(new XCoordinate(1), new YCordinate(2), new ZCoordinate(3))

and those coordinates won't have a getValue()... so you might need a function like
multiple(Scalar times)
add(Scalar amount)

probably also a
add (XCoordinate)

try writing the method to find
the slope of 2 coordinate. remember, no getters/setters, and wrap all ints!

I would love to see 1,000 lines of this code that you are proud of. we did 500 lines, it took us about 2 hours, and was massively confusing. Using regular methods we achieved the same thing in 5 minutes.

cogknight said...

At first I didn't like getters and setters. But I've found them quite handy in maintenance. Limits the searches if you want to find where a variable is set, instead of having to wade through all of it's uses. I don't really see a reason to throw that away.

voice10 said...

#2. you can't remove "else" if you need to check a condition. you can create method which just handles only "if" but then you're moving handling of "else" at upper level.

Bryan Stenson said...

Because bytes aren't free...a five digit US zipcode as a string is 4x the size of a uint...

Though, OOP probably isn't the solution in an environment where such a constraint exists...

zlayer said...

I'm not at home with Java/OO, but how about this?
class Point {
int coord;

class Point3D {
Array points = new Array(3);