Thursday, July 12, 2007

A Limitation in Subversion. Any ideas?

Today, I was speaking with an expert at CollabNet (the Subversion folks) about a problem I have with Subversion. To my amazement, he told me Subversion can't handle my (apparently) simple problem.

When I work on Platypus, I start by checking out my code from the public Subversion repository. I put it in my local project directory, and work there. Along the way, I might write code that breaks existing tests. At some point before I get all tests working, I want to check my code into an SCM, so that I can get back to where I am right now, if something should go wrong. Best practices in SCM forbid committing code to a project directory if it breaks tests. So, the way to do what I need, it would seem, is to use a temporary repositorty to commit to until I'm finished coding. Then, when all tests pass, I can commit the code back to the original public Platypus repository.

But Subversion does not permit checking code from one directory into different repositories. The suggestion from CollabNet was to copy the code to a temporary directory from which I could commit it to a separate Subversion repository. This, of course, is a non-starter--you don't keep two live copies of the code you're working on. That makes for all sorts of errors.

Another suggestion was to create a branch in the public repository and commit the small changes to that branch, and eventually merge the branch when all tests pass. I don't like this either, because it forces me to check non-working code into a public repository--the whole thing I want to avoid.

The only solution I've come up with is to use a different SCM (Perforce) for my local repository. Check temporary code into it. Then when everything is copascetic, check the tested code into the public Subversion directory. This works, but it's hardly elegant.

How do you solve this problem or how do you avoid it to begin with? TIA


Anonymous said...

Does Subversion support the concept of 'private' or 'sandbox' branches? With Surround SCM for example, you can create what's called a Workspace branch which is private to a user. Developers can continually commit code to that branch w/o affecting the QA or production builds. Then merge code changes up as they bring the code into compliance with existing tests and/or create new ones. Since a user can't see any other user's Workspace, there's no potential for mixups as to what code to use.

Anonymous said...

I think you just have to use a branch for this since there is no native construct in SVN. It's a topic that comes up with MS Team Foundation developers since it has a feature called shelving...

It's worth reading some of these links...

I's say non-working code in a repository is OK as long as it's not being pulled down for automated builds. It's a bigger problem if the trunk has non-working code. For most people creating a private branch and keeping a copy of code there is a good idea in case their machine dies.

Andrew Binstock said...

Matt: Thanks for your comment. Subversion does not have this concept, as such. Surround SCM would be an option, if it were not for the fact that I must use Subversion for the public OSS repository.

Brian: I think you're right. I should just use a branch that makes it clear the code must not be used. Thanks for the link and the solution, which is probably the best way to get around this limitation.

Odd that the CollabNet folks have not fixed this.

Unknown said...

If you have lots of untested code, it probably means that you're development cycles are too long and you're not breaking things down into small enough pieces of code. My suggestion is to focus on smaller pieces of unit-testable code, and more frequent check-ins. If you use Mylar, it will automatically associate bugzilla, trac, or jira issues with your code whenever you perform your checkins. This will make it easier to find all code associated with a given issue. Especially if you have people collaborating on the same task.

Depending on your project philosophy, you can either declare that the HEAD is always "somewhat" unstable and that individual release branches are the points of stability in the project or create a separate branch for development builds.

Neil Bartlett said...

I think you've hit a a limitation of all centralized SCM systems. Decentralized SCMs are becoming popular of late, perhaps you could look into one of those (e.g. Mercurial, git, darcs or Bazaar). They are designed to handle exactly this use case, along with many others that cannot be handled by centralized systems.

However if you really have to use Subversion, then a branch with an appropriate do_not_use label is probably the only solution.

Anonymous said...

On the commercial side, this would also be easy in AccuRev.

I'm still using Subversion, but am looking at experimenting with darcs or git in the future.

Dennis Nagel said...

So I'm wondering if I missed it or you didn't say which IDE you use... Eclipse does it's own local file revision storage. I won't go so far as to suggest it can 'label' your entire set of files in that local repository, but I wouldn't put it past some eager eclipse plugin genius to come up with a way to do that.

hth. D.

Adrian said...

The answer is a distributed version control system, because what you need is a "feature branch". Branching in SVN is cheap and quick, but the merging can be very painful. Branching in a DVCS is just as cheap, but the merging is much nicer. DVCS also allow you to make local commits, work offline, and are generally a lot faster than SVN.

There are several DVCS systems that will work with Subversion, so you don't have to migrate your central server just yet.

SVK is probably the most "native". This uses a Subversion repository as backing storage, and can push and merge revisions to a normal SVN repo. It suffers a little from being slow.

git is the real speed demon of the DVCS world, and can interact with SVN repos via the git-svn component.

Bazaar also has good support for SVN repositories via the bzr-svn plugin. Bazaar arguably has the best support for Windows of the three big next-generation DVCS systems, and being written (mostly) in Python makes it easier to scratch your own itches, should you have them. Not as fast as git, but still fast enough for the majority of projects and, like git and Mercurial, much faster than SVN.

Bazaar ended up being my choice for a project involving the merging of multiple simultaneous lines of development over some 4000 files. Mercurial has an edge case that my working tree ran into, and git was too scary for my end-users to contemplate.

Users make multiple commits into their working branches but at the end of the day are expected to commit a single working "merge" revision to the trunk which doesn't break anything.

Andrew Binstock said...

@Adrian: Thanks for your thoughtful comment. I have come to the same conclusion you did; namely, that for the specific need I identified using a DVCS is a very workable solution.