Thoughts on SCM and related issues

Clearcase is nice but proprietary and fragile.

CVS is often good enough but very tedious for serious branching and merging.

Darcs is really interesting but maybe too weird.

SVN I haven't tried.

Arch seems very good for at least small scale stuff. Many people seem to be put off by the funky filenames used for metadata. Actually having the metadata outside the working copy tree would be nice in general, but does any other system than CC accomplish this? At least in principle it should be possible to put arch metadata under e.g. ~/.arch instead of inside the working copy tree. Having metadata-free source tree would be nice for scripts at least - no need for SCM tool awareness in the build.

It would be nice to get CC-like build auditing. But how to do that without losing the SCM-tool independent build? At least some of build auditing should be possible without having a special file system: running build under strace and utilizing SCM tool/pkg database for getting versions of used files/dirs.

I feel that merging, configuring for build, building and installing packages are related. They all deal with dependencies. Or at least graph-like relations. For building, object depends on sources. For merging, deltas to apply depend on revision history and merges done. Installing application package requires installing library packages used. Wouldn't it be nice to be have a tool that would be a revision control system to which I would be able to say: I want this kernel, these apps for this CPU, this mainboard, these devices, these features and these options and the tool would respond with a combination of readily installed binary packages it found to exist in one part of the net, partly built packages with objects obtained from another part of the net, automatically merged source code trees from here and there, merge conflicts of this and that and a complaint about completely missing things, i.e. a task list for things needing to be written. Yeah - sound really sweet. Whether anything like this is ever feasible probably would depend on how much metadata this requires. Development should not be encumbered by requiring too much metadata. Auditing could probably produce some metadata automagically..

What does it matter if merging, building etc are related? Well, probably they are related enough for there to be some added value in having a tool that is aware of the relations. A hybrid of apt, dpkg and arch. I suppose I should take a look at the gentoo portage thing.

Yet another thought on merging, building and installing: apt is in a way doing something that ClearCase audited build does. If you view your /etc/apt/sources.list as your CC view config spec, you could argue, that apt is doing winkins - the apt repositories in your sources.list just explicitly specify DO storage locations. Also, I'd argue that a "cleartool findmerge .." does something similar to "apt-get update;apt-get -s dist-upgrade" - they both show you deltas between your local current state and state in a repository you are following.

SVN and Arch seem to emphasize atomic commits and whole tree deltas. I think this is not so important. Getting out of sync checkout can be annoying, but if one is checking out unlabeled (=untagged) version, I'd say one gets what one asks for. Also depending on ones working habits, whole tree deltas might be bad: they can tie together unrelated changes. At least for me, doing nice deltas into a source repository does not happen naturally. I guess it can be argued that this is just a general problem in my working methods, but what often happens is:

Why is it too much trouble to group the days work into sane changesets? Well - the version containing the whole delta is the only one I've tested. To me, constructing a changeset implies that the delta is at least somewhat sound, i.e. maybe even tested. If I factor the complete delta into changesets it would be nice if the tree would at least compile with each changeset applied in isolation.

But is it sensible to require that source code changes relate to compilability or correctness? Often the value of SCM comes from the simple fact that it is OK to check in complete crap. Sometimes thinking in code can be useful.

Humm..I should start versioning my pages..

One pretty fundamendal difference between version control systems seems to be distributed vs centralized. Only bitkeeper and arch seem to qualify as distributed. All other I know (including ClearCase - MultiSite is distribution build on centralized system) use centralized repository. Distributed tools can be used with shared repository making then centralized and distribution can be build on top of centralized tool, but the distributed vs centralized choice in the underlying tool has deep implications. See, e.g. Larry's article.

Another thing in versioning in general that can be seen as fundamental is the way changes are applied to the repository. In this respect, darcs is unique. For darcs a change is by defalt parallel to every other delta in the repository. All other tools I know, apply changes sequentially, i.e. a new delta is dependent on previous deltas.


Last modified: Fri Aug 19 12:27:27 EEST 2005