Mercurial Subrepositories

The basics

The way subrepositories work is that you manually create a file .hgsub in the root of the parent repo’s working directory. Each line of this file maps a subdirectory location (relative to the parent repo’s working directory) to the URL for the corresponding Mercurial repo:

local-clone = http://hg-server/a/b/backing-repo

says to clone http://hg-server/a/b/backing-repo into local subdirectory local-clone.

When you later hg commit in the parent repository, for each subrepo listed in .hgsub, Mercurial determines the local clone’s parent changeset, and records that changeset’s ID in .hgsubstate.

Later still, if you hg update to the parent changeset created in the previous step, Mercurial automatically updates each subrepo to the changeset whose ID was recorded in .hgsubstate.

Thus, each changset in the parent repo is a snapshot of the exact state of the entire collection of subrepos, at the moment the parent commit was done.

Notes:

  1. With sufficiently new versions of Mercurial, an attempt to commit in the parent repo will be aborted if any subrepo contains uncommitted changes; this guards against creating a “snapshot” that can’t actually be reverted to. (Local changes in the parent repo are fine; they’ll get committed as normal.) To override this and commit recursively, type use commit’s –subrepos/-S option.
  2. .hgsub is user-edited and revision-controlled, just like any other file in the project’s working directory
  3. .hgsubstate is also revision-controlled, but it is maintained automatically by Mercurial (specifically, it is updated during a “commit” in the parent, to reflect each subrepo’s then-current state). In general, it should not be manually edited

Limitations

As of Mercurial 2.1, subrepository support is still somewhat rough around the edges.

Subrepos must be updated manually

There is no mechanism to automatically update subrepositories to new changesets; the only thing Mercurial can do for you is, as described above, to restore things to a preexisting, snapshotted state. Thus, to create a changeset containing, for example, “the latest and greatest” of each subrepo, you have to do it manually:

  1. Possible preparatory work (see below)
  2. For each subrepo in turn, get the subrepo into the state you want, using the usual Mercurial commands — pull, update, merge, commit, etc. The goal here is that for each subrepo, your clone’s parent be precisely the changeset you want to include in your snapshot
  3. Do an hg commit in the parent repo

The “possible preparatory work” mentioned in (1) consists of doing, for the parent repo, almost what (2) describes for the subrepos: getting it into the state you’re going to want it in when it comes time to do the final commit (3). This has to happen before the corresponding work in the subrepos (2), because many Mercurial operations in the parent will perturb the state of the subrepos. Suppose you carefully herd all your subrepos into line, then cd parent; hg update some-other-changeset. The update will update each subrepo to its parent as recorded in some-other-changeset … and you’ll have to herd them all back into line again.

Not all hg subcommands are supported

Subrepository support has only been added to some of Mercurial’s commands. This is expected to improve over time.  hg help subrepos, section Interaction with Mercurial Commands, has all the gory details.

In general though, most commands with subrepo support do not recurse by default into subrepos; to make them recurse, pass --subrepos/-S. There are a few exceptions, e.g. where recursing was deemed essential to correct operation, and so always occurs — push and update, for example.

Best practices

Avoid subrepos when possible

Suppose you have a project main, which depends on two independent libraries libA and libB. Best is not to use subrepositories at all, but to manage the dependency by other means (e.g. Maven or Ant+Ivy).

But that’s not always feasible; sometimes subrepos are the best approach. In that case…

Use a “thin” parent

Rather than having:

.
    main
     |
 +---+---+
 |       |
subA    subB

Do:

.
       parent
         |
 +-------+-------+
 |       |       |
main    subA    subB

Where parent is a small project consisting, ideally, of little more than the subrepo configuration and a master build file (POM, Makefile, etc.).

FIXME: Cite reference.

Tagging and subrepos

If the subrepos represent independent projects (e.g. Intelliware Commons components), obviously each one should be branched, tagged, etc. on its own project’s release schedule, and the rest of this section does not apply. But if they’re intimately related to, and on the same release schedule as, other subrepos, things get more complicated.

In principle, you don’t need tags on the subrepos, since the tag on the parent is sufficient to retrieve the state of the entire tree. That works as advertised, but it can be confusing — to figure out which changeset from a given subrepo was used for a given release, one has to do, essentially:

 $ hg cat -R parent-repo -r tag .hgsubstate | grep subrepo
b8c38dcff02a76b5936ab4e0bdc1fd6170d513bd3 subrepo
$ hg log -R subrepo -r b8c38dcff02a    # "b8c3..." is the first part of the changeset ID displayed by the previous command

To save people this effort — and indeed, to have the tags directly visible in TortoiseHG — it can be very helpful to directly tag the subrepos, rather than depending on the parent repo’s tag. Here’s how to accomplish that:

XXX Add Illustrations

1. Get parent into the state you want it (this is the “possible preparatory work” described in Limitations)

2. Get each subrepo into the state you want it:

  • update it to the changeset you want
  • hg tag tagname
  • hg update tagname, so that it’s the tagged changeset that gets snapshotted, rather than the changeset that created the tag (reminder: in Mercurial, these are never the same changeset)

3. Tag parent:

  • hg commit "-mSnapshot subrepo versions for X.Y.Z"    # to snapshot the subrepo versions
  • hg tag tagname
  • hg up tagname
  • Verify that all the subrepos are still at the right place; it’s easy to make a mistake on this intensely manual procedure, and so worth a couple of minutes to double-check your work :-/

Tricks & traps

hg onsub

The hg onsub command (provided by the http://mercurial.selenic.com/wiki/OnsubExtension extension, lets you perform an arbitrary command on all subrepos:

hg onsub hg parent
It's only fair to share...
Share on FacebookGoogle+Tweet about this on TwitterShare on LinkedIn

Leave a Reply