From 725ca8a8e77847049630b1f409c28fb37e943dc2 Mon Sep 17 00:00:00 2001 From: Junio C Hamano Date: Sun, 22 Jan 2006 23:53:07 -0800 Subject: [PATCH] Add Subproject Design Notes. Signed-off-by: Junio C Hamano --- Makefile | 11 ++ Subpro.txt | 472 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ 2 files changed, 483 insertions(+) create mode 100644 Makefile create mode 100644 Subpro.txt diff --git a/Makefile b/Makefile new file mode 100644 index 00000000..32c8bd85 --- /dev/null +++ b/Makefile @@ -0,0 +1,11 @@ +all: + +clean: + rm -f Subpro.html + + +all: Subpro.html + +%.html: %.txt + asciidoc -bxhtml11 $*.txt + diff --git a/Subpro.txt b/Subpro.txt new file mode 100644 index 00000000..8340d888 --- /dev/null +++ b/Subpro.txt @@ -0,0 +1,472 @@ +Notes on Subproject Support +=========================== +Junio C Hamano + +Scenario +-------- + +The examples in the following discussion show how this proposal +plans to help this: + +. A project to build an embedded Linux appliance "gadget" is + maintained with git. + +. The project uses linux-2.6 kernel as its subcomponent. It + starts from a particular version of the mainline kernel, but + adds its own code and build infrastructure to fit the + appliance's needs. + +. The working tree of the project is laid out this way: ++ +------------ + Makefile - Builds the whole thing. + linux-2.6/ - The kernel, perhaps modified for the project. + appliance/ - Applications that run on the appliance, and + other bits. +------------ + +. The project is willing to maintain its own changes out of tree + of the Linux kernel project, but would want to be able to feed + the changes upstream, and incorporate upstream changes to its + own tree, taking advantage of the fact that both itself and + the Linux kernel project are version controlled with git. + +. To make the story a bit more interesting, later in the history + of development, `linux-2.6/` and `appliance/` directories will + be renamed to `kernel/` and `gadget/`. + +The idea here is to: + +. Keep `linux-2.6/` part as an independent project. The work by + the project on the kernel part can be naturally exchanged with + the other kernel developers this way. Specifically, a tree + object contained in commit objects belonging to this project + does *not* have `linux-2.6/` directory at the top. + +. Keep the `appliance/` part as another independent project. + Applications are supposed to be more or less independent from + the kernel version, but some other bits might be tied to a + specific kernel version. Again, a tree object contained in + commit objects belonging to this project does *not* have + `appliance/` directory at the top. + +. Have another project that combines the whole thing together, + so that the project can keep track of which versions of the + parts are built together. + +We will call the project that binds things together the +'toplevel project'. Other projects that hold `linux-2.6/` part +and `appliance/` part are called 'subprojects'. + + +Setting up +---------- + +Let's say we have been working on the appliance software, +independently version controlled with git. Also the kernel part +has been version controlled separately, like this: +------------ +$ ls -dF current/*/.git current/* +current/Makefile current/appliance/.git/ current/linux-2.6/.git/ +current/appliance/ current/linux-2.6/ +------------ + +Now we would want to get a combined project. First we would +clone from these repositories (which is not strictly needed -- +we could use `$GIT_ALTERNATE_OBJECT_DIRECTORIES` instead): + +------------ +$ mkdir combined && cd combined +$ cp ../current/Makefile . +$ git init-db +$ mkdir -p .git/refs/subs/{kernel,gadget}/{heads,tags} +$ git clone-pack ../current/linux-2.6/ master | read kernel_commit junk +$ git clone-pack ../current/appliance/ master | read gadget_commit junk +------------ + +We will introduce a new command to set up a combined project: + +------------ +$ git bind-projects \ + $kernel_commit linux-2.6/ \ + $gadget_commit appliance/ +------------ + +This would probably do an equivalent of: + +------------ +$ rm -f "$GIT_DIR/index" +$ git read-tree --prefix=linux-2.6/ $kernel_commit +$ git read-tree --prefix=appliance/ $gadget_commit +$ git update-index --bind linux-2.6/ $kernel_commit +$ git update-index --bind appliance/ $gadget_commit +------------ +[NOTE] +============ +Earlier outlines sent to the git mailing list talked +about `$GIT_DIR/bind` to record what subproject are bound to +which subtree in the current working tree and index. This +proposal instead records that information in the index file +with `update-index --bind` command. + +Also note that in this round of proposal, there is no separate +branches that keep track of heads of subprojects. +============ + +Let's not forget to add the `Makefile`, and check the whole +thing out from the index file. +------------ +$ git add Makefile +$ git checkout-index -f -u -q -a +------------ + +Now our directory should be identical with the `current` +directory. After making sure of that, we should be able to +commit the whole thing: + +------------ +$ diff -x .git -r ../current ../combined +$ git commit -m 'Initial toplevel project commit' +------------ + +Which should create a new commit object that records what is in +the index file as its tree, with `bind` lines to record which +subproject commit objects are bound at what subdirectory, and +updates the `$GIT_DIR/refs/heads/master`. Such a commit object +might look like this: +------------ +tree 04803b09c300c8325258ccf2744115acc4c57067 +bind 5b2bcc7b2d546c636f79490655b3347acc91d17f linux-2.6/ +bind 0bdd79af62e8621359af08f0afca0ce977348ac7 appliance/ +author Junio C Hamano 1137965565 -0800 +committer Junio C Hamano 1137965565 -0800 + +Initial toplevel project commit +------------ + +Notice that `Makefile` at the top is part of the toplevel +project in this example, but it is not necessary. We could +instead have the appliance subproject include this file. In +such a setup, the appliance subproject would have had `Makefile` +and `appliance/` directory at the toplevel. The `bind` line for +that project would have said "the rest is bound at `/`" and +`write-tree \--exclude=linux-2.6/` would have been used to write +the tree for that subproject out of the combined index. + + +Making further commits +---------------------- + +The easiest case is when you updated the Makefile without +changing anything in the subprojects. In such a case, we just +need to create a new commmit object that records the new tree +with the current `HEAD` as its parent, and with the same set of +`bind` lines. + +When we have changes to the subproject part, we would make a +separate commit to the subproject part and then record the whole +thing by making a commit to the toplevel project. The user +interaction might go this way: +------------ +$ git commit +error: you have changes to the subproject bound at linux-2.6/. +$ git commit --subproject linux-2.6/ +$ git commit +------------ + +With the new `\--subproject` option, the directory structure +rooted at `linux-2.6/` part is written out as a tree, and a new +commit object that records that tree object with the commit +bound to that portion of the tree (`5b2bcc7b` in the above +example) as its parent is created. Then the final `git commit` +would record the whole tree with updated `bind` line for the +`linux-2.6/` part. + + +Checking out +------------ + +After cloning such a toplevel project, `git clone` without `-n` +option would check out the working tree. This is done by +reading the tree object recorded in the commit object (which +records the whole thing), and adding the information from the +"bind" line to the index file. + +------------ +$ cd .. +$ git clone -n combined cloned ;# clone the one we created earlier +$ cd cloned +$ git checkout +------------ + +This round of proposal does not maintain separate branch heads +for subprojects. The bound commits and their subdirectories +are recorded in the index file from the commit object, so there +is no need to do anything other than updating the index and the +working tree. + + +Switching branches +------------------ + +Along with the traditional two-way merge by `read-tree -m -u`, +we would need to look at: + +. `bind` lines in the current `HEAD` commit. + +. `bind` lines in the commit we are switching to. + +. subproject binding information in the index file. + +to make sure we do sensible things. + +Just like until very recently we did not allow switching +branches when two-way merge would lose local changes, we can +start by refusing to switch branches when the subprojects bound +in the index do not match what is recorded in the `HEAD` commit. + +Because in this round of the proposal we do not use the +`$GIT_DIR/bind` file nor separate branches to keep track of +heads of the subprojects, there is nothing else other than the +working tree and the index file that needs to be updated when +switching branches. + + +Merging +------- + +Merging two branches of the toplevel projects can use the +traditional merging mechanism mostly unchanged. The merge base +computation can be done using the `parent` ancestry information +taken from the two toplevel project branch heads being merged, +and merging of the whole tree can be done with a three-way merge +of the whole tree using the merge base and two head commits. +For reasons described later, we would not merge the subproject +parts of the trees during this step, though. + +When the two branch heads use different versions of subproject, +things get a bit tricky. First, let's forget for a moment about +the case where they bind the same project at different location. +We would refuse if they do not have the same number of `bind` +lines that bind something at the same subdirectories. + +------------ +$ git merge 'Merge in a side branch' HEAD side +error: the merged heads have subprojects bound at different places. + ours: + linux-2.6/ + appliance/ + theirs: + kernel/ + gadget/ + manual/ +------------ + +Such renaming can be handled by first moving the bind points in +our branch, and redoing the merge (this is a rare operation +anyway). It might go like this: + +------------ +$ git reset +$ git update-index --unbind linux-2.6/ +$ git update-index --unbind appliance/ +$ git update-index --bind $kernel_commit kernel/ +$ git update-index --bind $gadget_commit gadget/ +$ git commit -m 'Prepare for merge with side branch' +$ git merge 'Merge in a side branch' HEAD side +error: the merged heads have subprojects bound at different places. + ours: + kernel/ + gadget/ + theirs: + kernel/ + gadget/ + manual/ +------------ + +Their branch added another subproject, so this did not work (or +it could be the other way around -- we might have been the one +with `manual/` subproject while they didn't). This suggests +that we may want an option to `git merge` to allow taking a +union of subprojects. Again, this is a rare operation, and +always taking a union would have created a toplevel project that +had both `kernel/` and `linux-2.6/` bound to the same Linux +kernel project from possibly different vintage, so it would be +prudent to require the set of bound subprojects to exactly match +and give the user an option to take a union. + +------------ +$ git merge --union-subprojects 'Merge in a side branch HEAD side +error: the subproject at 'kernel/' needs to be merged first. +------------ + +Here, the version of the Linux kernel project in the `side` +branch was different from what our branch had on our `bind` +line. On what kind of difference should we give this error? +Initially, I think we could require one is the fast forward of +the other (ours might be ahead of theirs, or the other way +around), and take the descendant. + +Or we could do an independent merge of subprojects heads, using +the `parent` ancestry of the bound subproject heads to find +their merge-base and doing a three-way merge. This would leave +the merge result in the subproject part of the working tree and +the index. + +[NOTE] +This is the reason we did not do the whole-tree three way merge +earlier. The subproject commit bound to the merge base commit +used for the toplevel project may not be the merge base between +the subproject commits bound to the two toplevel project +commits. + +So let's deal with the case to merge only a subproject part into +our tree first. + + +Merging subprojects +------------------- + +An operation of more practical importance is to be able to merge +in changes done outside to the projects bound to our toplevel +project. + +------------ +$ git pull --subproject=kernel/ git://git.kernel.org/.../linux-2.6/ +------------ + +might do: + +. fetch the current `HEAD` commit from Linus. +. find the subproject commit bound at kernel/ subtree. +. perform the usual three-way merge of these two commits, in + `kernel/` part of the working tree. + +After that, `git commit \--subproject` option would be needed to +make a commit. + +[NOTE] +This suggests that we would need to have something similar to +`MERGE_HEAD` for merging the subproject part. In the case of +merging two toplevel project commits, we probably can read the +`bind` lines from the `MERGE_HEAD` commit and either our `HEAD` +commit or our index file. Further, we probably would require +that the latter two must match, just as we currently require the +index file matches our `HEAD` commit before `git merge`. + +Just like the current `pull = fetch + merge` semantics, the +subproject aware version `git pull \--subproject=frotz/` would be +a `git fetch \--subproject=frotz/` followed by a `git merge +\--subproject=frotz/`. So the above would be: + +. Fetch the head. ++ +------------ +$ git fetch --subproject=kernel/ git://git.kernel.org/.../linux-2.6/ +------------ ++ +which would fetch the commit chain from the remote repository, and +write something like this to `FETCH_HEAD`: ++ +------------ +3ee68c4...\tfor-merge-into kernel/\tbranch 'master' of git://.../linux-2.6 +------------ + +. Run `git merge`. ++ +------------ +$ git merge --subproject=kernel/ \ + 'Merge git://.../linux-2.6 into kernel/' HEAD 3ee68c4... +------------ + +. In case it does not cleanly automerge, `git merge` would write +the necessary information for a later `git commit` to use in +`MERGE_HEAD`. It may look like this: ++ +------------ +3ee68c4af3fd7228c1be63254b9f884614f9ebb2 kernel/ +------------ ++ +Similarly, `MERGE_MSG` file will hold the merge message. + +With this, a later invocation of `git commit` to record the +result of hand resolving would be able to notice that: + +. We should be first resolving `kernel/` subproject, not the + whole thing. +. The remote `HEAD` is `3ee68c4\...` commit. +. The merge message is `Merge git://\.../linux-2.6 into kernel/`. + +and would make a merge commit, and register that resulting +commit in the index file using `update-index \--bind` instead of +updating *any* branch head. + + +Management of Subprojects +------------------------- + +While the above as a mechanism would support version controlling +of subprojects as a part of *one* larger toplevel project, it +probably is worth pointing out that having a separate repository +to manage the subproject independently would be a good idea. +The same subproject can be incorporated into more than one +toplevel projects, and after all, a subproject should be +something that can stand on its own. In our example scenario, +the `kernel/` project is used as a subproject for the "gadget" +product, but at the same time, the organizaton that runs the +"gadget" project may use Linux on their development machines, +and have their own kernel hackers, not necessarily related to +the use of the kernel in the "gadget" product. + +What this suggests is that not just we need to be able to pull +the kernel development history *into* the subproject of the +"gadget" project, but also we need to be able to push the +development history of the kernel part alone *out* *of* the +"gadget" project to another repository that deals only with the +kernel part. + +It might go this way. First the setup: + +------------ +$ git clone git://git.kernel.org/.../linux-2.6 Linux +$ ls -dF * +cloned/ combined/ current/ Linux/ +------------ + +That is, in addition to the `combined/` which we have been using +to develop the "gadget" product in, we now have a repository for +the kernel, cloned from Linus. In the previous section, we have +outlined how we update the kernel subproject part of `combined/` +repository from the `kernel.org` repository. The same procedure +would work for pulling from `Linux/` repository here. + +We are now going the other way; propagate the kernel work done +in the "gadget" project repository `combined/` back to `Linux/`. +We might do this at the lowest level: + +------------ +$ cd combined +$ git cat-file commit HEAD | + sed -ne 's|^bind \([0-9a-f]*\) kernel/$|\1|p' >.git/refs/heads/linux26 +$ git push ../Linux linux26:master +------------ + +Or, more realistically, since the `Linux` project might already +have their own commits on its `master`: + +------------ +$ cd Linux +$ git pull ../combined linux26 +------------ + +Either way we would need an easy way to maintain the `linux26` +branch in the above example, and that will have to be part of +the wrapper scripts like `git commit` (more likely, that would +be a job for `git commit \--subproject`) for the usability's +sake; in other words, the `cat-file commit` piped to `sed` above +is not something the end user would do, but something that is +done by the wrapper scripts. + +Hopefully the people who work in `Linux/` repository would run +`format-patch` and feed their changes back to the kernel +community. -- 2.11.0