--- /dev/null
+From: Junio C Hamano <junkio@cox.net>
+Subject: Re: Make "git clone" less of a deathly quiet experience
+Date: Sun, 12 Feb 2006 19:36:41 -0800
+Message-ID: <7v4q3453qu.fsf@assigned-by-dhcp.cox.net>
+References: <Pine.LNX.4.64.0602102018250.3691@g5.osdl.org>
+ <7vwtg2o37c.fsf@assigned-by-dhcp.cox.net>
+ <Pine.LNX.4.64.0602110943170.3691@g5.osdl.org>
+ <1139685031.4183.31.camel@evo.keithp.com> <43EEAEF3.7040202@op5.se>
+ <1139717510.4183.34.camel@evo.keithp.com>
+ <46a038f90602121806jfcaac41tb98b8b4cd4c07c23@mail.gmail.com>
+Content-Type: text/plain; charset=us-ascii
+Cc: Keith Packard <keithp@keithp.com>, Andreas Ericsson <ae@op5.se>,
+ Linus Torvalds <torvalds@osdl.org>,
+ Git Mailing List <git@vger.kernel.org>,
+ Petr Baudis <pasky@suse.cz>
+Return-path: <git-owner@vger.kernel.org>
+In-Reply-To: <46a038f90602121806jfcaac41tb98b8b4cd4c07c23@mail.gmail.com>
+ (Martin Langhoff's message of "Mon, 13 Feb 2006 15:06:42 +1300")
+
+Martin Langhoff <martin.langhoff@gmail.com> writes:
+
+> +1... there should be an easy-to-compute threshold trigger to say --
+> hey, let's quit being smart and send this client the packs we got and
+> get it over with. Or perhaps a client flag so large projects can
+> recommend that uses do their initial clone with --gimme-all-packs?
+
+What upload-pack does boils down to:
+
+ * find out the latest of what client has and what client asked.
+
+ * run "rev-list --objects ^client ours" to make a list of
+ objects client needs. The actual command line has multiple
+ "clients" to exclude what is unneeded to be sent, and
+ multiple "ours" to include refs asked. When you are doing
+ a full clone, ^client is empty and ours is essentially
+ --all.
+
+ * feed that output to "pack-objects --stdout" and send out
+ the result.
+
+If you run this command:
+
+ $ git-rev-list --objects --all |
+ git-pack-objects --stdout >/dev/null
+
+It would say some things. The phases of operations are:
+
+ Generating pack...
+ Counting objects XXXX...
+ Done counting XXXX objects.
+ Packing XXXXX objects.....
+
+Phase (1). Between the time it says "Generating pack..." upto
+"Done counting XXXX objects.", the time is spent by rev-list to
+list up all the objects to be sent out.
+
+Phase (2). After that, it tries to make decision what object to
+delta against what other object, while twenty or so dots are
+printed after "Packing XXXXX objects." (see #git irc log a
+couple of days ago; Linus describes how pack building works).
+
+Phase (3). After the dot stops, the program becomes silent.
+That is where it actually does delta compression and writeout.
+
+You would notice that quite a lot of time is spent in all
+phases.
+
+There is an internal hook to create full repository pack inside
+upload-pack (which is what runs on the other end when you run
+fetch-pack or clone-pack), but it works slightly differently
+from what you are suggesting, in that it still tries to do the
+"correct" thing. It still runs "rev-list --objects --all", so
+"dangling objects" are never sent out.
+
+We could cheat in all phases to speed things up, at the expense
+of ending up sending excess objects. So let's pretend we
+decided to treat everything in .git/objects/packs/pack-* (and
+the ones found in alternates as well) have interesting objects
+for the cloner.
+
+(1) This part unfortunately cannot be totally eliminated. By
+ assume all packs are interesting, we could use the object
+ names from the pack index, which is a lot cheaper than
+ rev-list object traversal. We still need to run rev-list
+ --objects --all --unpacked to pick up loose objects we would
+ not be able to tell by looking at the pack index to cover
+ the rest.
+
+ This however needs to be done in conjunction with the second
+ phase change. pack-objects depends on the hint rev-list
+ --objects output gives it to group the blobs and trees with
+ the same pathnames together, and that greatly affects the
+ packing efficiency. Unfortunately pack index does not have
+ that information -- it does not know type, nor pathnames.
+ Type is relatively cheap to obtain but pathnames for blob
+ objects are inherently unavailable.
+
+(2) This part can be mostly eliminated for already packed
+ objects, because we have already decided to cheat by sending
+ everything, so we can just reuse how objects are deltified
+ in existing packs. It still needs to be done for loose
+ objects we collected to fill the gap in (1).
+
+(3) This also can be sped up by reusing what are already in
+ packs. Pack index records starting (but not end) offset of
+ each object in the pack, so we can sort by offset to find
+ out which part of the existing pack corresponds to what
+ object, to reorder the objects in the final pack. This
+ needs to be done somewhat carefully to preserve the locality
+ of objects (again, see #git log). The deltifying and
+ compressing for loose objects cannot be avoided.
+
+ While we are writing things out in (3), we need to keep
+ track of running SHA1 sum of what we write out so that we
+ can fill out the correct checksum at the end, but I am
+ guessing that is relatively cheap compared to the
+ deltification and compression cost we are currently paying
+ in this phase.
+
+NB. In the #git log, Linus made it sound like I am clueless
+about how pack is generated, but if you check commit 9d5ab96,
+the "recency of delta is inherited from base", one of the tricks
+that have a big performance impact, was done by me ;-).
+
+
--- /dev/null
+From: Junio C Hamano <junkio@cox.net>
+Subject: Resetting paths
+Date: Thu, 09 Feb 2006 20:40:15 -0800
+Message-ID: <7vlkwjzv0w.fsf@assigned-by-dhcp.cox.net>
+Content-Type: text/plain; charset=us-ascii
+Return-path: <git-owner@vger.kernel.org>
+
+While working on "assume unchanged" git series, I found one
+thing missing from the current set of tools.
+
+While I worked on parts of the system that deals with the cached
+lstat() information, I needed a way to debug that, so I hacked
+ls-files -t option to show entries marked as "always matches the
+index" with lowercase tag letters. This was primarily debugging
+aid hack.
+
+Then I committed the whole thing with "git commit -a" by
+mistake. In order to rewind the HEAD to pre-commit state, I can
+say "git reset --soft HEAD^", but after doing that, now I want
+to unupdate the index so that ls-files.c matches the pre-commit
+HEAD.
+
+"git reset --mixed" is a heavy-handed tool for that. It reads
+the entier index from the HEAD commit without touching the
+working tree, so I would need to add the modified paths back
+with "git update-index".
+
+The low-level voodoo to do so for this particular case is this
+single liner:
+
+ git ls-tree HEAD ls-files.c | git update-index --index-info
+
+Have people found themselves in similar need like this? This
+could take different forms.
+
+ * you did "git update-index" on a wrong path. This is my
+ example and the above voodoo is a recipe for recovery.
+
+ * you did "git add" on a wrong path and you want to remove it.
+ This is easier than the above:
+
+ git update-index --force-remove path
+
+ * you did the above recovery from "git add" on a wrong path,
+ and you want to add it again. The same voodoo would work in
+ this case as well.
+
+ git ls-tree HEAD path | git update-index --index-info
+
+We could add "git reset path..." to reduce typing for the above,
+but I am wondering if it is worth it.
+
+BTW, this shows how "index centric" git is. With other SCM that
+has only the last commit and the working tree files, you do not
+have to worry any of these things, so it might appear that index
+is just a nuisance. But if you do not have any "registry of
+paths to be committed", you cannot do a partial commit like what
+I did above ("commit changes to all files other than
+ls-files.c") without listing all the paths to be committed, or
+fall back on CVS style "one path at a time", breaking an atomic
+commit, so there is a drawback for not having an index as well.
+
+
+
-e '/^[^\/][^\/]\//p' |
while read topic
do
- rebase= done= not_done= trouble=
+ rebase= done= not_done= trouble= date=
# (1)
only_next_1=`git-rev-list ^master "^$topic" ${next} | sort`
# (2)
not_in_master=`
- git-rev-list --pretty=oneline ^master "$topic" |
- sed -e 's/^[0-9a-f]* //'
+ git-rev-list ^master "$topic"
`
test -z "$not_in_master" &&
done="${LF}Fully merged -- delete."
# (3)
not_in_next=`
- git-rev-list --pretty=oneline ^${next} "$topic" |
- sed -e 's/^[0-9a-f]* / - /'
+ git-rev-list --pretty=oneline ^${next} "$topic"
`
if test -n "$not_in_next"
then
then
trouble="${LF}### MODIFIED AFTER COOKED ###"
fi
+ last=`expr "$not_in_next" : '\([0-9a-f]*\) '`
+ date=`
+ git-rev-list -1 --pretty "$last" |
+ sed -ne 's/^Date: *\(.*\)/ (\1)/p'
+ `
+ not_in_next=`echo "$not_in_next" | sed -e 's/^[0-9a-f]* / - /'`
not_done="${LF}Still not merged in ${next}$rebase.$LF$not_in_next"
elif test -n "$done"
then
not_done="${LF}Up to date."
fi
- echo "*** $topic ***$trouble$done$not_done"
+ echo "*** $topic ***$date$trouble$done$not_done"
if test -z "$trouble$not_done" &&
test -n "$done" &&