to import into git.
For our first example, we're going to start a totally new repository from
-scratch, with no pre-existing files, and we'll call it "git-tutorial".
+scratch, with no pre-existing files, and we'll call it `git-tutorial`.
To start up, create a subdirectory for it, change into that
-subdirectory, and initialize the git infrastructure with "git-init-db":
+subdirectory, and initialize the git infrastructure with `git-init-db`:
- mkdir git-tutorial
- cd git-tutorial
- git-init-db
+------------------------------------------------
+mkdir git-tutorial
+cd git-tutorial
+git-init-db
+------------------------------------------------
to which git will reply
which is just git's way of saying that you haven't been doing anything
strange, and that it will have created a local .git directory setup for
-your new project. You will now have a ".git" directory, and you can
-inspect that with "ls". For your new empty project, ls should show you
+your new project. You will now have a `.git` directory, and you can
+inspect that with `ls`. For your new empty project, it should show you
three entries, among other things:
- - a symlink called HEAD, pointing to "refs/heads/master"
-
- Don't worry about the fact that the file that the HEAD link points to
- doesn't even exist yet - you haven't created the commit that will
- start your HEAD development branch yet.
+ - a symlink called `HEAD`, pointing to `refs/heads/master`
++
+Don't worry about the fact that the file that the `HEAD` link points to
+doesn't even exist yet - you haven't created the commit that will
+start your `HEAD` development branch yet.
- - a subdirectory called "objects", which will contain all the
+ - a subdirectory called `objects`, which will contain all the
objects of your project. You should never have any real reason to
look at the objects directly, but you might want to know that these
- objects are what contains all the real _data_ in your repository.
-
- - a subdirectory called "refs", which contains references to objects.
-
- In particular, the "refs" subdirectory will contain two other
- subdirectories, named "heads" and "tags" respectively. They do
- exactly what their names imply: they contain references to any number
- of different "heads" of development (aka "branches"), and to any
- "tags" that you have created to name specific versions in your
- repository.
-
- One note: the special "master" head is the default branch, which is
- why the .git/HEAD file was created as a symlink to it even if it
- doesn't yet exist. Basically, the HEAD link is supposed to always
- point to the branch you are working on right now, and you always
- start out expecting to work on the "master" branch.
-
- However, this is only a convention, and you can name your branches
- anything you want, and don't have to ever even _have_ a "master"
- branch. A number of the git tools will assume that .git/HEAD is
- valid, though.
-
- [ Implementation note: an "object" is identified by its 160-bit SHA1
- hash, aka "name", and a reference to an object is always the 40-byte
- hex representation of that SHA1 name. The files in the "refs"
- subdirectory are expected to contain these hex references (usually
- with a final '\n' at the end), and you should thus expect to see a
- number of 41-byte files containing these references in this refs
- subdirectories when you actually start populating your tree ]
+ objects are what contains all the real 'data' in your repository.
+
+ - a subdirectory called `refs`, which contains references to objects.
+
+In particular, the `refs` subdirectory will contain two other
+subdirectories, named `heads` and `tags` respectively. They do
+exactly what their names imply: they contain references to any number
+of different 'heads' of development (aka 'branches'), and to any
+'tags' that you have created to name specific versions in your
+repository.
+
+One note: the special `master` head is the default branch, which is
+why the `.git/HEAD` file was created as a symlink to it even if it
+doesn't yet exist. Basically, the `HEAD` link is supposed to always
+point to the branch you are working on right now, and you always
+start out expecting to work on the `master` branch.
+
+However, this is only a convention, and you can name your branches
+anything you want, and don't have to ever even 'have' a `master`
+branch. A number of the git tools will assume that `.git/HEAD` is
+valid, though.
+
+[NOTE]
+An "object" is identified by its 160-bit SHA1 hash, aka "name",
+and a reference to an object is always the 40-byte hex
+representation of that SHA1 name. The files in the "refs"
+subdirectory are expected to contain these hex references
+(usually with a final '\n' at the end), and you should thus
+expect to see a number of 41-byte files containing these
+references in this refs subdirectories when you actually start
+populating your tree.
You have now created your first git repository. Of course, since it's
empty, that's not very useful, so let's start populating it with data.
in your git repository. We'll start off with a few bad examples, just to
get a feel for how this works:
- echo "Hello World" >hello
- echo "Silly example" >example
+------------------------------------------------
+echo "Hello World" >hello
+echo "Silly example" >example
+------------------------------------------------
you have now created two files in your working tree (aka "working directory"), but to
actually check in your hard work, you will have to go through two steps:
- commit that index file as an object.
The first step is trivial: when you want to tell git about any changes
-to your working tree, you use the "git-update-cache" program. That
+to your working tree, you use the `git-update-cache` program. That
program normally just takes a list of filenames you want to update, but
to avoid trivial mistakes, it refuses to add new entries to the cache
(or remove existing ones) unless you explicitly tell it that you're
-adding a new entry with the "--add" flag (or removing an entry with the
-"--remove") flag.
+adding a new entry with the `--add` flag (or removing an entry with the
+`--remove`) flag.
So to populate the index with the two files you just created, you can do
- git-update-cache --add hello example
+------------------------------------------------
+git-update-cache --add hello example
+------------------------------------------------
and you have now told git to track those two files.
which will print out "Hello World". The object 557db03 is nothing
more than the contents of your file "hello".
-[ Digression: don't confuse that object with the file "hello" itself. The
- object is literally just those specific _contents_ of the file, and
- however much you later change the contents in file "hello", the object we
- just looked at will never change. Objects are immutable. ]
+[NOTE]
+Don't confuse that object with the file "hello" itself. The
+object is literally just those specific _contents_ of the file, and
+however much you later change the contents in file "hello", the object we
+just looked at will never change. Objects are immutable.
-[ Digression #2: the second example demonstrates that you can
- abbreviate the object name to only the first several
- hexadecimal digits in most places. ]
+[NOTE]
+The second example demonstrates that you can
+abbreviate the object name to only the first several
+hexadecimal digits in most places.
Anyway, as we mentioned previously, you normally never actually take a
look at the objects themselves, and typing long 40-character hex
names is not something you'd normally want to do. The above digression
-was just to show that "git-update-cache" did something magical, and
+was just to show that `git-update-cache` did something magical, and
actually saved away the contents of your files into the git object
database.
-Updating the cache did something else too: it created a ".git/index"
+Updating the cache did something else too: it created a `.git/index`
file. This is the index that describes your current working tree, and
something you should be very aware of. Again, you normally never worry
about the index file itself, but you should be aware of the fact that
In particular, let's not even check in the two files into git yet, we'll
start off by adding another line to "hello" first:
- echo "It's a new day for git" >>hello
+------------------------------------------------
+echo "It's a new day for git" >>hello
+------------------------------------------------
and you can now, since you told git about the previous state of "hello", ask
git what has changed in the tree compared to your old index, using the
filenames with their contents (and their permissions), and we're
creating the equivalent of a git "directory" object:
- git-write-tree
+------------------------------------------------
+git-write-tree
+------------------------------------------------
and this will just output the name of the resulting tree, in this case
(if you have done exactly as I've described) it should be
exactly what git-commit-tree spits out, we can do this all with a simple
shell pipeline:
- echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD
+------------------------------------------------
+echo "Initial commit" | git-commit-tree $(git-write-tree) > .git/HEAD
+------------------------------------------------
which will say:
file to HEAD, doing "git-diff-cache --cached -p HEAD" should thus return
an empty set of differences, and that's exactly what it does.
-[ Digression: "git-diff-cache" really always uses the index for its
- comparisons, and saying that it compares a tree against the working
- tree is thus not strictly accurate. In particular, the list of
- files to compare (the "meta-data") _always_ comes from the index file,
- regardless of whether the --cached flag is used or not. The --cached
- flag really only determines whether the file _contents_ to be compared
- come from the working tree or not.
-
- This is not hard to understand, as soon as you realize that git simply
- never knows (or cares) about files that it is not told about
- explicitly. Git will never go _looking_ for files to compare, it
- expects you to tell it what the files are, and that's what the index
- is there for. ]
+[NOTE]
+"git-diff-cache" really always uses the index for its
+comparisons, and saying that it compares a tree against the working
+tree is thus not strictly accurate. In particular, the list of
+files to compare (the "meta-data") _always_ comes from the index file,
+regardless of whether the --cached flag is used or not. The --cached
+flag really only determines whether the file _contents_ to be compared
+come from the working tree or not.
++
+This is not hard to understand, as soon as you realize that git simply
+never knows (or cares) about files that it is not told about
+explicitly. Git will never go _looking_ for files to compare, it
+expects you to tell it what the files are, and that's what the index
+is there for.
However, our next step is to commit the _change_ we did, and again, to
understand what's going on, keep in mind the difference between "working
work through the index file, so the first thing we need to do is to
update the index cache:
- git-update-cache hello
+------------------------------------------------
+git-update-cache hello
+------------------------------------------------
(note how we didn't need the "--add" flag this time, since git knew
about the file already).
this wasn't an initial commit any more), but you've done that once
already, so let's just use the helpful script this time:
- git commit
+------------------------------------------------
+git commit
+------------------------------------------------
which starts an editor for you to write the commit message and tells you
a bit about what you have done.
and you will see exactly what has changed in the repository over its
short history.
-[ Side note: the "--root" flag is a flag to git-diff-tree to tell it to
- show the initial aka "root" commit too. Normally you'd probably not
- want to see the initial import diff, but since the tutorial project
- was started from scratch and is so small, we use it to make the result
- a bit more interesting. ]
+[NOTE]
+The "--root" flag is a flag to git-diff-tree to tell it to
+show the initial aka "root" commit too. Normally you'd probably not
+want to see the initial import diff, but since the tutorial project
+was started from scratch and is so small, we use it to make the result
+a bit more interesting.
With that, you should now be having some inkling of what git does, and
can explore on your own.
-[ Side note: most likely, you are not directly using the core
- git Plumbing commands, but using Porcelain like Cogito on top
- of it. Cogito works a bit differently and you usually do not
- have to run "git-update-cache" yourself for changed files (you
- do tell underlying git about additions and removals via
- "cg-add" and "cg-rm" commands). Just before you make a commit
- with "cg-commit", Cogito figures out which files you modified,
- and runs "git-update-cache" on them for you. ]
+[NOTE]
+Most likely, you are not directly using the core
+git Plumbing commands, but using Porcelain like Cogito on top
+of it. Cogito works a bit differently and you usually do not
+have to run "git-update-cache" yourself for changed files (you
+do tell underlying git about additions and removals via
+"cg-add" and "cg-rm" commands). Just before you make a commit
+with "cg-commit", Cogito figures out which files you modified,
+and runs "git-update-cache" on them for you.
Tagging a version
it in the ".git/refs/tags/" subdirectory instead of calling it a "head".
So the simplest form of tag involves nothing more than
- git tag my-first-tag
+------------------------------------------------
+git tag my-first-tag
+------------------------------------------------
which just writes the current HEAD into the .git/refs/tags/my-first-tag
file, after which point you can then use this symbolic name for that
working tree, with the local git information hidden in the ".git"
subdirectory. There is nothing else. What you see is what you got.
-[ Side note: you can tell git to split the git internal information from
- the directory that it tracks, but we'll ignore that for now: it's not
- how normal projects work, and it's really only meant for special uses.
- So the mental model of "the git information is always tied directly to
- the working tree that it describes" may not be technically 100%
- accurate, but it's a good model for all normal use ]
+[NOTE]
+You can tell git to split the git internal information from
+the directory that it tracks, but we'll ignore that for now: it's not
+how normal projects work, and it's really only meant for special uses.
+So the mental model of "the git information is always tied directly to
+the working tree that it describes" may not be technically 100%
+accurate, but it's a good model for all normal use.
This has two implications:
made a mistake and want to start all over), you can just do simple
rm -rf git-tutorial
-
- and it will be gone. There's no external repository, and there's no
- history outside the project you created.
++
+and it will be gone. There's no external repository, and there's no
+history outside the project you created.
- if you want to move or duplicate a git repository, you can do so. There
is "git clone" command, but if all you want to do is just to
create a copy of your repository (with all the full history that
went along with it), you can do so with a regular
"cp -a git-tutorial new-git-tutorial".
-
- Note that when you've moved or copied a git repository, your git index
- file (which caches various information, notably some of the "stat"
- information for the files involved) will likely need to be refreshed.
- So after you do a "cp -a" to create a new copy, you'll want to do
++
+Note that when you've moved or copied a git repository, your git index
+file (which caches various information, notably some of the "stat"
+information for the files involved) will likely need to be refreshed.
+So after you do a "cp -a" to create a new copy, you'll want to do
git-update-cache --refresh
-
- in the new repository to make sure that the index file is up-to-date.
++
+in the new repository to make sure that the index file is up-to-date.
Note that the second point is true even across machines. You can
duplicate a remote git repository with _any_ regular copy mechanism, be it
and in fact a lot of the common git command combinations can be scripted
with the "git xyz" interfaces, and you can learn things by just looking
-at what the git-*-script scripts do ("git reset" is the above two lines
-implemented in "git-reset-script", but some things like "git status" and
-"git commit" are slightly more complex scripts around the basic git
+at what the `git-*-script` scripts do (`git reset` is the above two lines
+implemented in `git-reset-script`, but some things like "git status" and
+`git commit` are slightly more complex scripts around the basic git
commands).
Many (most?) public remote repositories will not contain any of
-the checked out files or even an index file, and will _only_ contain the
+the checked out files or even an index file, and will 'only' contain the
actual core git files. Such a repository usually doesn't even have the
-".git" subdirectory, but has all the git files directly in the
+`.git` subdirectory, but has all the git files directly in the
repository.
To create your own local live copy of such a "raw" git repository, you'd
used earlier, and create a branch in it. You do that by simply just
saying that you want to check out a new branch:
- git checkout -b mybranch
+------------
+git checkout -b mybranch
+------------
will create a new branch based at the current HEAD position, and switch
to it.
-[ Side note: if you make the decision to start your new branch at some
- other point in the history than the current HEAD, you can do so by
- just telling "git checkout" what the base of the checkout would be.
- In other words, if you have an earlier tag or branch, you'd just do
+[NOTE]
+================================================
+If you make the decision to start your new branch at some
+other point in the history than the current HEAD, you can do so by
+just telling "git checkout" what the base of the checkout would be.
+In other words, if you have an earlier tag or branch, you'd just do
git checkout -b mybranch earlier-commit
- and it would create the new branch "mybranch" at the earlier commit,
- and check out the state at that time. ]
+and it would create the new branch "mybranch" at the earlier commit,
+and check out the state at that time.
+================================================
You can always just jump back to your original "master" branch by doing
git branch
-which is nothing more than a simple script around "ls .git/refs/heads".
+which is nothing more than a simple script around `ls .git/refs/heads`.
There will be asterisk in front of the branch you are currently on.
Sometimes you may wish to create a new branch _without_ actually
being the same as the original "master" branch, let's make sure we're in
that branch, and do some work there.
- git checkout mybranch
- echo "Work, work, work" >>hello
- git commit -m 'Some work.' hello
+------------
+git checkout mybranch
+echo "Work, work, work" >>hello
+git commit -m 'Some work.' hello
+------------
Here, we just added another line to "hello", and we used a shorthand for
both going a "git-update-cache hello" and "git commit" by just giving the
does some work in the original branch, and simulate that by going back
to the master branch, and editing the same file differently there:
- git checkout master
+------------
+git checkout master
+------------
Here, take a moment to look at the contents of "hello", and notice how they
don't contain the work we just did in "mybranch" - because that work
hasn't happened in the "master" branch at all. Then do
- echo "Play, play, play" >>hello
- echo "Lots of fun" >>example
- git commit -m 'Some fun.' hello example
+------------
+echo "Play, play, play" >>hello
+echo "Lots of fun" >>example
+git commit -m 'Some fun.' hello example
+------------
since the master branch is obviously in a much better mood.
script called "git resolve", which wants to know which branches you want
to resolve and what the merge is all about:
- git resolve HEAD mybranch "Merge work in mybranch"
+------------
+git resolve HEAD mybranch "Merge work in mybranch"
+------------
where the third argument is going to be used as the commit message if
the merge can be resolved automatically.
open "hello" in our editor (whatever that may be), and fix it up somehow.
I'd suggest just making it so that "hello" contains all four lines:
- Hello World
- It's a new day for git
- Play, play, play
- Work, work, work
+------------
+Hello World
+It's a new day for git
+Play, play, play
+Work, work, work
+------------
and once you're happy with your manual merge, just do a
- git commit hello
+------------
+git commit hello
+------------
which will very loudly warn you that you're now committing a merge
(which is correct, so never mind), and you can write a small merge
from the "master" branch, git will know how you merged it, so you'll not
have to do _that_ merge again.
-Another useful tool, especially if you do not work in X-Window
-environment all the time, is "git show-branch".
+Another useful tool, especially if you do not always work in X-Window
+environment, is "git show-branch".
------------------------------------------------
$ git show-branch master mybranch
Local directory
/path/to/repo.git/
-[ Digression: you could do without using any branches at all, by
- keeping as many local repositories as you would like to have
- branches, and merging between them with "git pull", just like
- you merge between branches. The advantage of this approach is
- that it lets you keep set of files for each "branch" checked
- out and you may find it easier to switch back and forth if you
- juggle multiple lines of development simultaneously. Of
- course, you will pay the price of more disk usage to hold
- multiple working trees, but disk space is cheap these days. ]
-
-[ Digression #2: you could even pull from your own repository by
- giving '.' as <remote-repository> parameter to "git pull". ]
+[NOTE]
+You could do without using any branches at all, by
+keeping as many local repositories as you would like to have
+branches, and merging between them with "git pull", just like
+you merge between branches. The advantage of this approach is
+that it lets you keep set of files for each "branch" checked
+out and you may find it easier to switch back and forth if you
+juggle multiple lines of development simultaneously. Of
+course, you will pay the price of more disk usage to hold
+multiple working trees, but disk space is cheap these days.
+
+[NOTE]
+You could even pull from your own repository by
+giving '.' as <remote-repository> parameter to "git pull".
It is likely that you will be pulling from the same remote
repository from time to time. As a short hand, you can store
Examples.
- (1) git pull linus
- (2) git pull linus tag v0.99.1
- (3) git pull jgarzik/netdev-2.6.git/ e100
+. git pull linus
+. git pull linus tag v0.99.1
+. git pull jgarzik/netdev-2.6.git/ e100
the above are equivalent to:
- (1) git pull http://www.kernel.org/pub/scm/git/git.git/ HEAD
- (2) git pull http://www.kernel.org/pub/scm/git/git.git/ tag v0.99.1
- (3) git pull http://www.kernel.org/pub/.../jgarzik/netdev-2.6.git e100
+. git pull http://www.kernel.org/pub/scm/git/git.git/ HEAD
+. git pull http://www.kernel.org/pub/scm/git/git.git/ tag v0.99.1
+. git pull http://www.kernel.org/pub/.../jgarzik/netdev-2.6.git e100
Publishing your work
update the public repository from it. This is often called
"pushing".
-[ Side note: this public repository could further be mirrored,
- and that is how kernel.org git repositories are done. ]
+[NOTE]
+This public repository could further be mirrored,
+and that is how kernel.org git repositories are done.
Publishing the changes from your local (private) repository to
your remote (public) repository requires a write privilege on
into it later. Obviously, this repository creation needs to be
done only once.
-[ Digression: "git push" uses a pair of programs,
- "git-send-pack" on your local machine, and "git-receive-pack"
- on the remote machine. The communication between the two over
- the network internally uses an SSH connection. ]
+[NOTE]
+"git push" uses a pair of programs,
+"git-send-pack" on your local machine, and "git-receive-pack"
+on the remote machine. The communication between the two over
+the network internally uses an SSH connection.
Your private repository's GIT directory is usually .git, but
your public repository is often named after the project name,
you need to make sure that you have the "git-receive-pack"
program on the $PATH.
-[ Side note: many installations of sshd do not invoke your shell
- as the login shell when you directly run programs; what this
- means is that if your login shell is bash, only .bashrc is
- read and not .bash_profile. As a workaround, make sure
- .bashrc sets up $PATH so that you can run 'git-receive-pack'
- program. ]
+[NOTE]
+Many installations of sshd do not invoke your shell
+as the login shell when you directly run programs; what this
+means is that if your login shell is bash, only .bashrc is
+read and not .bash_profile. As a workaround, make
+sure .bashrc sets up $PATH so that you can run 'git-receive-pack'
+program.
Your "public repository" is now ready to accept your changes.
Come back to the machine you have your private repository. From
packed, and stores the packed file in .git/objects/pack
directory.
-[ Side Note: you will see two files, pack-*.pack and pack-*.idx,
- in .git/objects/pack directory. They are closely related to
- each other, and if you ever copy them by hand to a different
- repository for whatever reason, you should make sure you copy
- them together. The former holds all the data from the objects
- in the pack, and the latter holds the index for random
- access. ]
+[NOTE]
+You will see two files, pack-\*.pack and pack-\*.idx,
+in .git/objects/pack directory. They are closely related to
+each other, and if you ever copy them by hand to a different
+repository for whatever reason, you should make sure you copy
+them together. The former holds all the data from the objects
+in the pack, and the latter holds the index for random
+access.
If you are paranoid, running "git-verify-pack" command would
detect if you have a corrupt pack, but do not worry too much.
You can try running "find .git/objects -type f" before and after
you run "git prune-packed" if you are curious.
-[ Side Note: "git pull" is slightly cumbersome for HTTP transport,
- as a packed repository may contain relatively few objects in a
- relatively large pack. If you expect many HTTP pulls from your
- public repository you might want to repack & prune often, or
- never. ]
+[NOTE]
+"git pull" is slightly cumbersome for HTTP transport,
+as a packed repository may contain relatively few objects in a
+relatively large pack. If you expect many HTTP pulls from your
+public repository you might want to repack & prune often, or
+never.
If you run "git repack" again at this point, it will say
"Nothing to pack". Once you continue your development and
A recommended workflow for a "project lead" goes like this:
- (1) Prepare your primary repository on your local machine. Your
- work is done there.
+1. Prepare your primary repository on your local machine. Your
+ work is done there.
- (2) Prepare a public repository accessible to others.
+2. Prepare a public repository accessible to others.
- (3) Push into the public repository from your primary
- repository.
+3. Push into the public repository from your primary
+ repository.
- (4) "git repack" the public repository. This establishes a big
- pack that contains the initial set of objects as the
- baseline, and possibly "git prune-packed" if the transport
- used for pulling from your repository supports packed
- repositories.
+4. "git repack" the public repository. This establishes a big
+ pack that contains the initial set of objects as the
+ baseline, and possibly "git prune-packed" if the transport
+ used for pulling from your repository supports packed
+ repositories.
- (5) Keep working in your primary repository. Your changes
- include modifications of your own, patches you receive via
- e-mails, and merges resulting from pulling the "public"
- repositories of your "subsystem maintainers".
+5. Keep working in your primary repository. Your changes
+ include modifications of your own, patches you receive via
+ e-mails, and merges resulting from pulling the "public"
+ repositories of your "subsystem maintainers".
++
+You can repack this private repository whenever you feel like.
- You can repack this private repository whenever you feel
- like.
+6. Push your changes to the public repository, and announce it
+ to the public.
- (6) Push your changes to the public repository, and announce it
- to the public.
-
- (7) Every once in a while, "git repack" the public repository.
- Go back to step (5) and continue working.
+7. Every once in a while, "git repack" the public repository.
+ Go back to step 5. and continue working.
A recommended work cycle for a "subsystem maintainer" who works
on that project and has an own "public repository" goes like this:
- (1) Prepare your work repository, by "git clone" the public
- repository of the "project lead". The URL used for the
- initial cloning is stored in .git/branches/origin.
-
- (2) Prepare a public repository accessible to others.
+1. Prepare your work repository, by "git clone" the public
+ repository of the "project lead". The URL used for the
+ initial cloning is stored in .git/branches/origin.
- (3) Copy over the packed files from "project lead" public
- repository to your public repository by hand; preferrably
- use rsync for that task.
+2. Prepare a public repository accessible to others.
- (4) Push into the public repository from your primary
- repository. Run "git repack", and possibly "git
- prune-packed" if the transport used for pulling from your
- repository supports packed repositories.
+3. Copy over the packed files from "project lead" public
+ repository to your public repository by hand; preferrably
+ use rsync for that task.
- (5) Keep working in your primary repository. Your changes
- include modifications of your own, patches you receive via
- e-mails, and merges resulting from pulling the "public"
- repositories of your "project lead" and possibly your
- "sub-subsystem maintainers".
+4. Push into the public repository from your primary
+ repository. Run "git repack", and possibly "git
+ prune-packed" if the transport used for pulling from your
+ repository supports packed repositories.
- You can repack this private repository whenever you feel
- like.
+5. Keep working in your primary repository. Your changes
+ include modifications of your own, patches you receive via
+ e-mails, and merges resulting from pulling the "public"
+ repositories of your "project lead" and possibly your
+ "sub-subsystem maintainers".
++
+You can repack this private repository whenever you feel
+like.
- (6) Push your changes to your public repository, and ask your
- "project lead" and possibly your "sub-subsystem
- maintainers" to pull from it.
+6. Push your changes to your public repository, and ask your
+ "project lead" and possibly your "sub-subsystem
+ maintainers" to pull from it.
- (7) Every once in a while, "git repack" the public repository.
- Go back to step (5) and continue working.
+7. Every once in a while, "git repack" the public repository.
+ Go back to step 5. and continue working.
A recommended work cycle for an "individual developer" who does
not have a "public" repository is somewhat different. It goes
like this:
- (1) Prepare your work repository, by "git clone" the public
- repository of the "project lead" (or a "subsystem
- maintainer", if you work on a subsystem). The URL used for
- the initial cloning is stored in .git/branches/origin.
+1. Prepare your work repository, by "git clone" the public
+ repository of the "project lead" (or a "subsystem
+ maintainer", if you work on a subsystem). The URL used for
+ the initial cloning is stored in .git/branches/origin.
- (2) Do your work there. Make commits.
+2. Do your work there. Make commits.
- (3) Run "git fetch origin" from the public repository of your
- upstream every once in a while. This does only the first
- half of "git pull" but does not merge. The head of the
- public repository is stored in .git/refs/heads/origin.
+3. Run "git fetch origin" from the public repository of your
+ upstream every once in a while. This does only the first
+ half of "git pull" but does not merge. The head of the
+ public repository is stored in .git/refs/heads/origin.
- (4) Use "git cherry origin" to see which ones of your patches
- were accepted, and/or use "git rebase origin" to port your
- unmerged changes forward to the updated upstream.
+4. Use "git cherry origin" to see which ones of your patches
+ were accepted, and/or use "git rebase origin" to port your
+ unmerged changes forward to the updated upstream.
- (5) Use "git format-patch origin" to prepare patches for e-mail
- submission to your upstream and send it out. Go back to
- step (2) and continue.
+5. Use "git format-patch origin" to prepare patches for e-mail
+ submission to your upstream and send it out. Go back to
+ step (2) and continue.
Working with Others, Shared Repository Style
- "goddamn idiotic truckload of sh*t": when it breaks
This is a stupid (but extremely fast) directory content manager. It
-doesn't do a whole lot, but what it _does_ do is track directory
+doesn't do a whole lot, but what it 'does' do is track directory
contents efficiently.
There are two object abstractions: the "object database", and the
characteristics: they are all deflated with zlib, and have a header
that not only specifies their tag, but also provides size information
about the data in the object. It's worth noting that the SHA1 hash
-that is used to name the object is the hash of the original data.
+that is used to name the object is the hash of the original data
+plus this header, so `sha1sum` 'file' does not match the object name
+for 'file'.
(Historical note: in the dawn of the age of git the hash
-was the sha1 of the _compressed_ object)
+was the sha1 of the 'compressed' object.)
As a result, the general consistency of an object can always be tested
independently of the contents or the type of the object: all objects can
The structured objects can further have their structure and
connectivity to other objects verified. This is generally done with
-the "git-fsck-cache" program, which generates a full dependency graph
+the `git-fsck-cache` program, which generates a full dependency graph
of all objects, and verifies their internal consistency (in addition
to just verifying their superficial consistency through the hash).
~~~~~~~~~~~
A "blob" object is nothing but a binary blob of data, and doesn't
refer to anything else. There is no signature or any other
-verification of the data, so while the object is consistent (it _is_
+verification of the data, so while the object is consistent (it 'is'
indexed by its sha1 hash, so the data itself is certainly correct), it
has absolutely no other attributes. No name associations, no
permissions. It is purely a blob of data (i.e. normally "file
So you can trust the contents of a tree to be valid, the same way you
can trust the contents of a blob, but you don't know where those
-contents _came_ from.
+contents 'came' from.
Side note on trees: since a "tree" object is a sorted list of
"filename+content", you can create a diff between two trees without
changes need a smarter "diff" implementation.
A tree is created with link:git-write-tree.html[git-write-tree] and
-its data can be accessed by link:git-ls-tree.html[git-ls-tree]
+its data can be accessed by link:git-ls-tree.html[git-ls-tree].
+Two trees can be compared with link:git-diff-tree.html[git-diff-tree].
Commit Object
~~~~~~~~~~~~~
result, for example.
Note on commits: unlike real SCM's, commits do not contain
-rename information or file mode chane information. All of that is
+rename information or file mode change information. All of that is
implicit in the trees involved (the result tree, and the result trees
of the parents), and describing that makes no sense in this idiotic
file manager.
A commit is created with link:git-commit-tree.html[git-commit-tree] and
-its data can be accessed by link:git-cat-file.html[git-cat-file]
+its data can be accessed by link:git-cat-file.html[git-cat-file].
Trust
~~~~~
An aside on the notion of "trust". Trust is really outside the scope
of "git", but it's worth noting a few things. First off, since
-everything is hashed with SHA1, you _can_ trust that an object is
+everything is hashed with SHA1, you 'can' trust that an object is
intact and has not been messed with by external sources. So the name
of an object uniquely identifies a known state - just not a state that
you may want to trust.
way once you have the name of a commit.
So to introduce some real trust in the system, the only thing you need
-to do is to digitally sign just _one_ special note, which includes the
+to do is to digitally sign just 'one' special note, which includes the
name of a top-level commit. Your digital signature shows others
that you trust that commit, and the immutability of the history of
commits tells others that they can trust the whole history.
integrity; the trust framework (and signature provision and
verification) has to come from outside.
-A tag is created with link:git-mktag.html[git-mktag] and
-its data can be accessed by link:git-cat-file.html[git-cat-file]
+A tag is created with link:git-mktag.html[git-mktag],
+its data can be accessed by link:git-cat-file.html[git-cat-file],
+and the signature can be verified by
+link:git-verify-tag-script.html[git-verify-tag].
The "index" aka "Current Directory Cache"
In particular, the index certainly does not need to be consistent with
the current directory contents (in fact, most operations will depend on
-different ways to make the index _not_ be consistent with the directory
+different ways to make the index 'not' be consistent with the directory
hierarchy), but it has three very important attributes:
'(a) it can re-generate the full state it caches (not just the
haven't lost any information as long as you have the name of the tree
that it described.
-At the same time, the directory index is at the same time also the
+At the same time, the index is at the same time also the
staging area for creating new trees, and creating a new tree always
involves a controlled modification of the index file. In particular,
the index file can have the representation of an intermediate tree that
To tell git that yes, you really do realize that certain files no
longer exist in the archive, or that new files should be added, you
-should use the "--remove" and "--add" flags respectively.
+should use the `--remove` and `--add` flags respectively.
-NOTE! A "--remove" flag does _not_ mean that subsequent filenames will
+NOTE! A `--remove` flag does 'not' mean that subsequent filenames will
necessarily be removed: if the files still exist in your directory
structure, the index will be updated with their new status, not
-removed. The only thing "--remove" means is that update-cache will be
+removed. The only thing `--remove` means is that update-cache will be
considering a removed file to be a valid thing, and if the file really
does not exist any more, it will update the index accordingly.
-As a special case, you can also do "git-update-cache --refresh", which
+As a special case, you can also do `git-update-cache --refresh`, which
will refresh the "stat" information of each index to match the current
-stat information. It will _not_ update the object status itself, and
+stat information. It will 'not' update the object status itself, and
it will only update the fields that are used to quickly test whether
an object still matches its old backing store object.
git-read-tree <sha1 of tree>
and your index file will now be equivalent to the tree that you saved
-earlier. However, that is only your _index_ file: your working
+earlier. However, that is only your 'index' file: your working
directory contents have not been modified.
4) index -> working directory
files. This is not a very common operation, since normally you'd just
keep your files updated, and rather than write to your working
directory, you'd tell the index files about the changes in your
-working directory (i.e. "git-update-cache").
+working directory (i.e. `git-update-cache`).
However, if you decide to jump to a new version, or check out somebody
else's version, or just restore a previous tree, you'd populate your
index file with read-tree, and then you need to check out the result
with
+
git-checkout-cache filename
-or, if you want to check out all of the index, use "-a".
+or, if you want to check out all of the index, use `-a`.
NOTE! git-checkout-cache normally refuses to overwrite old files, so
if you have an old version of the tree already checked out, you will
-need to use the "-f" flag (_before_ the "-a" flag or the filename) to
-_force_ the checkout.
+need to use the "-f" flag ('before' the "-a" flag or the filename) to
+'force' the checkout.
Finally, there are a few odds and ends which are not purely moving
git-commit-tree will return the name of the object that represents
that commit, and you should save it away for later use. Normally,
-you'd commit a new "HEAD" state, and while git doesn't care where you
+you'd commit a new `HEAD` state, and while git doesn't care where you
save the note about that state, in practice we tend to just write the
-result to the file ".git/HEAD", so that we can always see what the
+result to the file `.git/HEAD`, so that we can always see what the
last committed state was.
6) Examining the data
shows the type of the object, and once you have the type (which is
usually implicit in where you find the object), you can use
- git-cat-file blob|tree|commit <objectname>
+ git-cat-file blob|tree|commit|tag <objectname>
to show its contents. NOTE! Trees have binary content, and as a result
there is a special helper for showing that content, called
-"git-ls-tree", which turns the binary content into a more easily
+`git-ls-tree`, which turns the binary content into a more easily
readable form.
It's especially instructive to look at "commit" objects, since those
tend to be small and fairly self-explanatory. In particular, if you
-follow the convention of having the top commit name in ".git/HEAD",
+follow the convention of having the top commit name in `.git/HEAD`,
you can do
git-cat-file commit $(cat .git/HEAD)
Once you know the three trees you are going to merge (the one
"original" tree, aka the common case, and the two "result" trees, aka
the branches you want to merge), you do a "merge" read into the
-index. This will throw away your old index contents, so you should
+index. This will complain if it has to throw away your old index contents, so you should
make sure that you've committed those - in fact you would normally
always do a merge against your last commit (which should thus match
what you have in your current index anyway).
To do the merge, do
- git-read-tree -m <origtree> <target1tree> <target2tree>
+ git-read-tree -m -u <origtree> <yourtree> <targettree>
which will do all trivial merge operations for you directly in the
index file, and you can just write the result out with
-"git-write-tree".
+`git-write-tree`.
+
+Historical note. We did not have `-u` facility when this
+section was first written, so we used to warn that
+the merge is done in the index file, not in your
+working directory, and your working directory will no longer match your
+index.
-NOTE! Because the merge is done in the index file, and not in your
-working directory, your working directory will no longer match your
-index. You can use "git-checkout-cache -f -a" to make the effect of
-the merge be seen in your working directory.
-NOTE2! Sadly, many merges aren't trivial. If there are files that have
+8) Merging multiple trees, continued
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Sadly, many merges aren't trivial. If there are files that have
been added.moved or removed, or if both branches have modified the
same file, you will be left with an index tree that contains "merge
-entries" in it. Such an index tree can _NOT_ be written out to a tree
+entries" in it. Such an index tree can 'NOT' be written out to a tree
object, and you will have to resolve any such merge clashes using
other tools before you can write out the result.
-
-[ fixme: talk about resolving merges here ]
+You can examine such index state with `git-ls-files --unmerged`
+command. An example:
+
+------------------------------------------------
+$ git-read-tree -m $orig HEAD $target
+$ git-ls-files --unmerged
+100644 263414f423d0e4d70dae8fe53fa34614ff3e2860 1 hello.c
+100644 06fa6a24256dc7e560efa5687fa84b51f0263c3a 2 hello.c
+100644 cc44c73eb783565da5831b4d820c962954019b69 3 hello.c
+------------------------------------------------
+
+Each line of the `git-ls-files --unmerged` output begins with
+the blob mode bits, blob SHA1, 'stage number', and the
+filename. The 'stage number' is git's way to say which tree it
+came from: stage 1 corresponds to `$orig` tree, stage 2 `HEAD`
+tree, and stage3 `$target` tree.
+
+Earlier we said that trivial merges are done inside
+`git-read-tree -m`. For example, if the file did not change
+from `$orig` to `HEAD` nor `$target`, or if the file changed
+from `$orig` to `HEAD` and `$orig` to `$target` the same way,
+obviously the final outcome is what is in `HEAD`. What the
+above example shows is that file `hello.c` was changed from
+`$orig` to `HEAD` and `$orig` to `$target` in a different way.
+You could resolve this by running your favorite 3-way merge
+program, e.g. `diff3` or `merge`, on the blob objects from
+these three stages yourself, like this:
+
+------------------------------------------------
+$ git-cat-file blob 263414f... >hello.c~1
+$ git-cat-file blob 06fa6a2... >hello.c~2
+$ git-cat-file blob cc44c73... >hello.c~3
+$ merge hello.c~2 hello.c~1 hello.c~3
+------------------------------------------------
+
+This would leave the merge result in `hello.c~2` file, along
+with conflict markers if there are conflicts. After verifying
+the merge result makes sense, you can tell git what the final
+merge result for this file is by:
+
+ mv -f hello.c~2 hello.c
+ git-update-cache hello.c
+
+When a path is in unmerged state, running `git-update-cache` for
+that path tells git to mark the path resolved.
+
+The above is the description of a git merge at the lowest level,
+to help you understand what conceptually happens under the hood.
+In practice, nobody, not even git itself, uses three `git-cat-file`
+for this. There is `git-merge-cache` program that extracts the
+stages to temporary files and calls a `merge` script on it
+
+ git-merge-cache git-merge-one-file-script hello.c
+
+and that is what higher level `git resolve` is implemented with.