The number of git users is increasing, personal use, in business … But to have observed the profile of a few hundred trainees during git training, I would say that overall, 80% said they have already used git since some months. And of those 80%, less than 1% know the inner workings of git.

Ok for the numbers but why put your hands in the plumbing? “I have my list of orders that I use every day, that’s enough for me.” Except that when things go wrong, impossible to understand the why of how …

Hence the idea of putting the hands into the backend of the tool.

Git: everything is object … well almost

The organization of Git’s metadata is largely inspired by a file system. Everything relies on objects to store data and metadata. Before going further let us keep in mind that these objects have common characteristics:
  • each object is identified by a checksum to guarantee its uniqueness and integrity, in this case a SHA1
  • every object is immutable. An object created is never modified or deleted (nearly)
  • All of these objects are stored in the local repository, in the .git /objects directory
Once these prerequisites are set, let’s move on to the first object on the list, the blob. It’s simple, it offers the full content of a file saved in the project at a given point in history. It is identified by a SHA1. The blob does not refer to the file name. For each file, we can find n blob which represent the n recorded versions. In the jargon of the file system we could bring the blob closer to the file.

The second object is the tree. It describes the project tree and always following our comparison with a file system, it would be the equivalent of the directory. The trees point to all the files and directories, their names as well as the SHA1 of the corresponding objects. They also contain timestamp and permissions information.

    Blobs and trees are rarely used in daily git practice. The third object is: the commit. The object does not contain any code itself. It contains the following:
    • the commit message that you tweaked of course
    • the identity of the committer and the author – this differentiation makes it possible to trace the exact origin of the code, even if you do not have commit rights
    • a pointer to a tree: the commit points to a specific tree representing a given state at a given time in the project
    • a parent: a pointer to the commit that immediately precedes it. This is what makes it possible to build the history of commits.

    Simple, right? Come on, we put everything back to end: a commit points to a tree which itself points to trees and blobs.

    Git is also references

    OK we have commits but how do we organize them? Git is based on branches. But what is a branch in the end? It’s very simple: a branch references a specific commit, in this case the last one made on the branch itself. This commit is called HEAD. And all this happens in the .git / refs / heads directory: one file per defined branch, each file contains the SHA1 of the HEAD commit. Let’s take an example: my project has 2 branches
    # git branch
    dev
    * master
    I find these branches in the branch files folder:
    # ls .git/refs/heads
    dev master
    # cat .git/refs/heads/master
    3d7bb81994a5eaaf9eca5f12f335a8c0491eca56
    
    I check the type of this SHA1 by viewing the history of the master branch:
     git log --oneline master 
    3d7bb81 images utilisées pour les supports git
    bf98d90 Utilisation du nouveau template pour la formation git avancé
    10d16d7 Utilisation du nouveau template de slides pour git en pratique
    aba2d6a Merge branch 'feature1' into 'master'
    Everything is there ! One last thing: we still have to register the current branch to always know where to be in the project. The information is simply stored in the .git / HEAD file.
    # cat .git/HEAD
    ref: refs/heads/master
    That’s it  for plumbing and this first article on Git. Go further with our training and if not, see you in the next article.