banner
三文字

方寸之间

居善地,心善渊,与善仁,言善信,正善治,事善能,动善时。
github
email
mastodon
website

A special hash in Git.

Recently, I have gained some understanding of the internal principles of Git and came across a special hash, so I wrote this article to share my thoughts.

==============

Since you are reading this article, it means that you should be familiar with a series of operations in Git. However, when you use Git, have you ever encountered the following hash:

4b825dc642cb6eb9a060e54bf8d69288fbee4904

You may think that each object in Git has a hash value, but who pays attention to the value of the hash? Indeed, no one pays attention.

But the above hash is indeed a very special hash, and I will explain why this hash is a special existence.

Where does the hash in Git come from?#

Every Git repository, even an empty one, will contain this hash. This can be verified by using git show:

    $ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904
    tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904

So where does this hash come from? Before that, we need to understand a little bit about Git: The core part of Git is a simple key-value database. You can insert any type of content into the Git repository, and it will return a unique key, which can be used to retrieve the content at any time.

We can use the git hash-object command to store an object and get the key of that object.

    $ echo 'test' | git hash-object -w --stdin
    9daeafb9864cf43055ae93beb0afd6c7d144bfa4

The data stored internally by Git is similar to the following, where each object has its corresponding hash value:

Git Data Model from ProGit.v2

ps: If you are still curious, the Git Internals chapter of Pro Git provides more detailed explanations.

So let's get to the point. How is this special hash generated? It is actually the hash value of an empty tree. This can be verified by creating an object hash with an empty string /dev/null:

    $ git hash-object -t tree /dev/null
    4b825dc642cb6eb9a060e54bf8d69288fbee4904
    //or
    $ echo -n '' | git hash-object -t tree --stdin
    4b825dc642cb6eb9a060e54bf8d69288fbee4904

Special uses of the empty tree hash#

The empty tree hash can be used with git diff. For example, if you want to check for whitespace errors in a directory, you can use the --check option and compare HEAD with the empty tree:

    $ echo "test  " > readme.md
    $ git add . && git commit -m "init"
    [master 6d8e897] init
     1 file changed, 1 insertion(+), 3 deletions(-)
    $ git diff $(git hash-object -t tree /dev/null) HEAD --check -- readme.md
    readme.md:1: trailing whitespace.
    +test

The empty tree hash is also very useful when writing git hooks. A fairly common usage is to validate new commits before accepting them using code similar to the following:

    for changed_file in $(git diff --cached --name-only --diff-filter=ACM HEAD)
    do
      if ! validate_file "$changed_file"; then
        echo "Aborting commit"
        exit 1
      fi
    done

This can work fine if there are previous commits, but if there are no commits, the HEAD reference will not exist. To solve this problem, you can use the empty tree hash when checking the initial commit:

    if git rev-parse --verify -q HEAD > /dev/null; then
      against=HEAD
    else
      # Initial commit: diff against an empty tree object
      against="$(git hash-object -t tree /dev/null)"
    fi
    
    for changed_file in $(git diff --cached --name-only --diff-filter=ACM "$against")
    do
      if ! validate_file "$changed_file"; then
        echo "Aborting commit"
        exit 1
      fi
    done

References#

https://git-scm.com/book/en

https://floatingoctothorpe.uk/2017/empty-trees-in-git.html


Originally published on personal blog: 方寸之间

Loading...
Ownership of this post data is guaranteed by blockchain and smart contracts to the creator alone.