Recently, I have gained some understanding of the internal principles of Git and came across a special hash, so I wrote this article to share my thoughts.
==============
Since you are reading this article, it means that you should be familiar with a series of operations in Git. However, when you use Git, have you ever encountered the following hash:
4b825dc642cb6eb9a060e54bf8d69288fbee4904
You may think that each object in Git has a hash value, but who pays attention to the value of the hash? Indeed, no one pays attention.
But the above hash is indeed a very special hash, and I will explain why this hash is a special existence.
Where does the hash in Git come from?#
Every Git repository, even an empty one, will contain this hash. This can be verified by using git show
:
$ git show 4b825dc642cb6eb9a060e54bf8d69288fbee4904
tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
So where does this hash come from? Before that, we need to understand a little bit about Git: The core part of Git is a simple key-value database. You can insert any type of content into the Git repository, and it will return a unique key, which can be used to retrieve the content at any time.
We can use the git hash-object
command to store an object and get the key of that object.
$ echo 'test' | git hash-object -w --stdin
9daeafb9864cf43055ae93beb0afd6c7d144bfa4
The data stored internally by Git is similar to the following, where each object has its corresponding hash value:
ps: If you are still curious, the Git Internals chapter of Pro Git provides more detailed explanations.
So let's get to the point. How is this special hash generated? It is actually the hash value of an empty tree. This can be verified by creating an object hash with an empty string /dev/null
:
$ git hash-object -t tree /dev/null
4b825dc642cb6eb9a060e54bf8d69288fbee4904
//or
$ echo -n '' | git hash-object -t tree --stdin
4b825dc642cb6eb9a060e54bf8d69288fbee4904
Special uses of the empty tree hash#
The empty tree hash can be used with git diff
. For example, if you want to check for whitespace errors in a directory, you can use the --check
option and compare HEAD with the empty tree:
$ echo "test " > readme.md
$ git add . && git commit -m "init"
[master 6d8e897] init
1 file changed, 1 insertion(+), 3 deletions(-)
$ git diff $(git hash-object -t tree /dev/null) HEAD --check -- readme.md
readme.md:1: trailing whitespace.
+test
The empty tree hash is also very useful when writing git hooks
. A fairly common usage is to validate new commits before accepting them using code similar to the following:
for changed_file in $(git diff --cached --name-only --diff-filter=ACM HEAD)
do
if ! validate_file "$changed_file"; then
echo "Aborting commit"
exit 1
fi
done
This can work fine if there are previous commits, but if there are no commits, the HEAD
reference will not exist. To solve this problem, you can use the empty tree hash when checking the initial commit:
if git rev-parse --verify -q HEAD > /dev/null; then
against=HEAD
else
# Initial commit: diff against an empty tree object
against="$(git hash-object -t tree /dev/null)"
fi
for changed_file in $(git diff --cached --name-only --diff-filter=ACM "$against")
do
if ! validate_file "$changed_file"; then
echo "Aborting commit"
exit 1
fi
done
References#
https://floatingoctothorpe.uk/2017/empty-trees-in-git.html
Originally published on personal blog: 方寸之间