Git, a brief history of versions
After many years of using Git during personal or business projects, I have often had the feeling that for many there was friction, fear or even misunderstanding about the use of this software. Today I would like to write about this tool and some tips to demystify it and make it more accessible than it already is. In this article we will see the basics and see what a commit is, a branch, the notion of atomic commit and of course some tips to be more comfortable with the software.
Did you say a commit ? Did you say a branch ?
In fact, it’s not very complicated. A commit is simply an object that points to a snapshot of a project, of documents, with the main characteristics:
- id: A unique hash of 40 characters
- parent: The hash of the previous commit (or previous commits when it’s a merge commit)
- author: The author of the commit, not surprisingly
- message: The description of what the commit contains
Now that we have saw about what a commit is, let’s see what a branch is.
Nothing very complicated here either, a branch is a pointer to a commit or a succession of commits.
⚙️ Different states
In the schema above, a document corresponds to a file or folder. The commands in this schema to switch from one state to another are the ones I use the most, there are of course other commands to do the same thing.
When you want to know what state your documents are in, you can run the following command git status
.
The different states in which your documents may be are:
- Unmodified: When no document has been modified, we are in this state
- Working Directory: A document that has been modified is in this state
- Staging Area: After launching the
git add my_document
command, this one is in the staging area - Local Repository: Once the commit has been created thanks to the
git commit
command, it is in the local repository (corresponding to the git repository on the local machine) - Remote repository: The remote repository corresponds to the upstream repository (github, gitlab, etc.)
🧨 Atomic commit
We talk about atomic commit when it respects the following rules:
- Must concern only one subject (a bug fix, a functionality, etc.)
- Must not make the project inconsistent (by failing tests, making it impossible to build, etc.)
- Must have a clear and concise message
But by the way, why would we want to make atomic commits? What is it for?
Readability
Let’s be honest, in the example above, it’s quite complicated to understand what commits do. What features do they relate to?
After grouping commits by features, it immediately becomes much more visible!
Identification of bugs/regressions
Let’s take the following case: we have a branch with 6 commits on which the tests have not been launched since C0, and after launching them on C5 we notice that they no longer pass. In this case, we need to identify the commit from which they no longer pass. For this, we can use the git bisect
command, very useful for this case (this will be detailed later in this article). However, our commits are not atomic and some commits break the tests before they are fixed in the commit after. The git bisect
command cannot clearly identify the commit that breaks the tests.
If we had atomic commits, and therefore respected the rule “Must not make the project inconsistent (by failing tests, making it impossible to build, etc.)” the git bisect
command would quickly identify the first commit that adds the bug/regression.
Rollback on a feature
In the case we need to go back on a feature, if it is dispersed in several commits (C2, C3 and C4 on the schema above), it quickly becomes complicated to go back on it.
While if we used atomic commits, and therefore respected the rule “Must concern only one subject (a bug fix, a functionality, etc.)”, we would just have to delete the commit C2 to remove the feature.
📚 Basic commands
Checkout
This command allows you to update documents to match them to a given version.
Branch
The git checkout
command is very useful to switch between branches. In reality, what is happening is that we move the HEAD (git’s active pointer) to another branch.
Document
This time, the purpose of checkout on a document is to restore the file to the state it was in a previous version. On the schema above, the document file.md
retrieves the content it had on C1 “Foo”. The changes (removal of “Bar”) is added to the staging area.
Commit and detached HEAD
It is also possible to move the HEAD to a previous commit, however we will enter a state of Detached HEAD. This state often scares developers, but what does it mean?
Detached HEAD only indicates that the commit is pointed only by the HEAD, no branch points to it. The problem in this case is that if we add commits from the latter and then switch to a branch, these new commits will be orphaned and git will delete them thanks to its garbage collection system (see the schema below).
Merge
Fast-forward
Fast-forward is the simplest case. Git retrieves the direct ancestor and then applies the commits present on the branch we want to merge (C3, C4 and C5 in the schema above).
Non fast-forward
In the case the common ancestor is not direct, git will make a comparison between the 3 important commits, the common ancestor, the last commit pointed by main, the last commit pointed by develop, and will create a merge commit containing the result of this comparison.
Rebase
Let’s move on to a very interesting command : git rebase
. This command does not create a new merge commit. The strategy adopted is to retrieve the commits from the common ancestor (C4, C5 and C6 on the schema above), and to apply them one by one after the last commit pointed by the branch on which we want to rebase on (main on the schema above). In the case a commit contains the same changes as a new existing commit on the latter branch, this commit is not applied. To put it simple, on the schema the commit C5 is not applied because it contains the same changes as the commit C3.
🔑 Tips and tricks
Bisect
This command is extremely useful when we need to find the commit that is causing a bug or regression. As its name suggests, git bisect
uses bisection method to find the origin.
Let’s imagine that our unit tests no longer pass on the commit C5 and we know that they passed on the commit C1. Our tests are run by the tests.sh
file and it returns 0 in case of success and 1 otherwise.
git bisect start C5 C1
git bisect run ./tests.sh
git bisect reset # quit the bisect session
The steps performed by the git bisect are as follows:
- Determination of the interval: [C1, C5]
- Determination of the intermediate commit: C3
- Test on C3: The tests do not pass, we must start again with new interval [C1, C3]
- Determination of the intermediate commit: C2
- Test on C2: The tests pass -> C3 is the commit that adds regression
At the end, git bisect tells you the hash of the commit that adds the regression (C3 in our example). You can now leave the session with git bisect reset
and correct this regression with an interactive rebase for example!
Interactive rebase
You know the git rebase
command, but do you know the wonderful option --interactive
(or -i
for intimates). This command allows you to rewrite the history by adding, modifying, merging or even more your commits.
If you want to rework your last 3 commits, you can run the following command:
git rebase -i HEAD~3
Git will show you a list of actions that you can perform. For example, in the following proposal, we will modify the commit message “Added store exporter endpoint” and merge the commit “Fix typo” into the commit “Added brand exporter endpoint”.
r 9ceb668 Added store exporter endpoint
pick 3efb63f Added brand exporter endpoint
f d83dbbc Fix typo# Commands:
# p, pick <commit> = use commit
# r, reword <commit> = use commit, but edit the commit message
# e, edit <commit> = use commit, but stop for amending
# s, squash <commit> = use commit, but meld into previous commit
# f, fixup <commit> = like "squash", but discard this commit's log message
# x, exec <command> = run command (the rest of the line) using shell
# d, drop <commit> = remove commit
# l, label <label> = label current HEAD with a name
# t, reset <label> = reset HEAD to a label
# m, merge [-C <commit> | -c <commit>] <label> [# <oneline>]
# . create a merge commit using the original merge commit's
# . message (or the oneline, if no original merge commit was
# . specified). Use -c <commit> to reword the commit message.
Once done, we have our new commit message and the changes made by “Fix typo” are into the commit “Added brand exporter endpoint” (note that the hashes of the commits have changed since they are new commits).
⚠️ This command must be used with caution and it is often a bad idea to use it on already merged branches, on protected or common branches such as main and develop ⚠️
Reflog
Before knowing this command, I was afraid to break everything on git during my rebases, merges or other actions. I didn’t understand that it was almost always possible to go back because git keep the history of each action. This history was searchable by using the git reflog
command.
Let’s imagine that you have just finished your interactive rebase now that you know how to use it, and after thinking about it, you must return to the step before it. Don’t panic, you just have to recover the hash of this commit.
On the schema above, we can see all the steps performed and we can retrieve the commit before the interactive rebase (#18a45dd). We just have to use the following command to go back to this commit :
git reset --hard 18a45dd
The trick is done! Magic happened!
Useful commands
To finish this article, I would like to present 3 useful commands that I use every day.
Reset last commit
git reset --soft HEAD^
This command, which I call reset last commit, cancels the last commit and recovers its changes in the staging area. Very useful when a commit has left too quickly!
Add changes to the last commit
git commit --amend --no-edit
This command allows you to add the content of the staging area in the last commit without changing its message. Useful when we forgot some changes during the last commit!
Keep the control when you stage your documents
git add -p
This option is awesome! Git will pass on each modified block and ask you if you want to add it to the staging area or not! This is just essential when we want to add only certain parts of a file to the staging area.
Conclusion
Git is a powerful software which you should not be afraid to play with, as we always have the opportunity to go back thanks to the reflog
. That’s why you should not hesitate to make a nice atomic commit by playing with git rebase -i
or git reset --soft/--hard
. You can event quickly find the origin of a bug or regression with bisect!
I hope that after reading this article you come out with more knowledge on git and that you will no longer be afraid to break everything with the tool you use every day!
Thank you for reading my first article!