Version Control and Git#
Objectives📍
What Version Control is and where it comes from
Why it is useful
The basic principles of Git:
the Git database
structure of the Git repository
the staging area
commits
Once upon a time…#
… in the mystical kingdom of Academia, there lived a brilliant group of young scholars pursuing their Ph.D. in the magical realm of psychology. As they delved into the enchanted world of research and code development, they often found themselves tangled in the perplexing web of version confusion and collaborative chaos.
In this realm, the wise elders spoke of a powerful tool known as Git, a magical version control system that held the key to unlocking the secrets of seamless collaboration and code management. Legends were told of its ability to weave a protective spell around the precious codes, ensuring they remained untangled and unbroken, even in the face of the most challenging experiments.
Our young scholars, burdened with the weight of multiple revisions and the fear of losing their hard work, sought the guidance of a Git wizard. With Git as their trusted guide, the scholars discovered the joy of branching, a magical ability to create parallel universes for their code. No longer bound by the fear of experimenting with new ideas, they joyfully explored uncharted territories, knowing that the original code would remain unscathed.
Merge conflicts, once dreaded like dark curses, became mere bumps in the road as Git provided a harmonious way to reconcile differences between conflicting versions. The scholars danced through these challenges, emerging victorious with a united and cohesive codebase.
As the magical incantation “git push” echoed through the corridors of academia, the students marveled at Git’s power to teleport their creations to a safe and centralized repository, accessible to all who sought the wisdom within. No longer did they fear the loss of their precious work, for Git had cast a protective shield around their creations.
In the collaborative tapestry of research, Git emerged as the invisible thread that bound the efforts of each scholar into a magnificent masterpiece. Through pull requests and code reviews, they engaged in a cooperative dance, refining and enhancing each other’s contributions with grace and precision.
And so, with Git as their steadfast companion, the Ph.D. students embarked on a journey of discovery and innovation, unfazed by the challenges that once cast shadows upon their path. The enchanted land of Academia rejoiced as the scholars harnessed the magical powers of Git, transforming their code development process into a tale of seamless collaboration and boundless possibilities.
And thus, in this mystical realm, the legend of Git lived on as a timeless testament to the wondrous magic it bestowed upon those who dared to embrace its mystical embrace.[1]
Where Version Control Comes from#
Version Control originally comes from software development. Of course, we develop software, too, like our experiment or data analysis code. However, we often do not develop our code in a team with hundreds of people. When developing an app like a very simple gaming app, a lot of people are involved in active development at the same time. Also, a source code for a very simple gaming app has at least 50,000 rows of code. The code for analysing our research data likely has a lot less rows. The code for the gaming app also has different components (modules) and therefore the usual development process is divided in those modules, meaning that one person is responsible for one module. In the end, those modules have to come together smoothly. Also, apps usually come in different versions. We all heard about alpha-, beta-, or release-version of apps.
So professional software developers were confronted with several challenges:
Keep track of what each person is changing in the code
incorporate changes without breaking another module
have multiple streams of work for different versions
etc.
Sometimes Git might work in a way that is a bit too divided for your way of working (i.e., going in small-sized steps). There might be steps in the Git workflow that you consider unnecessary for your own needs. Then it always helps to remember where version control comes from to understand why the version control system is working in the way it does.
Centralization#
Another key challenge in software development is facilitating collaborative code development independent of time and space, i.e., centralization. With Git, you are only provided with local version control, meaning it is a software that runs on your computer and no one else has access to your work (=decentralized). For collaborative code development it is necessary that all developers have access to the same project and are able to see all of the projects changes and history (=centralized).
Collaborative code development dependent on time and space |
→ |
Collaborative code development independent from time and space |
For this, collaborative coding platforms such as GitHub or GitLab are key. This is also a common misconception of Git: Git is NOT the same as GitHub/GitLab. They’re independent of each other, in principle. You can use Git on your local without using GitHub/GitLab, and you can use GitHub/GitLab without using Git on your local machine. However, the combination of Git and GitHub/GitLab will give you the best of both worlds: Being able to track everything your doing on your local machine even if you don’t have internet access as well as sharing everything you did with your collaborators with just one command.
Git |
GitHub/GitLab |
- git is a software on your computer |
- collaboration platforms based on the git software |
- decentralized |
- centralized |
- local version control |
- distributed version control |
Benefits of Version Control#
Traceability
Track changes
Annotate each change with a message (=explain what and why) = a complete annotated history of the code
multiple streams of work (or version sets) independent from each other while also being able to merge the work back together
For collaborative work this means:
keeps all team members on the same page
makes sure that all team members can work simultaneously on the same project
makes sure that everybody is working on the latest version of the project
The basic principle of Git#
As a version control system, what Git does is simply store different version of your project for you. Git itself is a software that provides you with a server database in which the different versions of your project will be stored. All of these versions can be looked at and retrieved at any time, using the appropriate commands. So, what it is important for working with Git is to know how to communicate with it, i.e., knowing the language of Git (which we will learn during this course!).

Fig. 10 The basic principle of Git. Local folder vs. git version database.#
How the versioning works#
“Ok, so how exactly does Git store different versions of my project?”
Well, every time you want to save the current state of your project (=one version), git takes a “snapshot” of how your folder currently looks like (=commit a version). The reference to this snapshot (=commit hash) will then be saved in your commit history and with this commit hash you can find this version in your version database aka git repository. Committing a version means saving a snapshot of how all your files currently look like.
“Hello git, please save the current state of the project”
making a commit = git’s taking a snapshot of the current stage of your project and saving it in the git repository
commits are stand alone versions of the project
for every commit, git creates a hash which looks like this: 7c35a3ce607a14953f070f0f83b5d74c2296ef93
all hashes can be found in the commit history and can be used to look at or retrieve an earlier version
“Where is this version database?”
When you ask Git to version control your files (=initialize Git), it will create a hidden folder inside your local folder. This hidden folder (called .git) is your version database.

Fig. 11 After initializing Git in your project folder, a hidden .git folder is placed inside your project folder. Inside this hidden .git folder the different versions of the project are stored.#
“What does “different versions” of my project even mean? How do I know when to commit a version and when not?”
First of all: The expression “comitting a version” actually should be “comitting changes”. Because that’s what you do: You start with a first version of your project and time after time you change things compared to previous versions. So, what you commit in the end are actually the changes to the project.
Second: For the decision on when to make a commit, it is important to know that every time you make a commit you’re being asked to write a short description of what this commit is about (=commit message). So, what you should ask yourself before making a commit is: “Is the change between my current version compared to the previous version worth being saved as a stand alone version of my project?”.
The commit message you give is bound to your commit, i.e., the commit message should state what you changed. Only then you will be able to figure out to which commit you need to go back if you’re looking for a specific version of your project.
Another practice about commits is to commit changes that belong together. Namely, if you changed something in file1 and something in file2 and those changes are logically related, you should commit them together. If changes you made are not logically related, you should commit them separately.
Example for logically related changes
We are working on our choice_rtt
project. We make a change to the last committed version by coding a new stimuli shape. We insert this new stimuli shape in our experiment code. The changes happen in different files, yet the changes are logically related (make new shape - use new shape). We should commit those changes together, meaning after we changed both files.
Example for NOT logically related changes
We are working on our choice_rtt
project. We make a change to the last committed version by coding a new stimuli shape. We also make a change in our manuscript (which is awesome, btw), changing the affiliation of one of the co-authors. Those changes are not logically related. You should not put them in the same commit.
The staging area#
“But what if I worked on multiple different things since the last committed version and now I’m ready to commit my changes? Will I not have to commit all files together then?”
Luckily, no! Git has something called the staging area which solves this problem. Staging your files is basically giving your files a valid ticket for a ride to the version database. Only the files with a valid ticket for the next train (= the files you staged) can get onto the this train and take a ride to the version database (= commit to the version database). Every other file is left behind.
In the end, the basic Git workflow looks like this:

Fig. 12 Basic git workflow. Every change you want to commit needs to be staged first. Only staged changes will be committed in a new version.#
My staged changes being excited to take the train to version database:
My staged changes showing commitment and getting on board:
My unstaged changes watching the staged changes take off:
My committed changes arriving at version database: