Continuous Integration#
Continuous integration (CI), in the way we want to use it, is basically just running the tests you wrote previously automatically. In a wider sense, CI is the practice of integrating changes to a project made by individuals into a main, shared version frequently (usually multiple times per day). This is, because if you work collaboratively on GitLab (or GitHub) you probably work on your own branch or on a feature branch but eventually all work comes together in the main
branch. The main
branch is also the branch on which the “software package” in the end will be built from. So it is very important that everything coming together to the main
branch is working nicely, individually (make sure with unit tests) and of course together (make sure with integration tests).
CI software is also typically used to identify any conflicts and bugs that are introduced by changes, so they are found and fixed early, minimising the effort required to do so. Running tests regularly also saves humans from needing to do it manually.
CI on GitLab#
GitLab is only one provider of many which let’s you automate your tests whenever you push a new version to the remote repository. As GitLab is a collaborative code platform, it will do this for every developer
when they push to the main
branch by default. However, you can also use CI to test different branches. The way this is done on GitLab is through a .gitlab-ci.yml
file.
In this .gitlab-ci.yml
file you specify a pipeline
which is run every time on a push
. Each branch can hold only one .gitlab-ci.yml
file. So, every test you want to have run when pushing to the main
branch needs to be specified in the .gitlab-ci.yml
version on the main
branch. If you also want to have e.g., unit tests run when pushing to your own branch in the collaborative repo, you can specify this in the .gitlab-ci.yml
version on your branch (and so on).
Requirements for a CI pipeline on GitLab#
A functioning CI pipeline consists of multiple components. Basically, you can imagine that you are asking an empty machine to execute your test scripts. So what you need to do is
install the software on the machine that it needs to do what you want it to do
tell it what to do
tell it how to do it
and tell it when to do it
The machine#
The “machine” that will be running the pipeline is called a runner
. So, GitLab holds the pipeline file (= GitLab knows what to do) and the runner
picks up this pipeline file and runs it (= runner actually does the work). Runner
differ in e.g. size, speed, or operating system. GitLab provides you with built-in runners and you don’t need to worry about setting one up. However, if you wanted to, you could set up your own runner. This is complex and not really needed for our purposes which is why we will not do this.
Attributes of a CI pipeline#
Like mentioned above, the pipeline is written in the .gitlab-ci.yml
file. The pipeline consists of different components that need to be defined. Those are
stage
image
script
variables
artifacts
Stage#
Stages
are run one after another. Within a stage
, a job
is executed. A stage
can hold multiple jobs
. All jobs
under the same stage
run in parallel.
Typical stages
of a CI pipeline are:
build
: in this stage you build your applicationlint
: in this stage you run all your linting and formattingtest
: in this stage you run all your testsdeploy
in this stage you deploy your application
build
stage#
In the build
stage you install all the dependencies that you need in order for your app to run properly (=requirements) and compile your code. In software development, an app usually consists of multiple code files or modules and to run tests such as integration tests, codes need to be compiled first. Or in our example, we need certain requirements fulfilled in terms of installing the respective python packages we need.
lint
and test
stage#
We learned in a previous section what linting is and which tests to write for our research code. In those stages you simply call the linting and test frameworks and let them do their job.
deploy
stage#
In this stage you release your app to another server. For example, you might want to save your current version not only on GitLab but also on the institutional server. Or you might want to publish your app as a docker image
for others to reuse. Or you might want to publish a JupyterBook, just like this one. Or, in the TAM Data Hub, we will have a CI to deploy your GitLab content to the TAM data repository where you then can obtain a DOI. All this requires that the content of your GitLab repository leaves the repository (=deployment).
Image#
Container images
are read-only templates containing instructions for a container. A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. This means: depending on which image you use you have different tools available in this container. Containers isolate software from its environment and ensure that it works uniformly despite differences.
As our runner is basically an empty machine we need to give it a basic setup, such as a certain python version, in our case. For this, there are images available which hold the instructions for creating a python environment.
Note: by default GitLab uses the ruby image. If you need another image you need to overwrite it by simply specifying another image. Find the image you need on https://hub.docker.com/.
Script#
With script
you simply specify which script should be run in a respective stage. For example, in the test stage
you specify where the tests
you’ve written are hold, relative to the gitlab-repo path. Or, if you use a test framework
such as pytest
, you simply call pytest
.
Variables#
With variables
you can store values you want to re-use within your pipeline. GitLab has some predefined variables or define your own variables. For example, in the deploy
stage when you want to release your repo to another server you might need to provide some login credentials for the other server. To not have your login credentials shown in your pipeline (which can be seen by anyone if your repo is public), you should go to Settings –> CI/CD –> Variables and create a variable for those, like $username_docker
and $password_docker
that hold the respective information and then use those variables in your pipeline instead of presenting your actual login credentials.
Or, if you want to run a pipeline for MATLAB
you first need to provide some license file, which you should also specify as a variable.
Artifacts#
artifacts
can be any output of your pipeline that you want to keep in a file, e.g., reports of your tests (JUnit report in xml format).
Example pipeline#
stages:
- build
- lint
- test
- deploy
install_requirements:
stage: build
image: python:3.10.6
script:
- pip install -U -r requirements.txt
run_linters:
stage: lint
image: python:latest
script:
- pip install black
- black source_file_or_directory
run_tests:
stage: test
image: python:3.10.6
script:
- pip install pytest
- pytest
rules:
- if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
deploy_teamserver:
stage: deploy
image: my-image
variables:
SERVER_URL: server-url
script: name-of-my-app $SERVER_URL
Note:
An example pipeline for MATLAB
can be found here.
Task
create a
.gitlab-ci.yml
file in yourchoice_rtt
folder (upper level).copy the
CI
from above (without thedeploy
stage).add
andcommit
the CI, andpush
it to GitLab.go to GitLab –> Build –> Pipelines and watch your CI run.