Continuous Integration#

Continuous integration (CI), in the way we want to use it, is basically just running the tests you wrote previously automatically. In a wider sense, CI is the practice of integrating changes to a project made by individuals into a main, shared version frequently (usually multiple times per day). This is, because if you work collaboratively on GitLab (or GitHub) you probably work on your own branch or on a feature branch but eventually all work comes together in the main branch. The main branch is also the branch on which the “software package” in the end will be built from. So it is very important that everything coming together to the main branch is working nicely, individually (make sure with unit tests) and of course together (make sure with integration tests).

CI software is also typically used to identify any conflicts and bugs that are introduced by changes, so they are found and fixed early, minimising the effort required to do so. Running tests regularly also saves humans from needing to do it manually.

CI GitLab

CI on GitLab#

GitLab is only one provider of many which let’s you automate your tests whenever you push a new version to the remote repository. As GitLab is a collaborative code platform, it will do this for every developer when they push to the main branch by default. However, you can also use CI to test different branches. The way this is done on GitLab is through a .gitlab-ci.yml file.

In this .gitlab-ci.yml file you specify a pipeline which is run every time on a push. Each branch can hold only one .gitlab-ci.yml file. So, every test you want to have run when pushing to the main branch needs to be specified in the .gitlab-ci.yml version on the main branch. If you also want to have e.g., unit tests run when pushing to your own branch in the collaborative repo, you can specify this in the .gitlab-ci.yml version on your branch (and so on).

Requirements for a CI pipeline on GitLab#

A functioning CI pipeline consists of multiple components. Basically, you can imagine that you are asking an empty machine to execute your test scripts. So what you need to do is

  • install the software on the machine that it needs to do what you want it to do

  • tell it what to do

  • tell it how to do it

  • and tell it when to do it

The machine#

The “machine” that will be running the pipeline is called a runner. So, GitLab holds the pipeline file (= GitLab knows what to do) and the runner picks up this pipeline file and runs it (= runner actually does the work). Runner differ in e.g. size, speed, or operating system. GitLab provides you with built-in runners and you don’t need to worry about setting one up. However, if you wanted to, you could set up your own runner. This is complex and not really needed for our purposes which is why we will not do this.

Attributes of a CI pipeline#

Like mentioned above, the pipeline is written in the .gitlab-ci.yml file. The pipeline consists of different components that need to be defined. Those are

  • stage

  • image

  • script

  • variables

  • artifacts

Stage#

Stages are run one after another. Within a stage, a job is executed. A stage can hold multiple jobs. All jobs under the same stage run in parallel.

Typical stages of a CI pipeline are:

  • build: in this stage you build your application

  • lint: in this stage you run all your linting and formatting

  • test: in this stage you run all your tests

  • deploy in this stage you deploy your application

build stage#

In the build stage you install all the dependencies that you need in order for your app to run properly (=requirements) and compile your code. In software development, an app usually consists of multiple code files or modules and to run tests such as integration tests, codes need to be compiled first. Or in our example, we need certain requirements fulfilled in terms of installing the respective python packages we need.

lint and test stage#

We learned in a previous section what linting is and which tests to write for our research code. In those stages you simply call the linting and test frameworks and let them do their job.

deploy stage#

In this stage you release your app to another server. For example, you might want to save your current version not only on GitLab but also on the institutional server. Or you might want to publish your app as a docker image for others to reuse. Or you might want to publish a JupyterBook, just like this one. Or, in the TAM Data Hub, we will have a CI to deploy your GitLab content to the TAM data repository where you then can obtain a DOI. All this requires that the content of your GitLab repository leaves the repository (=deployment).

Image#

Container images are read-only templates containing instructions for a container. A container is a standard unit of software that packages up code and all its dependencies so the application runs quickly and reliably from one computing environment to another. A Docker container image is a lightweight, standalone, executable package of software that includes everything needed to run an application: code, runtime, system tools, system libraries and settings. This means: depending on which image you use you have different tools available in this container. Containers isolate software from its environment and ensure that it works uniformly despite differences.

As our runner is basically an empty machine we need to give it a basic setup, such as a certain python version, in our case. For this, there are images available which hold the instructions for creating a python environment.

Note: by default GitLab uses the ruby image. If you need another image you need to overwrite it by simply specifying another image. Find the image you need on https://hub.docker.com/.

Script#

With script you simply specify which script should be run in a respective stage. For example, in the test stage you specify where the tests you’ve written are hold, relative to the gitlab-repo path. Or, if you use a test framework such as pytest, you simply call pytest.

Variables#

With variables you can store values you want to re-use within your pipeline. GitLab has some predefined variables or define your own variables. For example, in the deploy stage when you want to release your repo to another server you might need to provide some login credentials for the other server. To not have your login credentials shown in your pipeline (which can be seen by anyone if your repo is public), you should go to Settings –> CI/CD –> Variables and create a variable for those, like $username_docker and $password_docker that hold the respective information and then use those variables in your pipeline instead of presenting your actual login credentials.

Or, if you want to run a pipeline for MATLAB you first need to provide some license file, which you should also specify as a variable.

Artifacts#

artifacts can be any output of your pipeline that you want to keep in a file, e.g., reports of your tests (JUnit report in xml format).

Example pipeline#

stages:
- build
- lint
- test
- deploy

install_requirements:
    stage: build
    image: python:3.10.6
	script:
	- pip install -U -r requirements.txt	

run_linters:
	stage: lint 
	image: python:latest
	script: 
	- pip install black  
	- black source_file_or_directory
	            
run_tests:
	stage: test
	image: python:3.10.6
	script:
	- pip install pytest
    - pytest
	rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH	

deploy_teamserver:
	stage: deploy
	image: my-image
	variables:
		SERVER_URL: server-url
	script: name-of-my-app $SERVER_URL		

Note: An example pipeline for MATLAB can be found here.

Task

  • create a .gitlab-ci.yml file in your choice_rtt folder (upper level).

  • copy the CI from above (without the deploy stage).

  • add and commit the CI, and push it to GitLab.

  • go to GitLab –> Build –> Pipelines and watch your CI run.