Code testing#

logo

Looking back at our script from the section before, ie the one you just brought up to code concerning comments and formatting based on guidelines, there is at least one other big question we have to address…

How do we know that the code does what it is supposed to be doing?

Now you might think “What do you mean? It is running the things I wrote down, done.”. However, the reality looks different…

Background#

Generally, there are two major reasons why it is of the utmost importance to check and evaluate code:

  1. Mistakes made while coding

  2. Code instability and changes

Let’s have a quick look at both.

Mistakes made while coding#

It is very, very easy to make mistakes when coding. A single misplaced character can cause a program/script’s output to be entirely wrong or vary tremendously from what its expected behavior. This can happen because a plus sign which should have been a minus or one piece of code working in one unit while a piece of code written by another researcher worked in a differnt unit. Everyone makes mistakes, but the results can be catastrophic. Careers can be damaged/ended, vast sums of research funds can be wasted, and valuable time may be lost to exploring incorrect avenues.

logo logo logo logo

Code instabilities and changes#

The second reason is a also challening but in a different way…The code you are using and writing is affected by underlying numerical instabilities and changes during development.

Regarding the first, there are intrinsic numerical erros & instabilities that may lead unstable functions towards distinct local minima. This phenomenon is only aggravated by prominent differences between OS.

logo

Concerning the second, a lot of the code you are using is going to be part of packages and modules that are developed and maintained by other people. Along this process, the code, e.g. a function, you are using is going to change, more or less prominently. It could be a rounding change or complete change of inputs and outputs. Either or, the effects on your application and/or pipeline might be significant and most importantely: unbeknownst to you.

This is why software and code tests are vital! Ie, to ensure the expected outcome and check, as well as evaluate changes along the development process.

Motivation#

Even if problems in a program are caught before research is published it can be difficult to figure out what results are contaminated and must be re-done. This represents a huge loss of time and effort. Catching these problems as early as possible minimises the amount of work it takes to fix them, and for most researchers time is by far their most scarce resource. You should not skip writing tests because you are short on time, you should write tests because you are short on time. Researchers cannot afford to have months or years of work go down the drain, and they can’t afford to repeatedly manually check every little detail of a program that might be hundreds or hundreds of thousands of lines long. Writing tests to do it for you is the time-saving option, and it’s the safe option.

logo

As researchers write code they generally do some tests as they go along, often by adding in print statements and checking the output. However, these tests are often thrown away as soon as they pass and are no longer present to check what they were intended to check. It is comparatively very little work to place these tests in functions and keep them so they can be run at any time in the future. The additional labour is minimal, the time saved and safeguards provided are invaluable. Further, by formalising the testing process into a suite of tests that can be run independently and automatically, you provide a much greater degree of confidence that the software behaves correctly and increase the likelihood that defects will be found.

logo

Testing also affords researchers much more peace of mind when working on/improving a project. After changing their code a researcher will want to check that their changes or fixes have not broken anything. Providing researchers with a fail-fast environment allows the rapid identification of failures introduced by changes to the code. The alternative, of the researcher writing and running whatever small tests they have time for is far inferior to a good testing suite which can thoroughly check the code.

logo

Another benefit of writing tests is that it typically forces a researcher to write cleaner, more modular code as such code is far easier to write tests for, leading to an improvement in code quality. Good quality code is far easier (and altogether more pleasant) to work with than tangled rat’s nests of code I’m sure we’ve all come across (and, let’s be honest, written). This point is expanded upon in the section Unit Testing.

logo

Research software#

As well as advantaging individual researchers testing also benefits research as a whole. It makes research more reproducible by answering the question “how do we even know this code works”. If tests are never saved, just done and deleted the proof cannot be reproduced easily.

Testing also helps prevent valuable grant money being spent on projects that may be partly or wholly flawed due to mistakes in the code. Worse, if mistakes are not at found and the work is published, any subsequent work that builds upon the project will be similarly flawed.

Perhaps the cleanest expression of why testing is important for research as a whole can be found in the Software Sustainability Institute slogan: better software, better research.

logo logo

General guidance and good practice for testing#

There are several different kinds of testing which each have best practice specific to them (see Types of Testing). Nevertheless, there is some general guidance that applies to all of them, which will be outlined here.

Write Tests - Any Tests!#

Starting the process of writing tests can be overwhelming, especially if you have a large code base. Further to that, as mentioned, there are many kinds of tests, and implementing all of them can seem like an impossible mountain to climb. That is why the single most important piece of guidance in this chapter is as follows: write some tests. Testing one tiny thing in a code that’s thousands of lines long is infinitely better than testing nothing in a code that’s thousands of lines long. You may not be able to do everything, but doing something is valuable.

Make improvements where you can, and do your best to include tests with new code you write even if it’s not feasible to write tests for all the code that’s already written.

logo

Run the tests#

The second most important piece of advice in this chapter: run the tests. Having a beautiful, perfect test suite is no use if you rarely run it. Leaving long gaps between test runs makes it more difficult to track down what has gone wrong when a test fails because, a lot of the code will have changed. Also, if it has been weeks or months since tests have been run and they fail, it is difficult or impossible to know which results that have been obtained in the mean time are still valid, and which have to be thrown away as they could have been impacted by the bug.

logo

It is best to automate your testing as far as possible. If each test needs to be run individually then that boring painstaking process is likely to get neglected. This can be done by making use of a testing framework (discussed later). Ideally set your tests up to run at regular intervals, possibly every night.

Consider setting up continuous integration (discussed in the continuous integration sesssion) on your project. This will automatically run your tests each time you make a change to your code and, depending on the continuous integration software you use, will notify you if any of the tests fail.

Consider how long it takes your tests to run#

Some tests, like Unit Testing only test a small piece of code and so typically are very fast. However other kinds of tests, such as System Testing which test the entire code from end to end, may take a long time to run depending on the code. As such it can be obstructive to run the entire test suite after each little bit of work.

logo

In that case it is better to run lighter weight tests such as unit tests frequently, and longer tests only once per day, overnight. It is also good to scale the number of each kind of tests you have in relation to how long they take to run. You should have a lot of unit tests (or other types of tests that are fast) but much fewer tests which take a long time to run.

Document the tests and how to run them#

It is important to provide documentation that describes how to run the tests, both for yourself in case you come back to a project in the future, and for anyone else that may wish to build upon or reproduce your work.

logo

This documentation should also cover subjects such as:

  • any resources, such as test dataset files that are required

  • any configuration/settings adjustments needed to run the tests

  • what software (such as testing frameworks) need to be installed

Ideally, you would provide scripts to set up and configure any resources that are needed.

Test Realistic Cases#

Make the cases you test as realistic as possible. If for example, you have dummy data to run tests on you should make sure that data is as similar as possible to the actual data. If your actual data is messy with a lot of null values, so should your test dataset be.

Use a Testing Framework#

There are tools available to make writing and running tests easier, these are known as testing frameworks. Find one you like, learn about the features it offers, and make use of them. A very common testing framework for python is pytest.

Coverage#

Code coverage is a measure of how much of your code is “covered” by tests. More precisely it a measure of how much of your code is run when tests are conducted. So for example, if you have an if statement but only test things where that if statement evaluates to “False” then none of the code in the if block will be run. As a result your code coverage would be < 100%. Code coverage doesn’t include documentation like comments, so adding more documentation doesn’t affect your percentages.

Code coverage gauges how much code your tests run, aiming for as close to 100% as possible without counting documentation. High coverage is ideal, though any testing is beneficial. Various tools and bots measure this across programming languages, e.g. pytest for python. Beware the illusion of good coverage; thorough testing involves multiple scenarios for the same code, emphasizing testing smaller code chunks for precise logic validation. Testing the same code multiple ways is encouraged for comprehensive assessment.

Use test doubles/stubs/mocking where appropriate#

Use test doubles like stubs or mocks for isolating code in tests. Ensure tests make it easy to pinpoint failures, which can be hard when code depends on external factors like internet connections or objects. For example, a web interaction test might fail due to internet issues, not code bugs. Similarly, a test involving an object might fail because of the object itself, which should have its own tests. Eliminate these dependencies with test doubles, which come in several types:

  • Dummy objects are placeholders that aren’t actually used in testing beyond filling method parameters.

  • Fake objects have simplified, functional implementations, like an in-memory database instead of a real one.

  • Stubs provide partial implementations to respond only to specific test cases and might record call information.

  • Mocks simulate interfaces or classes, with predefined outputs for method calls, often recording interactions for test validation.

Test doubles replace real dependencies, making tests more focused and reliable. Mocks can be hand-coded or generated with mock frameworks, which allow dynamic behavior definition. A common mock example is a data provider, where a mock simulates the data source to ensure consistent test conditions, contrasting with the real data source used in production.

Overview of Testing Types#

There are a number of different kinds of tests, which will be briefly discussed in the follwing.

Firstly, there are positive tests and negative tests. Positive tests check that something works, for example, testing that a function that multiplies some numbers together outputs the correct answer. Negative tests check that something generates an error when it should. For example, nothing can go quicker than the speed of light, so a plasma physics simulation code may contain a test that an error is outputted if there are any particles faster than this, as it indicates there is a deeper problem in the code.

In addition to these two kinds of tests, there are also different levels of tests which test different aspects of a project. These levels are outlined below and both positive and negative tests can be present at any of these levels. A thorough test suite will contain tests at all of these levels (though some levels will need very few).

However, before we will check out the different test options, we have to talk about one aspect that is central to all: assert.

Assert - a test’s best friend#

In order to check and evaluate if a certain piece of code is doing what it is supposed to do, in a reliable manner, we need a way of asserting what the “correct output” should be and testing the outcome we get against it.

In Python, assert is a statement used to test whether a condition is true. If the condition is true, the program continues to execute as normal. If the condition is false, the program raises an AssertionError exception and optionally can display an accompanying message. The primary use of assert is for debugging and testing purposes, where it helps to catch errors early by ensuring that certain conditions hold at specific points in the code.

Syntax#

The basic syntax of an assert statement is:

assert condition, "Optional error message"
  • condition: This is the expression to be tested. If the condition evaluates to True, nothing happens, and the program continues to execute. If it evaluates to False, an AssertionError is raised.

  • "Optional error message": This is the message that is shown when the condition is false. This message is optional, but it’s helpful for understanding why the assertion failed.

Usage in Testing#

In the context of testing, assert statements are used to verify that a function or a piece of code behaves as expected. They are a simple yet powerful tool for writing test cases, where you check the outcomes of various functions under different inputs. Here’s how you might use assert in a test:

  • Checking Function Outputs: To verify that a function returns the expected value.

  • Validating Data Types: To ensure that variables or return values are of the correct type.

  • Testing Invariants: To check conditions that should always be true in a given context.

  • Comparing Data Structures: To ensure that lists, dictionaries, sets, etc., contain the expected elements.

A simple example#

Here’s a simple example demonstrating how assert might be used in a test case:

def add(a, b):
    return a + b

# Test case for the add function
def test_add():
    result = add(2, 3)
    assert result == 5, "Expected add(2, 3) to be 5"

test_add()  # This will pass silently since 2 + 3 is indeed 5

If add(2, 3) did not return 5, the assert statement would raise an AssertionError with the message "Expected add(2, 3) to be 5".

Best Practices#

  • Use for Testing: Leverage assert primarily in testing frameworks or during the debugging phase, not as a mechanism for handling runtime errors in production code.

  • Clear Messages: Include clear, descriptive messages with assert statements to make it easier to identify the cause of a test failure.

  • Test Precisely: Each assert should test one specific aspect of your code’s behavior to make diagnosing issues straightforward.

Runtime testing#

Runtime tests are tests that run as part of the program itself. They may take the form of checks within the code, as shown below:

For example, we could use the following runtime tests to test the first block of our analysis_pipeline.py script:

import requests, zipfile
from io import BytesIO

url = 'https://gitlab.com/julia-pfarr/nowaschool/-/raw/main/school/materials/CI_CD/crtt.zip?ref_type=heads'
extract_to_path = '/Users/peerherholz/Desktop/'

req = requests.get(url)
if req.status_code == 200:
    print('Downloading Completed')
    with zipfile.ZipFile(BytesIO(req.content)) as zfile:
        zfile.extractall(extract_to_path)
else:
    print('Download failed.')
Downloading Completed

Advantages of runtime testing:#

  • run within the program, so can catch problems caused by logic errors or edge cases

  • makes it easier to find the cause of the bug by catching problems early

  • catching problems early also helps prevent them escalating into catastrophic failures. It minimises the blast radius.

Disadvantages of runtime testing:#

  • tests can slow down the program

  • what is the right thing to do if an error is detected? How should this error be reported? Exceptions are a recommended route to go with this.

Smoke tests#

Very brief initial checks that ensures the basic requirements required to run the project hold. If these fail there is no point proceeding to additional levels of testing until they are fixed.

For example, we could use the following smoke tests to test the first block of our analysis_pipeline.py script:

import requests
from zipfile import ZipFile, BadZipFile
from io import BytesIO
import os

def test_download_and_extraction(url, extraction_path):
    """
    Test downloading a ZIP file from a URL and extracting it to a specified path.

    Args:
    - url (str): URL of the ZIP file to download.
    - extraction_path (str): The filesystem path where the ZIP file contents will be extracted.
    """

    # 1. URL Accessibility
    response = requests.head(url)
    assert response.status_code == 200, "URL is not accessible or does not exist"
    #assert 'application/zip' in response.headers['Content-Type'], "URL does not point to a ZIP file"

    # 2. Successful Download
    response = requests.get(url)
    assert response.status_code == 200, "Failed to download the file"

    # 3. Correct File Type and Extraction
    try:
        with ZipFile(BytesIO(response.content)) as zipfile:
            zipfile.extractall(extraction_path)
        assert True  # If extraction succeeds
    except BadZipFile:
        assert False, "Downloaded file is not a valid ZIP archive"

    # 4. Check Extracted Files
    extracted_files = os.listdir(extraction_path)
    assert len(extracted_files) > 0, "No files were extracted"

    print(f"Test passed: Downloaded and extracted ZIP file to {extraction_path}")

# Example usage
url = 'https://gitlab.com/julia-pfarr/nowaschool/-/raw/main/school/materials/CI_CD/crtt.zip?ref_type=heads'
extraction_path = '/Users/peerherholz/Desktop/'

test_download_and_extraction(url, extraction_path)
Test passed: Downloaded and extracted ZIP file to /Users/peerherholz/Desktop/

Unit tests#

A level of the software testing process where individual units of a software are tested. The purpose is to validate that each unit of the software performs as designed.

For example, we could use the following unit tests to test the second block of our analysis_pipeline.py script:

import pandas as pd

def test_data_conversion():
    def process_data(df, columns_select):
        # Assuming df is the DataFrame before conversion
        data_loaded_sub_part = df[columns_select]
        # Insert more DF operations if needed
        return data_loaded_sub_part

    # Load the raw data (before conversion)
    raw_data_path = '/Users/peerherholz/Desktop/choice_rtt/sourcedata/sub-01/ses-post/01_post_crtt_exp_2024-02-02_09h43.24.388.csv'  # Update this path
    raw_data_df = pd.read_csv(raw_data_path, delimiter=',')

    # Columns to select and any other processing details
    columns_select = ['participant_id', 'age', 'left-handed', 'Do you like this session?', 'session', 'TargetImage', 'keyboard_response.corr', 'trialRespTimes']

    # Process the raw data
    processed_data_df = process_data(raw_data_df, columns_select)

    # Load the expected data (after conversion) for comparison
    expected_data_path = '/Users/peerherholz/Desktop/choice_rtt/sub-01/ses-post/beh/sub-01_ses-post_task-ChoiceRTT_beh.tsv'  # Update this path
    expected_data_df = pd.read_csv(expected_data_path, delimiter='\t')

    # Assertions
    assert list(processed_data_df.columns) == list(expected_data_df.columns), "Columns do not match"
    assert processed_data_df.shape == expected_data_df.shape, "DataFrame shapes do not match"
    
    # Compare the first row as dicts
    processed_first_row = processed_data_df.iloc[0].to_dict()
    expected_first_row = expected_data_df.iloc[0].to_dict()
    
    for key in processed_first_row:
        if isinstance(processed_first_row[key], float):
            assert abs(processed_first_row[key] - expected_first_row[key]) < 1e-5, f"Row values do not match for column {key}"
        else:
            assert processed_first_row[key] == expected_first_row[key], f"Row values do not match for column {key}"

Unit Testing Tips#

  • many testing frameworks have tools specifically geared towards writing and running unit tests, pytest does as well

  • isolate the development environment from the test environment

  • write test cases that are independent of each other. For example, if a unit A utilises the result supplied by another unit B, you should test unit A with a test double, rather than actually calling the unit B. If you don’t do this your test failing may be due to a fault in either unit A or unit B, making the bug harder to trace.

  • aim at covering all paths through a unit, pay particular attention to loop conditions.

  • in addition to writing cases to verify the behaviour, write cases to ensure the performance of the code. For example, if a function that is supposed to add two numbers takes several minutes to run there is likely a problem.

  • if you find a defect in your code write a test that exposes it. Why? First, you will later be able to catch the defect if you do not fix it properly. Second, your test suite is now more comprehensive. Third, you will most probably be too lazy to write the test after you have already fixed the defect.

Integration tests#

A level of software testing where individual units are combined and tested as a group. The purpose of this level of testing is to expose faults in the interaction between integrated units.

Integration Testing Approaches#

There are several different approaches to integration testing.

  • Big Bang: an approach to integration testing where all or most of the units are combined together and tested at one go. This approach is taken when the testing team receives the entire software in a bundle. So what is the difference between Big Bang integration testing and system testing? Well, the former tests only the interactions between the units while the latter tests the entire system.

  • Top Down: an approach to integration testing where top-level sections of the code (that themselves contain many smaller units) are tested first and lower level units are tested step by step after that.

  • Bottom Up: an approach to integration testing where integration between bottom level sections are tested first and upper-level sections step by step after that. Again test stubs should be used, in this case to simulate inputs from higher level sections.

  • Sandwich/Hybrid is an approach to integration testing which is a combination of Top Down and Bottom Up approaches.

Which approach you should use will depend on which best suits the nature/structure of your project.

For example, we could use the following integration test to test the first and second block of our analysis_pipeline.py script:

import requests
import zipfile
from io import BytesIO
import os
import pandas as pd
from glob import glob

import pytest
from analysis_pipeline import download_and_extract_data, convert_data
import os
from tempfile import TemporaryDirectory
import pandas as pd


def download_and_extract_data(url, extract_to_path):
    print('Downloading started')
    response = requests.get(url)
    print('Downloading Completed')
    with zipfile.ZipFile(BytesIO(response.content)) as zfile:
        zfile.extractall(extract_to_path)


def convert_data(source_dir, target_dir):
    data_files = glob(os.path.join(source_dir, '*'))
    columns_select = ['participant_id', 'age', 'left-handed', 'Do you like this session?', 'session', 'TargetImage', 'keyboard_response.corr', 'trialRespTimes']

    for index, participant in enumerate(data_files):
        print(f'Working on {participant}, file {index+1}/{len(data_files)}')
        data_loaded_part = pd.read_csv(participant, delimiter=',')
        data_loaded_sub_part = data_loaded_part[columns_select]
        # Additional processing...
        # Save converted data
        output_file = os.path.join(target_dir, os.path.basename(participant))
        data_loaded_sub_part.to_csv(output_file, sep='\t', index=False)


def test_download_and_data_conversion():
    with TemporaryDirectory() as tmp_dir:
        download_dir = os.path.join(tmp_dir, "download")
        os.makedirs(download_dir, exist_ok=True)
        convert_dir = os.path.join(tmp_dir, "convert")
        os.makedirs(convert_dir, exist_ok=True)

        test_url = 'https://example.com/test_data.zip'
        
        # Block 1: Download and extract
        download_and_extract_data(test_url, download_dir)
        extracted_files = os.listdir(download_dir)
        assert extracted_files, "Download or extraction failed."
        
        # Example additional check: Verify the extracted file names or types
        # This step assumes you know what files you're expecting
        expected_files = ['data1.csv', 'data2.csv']  # Example expected files
        assert all(file in extracted_files for file in expected_files), "Missing expected files after extraction."

        # Block 2: Convert data
        convert_data(download_dir, convert_dir)
        converted_files = os.listdir(convert_dir)
        assert converted_files, "Data conversion failed."

        # Content validation for one of the converted files
        # This assumes you know the structure of the converted data
        sample_converted_file = os.path.join(convert_dir, converted_files[0])
        df = pd.read_csv(sample_converted_file, sep='\t')
        
        # Check if specific columns are present in the converted file
        expected_columns = ['participant_id', 'age', 'left-handed', 'Do you like this session?', 'session', 'stim_file', 'response', 'response_time']
        assert all(column in df.columns for column in expected_columns), "Converted file missing expected columns."

        # Basic content check: Ensure no empty rows for key columns
        assert df['participant_id'].notnull().all(), "Null values found in 'participant_id' column."
        assert df['session'].notnull().all(), "Null values found in 'session' column."

        # Example of performance metric (very basic)
        import time
        start_time = time.time()
        convert_data(download_dir, convert_dir)
        end_time = time.time()
        assert (end_time - start_time) < 60, "Conversion took too long."

Integration Testing Tips#

Ensure that you have a proper Detail Design document where interactions between each unit are clearly defined. It is difficult or impossible to perform integration testing without this information.

Make sure that each unit is unit tested and fix any bugs before you start integration testing. If there is a bug in the individual units then the integration tests will almost certainly fail even if there is no error in how they are integrated.

Use mocking/stubs where appropriate.

System tests#

A level of the software testing process where a complete, integrated system is tested. The purpose of this test is to evaluate whether the system as a whole gives the correct outputs for given inputs.

System Testing Tips#

System tests, also called end-to-end tests, run the program, well, from end to end. As such these are the most time consuming tests to run. Therefore you should only run these if all the lower-level tests (smoke, unit, integration) have already passed. If they haven’t, fix the issues they have detected first before wasting time running system tests.

Because of their time-consuming nature it will also often be impractical to have enough system tests to trace every possible route through a program, especially if there are a significant number of conditional statements. Therefore you should consider the system test cases you run carefully and prioritise:

  • the most common routes through a program

  • the most important routes for a program

  • cases that are prone to breakage due to structural problems within the program. Though ideally it’s better to just fix those problems, but cases exist where this may not be feasible.

Because system tests can be time consuming it may be impractical to run them very regularly (such as multiple times a day after small changes in the code). Therefore it can be a good idea to run them each night (and to automate this process) so that if errors are introduced that only system testing can detect, the developer(s) will be made aware of them relatively quickly.

For example, we could use the following system test to test our analysis_pipeline.py script:

import pytest
import subprocess
from tempfile import TemporaryDirectory
import os
import pandas as pd

def run_pipeline(script_path, data_dir, output_dir):
    """Executes the analysis pipeline script."""
    subprocess.run(['python', script_path, '--data-dir', data_dir, '--output-dir', output_dir], check=True)

@pytest.mark.system
def test_analysis_pipeline():
    with TemporaryDirectory() as tmp_dir:
        data_dir = os.path.join(tmp_dir, "data")
        os.makedirs(data_dir, exist_ok=True)
        output_dir = os.path.join(tmp_dir, "output")
        os.makedirs(output_dir, exist_ok=True)

        script_path = 'analysis_pipeline.py'  # Update if your script is in a different location
        run_pipeline(script_path, data_dir, output_dir)

        # Specific check 1: Verify the structure of output CSV files
        output_files = [f for f in os.listdir(output_dir) if f.endswith('.csv')]
        assert output_files, "No CSV output files found after running the pipeline."

        for output_file in output_files:
            df = pd.read_csv(os.path.join(output_dir, output_file))
            expected_columns = ['participant_id', 'age', 'left-handed', 'session', 'stim_file', 'response', 'response_time', 'trial_type', 'trial']
            assert all(column in df.columns for column in expected_columns), f"Missing expected columns in {output_file}."

            # Specific check 2: Verify data integrity, e.g., non-negative ages and response times
            assert (df['age'] >= 0).all(), f"Negative values found in 'age' column of {output_file}."
            assert (df['response_time'] >= 0).all(), f"Negative values found in 'response_time' column of {output_file}."

        # Additional check: Verify summary statistics in analysis_results.txt (if applicable)
        summary_file_path = os.path.join(output_dir, 'analysis_results.txt')
        if os.path.exists(summary_file_path):
            with open(summary_file_path, 'r') as file:
                summary_contents = file.read()
                # Example: Check for a specific summary statistic
                assert "Mean response time:" in summary_contents, "Expected summary statistic 'Mean response time' not found in analysis results."

                # Further parsing and checks can be added based on the expected format and content of the summary statistics

Acceptance and regression tests#

A level of the software testing process where a system is tested for acceptability. The purpose of this test is to evaluate the system’s compliance with the project requirements and assess whether it is acceptable for the purpose.

Acceptance testing#

Acceptance tests are one of the last tests types that are performed on software prior to delivery. Acceptance testing is used to determine whether a piece of software satisfies all of the requirements from user’s perspective. Does this piece of software do what it needs to do? These tests are sometimes built against the original specification.

Because research software is typically written by the researcher that will use it (or at least with significant input from them) acceptance tests may not be necessary.

Regression testing#

Regression testing checks for unintended changes by comparing new test results to previous ones, ensuring updates don’t break the software. It’s critical because even unrelated code changes can cause issues. Suitable for all testing levels, it’s vital in system testing and can automate tedious manual checks. Tests are created by recording outputs for specific inputs, then retesting and comparing results to detect discrepancies. Essential for team projects, it’s also crucial for solo work to catch self-introduced errors.

Regression testing approaches differ in their focus. Common examples include:

  • Bug regression: retest a specific bug that has been allegedly fixed

  • Old fix regression testing: retest several old bugs that were fixed, to see if they are back. (This is the classical notion of regression: the program has regressed to a bad state.)

  • General functional regression: retest the project broadly, including areas that worked before, to see whether more recent changes have destabilized working code.

  • Conversion or port testing: the program is ported to a new platform and a regression test suite is run to determine whether the port was successful.

  • Configuration testing: the program is run with a new device or on a new version of the operating system or in conjunction with a new application. This is like port testing except that the underlying code hasn’t been changed–only the external components that the software under test must interact with.

For example, we could use the following regression test to test our analysis_pipeline.py script:

import pytest
import subprocess
from tempfile import TemporaryDirectory
import os
import pandas as pd
import filecmp
import difflib

SCRIPT_PATH = 'path/to/your/analysis_pipeline.py'  # Update this path
BASELINE_DIR = 'path/to/your/baseline_data'  # Directory containing baseline results

def run_pipeline(script_path, data_dir, output_dir):
    """Executes the analysis pipeline script."""
    subprocess.run(['python', script_path, '--data-dir', data_dir, '--output-dir', output_dir], check=True)

def compare_files(file1, file2):
    """Compares two files line by line."""
    with open(file1, 'r') as f1, open(file2, 'r') as f2:
        diff = difflib.unified_diff(
            f1.readlines(), f2.readlines(),
            fromfile='baseline', tofile='current',
        )
        diff_list = list(diff)
        if diff_list:
            print('Differences found:\n', ''.join(diff_list))
        return not diff_list

@pytest.mark.regression
def test_pipeline_against_baseline():
    with TemporaryDirectory() as tmp_dir:
        data_dir = os.path.join(tmp_dir, "data")
        output_dir = os.path.join(tmp_dir, "output")
        os.makedirs(data_dir, exist_ok=True)
        os.makedirs(output_dir, exist_ok=True)

        # Assuming the input data is prepared in data_dir

        run_pipeline(SCRIPT_PATH, data_dir, output_dir)

        # Compare each output file against its baseline counterpart
        for baseline_file in os.listdir(BASELINE_DIR):
            baseline_path = os.path.join(BASELINE_DIR, baseline_file)
            current_path = os.path.join(output_dir, baseline_file)

            assert os.path.exists(current_path), f"Expected output file {baseline_file} not found in current run."

            # Compare files (could be CSV, TXT, etc.)
            assert compare_files(baseline_path, current_path), f"File {baseline_file} does not match baseline."

Testing frameworks#

Testing frameworks are essential in the software development process, enabling developers to ensure their code behaves as expected. These frameworks facilitate various types of testing, such as unit testing, integration testing, functional testing, regression testing, and performance testing. By automating the execution of tests, verifying outcomes, and reporting results, testing frameworks help improve code quality and software stability.

Key Features of Testing Frameworks#

  • Test Organization: Helps structure and manage tests effectively.

  • Fixture Management: Supports setup and teardown operations for tests.

  • Assertion Support: Provides tools for verifying test outcomes.

  • Automated Test Discovery: Automatically identifies and runs tests.

  • Mocking and Patching: Allows isolation of the system under test.

  • Parallel Test Execution: Reduces test suite execution time.

  • Extensibility: Offers customization through plugins and hooks.

  • Reporting: Generates detailed reports on test outcomes.

pytest#

pytest is a powerful testing framework for Python that is easy to start with but also supports complex functional testing. It is known for its simple syntax, detailed assertion introspection, automatic test discovery, and a wide range of plugins and integrations.

Running pytest#

There are various options to run pytest. Let’s start with the easiest one, running all tests written in a specific directory.
At first, you need to ensure pytest is installed in your computational environment. If not, install it using:

%%bash
pip install pytest

Additionally, you have to make sure that all your tests are placed in a dedicated directory and that their filenames follows one of these patterns: test_*.py or *_test.py.

Following the structure we discussed in the RDM session, they should ideally be placed in the code directory. Thus, let’s create a respective test directory there.

import os

os.makedirs('/Users/peerherholz/Desktop/choice_rtt/code/tests', exist_ok=True)

Next, we will create our test files and save them in the test directory.

%%writefile /Users/peerherholz/Desktop/choice_rtt/code/tests/test_download.py

import requests
from zipfile import ZipFile, BadZipFile
from io import BytesIO
import os

def test_download_and_extraction():
    """
    Test downloading a ZIP file from a URL and extracting it to a specified path.

    Args:
    - url (str): URL of the ZIP file to download.
    - extraction_path (str): The filesystem path where the ZIP file contents will be extracted.
    """

    url = 'https://gitlab.com/julia-pfarr/nowaschool/-/raw/main/school/materials/CI_CD/crtt.zip?ref_type=heads'
    extraction_path = '/Users/peerherholz/Desktop/'
    
    # 1. URL Accessibility
    response = requests.head(url)
    assert response.status_code == 200, "URL is not accessible or does not exist"
    #assert 'application/zip' in response.headers['Content-Type'], "URL does not point to a ZIP file"

    # 2. Successful Download
    response = requests.get(url)
    assert response.status_code == 200, "Failed to download the file"

    # 3. Correct File Type and Extraction
    try:
        with ZipFile(BytesIO(response.content)) as zipfile:
            zipfile.extractall(extraction_path)
        assert True  # If extraction succeeds
    except BadZipFile:
        assert False, "Downloaded file is not a valid ZIP archive"

    # 4. Check Extracted Files
    extracted_files = os.listdir(extraction_path)
    assert len(extracted_files) > 0, "No files were extracted"

    print(f"Test passed: Downloaded and extracted ZIP file to {extraction_path}")
Writing /Users/peerherholz/Desktop/choice_rtt/code/tests/test_download.py
%%writefile /Users/peerherholz/Desktop/choice_rtt/code/tests/test_conversion.py


import pandas as pd

def test_data_conversion():
    def process_data(df, columns_select):
        # Assuming df is the DataFrame before conversion
        data_loaded_sub_part = df[columns_select]
        # Insert more DF operations if needed
        return data_loaded_sub_part

    # Load the raw data (before conversion)
    raw_data_path = '/Users/peerherholz/Desktop/choice_rtt/sourcedata/sub-01/ses-post/01_post_crtt_exp_2024-02-02_09h43.24.388.csv'  # Update this path
    raw_data_df = pd.read_csv(raw_data_path, delimiter=',')

    # Columns to select and any other processing details
    columns_select = ['participant_id', 'age', 'left-handed', 'Do you like this session?', 'session', 'TargetImage', 'keyboard_response.corr', 'trialRespTimes']

    # Process the raw data
    processed_data_df = process_data(raw_data_df, columns_select)

    # Load the expected data (after conversion) for comparison
    expected_data_path = '/Users/peerherholz/Desktop/choice_rtt/sub-01/ses-post/beh/sub-01_ses-post_task-ChoiceRTT_beh.tsv'  # Update this path
    expected_data_df = pd.read_csv(expected_data_path, delimiter='\t')

    # Assertions
    assert list(processed_data_df.columns) == list(expected_data_df.columns), "Columns do not match"
    assert processed_data_df.shape == expected_data_df.shape, "DataFrame shapes do not match"
    
    # Compare the first row as dicts
    processed_first_row = processed_data_df.iloc[0].to_dict()
    expected_first_row = expected_data_df.iloc[0].to_dict()
    
    for key in processed_first_row:
        if isinstance(processed_first_row[key], float):
            assert abs(processed_first_row[key] - expected_first_row[key]) < 1e-5, f"Row values do not match for column {key}"
        else:
            assert processed_first_row[key] == expected_first_row[key], f"Row values do not match for column {key}"
Writing /Users/peerherholz/Desktop/choice_rtt/code/tests/test_conversion.py

Now, navigate to your test directory (provide the path to it) and run pytest via:

os.chdir('/Users/peerherholz/Desktop/choice_rtt/code/tests')
%%bash
pytest
============================= test session starts ==============================
platform darwin -- Python 3.7.0, pytest-7.4.4, pluggy-1.2.0
rootdir: /Users/peerherholz/Desktop/choice_rtt/code/tests
plugins: anyio-3.5.0
collected 2 items

test_conversion.py F                                                     [ 50%]
test_download.py .                                                       [100%]

=================================== FAILURES ===================================
_____________________________ test_data_conversion _____________________________

    def test_data_conversion():
        def process_data(df, columns_select):
            # Assuming df is the DataFrame before conversion
            data_loaded_sub_part = df[columns_select]
            # Insert more DF operations if needed
            return data_loaded_sub_part
    
        # Load the raw data (before conversion)
        raw_data_path = '/Users/peerherholz/Desktop/choice_rtt/sourcedata/sub-01/ses-post/01_post_crtt_exp_2024-02-02_09h43.24.388.csv'  # Update this path
        raw_data_df = pd.read_csv(raw_data_path, delimiter=',')
    
        # Columns to select and any other processing details
        columns_select = ['participant_id', 'age', 'left-handed', 'Do you like this session?', 'session', 'TargetImage', 'keyboard_response.corr', 'trialRespTimes']
    
        # Process the raw data
        processed_data_df = process_data(raw_data_df, columns_select)
    
        # Load the expected data (after conversion) for comparison
        expected_data_path = '/Users/peerherholz/Desktop/choice_rtt/sub-01/ses-post/beh/sub-01_ses-post_task-ChoiceRTT_beh.tsv'  # Update this path
        expected_data_df = pd.read_csv(expected_data_path, delimiter='\t')
    
        # Assertions
>       assert list(processed_data_df.columns) == list(expected_data_df.columns), "Columns do not match"
E       AssertionError: Columns do not match
E       assert ['participant...etImage', ...] == ['participant...im_file', ...]
E         At index 5 diff: 'TargetImage' != 'stim_file'
E         Right contains 2 more items, first extra item: 'trial_type'
E         Use -v to get more diff

test_conversion.py:27: AssertionError
=============================== warnings summary ===============================
../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:10
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:10: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    _nlv = LooseVersion(_np_version)

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:11
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:11: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    _np_version_under1p16 = _nlv < LooseVersion("1.16")

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:12
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:12: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    _np_version_under1p17 = _nlv < LooseVersion("1.17")

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:13
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:13: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    _np_version_under1p18 = _nlv < LooseVersion("1.18")

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:14
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:14: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    _np_version_under1p19 = _nlv < LooseVersion("1.19")

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:15
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/__init__.py:15: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    _np_version_under1p20 = _nlv < LooseVersion("1.20")

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/setuptools/_distutils/version.py:351
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/setuptools/_distutils/version.py:351: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    other = LooseVersion(other)

../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/function.py:125
../../../../anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/function.py:125
  /Users/peerherholz/anaconda3/envs/neuro_ai/lib/python3.7/site-packages/pandas/compat/numpy/function.py:125: DeprecationWarning: distutils Version classes are deprecated. Use packaging.version instead.
    if LooseVersion(_np_version) >= LooseVersion("1.17.0"):

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
=========================== short test summary info ============================
FAILED test_conversion.py::test_data_conversion - AssertionError: Columns do not match
=================== 1 failed, 1 passed, 9 warnings in 2.36s ====================
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
Cell In[52], line 1
----> 1 get_ipython().run_cell_magic('bash', '', 'pytest\n')

File ~/anaconda3/envs/nowaschool/lib/python3.10/site-packages/IPython/core/interactiveshell.py:2517, in InteractiveShell.run_cell_magic(self, magic_name, line, cell)
   2515 with self.builtin_trap:
   2516     args = (magic_arg_s, cell)
-> 2517     result = fn(*args, **kwargs)
   2519 # The code below prevents the output from being displayed
   2520 # when using magics with decorator @output_can_be_silenced
   2521 # when the last Python token in the expression is a ';'.
   2522 if getattr(fn, magic.MAGIC_OUTPUT_CAN_BE_SILENCED, False):

File ~/anaconda3/envs/nowaschool/lib/python3.10/site-packages/IPython/core/magics/script.py:154, in ScriptMagics._make_script_magic.<locals>.named_script_magic(line, cell)
    152 else:
    153     line = script
--> 154 return self.shebang(line, cell)

File ~/anaconda3/envs/nowaschool/lib/python3.10/site-packages/IPython/core/magics/script.py:314, in ScriptMagics.shebang(self, line, cell)
    309 if args.raise_error and p.returncode != 0:
    310     # If we get here and p.returncode is still None, we must have
    311     # killed it but not yet seen its return code. We don't wait for it,
    312     # in case it's stuck in uninterruptible sleep. -9 = SIGKILL
    313     rc = p.returncode or -9
--> 314     raise CalledProcessError(rc, cell)

CalledProcessError: Command 'b'pytest\n'' returned non-zero exit status 1.

pytest will automatically discover tests within any files that match the pattern described above in the directory and its subdirectories.

You can also run specific tests, e.g. tests from a specific file or those matching a certain pattern.
You can specify the file like so:

%%bash
pytest /Users/peerherholz/Desktop/choice_rtt/code/tests/test_download.py
============================= test session starts ==============================
platform darwin -- Python 3.7.0, pytest-7.4.4, pluggy-1.2.0
rootdir: /Users/peerherholz/Desktop/choice_rtt/code/tests
plugins: anyio-3.5.0
collected 1 item

test_download.py .                                                       [100%]

============================== 1 passed in 0.90s ===============================

or run tests matching a name pattern like so:

pytest -k "pattern"

You can also run tests marked with a Custom Marker: If you’ve used custom markers to decorate your tests (e.g., @pytest.mark.regression), you can run only the tests with that marker:

pytest -m markername

Writing Basic Tests with pytest#

Tests in pytest are simple to write. Starting with test functions, as tests grow, pytest provides a rich set of features for more complex scenarios.

def test_example():
    assert 1 + 1 == 2

Using Fixtures for Setup and Teardown#

pytest fixtures define setup and teardown logic for tests, ensuring tests run under controlled conditions.

import pytest

@pytest.fixture
def sample_data():
    return [1, 2, 3, 4, 5]

def test_sum(sample_data):
    assert sum(sample_data) == 15

Parameterizing Tests#

pytest allows running a single test function with different inputs using @pytest.mark.parametrize.

import pytest

@pytest.mark.parametrize("a,b,expected", [(1, 1, 2), (2, 3, 5), (3, 3, 6)])
def test_addition(a, b, expected):
    assert a + b == expected

Conclusion:#

Embracing testing and testing frameworks like pytest and incorporating a comprehensive testing strategy are essential steps towards achieving high-quality software development. These frameworks not only automate the testing process but also provide a structured approach to addressing a wide spectrum of testing requirements. By leveraging their capabilities, researchers and software developers can ensure thorough test coverage, streamline debugging, and maintain high standards of software quality and `performance``.

Task for y’all!

Remember our script from the beginning? You already went through it a couple of times and brought to code (get it?). Now, we would like to add some tests for our script to ensure its functionality.

  1. Add tests that check if the dataset was downloaded and unzipped properly, as well as if the DataFrames have the correct shape. (Make sure to look at the Intro to data handling section again.

  2. Add tests that check if the DataFrame has the right amount and types of columns after the conversion and if the first few columns contain the expected values.

You have 40 min.