Metadata#
Objectives📍
general Metadata (bibliographic/administrative Metadata)
content and field specific Metadata (BIDS)
Metadata = data about data#
We all know and use metadata every day. See for yourself in this short video which kind of metadata you already know:
To get a sense of what Metadata for a research project should be you can simply ask yourself: “What would someone unfamiliar with your data (and possibly your research) need in order to find, evaluate, understand, and reuse them?
Task
Let’s imagine someone gives you a value of their data which only says “Temperature 31.5”. What questions would you ask this researcher in order to find out what “Temperature 31.5” means and be able to reuse it or make a decision about if you can reuse it for your own research?
Possible questions
Temperature 31.5…
of what?
location?
in what unit?
is this value averaged?
collected how?
collected when?
precision/accuracy?
according to whom?
has anyone checked the quality of this value?
Note: Most of the following content was copied from The Turing Way Handbook under a CC-BY 4.0 licence.
General metadata for a research project#
Having data available is of no use if it cannot be understood. Without metadata to provide provenance and context, the data can’t be used effectively. For example, a table of numbers is useless if no headings describe what the columns/rows contain. Therefore you should ensure that open datasets include consistent metadata, that is information about the data so that the data is fully described. This requires that information accompanying data is captured in documentation and metadata.
Documentation provides context for your work. It allows your collaborators, colleagues and future you to understand what has been done and why.
Data documentation can be done on different levels. All documentation accompanying data should be written in clear, plain language. Documentation allows data users have sufficient information to understand the source, strengths, weaknesses, and analytical limitations of the data so that they can make informed decisions when using it.
Metadata is information about the data, descriptors that facilitate cataloguing data and data discovery. Often, metadata are intended for machine reading while documentation is mostly written for human reading.
When data is submitted to a trusted data repository, the machine-readable metadata is generated by the repository. If the data is not in a repository a text file with machine-readable metadata can be added as part of the documentation.
The type of research and the nature of the data also influence what kind of documentation is necessary.
The level of documentation and metadata will vary according to the project, and the range of people the data needs to be understood by.
Examples of documentation may include items like data dictionaries (see here for a template) or codebooks, protocols, logbooks or lab journals, README files, research logs, analysis syntax, algorithms and code comments.
Variables should be defined and explained using data dictionaries or codebooks.
Data should be stored in logical and hierarchical folder structures, with a README file used to describe the structure. The README file is helpful for others and will also help you find your data in the future. See the README template from Cornell for an example.
It is best practice to use recognized community metadata standards to make it easier for datasets to be combined.
Tagging#
Tags are keywords assigned to files, and a way to add metadata to a file to organise them more flexibly. While a file can only be in one folder at a time, it can have an unlimited number of tags.
Some tips include:
Use short tag names (one or two words)
Be consistent with tags
Not all file formats allow tags, and when files are transferred tags may be stripped
See Tagging and Finding Your Files by MIT libraries for more information.
Task
Go to your OSF project and insert the general Metadata. Give your project some tags.
Task
Open the README.md file in your local folder and write a documentation for our project with the following information:
Author, co-authors, collaborators
experiment description
Method description for collecting or generating the data, as well as the methods for processing data, if data other than raw data are being contributed (you can, e.g., list the file names and state what the file contains)
contact information
You can visit the Pavlovia GitLab again if you need some information on the task or output.
This README is not complete, yet. But we simply don’t have more information now. It is important to update the README according to the README template from Cornell as soon as you have all this information.
Community Standards - Metadata#
The use of community-defined standards for metadata is vital for reproducible research and allows for the comparison of heterogeneous data from multiple sources, domains and disciplines. Metadata standards are also discipline-specific. Not every discipline may use metadata standards, however. You can see if your discipline uses metadata standards through FAIRsharing, a resource to identify and cite the metadata or identifier schemas, databases or repositories that exist for your data and discipline. There are also situations when researchers make use of more general metadata standards, for example when they use a generic archive to store their data they have to adhere to the metadata standards of the archive.
In this case, a text file with discipline specific metadata can be added as part of the documentation.
Want to learn more about Metadata and Metadata Standards? Watch an introduction video.
BIDS Metadata (field specific metadata)#
In a previous section Project- and Data Organization we learned about BIDS. Thankfully, BIDS is a community-defined standard and provides us with discipline-specific metadata. Compliant with their philosophy of minimizing complexity and maximizing adoption and flexibility, BIDS doesn’t make you annotate your data with pages of metadata. However, it gives you a nomenclature of how metadata is named in which units this metadata is given. It’s easiest to see for yourself in an example:
Task
Go to the BIDS handbook and look through the metadata specification for the behavioral experiment we will work with through the week. Which Metadata do you think we will collect/ we will need? Discussion in 10 minutes.