Cardinality

In our ongoing exploration of using the Semantic Engine to describe your data, there’s one concept we haven’t yet discussed—but it’s an important one: cardinality.

Cardinality refers to the number of values that a data field (specifically an array) can contain. It’s a way of describing how many items you’re expecting to appear in a given field, and it plays a crucial role in data descriptions, verification, and interpretation.

What Is an Array?

Before we talk about cardinality, we need to understand arrays. In data terms, an array is a field that can hold multiple values, rather than just one.

For example, imagine a dataset where you’re recording the languages a person speaks. Some people might speak only one language, while others might speak three or more. Instead of creating separate fields for “language1”, “language2”, and so on, you might store them all in one field as an array.

In an Excel spreadsheet, this might look like:

An example of an array attribute with a list of languages.
An example of an array attribute with a list of languages.

Here, the “Languages” column contains comma-separated lists—an informal representation of an array. Each cell in that column holds more one or more values.

What Is Cardinality?

Once you know you’re dealing with arrays, cardinality describes how many values are expected or allowed.

You can define:

  • Minimum cardinality – the fewest number of values allowed

  • Maximum cardinality – the most number of values allowed

Let’s return to the “Languages” example. If every person must list at least one language, you would set the minimum cardinality to 1. If your system supports a maximum of three languages per person, you would set the maximum cardinality to 3. You can also specify a minimum and maximum (for example a minimum of 1 and a maximum of 5).

Why Does Cardinality Matter?

Cardinality helps verify data ensuring that each entry meets the expected structure, and supports machine-readable data because the Semantic Engine supports cardinality descriptions of research data.

Cardinality is a simple but essential concept when working with arrays of data. Whether you’re developing survey responses, cataloguing plant attributes, or managing research metadata, specifying cardinality ensures that your data behaves as expected.

In short: if your data field can hold more than one value, cardinality lets you define how many it should hold.

Written by Carly Huitema