
Unit framing to standardize units
In research and data-intensive environments, precision and clarity are critical. Yet one of the most common sources of confusion—often overlooked—is how units of measure are written and interpreted.
Take the unit micromolar, for example. Depending on the source, it might be written as uM
, μM
, umol/L
, μmol/l
, or umol-1
. Each of these notations attempts to convey the same concentration unit. But when machines—or even humans—process large amounts of data across systems, this inconsistency introduces ambiguity and errors.
The role of standards
To ensure clarity, consistency, and interoperability, standardized units are essential. This is especially true in environments where data is:
-
Shared across labs or institutions
-
Processed by machines or algorithms
-
Reused or aggregated for meta-analysis
-
Integrated into digital infrastructures like knowledge graphs or semantic databases
Standardization ensures that “1 μM” in one dataset is understood exactly the same way in another and this ensures that data is FAIR (Findable, Accessible, Interoperable and Reusable).
UCUM: Unified Code for Units of Measure
One widely adopted system for encoding units is UCUM—the Unified Code for Units of Measure. Developed by the Regenstrief Institute, UCUM is designed to be unambiguous, machine-readable, compact, and internationally applicable.
In UCUM:
-
micromolar becomes
umol/L
-
acre becomes
[acr_us]
-
milligrams per deciliter becomes
mg/dL
This kind of clarity is vital when integrating data or automating analyses.
UCUM doesn’t include all units
While UCUM covers a broad range of units, it’s not exhaustive. Many disciplines use niche or domain-specific units that UCUM doesn’t yet describe. This can be a problem when strict adherence to UCUM would mean leaving out critical information or forcing awkward approximations. Furthermore, UCUM doesn’t offer and exhaustive list of all possible units, instead the UCUM specification describes rules for creating units. For the Semantic Engine we have adopted and extended existing lists of units to create a list of common units for agri-food which can be used by the Semantic Engine.
Unit framing overlays of the Semantic Engine
To bridge the gap between familiar, domain-specific unit expressions and standardized UCUM representations, the Semantic Engine supports what’s known as a unit framing overlay.
Here’s how it works:
-
Researchers can input units in a familiar format (e.g.,
acre
oruM
). -
Researchers can add a unit framing overlay which helps them map their units to UCUM codes (e.g.,
"[acr_us]"
or"umol/L"
). -
The result is data that is human-friendly, machine-readable, and standards-compliant—all at the same time.
This approach offers the both flexibility for researchers and consistency for machines.
Final thoughts
Standardized units aren’t just a technical detail—they’re a cornerstone of data reliability, semantic precision, and interoperability. Adopting standards like UCUM helps ensure that your data can be trusted, reused, and integrated with confidence.
By adopting unit framing overlays with UCUM, ADC enables data documentation that meet both the practical needs of researchers and the technical requirements of modern data infrastructure.
Written by Carly Huitema