Data Usage

When you’re building a data schema you’re making decisions not only about what data to collect, but also how it should be structured. One of the most useful tools you have is format restrictions.

What Are Format Entries?

A format entry in a schema defines a specific pattern or structure that a piece of data must follow. For example:

A date must look like YYYY-MM-DD or be in the ISO duration format.
An email address must have the format name@example.com
A DNA sequence might only include the letters A, T, G, and C

These formats are usually enforced using rules like regular expressions (regex) or standardized format types.

Why Would You Want to Restrict Format?

Restricting the format of data entries is about ensuring data quality, consistency, and usability. Here’s why it’s important:

✅ To Avoid Errors Early

If someone enters a date as “15/03/25” instead of “2025-03-15”, you might not know whether that’s March 15 or March 25 and what year? A clear format prevents confusion and catches errors before they become a problem.

✅ To Make Data Machine-Readable

Computers need consistency. A standardized format means data can be processed, compared, or validated automatically. For example, if every date follows the YYYY-MM-DD format, it’s easy to sort them chronologically or filter them by year. This is especially helpful for sorting files in folders on your computer.

✅ To Improve Interoperability

When data is shared across systems or platforms, shared formats ensure everyone understands it the same way. This is especially important in collaborative research.

Format in the Semantic Engine

Using the Semantic Engine you can add a format feature to your schema and describe what format you want the data to be entered in. While the schema writes the format rule in RegEx, you don’t need to learn how to do this. Instead, the Semantic Engine uses a set of prepared RegEx rules that users can select from. These are documented in the format GitHub repository where new format rules can be proposed by the community.

After you have created format rules in your schema you can use the Data Entry Web tool of the Semantic Engine to verify your results against your rules.

Final Thoughts

Format restrictions may seem technical, but they’re essential to building reliable, reusable, and clean data. When you use them thoughtfully, they help everyone—from data collectors to analysts—work more confidently and efficiently.

Written by Carly Huitema

…and we’re back to the data ownership quandry…

Just when I think I may have heard all the different types of questions and situations that may arise in the context of data ownership – I hear a new one. When I first heard the situation I’m going to share with you in a moment – I thought nah.. this must be a one-off. But then I heard it again from a different individual and situation – so it MUST be a “thing”! When I’m honest with myself, look back, and contemplate my own situations – I’m left wondering too!!!

So let’s work through a research situation. You have been hired onto a project as a graduate student – working towards your MSc. You’re SO excited and happy about this wonderful opportunity you have. You work with your supervisor and lab group to create the most appropriate experimental design to answer your research question, and begin your data collection. You heard about the Semantic Engine and created your data schema to match your data collection. Two years down the road and you’re ready to move on – your thesis is complete and you’ve graduated. What about your data? What do you do with it?

The BIG question here – WHO owns this data? The supervisor – who is the PI on the research project you’ve been hired onto? OR you as the data collector and analyser? Hmmm…… When you think about these questions – the next question becomes WHO is responsible for the data and what happens to it? I would love to hear what readers think about this? Email me at edwardsm@uoguelph.ca if you have an opinion.

OK what are my thoughts? I’ll let you know on my next blog post 🙂

image created by CoPilot

Funding for Agri-food Data Canada is provided in part by the Canada First Research Excellence Fund

University of Guelph

50 Stone Road East,
Guelph, Ontario, Canada
N1G 2W1

Call: (226) 971-0357

Resources

Quick Links

University of Guelph