Documentation

That is the question to ask – when it comes to historical research data.

So – yes I found some of the original research data that was collected by OAC researchers back in 1877 – BUT…   do we spend the time and resources into pulling it out of the PDFs and making it accessible to the world?  I know I’ve asked this question in a different way in previous posts – but it’s a question that keeps coming up.

There are definitely aspects of this question that need to be objectively reviewed:

  • WHY? undertake this venture?  For curiosity or because it is valuable to you as a researcher?
  • HOW far back do we go? 10 yrs? 50 yrs? 150 yrs?
  • WHERE will you keep this data?  Hmm…  that’s a big question today!
  • WHO? will steward this data?

As a data geek – I want to pull this out and steward it – but I have to be practical about it as well.  How much have research technologies changed over this time period?  How valid is this data today?  Let’s think about this in a different way.  Textbooks – When we teach we update our textbooks on a regular basis since there is new materials to teach and new ways to view and teach the materials.  I have Statistics textbooks going way back – 1950s – but I don’t use these to teach – I use the new updated 2025 texts.  I may read the older texts to get a perspective on why or how things have changed –  but I will use the newer texts as resources for my students.

Let’s go back to data.   I have this cool data on weights and feed intakes of animals in the 1870s but our animals have changed over the past 150+ years.  Is the historical data really of use in today’s active research projects?   Probably not – unless you are a historian?   See – how I can always find a way or reason to keep this data?  But really, time and available resources come into play – do we have the time,  money, and resources to create and preserve this data – that may or may not be of use?

Oish!  I can talk circles around this!  I believe that everyone will come to a point where they will need to make these types of decisions –  but for today’s research data – let’s document it, and deposit into a repository – the data is relevant!  Not like my 150+yr old data 🙂

Michelle

Imagine this scenario. On her first field season as a principal investigator, a professor watched a graduate student realize—two weeks too late—that no one had recorded soil temperature at the sampling sites. The team had pH, moisture, GPS coordinates… but not the one variable that explained the anomaly in their results. A return trip wasn’t possible. The data gap was permanent.

After that, she changed how her lab collected data.

Instead of relying on ad hoc spreadsheets, she worked with her students to design schemas for their lab’s routine data collection. These weren’t schemas for final data deposit—they were practical structures for the messy, active phase of research. The goal was simple: define in advance what gets collected, how it’s recorded, and which values are allowed.

Researchers can use the Semantic Engine to create schemas that they need for all stages of their research program, from active data collection to final data deposition.

For data collection, once a schema is established, it can be uploaded into the Semantic Engine to generate a Data Entry Excel (DEE) file.

Each DEE contains:

  • A schema description sheet – documentation pulled directly from the schema, including variable definitions and code lists.

  • A data entry sheet – pre-labeled columns that follow the schema rules.

The schema description sheet of a Data Entry Excel.
The schema description sheet of a Data Entry Excel.
Data Entry Excel showing the sheet for data entry.
Data Entry Excel showing the sheet for data entry.

Because the documentation lives in the same file as the data, nothing has to be retyped, reinvented, or remembered from scratch. The schema description sheet also includes code lists that populate the drop-down menus in the data entry sheet, reducing inconsistent terminology and formatting errors.

If the standard schema isn’t sufficient, it can be edited in the Semantic Engine. Researchers can add attributes or adjust fields without rebuilding everything from scratch. The updated schema can then generate a new DEE, preserving previous structure while incorporating the changes.

This approach addresses a common problem: unstructured Excel data. Without standardization, spreadsheets accumulate inconsistent date formats, unit mismatches, ambiguous abbreviations, and missing values. Cleaning that data later is costly and error-prone.

By organizing data entry around a schema:

  • Required information is visible and less likely to be forgotten.

  • Fieldwork becomes more reliable – critical variables are collected the first time.

  • Data from multiple researchers or projects can be harmonized more easily.

  • Manual cleaning and interpretation are reduced.

The generated DEE does not enforce full validation inside Excel (beyond drop-down lists). For formal validation, the completed spreadsheet can be uploaded to the Semantic Engine’s Data Verification tool.

Using schema-driven Data Entry Excel files turns data structure into a practical research tool. Instead of discovering gaps during analysis, researchers define expectations at the point of collection—when it matters most.

Written by Carly Huitema

I’m sure by now you’ve heard of the AAFC news – seven research facilities closing with many job cuts.  Research facilities with over a century of research, data, reports….  Oh you all know where I’m going with this!!!  Yup!  Where is all that data?  Gone?  Hidden?  Maybe in some repository?  I don’t know!

What I do know is that we, as an industry and as data researchers and archivists, need to seriously think about that data!   Lacombe Research Centre – 119 years of research – many of these in the field of meat science!  If anyone works in that area – you are well aware of the changes we’ve made over time in the quality of our meats – how we evaluate and grade – a lot of that research was developed at Lacombe!   As a beef geneticist who worked in the meat science field, I am crying if that data is not saved or at least documented!  Uh-oh I said that magic word “document”.

I’m trying to stay optimistic and hopeful – but when I attend industry related meetings and the primary question that arises is “What data?” followed by “Where is the data?” I get scared!  The only reason I am familiar with the type of research and data that was collected at Lacombe is because of my research background.  If I was to run a search today for pork grade data – ok – let’s try it for giggles.

Screenshot of google search results
Screenshot: Google results of “pork grade data”

Hmmm…  ok I should add Canada and see if that changes anything….

Canadian pork grade data google results
Screenshot of Google results for “Canadian pork grade data”

 

Yup!  As I suspected nothing but reports – no data!  So – that initial question of “What data?” followed by “Where is the data?” is not being answered!

Two points I want to make here:

  1. Data is NOT easy to find – nothing showing up for Lacombe information?   If you didn’t know this data existed you wouldn’t know to ask about it.  The classic “If you don’t know you don’t know!”  So – if we don’t know it exists then it’s ok to let it go?  Maybe I shouldn’t worry about the data that’s been collected for the past 119 years?
  2. This is the MAIN problem that we are trying to solve with both ADC and the CS-DCC!   A catalogue of data sources to search across.  A place to visit to determine IF the data exists – followed by where the data exists.  BUT if we don’t know it exists or if it disappears then….

Let’s wake up and acknowledge that our data is VALUABLE and needs to be preserved!

Let’s hope I am wrong and the data collected at the seven AAFC facilities slated for closure will be preserved and FAIR!

Michelle

You’ve seen this word thrown around a lot!  Data about data.  Data Documentation.  Information about your data.  So many different ways to define “metadata”.

If you’ve been reading our blog posts – you know that we are STRONG advocates for data documentation!!  I, personally, am a STRONG believer in metadata – without it – all that time and money that was put into data collection has been flushed away.  Without that crucial documentation or metadata – the data you or your team collected is useless since no one can understand what the data is – let alone understand how to use it.

Let’s add another word now – Standards.  Yes!  Believe it or not there are many different metadata standards out there!  I would argue that most scientific disciplines have an established metadata standard.  Now – as a researcher – are you familiar with these?  Did you know there was a metadata standard for your field of research?

At Agri-food Data Canada, we are aware that this can be very overwhelming – so that’s one of the primary reasons we encourage you to use the Semantic Engine to document the data that you collect – as you would collect it.  Let’s work at documenting the data in a machine readable/actionable format – then we can translate it to the metadata standard that your field of research uses.  WOW!  Easy peasy?  Ok there’s some work involved in creating crosswalks across metadata standards – but first and foremost – let’s NOT fret about what the best or recommended metadata standard is in your field – let’s DOCUMENT that data – and cross-walk it over later.  Let’s be honest – most of us forget to document and need to go back months later and remember what we did!  So document now in an easy to use format – Semantic Engine – and then come talk to us about how to cross-walk to the metadata standard in your field.

Hang on!  One more word today:

INTEROPERABILITY!   

Let me just drop that one here – isn’t this part of what I’m rambling on about today?

Michelle