Research Data Management

The Case for Uploadable Form Data: A More Flexible Approach to Online Submissions

Anyone who has worked extensively with online submission systems will recognize a familiar frustration: you have done the hard work of gathering, drafting, and refining your content – often collaboratively, across multiple documents and tools – and now you face the tedious task of manually copying everything into a web form, field by field. The content is ready; the process of getting it into the system is not.

This challenge comes up repeatedly in our work with the Agri-food Data Canada (ADC) and the Climate-Smart Data Collaboration Centre (CS-DCC), and it is not unique to any one platform or domain. It reflects a structural gap in how most online forms are designed: they are built for data entry, not data transfer.

The Collaboration Problem

This tension is particularly well illustrated in the context of data management plans. As noted in a recent article from Upstream:

“Data management planning is often a collaborative process, involving researchers, librarians, and institutional support staff. External tools and shared documents make it easier to iterate on plans, incorporate guidance, and ensure alignment with institutional policies and available resources. When plans are created directly within submission systems, that collaborative process can become more difficult.”

The same dynamic applies across many submission workflows. Researchers, teams, and support staff work best when they can iterate freely in shared documents, draw on existing resources, and incorporate guidance from multiple sources. Forcing that process into a single submission interface introduces friction at exactly the wrong moment – when content is nearly complete and should be easy to finalize.

Why APIs Are Not Always the Answer

The conventional infrastructure response to interoperability problems is to connect systems via APIs. While APIs are powerful and appropriate in many contexts, they come with real constraints. Both parties must be ready and willing to build and maintain the connection. Integration work requires technical resources on both sides. Security and access management become more complex. And the result is a series of point-to-point connections rather than a flexible, open approach that any participant can use.

APIs are well suited for tightly coupled systems with dedicated integration teams. They are less well suited for the diverse, distributed ecosystems that characterize research infrastructure – where institutions, tools, and workflows vary enormously and where not every participant has the capacity to build custom integrations.

A Simpler Proposal: Publishable Schemas and Uploadable Data Files

At ADC, we have been exploring a different approach. The core idea is straightforward: if an online form publishes its expected data structure as a downloadable schema or sample data file, then users can prepare their submissions outside the system – collaboratively, using whatever tools work best for them – and upload a structured file (such as a .json file) when they are ready to submit.

The form receives the uploaded file, validates it against the expected format, and populates the interface with the pre-filled content. The user retains final control, reviewing and editing within the UI before submitting. This preserves the benefits of collaborative, tool-agnostic preparation while keeping the submission process human-centered and editable.

Consider the DMP Assistant, the Alliance’s data management planning tool. Currently, users draft their plans directly in the online interface. Under this model, a researcher could instead work with their existing lab documentation, institutional guidance, and an AI assistant to compile all the relevant information into a structured .json file, then upload it into the DMP Assistant to populate the form in a single step – arriving at the editing stage with a complete draft rather than a blank form.

We Already Do This

This is not a theoretical proposal. ADC already supports this kind of workflow in the Semantic Engine’s schema development tool. We provide a structured prompt that helps users draft their schema content with an AI assistant, then upload the resulting .json file directly into the Semantic Engine editor. The file is parsed, the fields are populated, and the user continues working from there – a draft version that may contain AI errors but offers the opportunity to correct and improve.

The pattern works. It reduces manual data entry, supports collaborative preparation, and lowers the barrier for users who want to work in familiar tools before engaging with a specialized system. We believe it is a model worth broader adoption, and one that submission systems of all kinds could implement without requiring significant infrastructure investment from either side.

Written by Carly Huitema

 

Imagine this scenario. On her first field season as a principal investigator, a professor watched a graduate student realize—two weeks too late—that no one had recorded soil temperature at the sampling sites. The team had pH, moisture, GPS coordinates… but not the one variable that explained the anomaly in their results. A return trip wasn’t possible. The data gap was permanent.

After that, she changed how her lab collected data.

Instead of relying on ad hoc spreadsheets, she worked with her students to design schemas for their lab’s routine data collection. These weren’t schemas for final data deposit—they were practical structures for the messy, active phase of research. The goal was simple: define in advance what gets collected, how it’s recorded, and which values are allowed.

Researchers can use the Semantic Engine to create schemas that they need for all stages of their research program, from active data collection to final data deposition.

For data collection, once a schema is established, it can be uploaded into the Semantic Engine to generate a Data Entry Excel (DEE) file.

Each DEE contains:

  • A schema description sheet – documentation pulled directly from the schema, including variable definitions and code lists.

  • A data entry sheet – pre-labeled columns that follow the schema rules.

The schema description sheet of a Data Entry Excel.
The schema description sheet of a Data Entry Excel.
Data Entry Excel showing the sheet for data entry.
Data Entry Excel showing the sheet for data entry.

Because the documentation lives in the same file as the data, nothing has to be retyped, reinvented, or remembered from scratch. The schema description sheet also includes code lists that populate the drop-down menus in the data entry sheet, reducing inconsistent terminology and formatting errors.

If the standard schema isn’t sufficient, it can be edited in the Semantic Engine. Researchers can add attributes or adjust fields without rebuilding everything from scratch. The updated schema can then generate a new DEE, preserving previous structure while incorporating the changes.

This approach addresses a common problem: unstructured Excel data. Without standardization, spreadsheets accumulate inconsistent date formats, unit mismatches, ambiguous abbreviations, and missing values. Cleaning that data later is costly and error-prone.

By organizing data entry around a schema:

  • Required information is visible and less likely to be forgotten.

  • Fieldwork becomes more reliable – critical variables are collected the first time.

  • Data from multiple researchers or projects can be harmonized more easily.

  • Manual cleaning and interpretation are reduced.

The generated DEE does not enforce full validation inside Excel (beyond drop-down lists). For formal validation, the completed spreadsheet can be uploaded to the Semantic Engine’s Data Verification tool.

Using schema-driven Data Entry Excel files turns data structure into a practical research tool. Instead of discovering gaps during analysis, researchers define expectations at the point of collection—when it matters most.

Written by Carly Huitema

In my last post OAC and Historical Ag Data – has the 150 year old mystery been solved?  I ended with a couple of highlighted statements made by the Wm. Johnston, who was the rector/president of OAC back in the early years (1874-1879):

  • conduct experiments and publish the results
  • lay his hands upon our results and follow them if desirable

How many of you saw Research Data Management at play here??  Oh it yelled documentation to me too!   Obtain results and follow them if desirable – how do you do this without documentation????

Back in 1876, Wm Johnston had the foresight to set out what he called a code of action for any trials that were conducted on the Experimental Farm.  As I read through these I see a number of corollaries to today’s research processes – setting out a research question, ensuring you look at the results in context of the trial, documentation, and the practicality of running trials on a working farm!   I find it funny how this code of action was put in place back in 1876 and yet we are still teaching some of these basics today.  For reference here is the code of action items that relate to both livestock and field experiments:

  1. All principles must be laid down by facts of practice and science
  2. If a principle seems wanting, we may have to establish one
  3. To ascertain the exact state of information regarding any contemplated experiment – repetition might be useless
  4. To select the subject of enquiry
  5. The solving of a definite question, whether affirmative or negative
  6. Whether the subject is practical or economical, of both combined
  7. The arrangement of a definite plan of operations – the form in which the enquiry should be prosecuted
  8. Uniformity of treatment
  9. Duplicates indispensable

There are 8 more relevant if you are embarking on an experiment where you are feeding animals.  These include items such as:  previous treatment and present condition, periodical weighing; character of housing and temperature, the greatest result in the least time, at the least cost, and four more…  Additional field experiment principles include: uniformity of soil and exposure, analysis of soil and manures, all useless and misleading without minuteness, the result of one experiment suggesting another, and twelve more…  It is suggested that all these principles be observed and documented – WOW!!!

As I continue to review the experiments and data available from the initial years of what we now know as OAC –  I see that we also have/had a strong history in research management and yes I’m going to extend that to research data management.  The information – data at that time – is included in these publications – albeit Annual Reports – but all the information – data – is there!!

In the early years – before any requests came in for specific experiments on farm, feeding trials for the pigs and cattle were predominant – as were cropping/planting trials on the site.  There are tables and images of alot of these trials in the Annual Report.  I have only read the first 4 years and I am VERY impressed with the “data” and information on these trials.

So now the question that begs to be asked – now what?  How much data did I find?  What is the value of this data?   Is it only valuable to the data geeks?  What if I told you there were crude weather reports and data included in some of these reports?   Would that change your mind regarding the value of this information?

Something to ponder…. Till the next time,

Michelle

 

Generalist and Specialist Data Repositories

Research data repositories can be described along two important dimensions:

  1. how broad or specialized their scope is, and

  2. where they sit in the research data lifecycle.

Understanding these distinctions helps understand the technologies and repositories available for research data.

Generalist Repositories

Generalist repositories are designed to accept many different kinds of data across disciplines. They prioritize inclusivity and flexibility, offering a common technical platform where researchers can deposit datasets that do not fit neatly into a single domain.

A useful metaphor is the junk drawer in a kitchen. A junk drawer contains many useful items—batteries, spare cables, elastic bands—but finding a specific item often requires some effort. Similarly, generalist repositories can hold valuable datasets, but those datasets may be described with relatively generic metadata and limited domain-specific structure.

As a result, data in generalist repositories can be:

  • Harder to discover through precise searches

  • More difficult to interpret without additional context

  • Less immediately reusable by domain experts

Examples of generalist repositories include Dataverse (Borealis in Canada), Figshare and OSF.

Specialist Repositories

Specialist repositories focus on a specific discipline, data type, or research community. They typically enforce domain-specific metadata standards, controlled vocabularies, and structured submission requirements.

Continuing the kitchen metaphor, specialist repositories resemble a cutlery drawer: clearly organized, purpose-built, and easy to use—provided you are looking for the right type of item. Knives go in one place, forks in another, and everything has a defined role.

Because of this structure, specialist repositories tend to make data:

  • More findable through precise, domain-aware search

  • Easier to interpret due to consistent metadata

  • More interoperable with related tools and systems

  • More reusable for future research

In other words, data in specialist repositories are often more FAIR than data in generalist repositories. However, this specialization also limits what they can accept. Many interdisciplinary datasets—particularly in agri-food research—do not align cleanly with the strict models of existing specialist repositories and therefore end up in generalist ones. Examples of specialist repositories include Genbank, PDB and GEO.

The Research Data Lifecycle: Active and Archival Data

Another important way to think about data repositories is in relation to the research data lifecycle.

Research data typically move through several phases:

  1. Planning and collection

  2. Processing, active analysis and refinement

  3. Publication and dissemination

  4. Long-term preservation and reuse

Repositories are often designed to support either active data or archival data, but not both equally well.

Active Data

Active data are produced and used during the course of research. They may be incomplete, frequently updated, or subject to access restrictions due to confidentiality, sensitivity, or competitive concerns.

This is the phase where data are still being cleaned, analyzed, and interpreted. Changes are expected, and collaboration is often ongoing. Most formal repositories are not designed to support this stage, which is typically handled through local storage, shared drives, or project-specific platforms.

Archival Data

Once research is complete and results have been published, data generally move into an archival phase. At this point, datasets are more stable, less likely to change, and often less sensitive—especially if they have been anonymized or if concerns about being “scooped” no longer apply.

Most well-known repositories, including Dataverse, Figshare, and domain-specific archives such as the Protein Data Bank (PDB), are designed primarily for archival data. Their strengths lie in long-term preservation, persistent identifiers (PIDs like DOIs), citation, and access, rather than supporting ongoing analysis or frequent updates.

Bridging the Gaps

It would be inefficient to build a highly specialized repository for every possible type of dataset—much like building a kitchen with a separate drawer for every object that might otherwise end up in the junk drawer. Instead, a more scalable approach is to improve the organization and description of data held in generalist repositories.

Agri-food Data Canada’s approach focuses on developing tools, guidance, and training that help researchers add structure and context to their data wherever it is deposited. By enhancing metadata quality and enabling interoperability between repositories, it becomes possible to make data in generalist repositories more FAIR—without requiring a proliferation of narrowly specialized infrastructure.

Together, specialist and generalist repositories, along with active and archival data systems, form complementary parts of the research data ecosystem. Recognizing their respective roles helps researchers choose appropriate platforms and supports more effective data reuse over time.

Written by Carly Huitema

In Canada, national research data infrastructure is coordinated by the Digital Research Alliance of Canada (DRAC). The Alliance provides the digital tools and platforms that researchers depend on to manage data, perform advanced computing, and leverage research software. Supported by federal funding, DRAC works with partners across the country to expand access, improve security, and strengthen the digital research workforce. These efforts enable Canadian researchers in all disciplines to conduct more efficient, secure, and interoperable research.

As part of the ongoing modernization of Canada’s research infrastructure, DRAC is preparing to introduce a national Registration Agency for Research Activity Identifiers (RAiDs). RAiDs are a relatively new category of persistent identifiers designed to support the accurate identification, management, and linking of research activities—often conceptualized as research “projects”—throughout their full lifecycle.

Why RAiDs Matter

RAiDs provide a globally unique, persistent identifier for a research activity and connect that activity to:

  • People (researchers, collaborators)

  • Organizations (institutions, funders)

  • Outputs (publications, datasets, software)

  • Related resources (grants, ethics approvals, infrastructure)

This enables research projects to be tracked, referenced, and integrated across multiple systems. RAiDs are especially relevant in environments where interoperability is critical, such as national and international research data platforms.

For Canada, RAiDs are being positioned as a foundational component of the Canadian Research Data Platform, where they will facilitate information exchange between services, reduce duplication, and improve project-level transparency across institutions.

How RAiDs Are Minted

A key principle of the RAiD system is that researchers cannot independently mint RAiD identifiers. RAiDs must be generated through a recognized RAiD Service Provider. This approach ensures consistency, quality, and proper registration within the global RAiD infrastructure.

Two other identifiers have other minting processess:

  • ORCID allows individuals to mint their own researcher identifier at the ORCID website.

  • DOIs, however, must be issued by an authorized DOI service provider such as Dataverse, Zenodo, or Figshare.

RAiDs follow the DOI model rather than the ORCID model. Institutions, not individuals, carry the responsibility for minting and maintaining the associated metadata.

As DRAC moves toward becoming a national RAiD Registration Agency, Canadian researchers and institutions will gain a dedicated domestic pathway to obtain RAiDs that are recognized and resolvable globally.

The Global RAiD Registry

All RAiD identifiers and their metadata are maintained in a centralized global registry managed by the International RAiD Data Service, currently coordinated by the Australian Research Data Commons and partner organizations. This registry serves as the authoritative source for RAiD information and provides stable, persistent resolution of RAiD identifiers.

The registry stores:

  • The RAiD itself

  • Descriptive metadata about the research activity

  • Relationships to researchers, institutions, datasets, and grants

  • Activity lifecycle events (start, updates, completion)

  • Version histories and changes over time

Functionally, the RAiD registry operates in a manner similar to:

  • DataCite, which maintains DOI metadata

  • ORCID, which maintains researcher metadata

It is the central location where systems can query, resolve, and verify RAiD information.

The RAiD metadata schema is published openly and can be reviewed at:
https://metadata.raid.org/en/v1.6/index.html

Can Anyone Use the RAiD Metadata Schema?

Any organization—or individual—can choose to document their research activities using the publicly available RAiD metadata model. However, without going through an authorized RAiD Service Provider, they cannot mint an official RAiD identifier, and the resulting record will not be registered in the global RAiD registry or participate in the broader RAiD ecosystem.

Official registration is what ensures global uniqueness, persistent resolution, and interoperability across research platforms.

Conclusion

RAiDs are emerging as a critical component of modern research infrastructure, offering a structured, persistent mechanism for identifying and connecting research activities with all related people, outputs, and systems. The Digital Research Alliance of Canada’s plan to establish a national RAiD Registration Agency represents a significant step toward improving the coordination, traceability, and interoperability of research in Canada.

As Canada’s research ecosystem continues to evolve, the adoption of standardized, globally recognized identifiers like RAiDs will support more transparent, connected, and efficient research workflows—benefiting researchers, institutions, and the broader scientific community.

Written by Carly Huitema

In research environments, effective data management depends on clarity, transparency, and interoperability. As datasets grow in complexity and scale, institutions must ensure that research data is FAIR; not only accessible but also well-documented, interoperable, and reusable across diverse systems and contexts in research Data Spaces.

The Semantic Engine (which runs OCA Composer), developed by Agri-Food Data Canada (ADC) at the University of Guelph, addresses this need.

What is the OCA Composer

The OCA Composer is based on the Overlays Capture Architecture (OCA), an open standard for describing data in a structured, machine-readable format. Using OCA allows datasets to become self-describing, meaning that each element, unit, and context is clearly defined and portable.

This approach reduces reliance on separate documentation files or institutional knowledge. Instead, OCA schemas ensure that the meaning of data remains attached to the data itself, improving how datasets are shared, reused, and integrated over time. This makes data easier to interpret for both humans and machines.

The OCA Composer provides a visual interface for creating these schemas. Researchers and data managers can build machine-readable documentation without programming skills, making structured data description more accessible to those involved in data governance and research.

Why Use OCA Composer in your Data Space

Implementing standards can be challenging for many Data Spaces and organizations. The OCA Composer simplifies this process by offering a guided workflow for creating structured data documentation. This can help researchers:

  • Standardize data descriptions across projects and teams
  • Improve dataset discoverability and interoperability
  • Support collaboration through consistent documentation templates (e.g. Data Entry Excel)
  • Increase transparency and trust in data definitions

By making metadata a central part of data management, researchers can strengthen their overall data strategy.

Integration and Customization

The OCA Composer can support the creation and running of Data Spaces by organizations, departments, research projects and more. These Data Spaces often have unique digital environments and branding requirements. The OCA Composer supports this through embedding and white labelling features. These allow the tool to be integrated directly into existing platforms, enabling users to create and verify schemas while remaining within the infrastructure of the Data Space. Institutions can also apply their own branding to maintain a consistent visual identity.

This flexibility means the Composer can be incorporated into internal portals, research management systems, or open data platforms including Data Spaces while preserving organizational control and customization.

To integrate the OCA Composer in your systems or Data Space, check out our more technical details. Alternatively, consult with Agri-food Data Canada for help, support or as a partner in your grant application.

 

Written by Ali Asjad and Carly Huitema

Surprise! Surprise!  I’m switching gears a bit for this blog post – off my historical data and data ownership pedestal for a bit 🙂

I want to talk about RDM – Research Data Management – today.  For the past decade I’ve been working with colleagues offering workshops on this topic and working with the Research Data Lifecycle – yup also talked about this in the past:

 

Throughout these posts and the FAIR set of blog posts – I always think to myself – we are NOT teaching anyone anything new or really exciting.  It is more about bringing these challenges to light and nudging everyone to think about RDM when they start their research projects.  At the end of the workshops, I often have students thank me and comment on how funny my examples were and leave.  But, once they get more involved with their projects – that’s when I get the OH! I get it now! and they start from scratch and re-organize their files and project folders.

As ADC matures, we are getting calls from projects to help with their RDM – specifically with the “management” aspect of their data.  Questions are usually to the effect of: we have terabytes of data – what do we do?  This is a basic yet VERY daunting question – let’s be honest!  So let’s work through this together and hopefully some of the tips we re-share here will help.

Organizing your Research Project Data

When organizing a project there are many ways to do this.  Let’s use my bookshelf in my home office as an example.  I can organize my books by author, or I can organize my books by topic, or I can organize my books by frequency of use, or I can organize my books by colour, or….  you get the idea!  How I organize the books on my shelf is really a personal choice and based on how “I” use the books.  Now, let’s turn to how to organize your project data.  Chances are you will have many different views and opinions on how to organize the data.  The project team may consider organizing it by date received, or by instrument used to collect the data, or by individual collecting the data, the options are almost endless.   In my opinion, there are a couple of ways that I think about it:  how the data was collected VS how the data will be used.

In extremely large projects where we have terabytes of data, you should start by asking yourself the very basic question: “How will be use this data?  or How do we anticipate using this data?”  Organizing the data by animal or plot does NOT make sense if you anticipate working with the data across many dates.  So would organizing the data by dates be better?  Let’s be honest – there is NO one way or right way for all!   But, I HIGHLY recommend you and/or your team spend time working through the best organization for your data.

Let me show you WHY this will save you a LOT of time.   Here is an image I took of my files from an old work laptop – eek!  15 years ago!  There are a LOT of problems with how I organized my work laptop at that time.  Now, if I need to find a presentation I did for a conference in 2009 – I will need to open ALL those powerpoint presentations, review the content, rename and place in a more appropriate directory.  Friday_April_11 is in a large unorganized directory just doesn’t work!  What if I need to find that historical polling data from the 1950s?  I would need to open each Excel file, browse the file to determine whether it is the correct one or not.  Psst – it’s the one titled “husbands_fauults_maritalStatus” 😉

a list of files that are NOT organized

In this directory there are only a few files  – so manageable.  BUT imagine this was a directory or folder on your computer with Tbs of your data!  Names are all over the place, since one instrument may provide filenames as INSTR01.dat, one student may name their files as MEdwards_202505.csv, a third researcher may name their files as PROJECT02_data.xlsx.  Without any guidance, everyone places their data files where they think it makes sense – think back to my bookshelf example and you have one big mess – similar to my files back in 2009!

Remember you have Tbs of data!  The time it takes to open every file, review the contents, rename, and move to new organizational structure is time saved IF you decide on an organizational structure at the start of your project!  YES as project management changes, there may also be a re-org of data – but let’s come up with a structure, document it, and leave for the whole team to use!

Sounds easy right??

If you and your team are starting a project and would like to meet with us to help – please send us an email at adc@uoguelph.ca. We are currently working with a couple of larger projects and would love to help you out too!

Michelle

 

 

image created by AI

It’s me again!  Yup back to that historical data topic too!  I didn’t want to leave everyone wondering what I did with my old data – so I thought I’d take you on a tour of my research data adventures and what has happened to all that data.

BSc(Agr) 4th year project data – 1987-1988

Let’s start with my BSc(Agr) data – that image you saw in my last post  was indeed part of my 4th year project and a small piece of a provincial (Nova Scotia) mink breeding project: “Estimation of genetic parameters in mink for commercially important traits”.  The data was collected over 2 years and YES it is was collected by hand and YES I have it my binder (here in my office).  Side note: if you have ever worked with mink – it can take days to smell human after working with them 🙂   Now some of you may be thinking – hang on – breeding project – data in hand – um…  how were the farms able to make breeding decisions if I had the data?  Did they get a copy of the data?

Remember we are talking 40 years ago – and YES every piece of data that we collected – IF it was relevant for any farm decisions, was photocopied and later entered into a farm management system.  So, no management data was lost!  However, I took bone diameter measurements, length measures, weights at regular intervals, and many more measures – that frankly were NOT necessary or of interest to management of the animals at that time.  Now that data – to me – is valuable!!  So – what did I do with it?  A few years ago – during some down time – I transcribed it and now I have a series of Excel files with the data.  Next question would be – where is the data?   Another topic for next blog post 😉

MSc project data – 1989-1990

Moving onto my MSc data – “Estimation of swine carcass composition by video image analysis” (https://bac-lac.on.worldcat.org/oclc/27849855?lang=en).   Hang on to your hats for this!!

Image of folders         Image of folders
Image of acetate tracings

 

And you thought handwritten data was bad!  Here are all the printouts of my MSc thesis data and hand drawn acetate tracings from a variety of pork cuts!  Now what?  Remember I keep bringing us back to this concept of historical data.  Well, this is to show you that historical data comes in many different formats.  Does this have value?  Should I do something with this?

Well –  you should all know the answer by now :). Yes, a couple of years ago I transcribed the raw data sheets into Excel files.  But, those tracings – they’re just hanging around for the moment.  I just cannot get myself to throw them out – maybe some day I’ll figure out what to do with them.  If you have any suggestions – I would LOVE to hear from you.

Also note, that the manager of the swine unit at that time, kept his own records for management and breeding purposes – this data was only for research purposes.

PhD project data – 1995-1997

So up until now – it took work but I was able to transcribe and re-use my BSc(Agr) and MSc research data.  Now the really fun part.  Here are another couple of pictures that might take some of you back.

Box of 3.5" diskettesImage of 2 3.5" diskettes

Yup!  My whole PhD data was either on these lovely 3.5″ diskettes or on a central server – which is now defunct!  Now we might excited and think – hey it’s digital!   No need to transcribe!  BUT and that’s a VERY LOUD

BUT

These diskettes are 30 years old!  Yes I bought a USB disk drive and when I went through these only 3 were readable!   and the data on them were in a format that I can no longer read without investing a LOT of time and potentially money!

Now the really sad part – these data were again part of a large rotational breeding program.  The manager also kept his own records – but there was SO much valuable data, especially the meat quality side of my trials that were not kept and lost!  To this day, I am aware that there were years of data from this larger beef trial that were not kept.  It’s really hard to see and know that has happened!

Lessons learned?

Have we really learned anything?  For me, personally, these 3 studies, have instilled my desire to save research data – but I have come to realize that not everyone feels the same way.  That’s ok!  Each of us, needs to consider if there is an impact to losing that OLD or historical data.  For my 3 studies, the mink one – the farm managers kept what they needed and the extra measures I was taking would not have impacted the breeding decisions or the industry – so – ok we can let that data die.  It’s a great resource for teaching statistics though.

My MSc data – again – I feel that it followed a similar pattern than my BSc(Agr) trials.  Although, from a statistical point of view – there are a few great studies that someone could do with this data – so who knows if that will happen or not.

Now my PhD data – that one really stings!  Working with the same Research Centre today yes 30 years later!  I wish we had a bigger push to save that data.  Believe me – we tried – there are a few of us around today that still laugh at the trials and tribulations of creating and resurrecting the Elora Beef Database – but we just haven’t gotten there yet – and I personally am aware of a lot of data that will never be available.

So I ask you – is YOUR research data worth saving?  What are your research data adventures?   Where will it leave your data?

Michelle

 

 

image created by AI

I left off my last blog post with a question – well, actually a few questions:   WHO owns this data?  The supervisor – who is the PI on the research project you’ve been hired onto?  OR you as the data collector and analyser?  Hmmm…… When you think about these questions – the next question becomes WHO is responsible for the data and what happens to it?

As you already know there really are NO clear answers to these questions.   My recommendation is that the supervisor, PI, lab manager, sets out a Standard Operating Procedures (SOP) guide for data collection.  Yes, I know this really does NOT address the data ownership question – but it does address my last question: WHO is responsible for the data and what happens to it?  And let’s face it – isn’t that just another elephant in the room?  Who is responsible for making the research data FAIR?

Oh my, have I just jumped into another rabbit hole?

We have been talking about FAIR data, building tools, and making them accessible to our research community and beyond – BUT?  are we missing the bigger vision here?  I talk to researchers and most agree that they want to make their data FAIR and share it beyond their lab – BUT…. let’s be honest – that’s a lot of work!   Who is going to do it?  Here, at ADC, our goal is to work with our research community to help them make agri-food data (and beyond) FAIR – and we’ve been creating tools, creating training materials, and now we are on the precipice of changing the research data culture – well I thought we were – and now I’m left wondering – who is RESPONSIBLE for setting out these procedures in a research project?  WHO should be the TRUE force behind changing the data culture and encouraging FAIR research data?

Don’t worry – for anyone reading this – we are VERY set and determined to changing the research data culture by continuing to make the transition to FAIR data – easy and straightforward.  It’s just an interesting question and one I would love for you all to consider – WHO is RESPONSIBLE for the data collected in a research project?

Till the next post – let’s consider Copyright and data – oh yes!  Let’s tackle that hurdle 🙂

Michelle

 

 

image created by AI

GitHub is more than just a code repository, it is a a powerful tool for collaborative documentation and standards development. GitHub is an important tool for the development of FAIR data. In the context of writing and maintaining documentation, GitHub provides a comprehensive ecosystem that enhances the quality, accessibility, and efficiency of the process. Here’s why GitHub is invaluable for documentation:

  1. Version Control: Every change to the documentation is tracked, ensuring that edits can be reviewed, reverted, or merged with ease. This enables a clear history of revisions with clear authorship identified to contributors. While this is possible using a tool such as Google Docs or even Word, version control is a central feature of GitHub and it offers much stronger tooling compared to other methods.
  2. Collaboration: GitHub makes collaboration easy among team members. Contributors can suggest changes, discuss updates, and resolve questions through pull requests and issues.
  3. Accessibility: Hosting documentation on GitHub makes it easily accessible to a wide audience. Users can view, clone, or download the latest version of documentation from anywhere.
  4. Markdown Support: GitHub natively supports Markdown which is a simple and powerful way to create and format documentation. Markdown lets you write clean, readable text with minimal effort.
  5. Integration and Automation: GitHub integrates with various tools and services. One common usage in documentation is the ability to connect GitHub content with static site generators (e.g., Jekyll, Docusaurus). This then allows documentation to be presented as a webpage with a clean interface for reading, but with the backend tools of GitHub for content management and collaborative creation.

Learn how to start using GitHub

To learn more about how to use GitHub, ADC has contributed content to this online book with introductions to research data management and how to use GitHub for people who write documentation. This project itself is an example of documentation hosted in GitHub and using the static site generator Jekyll to turn back-end markdown pages into an HTML-based webpage.

From the GitHub introduction you can learn about how to navigate GitHub, write in Markdown, edit files and folders, work on different branches of a project, and sync your GitHub work with your local computer. All of these techniques are useful for working collaboratively on documentation and standards using GitHub.

Agri-food Data Canada is a partner in the recently announced Climate Smart Agriculture and Genomics project and is a member of the Data Hub. One of our outputs as part of this team has been the introduction to GitHub documentation.

Written by Carly Huitema