GitHub

Oh WOW! Back in October I talked about possible places to store and search for data schemas. For a quick reminder check out Searching for Data Schemas and Searching for variables within Data Schemas. I think I also stated somewhere or rather sometime, that as we continue to add to the Semantic Engine tools we want to take advantage of processes and resources that already exist – like Borealis, Lunaris, and odesi. In my opinion, by creating data schemas and storing them in these national platforms, we have successfully made our Ontario Agricultural College research data and Research Centre data FAIR.

But, there’s still a little piece to the puzzle that is missing – and that my dear friends is a catalogue. Oh I think I heard my librarian colleagues sigh :). Searching across National platforms is fabulous! It allows users to “stumble” across data that they may not have known existed and this is what we want. But, remember those days when you searched for a book in the library – found the catalogue number – walked up a few flights of stairs or across the room to find the shelf and then your book? Do you remember what else you did when you found that shelf and book? Maybe looked at the titles of the books that were found on the same shelf as the book you sought out? The titles were not the same, they may have contained different words – but the topic was related to the book you wanted. Today, when you perform a search – the results come back with the “word” you searched for. Great! Fabulous! But it doesn’t provide you with the opportunity to browse other related results. How these results are related will depend on how the catalogue was created.

I want to share with you the beginnings of a catalogue or rather catalogues. Now, let’s be clear, when I say catalogue, I am using the following definition: “a complete list of items, typically one in alphabetical or other systematic order.” (from the Oxford English Dictionary). We have started to create 2 catalogues at ADC – one is the Agri-food research centre schema library listing all the data schemas currently available at the Ontario Dairy Research Centre and the Ontario Beef Research Centre and the second is a listing of data schemas being used in a selection of Food from Thought research projects.

As we continue to develop these catalogues, keep an eye out for more study level information and a more complete list of data schemas.

image created by AI

GitHub is more than just a code repository, it is a a powerful tool for collaborative documentation and standards development. GitHub is an important tool for the development of FAIR data. In the context of writing and maintaining documentation, GitHub provides a comprehensive ecosystem that enhances the quality, accessibility, and efficiency of the process. Here’s why GitHub is invaluable for documentation:

Version Control: Every change to the documentation is tracked, ensuring that edits can be reviewed, reverted, or merged with ease. This enables a clear history of revisions with clear authorship identified to contributors. While this is possible using a tool such as Google Docs or even Word, version control is a central feature of GitHub and it offers much stronger tooling compared to other methods.
Collaboration: GitHub makes collaboration easy among team members. Contributors can suggest changes, discuss updates, and resolve questions through pull requests and issues.
Accessibility: Hosting documentation on GitHub makes it easily accessible to a wide audience. Users can view, clone, or download the latest version of documentation from anywhere.
Markdown Support: GitHub natively supports Markdown which is a simple and powerful way to create and format documentation. Markdown lets you write clean, readable text with minimal effort.
Integration and Automation: GitHub integrates with various tools and services. One common usage in documentation is the ability to connect GitHub content with static site generators (e.g., Jekyll, Docusaurus). This then allows documentation to be presented as a webpage with a clean interface for reading, but with the backend tools of GitHub for content management and collaborative creation.

Learn how to start using GitHub

To learn more about how to use GitHub, ADC has contributed content to this online book with introductions to research data management and how to use GitHub for people who write documentation. This project itself is an example of documentation hosted in GitHub and using the static site generator Jekyll to turn back-end markdown pages into an HTML-based webpage.

From the GitHub introduction you can learn about how to navigate GitHub, write in Markdown, edit files and folders, work on different branches of a project, and sync your GitHub work with your local computer. All of these techniques are useful for working collaboratively on documentation and standards using GitHub.

Agri-food Data Canada is a partner in the recently announced Climate Smart Agriculture and Genomics project and is a member of the Data Hub. One of our outputs as part of this team has been the introduction to GitHub documentation.

Written by Carly Huitema

Funding for Agri-food Data Canada is provided in part by the Canada First Research Excellence Fund

University of Guelph

50 Stone Road East,
Guelph, Ontario, Canada
N1G 2W1

Call: (226) 971-0357

Resources

Quick Links

University of Guelph