Funding for Agri-food Data Canada is provided in part by the Canada First Research Excellence Fund
At Agri-food Data Canada we are supporting our researchers to make their research data more FAIR (Findable, Accessible, Interoperable and Reusable). Open data, where data is made available in archival data catalogues such as Borealis is an important step in improving data FAIRness. However, it isn’t always feasible to make data openly available. Some data is sensitive, containing personally identifying information (PII) and cannot be shared. Data is also often in an ‘active research state’, that is, researchers are in the midst of collecting and analyzing the data and it is not ready to be made available openly.
An active research data catalogue can help our researchers meet goals of wanting to share with the community the types of data they are currently working with (or have available) in order to identify collaborators or contribute to research projects, without making the data available itself. Instead, the community can learn about what data is collecting by searching catalogues of schemas – metadata which describes important details about the research data but doesn’t contain any research data itself.
An active research data catalogue can serve many communities – a lab could host a catalogue so that all lab members can find out what data the team is collecting and in what formats. With a lab catalogue there is the opportunity to reuse schemas between projects promoting interoperability and making it easier to combine datasets, reducing the work of ‘data wrangling’. If researchers are working together in a larger project they can help their community keep on top of what data everyone is collecting by creating a shared active research data catalogue for the project which would store project specific data schemas describing the data being collected by project members. In the same vein, departments could also host these schema repositories to help their members understand what data their colleagues are collecting, perhaps identifying future opportunities to collaborate.
At Agri-food Data Canada we are supporting these kinds of advancements with the development of the Semantic Engine. The Semantic Engine lets researchers – the people with the most familiarity with the data, easily write machine-readable data schemas. The Semantic Engine guides researchers through a series of questions asking the researcher to describe their tabular data. Schemas can be gradually improved by reloading the schema into the Semantic Engine and adding more information as it becomes more relevant or needed. After a machine-readable schema is generated it can be used to generate a version of the schema for publishing on webpages and these are collected together in an active research data catalogue. If the schema becomes sufficiently refined researchers could also deposit their schema into archival data repositories such as Borealis and give their schemas a DOI – a persistent identifier they and others can cite when they have data that follows that particular schema.
Examples of active research data repositories:
Food from Thought schema library
Agri-food research centre schema library
Connect with Agri-food Data Canada for help to set up the repository for your lab, project, department or anywhere else you’d like to see what research data is being collected. Our process is to create a GitHub repository hosting a just-the-docs website which can display markdown schema files generated by the Semantic Engine. Because we are using publicly available and open source resources there is no cost for the researchers who are building and using the schema library. GitHub hosted sites are required to be publicly available so your free schema library would be searchable on the internet. For a fee GitHub supports private documentation websites as well if researchers are interested in this approach.
Connect with us at ADC to set up an active research data catalogue today. adc@uoguelph.ca
© 2023 University of Guelph