Funding for Agri-food Data Canada is provided in part by the Canada First Research Excellence Fund
Decentralised Semantics: A Semantic Engine user perspective
October 23-26. Salzburg. Austria.
Carly Huitema , A. Michelle Edwards , Paul Knowles , Philippe Page
The FAIR (Findable, Accessible, Interoperable and Reusable) data principles were created to guide the improvement of research data (https://www.go-fair.org/fair-principles/). As data curators and educators, we often see individual research groups and researchers establish their own unique data collection process, resulting in poor and inconsistent data documentation. At the conclusion of the project, while the data may be accessible and understood by members within the team, it is often not usable to anyone outside of those most closely associated with data collection and analysis. This is generally because of the lack of context that is available as documentation which helps other researchers understand and use the data.
An important source of contextual information about a dataset is the data schema. A schema describes the structure of the data and aspects of it are often called a data dictionary. A schema contains more than just the attribute or variable names, it can include important details such as descriptions of attributes, language specific labels, rules for data validation, ontological tagging, and machine-readable formatting rules. To ensure that schemas are FAIR, it is important to not only help researchers write schemas but also to make it easy for researchers to create machine-actionable schemas.
At Agri-food Data Canada we have adopted Overlays Capture Architecture (OCA), a global, open, extensible standard for writing machine actionable data schemas which is hosted at the non-profit Human Colossus Foundation in Switzerland. The OCA format recognizes that a schema performs multiple tasks, for example, human understanding via descriptions and labels, internationalisation via information in multiple languages, machine understanding through data validation rules etc. The OCA architecture for describing a schema is built on a ‘capture base’ which describes the bedrock foundation of a schema structure distilled to the most pertinent and minimal details. To the capture base the OCA architecture adds overlays which are task specific and reference the content addressable identifier of the capture base. All these components are collected together in a schema bundle which is machine-actionable and can be stored with the data or made more accessible by being saved in a repository where it can be given a persistent identifier to be referenced by the community. To take advantage of the benefits of OCA schemas and to simplify their creation for researchers, we have created tools that allow researchers to easily compose OCA schemas initially via templates, but also through a web interface.
The OCA schema standard is a useful tool for anyone wanting to document datasets. The flexibility of OCA means that users can collaboratively or independently improve schemas through the creation of additional overlays while the unique identity of each is preserved through cryptographic digests. Any OCA schema is machine-actionable and ready to be incorporated into workflows. The tools for creating schemas are user friendly, open, accessible, and extensible. All these features of OCA ensure that it can be adapted and adopted into a variety of data ecosystems and technologies.
During our presentation, we will introduce OCA and how researchers have adopted OCA schemas at the University of Guelph.
May 22-26, 2023. Toledo, Spain.
Agri-food Data Canada: A data ecosystem serving agri-food sustainability
Lucas Alcantara, Carly Huitema, A. Michelle Edwards
Agri-food Data Canada (ADC) is creating a data ecosystem serving agri-food sustainability. Through investments in technology, infrastructure, and culture, we are helping researchers and the research community get more value from the data researchers are already collecting. Agri-food Data Canada’s approach is guided by the FAIR data principles (that data should be Findable, Accessible, Interoperable and Reusable). To improve data FAIRness ADC is 1) Creating a semantic engine that will help researchers create and use better machine-actionable, reusable, and accessible descriptions and governance for their data, projects, algorithms, tools, workflows, and other digital research outputs; 2) Collaborating on projects supporting the federation of data silos, to ensure that data, metadata, and access rights can travel with the data from source to destination within the ADC federation; 3) Developing tools to help researchers with data provenance and traceability; and 4) Creating a culture of FAIR data by developing knowledge-sharing resources such as webinars, training, and teaching materials. ADC works with partners to align our approaches and contribute to the global research community, with the goal to ensure research data is FAIR. One collection of tools that are under development at ADC is the Semantic Engine. While there are many approaches to harmonizing data through the creation of data platforms, ADC sees the value in adding value to heterogeneous data through the creation of tools that improve data without the necessity of data platform infrastructure. Researchers can improve their data documentation workflows by adding context to their data through the creation of machine-actionable data schemas. At the heart of the Semantic Engine is the Overlays Capture Architecture (OCA), an international open standard created by the non-profit organization Human Colossus Foundation. OCA’s layered architecture which is machine-actionable and easy to generate. OCA schemas allow multiple contributors to improve a schema independently and permits the bundling of schemas with appropriate task-specific schema overlays. Schemas can be internationalized through the creation of language-independent overlays, and their additions do not change the underlying structure of the schema which ensures interoperability and allows schemas to be continually improved throughout the dataset’s lifecycle. OCA also permits the use of downstream data validation rules carried by schemas and enables the incorporation of ontological terms. For example, ontologies, terms, and data standards endorsed by ICAR can be added to schemas to improve data interoperability and harmonization, which are essential for advancing the international agri-food sector. Agri-food Data Canada is developing a powerful collection of tools and creating a data ecosystem that will reduce barriers to data documentation, ease data sharing, and support the international agri-food sector’s data needs.
April 18-20, 2023. Mountain View, California, United States of America.