An engine to help researchers generate meaning for data.
The benefits of better data schemas
Data must be structured to be understood and a schema describes the structure of the data.

For example, a schema can describe what information is contained within the columns of a dataset. Researchers can tune the detail of descriptions in their data schemas depending on their needs.

The better a schema is, the more value it adds to the associated dataset. For researchers this can give a host of benefits. You can help your present self, your future self, and your collaborators by better documenting your data. You can avoid mystery data, or spending time following references to figure out what you had done with data years ago. You can also avoid costly mistakes when you think you understood your data, but after hours of analysis you realize that your assumptions were wrong (or worse you publish and someone else figures out your wrong assumptions based on incorrectly interpreted data).
A better data schema can also help researchers and the research community when you share your data with other researchers. Better documentation means that you spend less time answering questions from other data users. You can communicate the context of the data better and ensure your data is used relevantly. This can be especially valuable in cross-disciplinary research where other people are less familiar with the conventions of your discipline.
How to easily write better schemas
The desire to write better data schemas can be difficult because of the amount of work needed and the knowledge of how to do it. Agri-food Data Canada is creating the semantic engine to help researchers write better data schemas with less effort. We are developing the semantic engine together with researchers to ensure that it meets researcher needs.
To create the semantic engine, Agri-food Data Canada is partnering with the Human Colossus Foundation to adopt their work on Overlay Capture Architecture (OCA) as the underlying schema standard. Overlay Capture Architecture is an extensible, flexible, international, open, and machine-accessible standard for schemas.

An OCA schema takes a table representation of a schema and splits each feature into a separate layer. Each layer is a separate file (written in a machine-readable format) that recognizes the Capture Base which is the basic foundation of the schema describing the dataset. Layers are added to the schema adding more detail, making easier to understand and use data that has been collected and structured according to the associated schema.

There are many benefits to this layered schema architecture, especially improved interoperability and extensibility. Each layer is independent and references the unique identifier of the schema base. You can begin with a very basic schema and as it becomes necessary (or popular) you can add layers referencing the capture base and increase schema usability. You can also extend and improve other people’s schemas to fit your needs. For example, you can add a layer with the labels and information in your own language to make it easier for users who don’t speak the original documented language. Rather than creating a new schema, you add new layers while keeping the same schema base which keeps your data interoperable.

Layers can be more than just descriptions and labels. For example, you can add a data transformation layer which contain the instructions for how to transform data from another schema into your format. This might be important when you want to work with data where the units are in an unusual format; the data transformation layer records how to transform data from one schema type to another, making data collected with two different schema bases interoperable.
The semantic engine being created by Agri-food Data Canada in partnership with the Human Colossus Foundation lets researchers create, use and export schemas using the flexible and extensible OCA standard. The semantic engine is an engine to help researchers generate meaning for data.