new feature

There is a new feature just released in the Semantic Engine!

Now, after you have written your schema you can use this schema to enter and verify data using your web browser.

Find the link to the new tool in the Quick Link lists, after you have uploaded a schema. Watch our video tutorial on how to easily create your own schema.

Link to the Data Entry Web plus Verification tool in the Quick Links section.
Link to the Data Entry Web plus Verification tool in the Quick Links section.

Add data

The Data Entry Web tool lets you upload your schema and then you can optionally upload a dataset. If you choose to upload a dataset, remember that Agri-food Data Canada and the Semantic Engine tool never receive your data. Instead, your data is ‘uploaded’ into your browser and all the data processing happens locally.

If you don’t want to upload a dataset, you can skip this step and go right to the end where you can enter and verify your data in the web browser. You add rows of blank data using the ‘Add rows’ button at the bottom and then enter the data. You can hover over the ?’s to see what data is expected, or click on the ‘verification rules’ to see the schema again to help you enter your data.

 

Screenshot of entering data following the rules of a schema using Data Entry Web.
Screenshot of entering data following the rules of a schema using Data Entry Web.

 

If you upload your dataset you will be able to use the ‘match attributes’ feature. If your schema and your dataset use the same column headers (aka variables or attributes), then the DEW tool will automatically match those columns with the corresponding schema attributes. Your list of unmatched data column headers are listed in the unassigned variables box to help you identify what is still available to be matched. You can create a match by selecting the correct column name in the associated drop-down. By selecting the column name you can unmatch an assigned match.

 

Matching attributes between schema and dataset in the DEW tool.
Matching attributes between schema and dataset in the DEW tool.

 

Matching data does two things:

1) Lets you verify the data in a data column (aka variable or attribute) against the rules of the schema. No matching, no verification.

2) When you export data from the DEW tool you have the option of renaming your column names to the schema name. This will automate future matching attempts and can also help you harmonize your dataset to the schema. No matching, no renaming.

Verify data

After you have either entered or ‘uploaded’ data, it is time to use one of the important tools of DEW – the verification tool! (read our blog post about why it is verification and not validation).

Verification works by comparing the data you have entered against the rules of the schema. It can only verify against the schema rules so if the rule isn’t documented or described correctly in the schema it won’t verify correctly either. You can always schedule a consultation with ADC to receive one-on-one help with writing your schema.

 

Verifying data using a schema in the DEW tool of the Semantic Engine.
Verifying data using a schema in the DEW tool of the Semantic Engine.

 

In the above example you can see the first variable/attribute/column is called farm and the DEW tool displays it as a list to select items from. In your schema you would set this feature up by making an attribute a list (aka entry codes). The other errors we can see in this table are the times. When looking up the schema rules (either via the link to verification rules which pops up the schema for reference, or by hovering over the column’s ?) you can see the expected time should be in ISO standard (HH:MM:SS), which means two digits for hour. The correct times would be something like 09:15:00. These format rules and more are available as the format overlay in the Semantic Engine when writing your schema. See the figure below for an example of adding a format rule to a schema using the Semantic Engine.

 

Add format rules for data entry using the Semantic Engine
Add format rules for data entry using the Semantic Engine

Export data

A key thing to remember, because ADC and the Semantic Engine don’t ever store your data, if you leave the webpage, you lose the data! After you have done all the hard work of fixing your data you will want to export the data to keep your results.

You have a few choices when you export the data. If you export to .csv you have the option of keeping your original data headers or changing your headers to the matched schema attributes. When you export to Excel you will generate an Excel following our Data Entry Excel template. The first sheet will contain all the schema documentation and then next sheet will contain your data with the matching schema attribute names.

The new Data Entry Web tool of the Semantic Engine can help you enter and verify your data. Reuse your schema and improve your data quality using these tools available at the the Semantic Engine.

 

Written by Carly Huitema

Let’s take a little jaunt back to my FAIR posts.  Remember that first one?  R is for Reusable?  Now, it’s one thing to talk about data re-usability, but it’s an entirely different thing to put this into action.  Well, here at Agri-food Data Canada or ADC we like to put things into action, or think about it as “putting our money where our mouth is”.   Oh my!  I’m starting to sound like a billboard – but it’s TIME to show off what we’re doing!

Alrighty – data re-usability.  Last time I talked about this, I mentioned the reproducibility crisis and the “fear” of people other than the primary data collector using your data.  Let’s take this to the next level.  I WANT to use data that has been collected by other researchers, research labs, locales, etc… But now the challenge becomes – how do I find this data?  How can I determine whether I want to use it or whether it fits my research question without downloading the data and possibly running some pre-analysis, before deciding to use it or not?

ADC’s Re-usable Data Explorer App

How about our newest application?  the Re-usable Data Explorer App?  The premise behind this application is that research data will be stored in a data repository, we’ll use Borealis, the Canadian Dataverse Repository for our instance.  At the University of Guelph, I have been working with researchers in the Ontario Agricultural College for a few years now, to help them deposit data from papers that have already been published – check out the OAC Historical Data project.  There are currently almost 1,500 files that have been deposited representing almost 60 studies.   WOW!  Now I want to explore what data there is and whether it is applicable to my study.

Let’s visit the Re-usable Data Explorer App   and select Explore Borealis at the top of the page.  You have the option to select Study Network and Data Review.  Select Study Network and be WOWed.   You have the option to select a department within OAC or the Historical project.  I’m choosing the Historical project for the biggest impact!  I also love the Authors option.

 

Look at how all these authors are linked, just based on the research data they deposited into the OAC historical project!  Select an author to see how many papers they are involved with and see how their co-authors link to others and so on.

But ok – where’s the data?  Let’s go back and select a keyword.  Remember lots of files, means you need a little patience for the entire keyword network to load!!  Zoom in to select your keyword of choice – I’ll select “Nitrogen”.  Now you will notice that keywords needs some cleaning up and that will happen over the next few iterations of this project.  Alright nitrogen appears in 4 studies – let’s select Data Review at the top.  Now I need to select one of the 4 studies – I selected the Replication Data for: Long-term cover cropping suppresses foliar and fruit disease in processing tomatoes.

What do I see?
All the metadata – at the moment this comes directly from Borealis – watch for data schemas to pop up here in the future!  Let’s select Data Exploration – OOOPS the data is restricted for this study – no go.

Alrighty let’s select another study:  Replication Data for: G18-03 – 2018 Greens height fertility trial

Metadata – see it!  Let’s try Data exploration – aha!  Looking great – select a datafile – anything with a .tab ending – and you will see a listing of the raw data.  Check out Data Summary and Data Visualization tabs!

heatmap

Wow!!  This gives me an idea of the relationship of the variables in this dataset and I can determine by browsing these different visualizations and summary statistics whether this dataset fits the needs of my current study – whether I can RE-USE this data!

Last question though – ok I’ve found a dataset I want to use – how do I access it?  Easy…  Go to the Study Overview tab scroll down to the DOI of the dataset.  Click it or copy it into your browser and it will take you to the dataset in the data repository and you can click Access Dataset to view your download options

Data Re-use at my fingertips

Now isn’t that just great!  This project came from a real use case scenario and I just LOVE what the team has created!  Try it out and let us know what you think or if you run into any glitches!

I’m looking forward to the finessing that will take place over the next year or so – but for now enjoy!!

Michelle

The Semantic Engine has a new upgrade for importing existing entry codes!

If you don’t know what entry codes are, you can check out our blog post about how to use entry codes. We also walk through an example of entry codes in our video tutorial.

While you can type your entry codes and labels in directly when writing your schema, if you have a lot of entry codes it might be easier to import them. We already discussed how to import entry codes from a .csv file, or copy them from another attribute, but you can also import them from another OCA schema. You use the same process for uploading the schema bundle as you would for the .csv file.

Add entry codes by clicking on the upload arrow next to the attribute name.
Add entry codes by clicking on the upload arrow next to the attribute name.

The advantage of using entry codes from an existing schema is that you can reuse work that someone has already done. If you like their choice of entry codes now your schema can also include them. After importing a list of entry codes you can extend the list by adding more codes as needed.

You can watch an example of entry codes in action in our tutorial video.

Entry codes are very valuable and can really help with your data standardization. The Semantic Engine can help you add them to your data schemas.

Written by Carly Huitema

The Semantic Engine has gotten a recent upgrade for importing entry codes.

If you don’t remember what entry codes are, they help with data standardization and quality by limiting what people can enter in a field for a specific attribute. You can read more about entry codes in our entry code blog post.

Now, the Semantic Engine lets you upload a .csv file that contains your entry codes rather than typing them in individually. You can include the code and multiple languages in your entry code .csv file. Don’t worry if the languages don’t appear in your schema, you will have a chance to pick which ones you want to use.

An example .csv file with a code and its label in both French and English.
An example .csv file with a code and its label in both French and English.

After you have created your .csv file it is time to add them to your schema.

After you have ensured to click ‘list’ for your attribute, it will appear on the screen for adding entry codes. There is an up arrow to select that will let you upload your .csv file containing the entry codes and their labels.

Add entry codes by clicking on the upload arrow next to the attribute name.
Add entry codes by clicking on the upload arrow next to the attribute name.

The Semantic Engine will try to auto-match the columns and will give you a screen to check the matching and fill in the correct fields if you need to.

Match the code and language columns with the correct columns from your imported .csv file.
Match the code and language columns with the correct columns from your imported .csv file.

After you have matched columns (and discarded what you don’t need) you now have imported entry codes into your schema. Now you can save the .csv files to reuse when it comes to adding more entry code overlays. It is best practice to include labels for all your languages in your schema, even if they are a repeat of the Code column itself. You can always change it later.

Entry codes are an excellent way to support data quality entry as well as internationalization in your schemas. The Semantic Engine has made it easier to add them using .csv files.

Written by Carly Huitema