Searching for variables within Data Schemas
My last post was all about where to store your data schemas and how to search for them. Now let’s take it to the next step – how do I search for what’s INSIDE a data schema – in other words how do I search for the variables or attributes that someone has described in their data schema? A little caveat here – up to this point, we have been trying to take advantage of National data platforms that are already available – how can we take advantage of these with our services? Notice the words in that last statement “up to this point” – yes that means we have some new options and tools coming VERY soon. But for now – let’s see how we can take advantage of another National data repository odesi.ca.
Findable in the FAIR principles?
How can a data schema help us meet the recommendations of this principle? Well…. technically I showed you one way in my last post – right? Finding the data schema or the metadata about our dataset. But let’s dig a little deeper and try another example using the Ontario Dairy Research Centre (ODRC) data schemas to find the variables that we’re measuring this time.
As I noted in my last post there are more than 30 ODRC data schemas and each has a listing of the variables that are being collected. As a researcher who works in the dairy industry – I’m REALLY curious to see WHAT is being collected at the ODRC – by this I mean – what variables, measures, attributes. But, when I look at the README file for the data schemas in Borealis, I have to read it all and manually look through the variable list OR use some keyboard combination to search within the file. This means I need to search for the data schema first and then search within all the relevant data schemas. This sounds tedious and heck I’ll even admit too much work!
BUT! There is another solution – odesi.ca – another National data repository hosted by Ontario Council of University Libraries (OCUL) that curates over 5,700 datasets, and has recently incorporated the Borealis collection of research data. Let’s see what we can see using this interface.
Let’s work through our example – I want to see what milking variables are being used by the ODRC – in other words, are we collecting “milking” data? Let’s try it together:
- Visit odesi.ca
- To avoid the large number of results from any searches, let’s start by restricting our results to the Borealis entries – Under Collection – Uncheck Select All – Select Borealis Research Data
- In the Find Data box type: milking
- Since we are interested in what variables that collect milking data – change Anywhere to Variable
- Click Search
- I see over 20 results – notice that there are Replication dataset results along with the data schema results. The Replication dataset entries refer to studies that have been conducted and whose data has been uploaded and available to researchers – this is fabulous – it shows you how previous projects have also collected milking data.
For our purposes – let’s review the information related to the ODRC Data Schema entries. Let’s pick the first one on my list ODRC data schema: Tie stalls and maternity milkings. Notice that it states there is one Matching Variable? It is the variable called milking_device. If you select the the data schema you will see all the relevant and limited study level metadata along with a DOI for this schema. By selecting the variable you will also see a little more detail regarding the chosen attribute.
NOTE – there is NO data with our data schemas – we have added dummy data to allow odesi.ca to search at a variable level, but we are NOT making any data available here – we are using this interface to increase the visibility of the types of variables we are working with. To access any data associated with these data schemas, researchers need to visit the ODRC website as noted in the associated study metadata.
I hope you found this as exciting as I do! Researchers across Canada can now see what variables and information is being collected at our Research Centres – so cool!!
Look forward to some more exciting posts on how to search within and across data schemas created by the Semantic Engine. Go try it out for yourself!!!