FAIR

Findable

Accessible (where possible)

Interoperable

Reusable

Good day everyone!  We’re back looking at the FAIR priniciples and have now moved to talk about A for Accessible (where possible).  Let’s first review the 2 principles under Accessible:

A1. (Meta)data are retrievable by their identifier using a standardised communications protocol
A2. Metadata are accessible, even when the data are no longer available

Source: FAIR principles

Oh my!  What in the world are you talking about?  Metadata? Identifier?  Accessible when data are no longer available?  Come on!  I’m a researcher that just wants to collect my data, document as I need for my purposes, publish the papers and data where required and move on to the next project.  Now you’re telling me I need to ensure my metadata has an identifier?  Like a DOI (digital object identifier)??  How in the world do I do that?  and what????  My metadata should be accessible even if my data is NOT available?  Ok – now I’m confused!

The above is a conversation I imagined having with my younger researcher self.  Let’s say 20 years ago to put this into context.  However, I can almost imagine having this conversation with some of our researchers today!  Lucky for them we have tools and services to help out with what appears to be a scary and very time consuming proposition.

So, let’s walk through this.  I’ve been going on and on and on about metadata – I know!  What can I say?  I love my metadata and am trying to convince everyone around me that they should love their metadata too!  I’ve heard this saying a few times now “Writing a love letter/note to one’s future self” or something similar to this (those of you that know me – are well aware of my inability to remember sayings).  This is a great way to think about metadata.  Writing a note to yourself to remind yourself about what you’re doing with your data.  Now let’s think about principle A1 above – we’re suggesting that you save your metadata or that love letter – put it somewhere safe and in a place where you can retrieve it when you want it or need it.  Me, I have my notebooks, YES, I have saved them all going back a number of years.  BUT, ask me to find something in them – ha ha ha!  Not going to happen quickly!  I’ll find it “sometime”.  Now if you take your metadata, love letter, or data schema and deposit it in… say.. Borealis…  Guess what?  You will now have a unique identifier for it (DOI) and you’ll be able to find it – not like me going through all my notebooks!  You’ve got nothing to lose and everything to gain – so let’s try it!  Check out Saving, Depositing and/or Publishing your Schema or book an appointment with the ADC team to work with you.

Alrighty we’re moving towards increased accessibility of our data and metadata.  Now, what about that data for one reason or another canNOT be shared or made available?  Why in the world should I still create that metadata, love letter, data schema, and deposit it?  This is where I feel we have failed the world of science over the past few decades.   Isn’t science about sharing and building knowledge?  How can we do this if we are not aware of the data that has been collected or the studies that have been conducted?  I know, I know, I should stay on top of journal publications!  BUT!  are all studies published?  Heck no!  Why?  In some cases, because study results are not statistically significant, so why publish?  If you don’t publish, how can anyone be aware of the data that may have been collected?   Word of mouth only goes so far!  So let’s publish that data schema, metadata, love letter!  Remember it’s the metadata only – NO data!  So NO excuses!

Here’s another reason why you should publish your data schemas – citations!  YES! you can cite a data schema, the same way as you would cite a paper reference.  Hmm…  hang on now, citations of my works…  yes!  A benefit to you!!

Let’s round this all up to say that the more (meta)data we make accessible, the more knowledge we build.  There really is NO negative side of sharing your metadata!  So let’s do it!

Michelle

 

 

 

Findable

Accessible (where possible)

Interoperable

Reusable

Oh my INTEROPERABILITY!   What a HUGE word this is and let’s be honest are we all comfortable with what it means?  Remember data – interoperability…..  Ah but let’s start with a silly and very basic example first:  a tool – more specifically a wrench.

Tools, let’s think about a wrench – you know that tool that helps us remove or tighten nuts – ok before someone pipes up and corrects me – let’s be more specific:  a ratcheting or combination wrench (google it to get a picture).  They come in different sizes – which is great and really helpful for those tough nuts since they fit snuggly around the nut and you can really yank on that wrench to loosen the nut.   I’m sure many of you can relate to this.  However, how many times have you tried this and that darn wrench is just a little too big or just a little too small – and we know that those nuts are a standard size!  What’s going on???  Don’t laugh too hard at this analogy – as I don’t know how many times we’ve encountered this in our household.  Metric vs SAE – yup!  Different standards for the tools – Ugh!  Now where’s my 8mm or was it the 1/4 inch?

If these were interoperable – I should be able to use one wrench for my nuts regardless of whether it was metric of SAE, but alas, 2 standards and the only way around them is to use an adjustable wrench – aha and answer to the 2 standards and a way to interoperability?

Let’s turn to data now – how do we gather information about the data we are working with?  Metadata!  The metadata will or rather should point us in the direction as to how the data was measured, if any standards were used, if the metadata follows a standard, and in general what is the data we’re working with.  Now, let’s say we are working with weights measured on dogs.  I am more comfortable weighing my dogs using pounds (lbs) and my colleague is more comfortable weighing their dogs in kilograms (kgs).  What happens when I pool these 2 data sources together?  I may have an interesting set of data with a mix of weights in lbs and kgs.  I may well have a Great Dane who appears to weigh the same as a Chihuahua!  I NEED that metadata to help me, as a researcher or data user – understand what my data represents and what transformations I may or may not need in order to pool the data!

Without this information I cannot integrate different sources of data!  Think of that wrench – my nuts were metric and my wrench was SAE – it just won’t fit.  The only difference with data – I can still pool all that data and come up with interesting and non-sensical results – that just won’t work with the wrench.

So interoperability when we think of the FAIR principles is the ability to integrate data from different sources, as long as we know the following:

I1. (Meta)data use a formal, accessible, shared, and broadly applicable language for knowledge representation.
I2. (Meta)data use vocabularies that follow FAIR principles
I3. (Meta)data include qualified references to other (meta)data

Source: FAIR Principles

Michelle

Findable

Accessible (where possible)

Interoperable

Reusable

I believe most of us are now familiar with this acronym?  The FAIR principles  published in 2016.  I have to admit that part of me really wants to create a song around these 4 words – but I’ll save you all from that scary venture.  Seriously though, how many of us are aware of the FAIR principles?  Better yet, how many of us are aware of the impact of the FAIR principles?  Over my next blog posts we’ll take a look at each of the FAIR letters and I’ll pull them all together with the RDM posts – YES there is a relationship!

So, YES I’m working backwards and there’s a reason for this.  I really want to “sell” you on the idea of FAIR.  Why do we consider this so important and a key to effective Research Data Management – oh heck it is also a MAJOR key to science today.

R is for Reusable

Reusable data – hang on – you want to REUSE my data?  But I’m the only one who understands it!   I’m not finished using it yet!  This data was created to answer one research question, there’s no way it could be useful to anyone else!  Any of these statements sound familiar?   Hmmm…  I may have pointed some of these out in the RDM posts – but aside from that – truthfully, can you relate to any of these statements?  No worries, I already know the answer and I’m not going to ask you to confess to believing or having said or thought any of these.  Ah I think I just heard that community sigh of relief 🙂

So let’s look at what can happen when a researcher does not take care of their data or does not put measures into place to make their data FAIR – remember we’re concentrating on the R for reusability today.

Reproducibility Crisis?

Have you heard about the reproducibility crisis in our scientific world?  The inability to reproduce published studies.  Imagine statements like this: “…in the field of cancer research, only about 20-25% of the published studies could be validated or reproduced…”? (Miyakawa, 2020). How scary is that?  Sometimes when we think about reproducibility and reuse of our data – questions that come to mind – at least my mind – why would someone want my data?  It’s not that exciting?  But boys oh boys when you step back and think about the bigger picture – holy cow!!!  We are not just talking about data in our little neck of the woods – this challenge of making your research data available to others – has a MUCH broader and larger impact!  20-25% of published studies!!! and that’s just in the cancer research field.  If you start looking into this crisis you will see other numbers too!

So, really what’s the problem here?   Someone cannot reproduce a study – maybe it’s age of the equipment, or my favourite – the statistical methodologies were not written in a way the reader could reproduce the results IF they had access to the original data.  There are many reasons why a study may not be reproducible – BUT – our focus is the DATA!

The study I referred to above also talks about some of the issues the author encountered in his capacity as a reviewer.  The issue that I want to highlight here is access to the RAW data or insufficient documentation about the data – aha!!  That’s the link to RDM.  Creating adequate documentation about your data will only help you and any future users of your data!  Many studies cannot by reproduced because the raw data is NOT accessible and/or it is NOT documented!

Pitfalls to NO Reusable data

There have been a few notable researchers that have lost their career because of their data or rather lack thereof.  One notable one is Brian Wansink, formerly of Cornell University.  His research was ground-breaking at the time, studying eating habits, looking at how cafeterias could make food more appealing to children, it was truly great stuff!  BUT…..  when asked for the raw data…..  that’s when everything fell apart.  To learn more about this situation follow the link I provided above that will take you to a TIME article.

This is a worst case scenario – I know – but maybe I am trying to scare you!  Let’s start treating our data as a first class citizen and not an artifact of our research projects.  FAIR data is research data that should be Findable, Accessible (where possible), Interoperable, and REUSABLE!  Start thinking beyond your study – one never knows when the data you collected during your MSc or PhD may be crucial to a study in the future.  Let’s ensure it’s available and documented – remember Research Data Management best practices – for the future.

Michelle