Data Quality Annotations

What do you do when you’ve collected data but you need to also include notes in the data. Do you mix the data together with the notes?

Here we build on our previous blog post describing data quality comments with worked examples.

An example of quality comments embedded into numeric data is if you include values such as NULL or NA when you have a data table. Below are some examples of datatypes being assigned to different attributes (variables v1-v8). You can see in v5 that there is are numeric measurements values mixed together with quality notations such as NULL, NA, or BDL (below detection limit).

Examples of different types of datatype classifications.
Examples of different types of datatype classifications.

Technically, this type of data would be given the datatype of text when using the Semantic Engine. However, you may wish to use v5 as a numeric datatype so that you can perform analysis with it. You could delete all the text values, but then you would be losing this important data quality information.

As we described in a previous blog post, one solution to this challenge is to add quality comments to your dataset. How you would do this is demonstrated in the next data example.

In this next example there are two variables: c and v. Variable v contains a mixture of numeric values and text.

step 1: Rename v to v_raw. It is good practice to always keep raw data in its original state.

step 2: copy the values into v_analysis and here you can remove any text values and make other adjustments to values.

step 3: document your adjustments in a new column called v_quality and using a quality code table.

The quality code table is noted on the right of the data. When using the Semantic Engine you would put this in a separate .csv file and import it as an entry code list. You would also remove the highlighted dataypes (numeric, text etc.) which don’t belong in the dataset but are written here to make it easier to understand.

Example of adding additional rows of data to a dataset with quality comments.
Quality Annotations

You can watch the entire example being worked through using the Semantic Engine in this YouTube video. Note that even without using the Semantic Engine you can annotate data with quality comments, the Semantic Engine just makes the process easier.

 

Written by Carly Huitema