Documenting your work: Variable Names – Research Data Management (RDM)
The next stop on our RDM travels is “Documenting your work”. Those 3 words can scare a lot of people – let’s face it that means spending time writing things down, or creating scripts, or it could be viewed as taking time away from conducting research and analysis. Yes, I know, I know – and anyone who has worked with me in the past, knows that I value documentation VERY highly! Without documentation, your data is valuable to YOU at this moment, but 6 months or 5 years down the road, without documentation it may become useless. On this note, before I start talking about the details of documenting your work, I would like to share the Data Sharing and Management Snafu in 3 Short Acts video. I cannot believe that this video is 10 years old – but it still SO relevant. If you have not seen it, please watch it! It highlights WHY we are talking about RDM – but near the end it deals with our topic today – documenting your data.
Reference: NYU Health Sciences Library. “Data Sharing and Management Snafu in 3 Short Acts” YouTube, 19 Dec 2012, https://www.youtube.com/watch?v=N2zK3sAtr-4.
Variable Names
So let’s talk about variable names for your statistical analyses. Creating variable names is usually done with statistical analyses packages in mind. Let’s be honest we only want to create the variable names once – if we have to rename them – we increase our chances of introducing oddities in our analyses and outputs. Hmm… could I be talking about personal experiences? How many times, in the past, have I fallen into the trap of naming my variables V1, V2, V3, etc… or ME1 or ME_final? It is so easy to fall into these situations especially when we have a deadline looming. So let’s try to build some habits that will help us avoid these situations and help us create data documentation that can eventually be shared and understood by researchers outside of our inner circle. A great place to begin is by reviewing the naming characteristics of the most popular packages used by University of Guelph researchers – based on a survey I conducted in 2017.
Length of Variable Name
SAS: 32 characters long
Stata: 32 characters long
Matlab: 32 characters long
SPSS: 64 bytes long = 64 characters in English or 32 characters in Chinese
R: 10,000 characters long
1st Character of a Variable Name
SAS: MUST be a letter or an underscore
Stata: MUST be a letter or an underscore
Matlab: MUST be a letter
SPSS: MUST be a letter, an underscore or @,#,$
R: No restrictions found
Special Characters in Variable Names
SAS: NOT allowed
Stata: NOT allowed
Matlab: No restrictions found
SPSS: ONLY Period, @ are allowed
R: ONLY Period is allowed
Case in Variable Names
SAS: Mixed case –Presentation only
Stata: Mixed case – Presentation only
Matlab: Case sensitive
SPSS: Mixed case – Presentation only
R: Case sensitive
Recommended Best Practice for Variable Names
Based on the naming characteristics listed above the following is a list of Recommended Best Practices to consider when naming your variables:
- Set Maximum length to 32 characters
- ALWAYS start variable names with a letter
- Numbers can be used anywhere in the variable name AFTER the first character
- ONLY use underscores “_” in a variable name
- Do NOT use blanks or spaces
- Use lowercase
Example Variable Names
Heading in Excel or description of the measure to be taken → variable name to be used in a statistical analysis
Diet A → diet_a
Fibre length in centimetres → fibre_cm
Location of farm → location
Price paid for fleece → price
Weight measured during 2nd week of trial → weight2
Label or description of the variable
Let’s ALWAYS ensure that we are keeping the descriptive part or label for the variable name documented. Check out the Semantic Engine, an easy to use tool to document your dataset!
Conclusion
Variable names are only one piece of the documentation for any study, but it’s usually the first piece we tend to work on as we collect our data or once we start the analysis. Next RDM post I will talk about the other aspects of documentation and present different ways to do it.