Naming Without Spaces
In research data management, small decisions about naming can have outsized consequences. One of the simplest—and most important—best practices is to avoid spaces when you are assigning names. While spaces may seem harmless and human-friendly, they often create problems when data is processed, shared, or analyzed across different tools and systems. The Semantic Engine for example, enforces simple rules for better attribute names without spaces.
The Problem with Spaces
Spaces can introduce ambiguity and errors in many computational environments. For example:
- Programming languages may require additional syntax (such as quotes or special characters) to correctly reference file names or fields containing spaces.
- APIs and URLs convert spaces into encoded forms (like
%20), making paths harder to read and increasing the chance of errors.
In contrast, using consistent, space-free naming makes data easier to automate, share, and scale—key goals in reproducible research.
A Better Approach: Naming Conventions
Instead of spaces, researchers use structured naming conventions that improve readability while remaining machine-friendly. The most common conventions are outlined below.
1. Snake Case (snake_case)
- Words are written in lowercase and separated by underscores.
- Example:
sample_id,gene_expression_level - Best for: Data schemas, file names, and many programming contexts (especially Python).
- Benefits: Clear separation between words, widely supported, easy to read.
2. Kebab Case (kebab-case)
- Words are lowercase and separated by hyphens.
- Example:
sample-id,gene-expression-level - Best for: File names and URLs.
- Benefits: Clean and readable; commonly used in web contexts.
- Note: Some programming languages interpret hyphens as minus signs, so avoid in variable names.
3. Camel Case (camelCase)
- The first word is lowercase; subsequent words are capitalized.
- Example:
sampleId,geneExpressionLevel - Best for: Programming variables (common in JavaScript and many APIs).
- Benefits: Compact and readable without separators.
4. Pascal Case (PascalCase)
- Each word starts with a capital letter, including the first.
- Example:
SampleId,GeneExpressionLevel - Best for: Class names and structured types in many programming languages.
- Benefits: Clearly distinguishes multi-word identifiers, often used for higher-level objects.
Choosing the Right Convention
The key is consistency. Pick a convention appropriate for your tools and stick with it across your dataset, codebase, and documentation.
Conclusion
Eliminating spaces in file and schema names is a small change that greatly improves the reliability and portability of research data. By adopting clear naming conventions like snake case, kebab case, camel case, or Pascal case, researchers create datasets that are easier to process, share, and reproduce.
Written by Carly Huitema
