Ensuring Code Consistency and Reproducibility with R projects and renv in R Studio

Ensuring code consistency and reproducibility is paramount. Imagine collaborating on a project where each member uses different package versions, leading to inconsistencies in the results obtained. One of the fundamental steps in ensuring reproducibility is setting up an organized and self-contained R project and leveraging renv, an R package manager, providing a robust solution to manage project-specific dependencies and environments.

 

Creating an R project in RStudio is a straightforward process

 

Step 1: Open RStudio

Launch RStudio on your computer. If you haven’t installed RStudio yet, you can download it from the official website: RStudio Download Page.

Step 2: Create a New Project

Once RStudio is open, navigate to the top menu and click on “File” > “New Project” > “New Directory”. You’ll see a dialog box appear with options for creating a new project.

Step 3: Choose Project Type

In the dialog box, you’ll see several project types to choose from. Select “New Directory” and then choose the type of project you want to create. For a generic R project, select “New Project”.

Step 4: Choose Project Directory

After selecting “New Project”, click “Next”. You’ll be prompted to choose a directory for your new project. This is where all your project files will be stored. You can either create a new directory or choose an existing one. I suggest you always create a new directory.

Step 5: Enter Project Name

Give your project a name in the “Directory name” field. This name will be used to name the new directory and will also be the name that identifies your project in RStudio.

Step 6: Additional Options

In the same screen above, you’ll see additional options. Check the box that says “Create a git repository” if you want to initialize a Git repository for version control. Next, check the box that says “Use renv with this project” to utilize renv for managing project dependencies. This will automatically setup renv to manage your project’s package dependencies.

Step 7: Create Project

Once you’ve chosen a directory, entered a project name, and selected the desired options, click “Create Project”. RStudio will create the project directory, set up Git (if selected), activate renv, and open the project as a new RStudio session.

Step 8: Start Working

Your new project is now set up and ready to use. You’ll see the project directory in the “Files” pane on the bottom right of the RStudio interface. You can start working on your R scripts, import data, create plots, and more within this project.

 

Using renv Package Manager

 

If you have already initialized renv when you created your project, skip to Step 2.

Step 1: Initializing renv

Start by installing and loading the renv package. If it’s not already installed, a simple installation command gets the job done. Once initialized, you don’t need to load and initialize it again, so you should comment those lines out.

# Install, load and initialize renv

install.packages(“renv”)

library(renv)

renv::init()

Step 2: Installing and Managing Packages

With renv activated, installing and managing packages becomes a breeze. You can install packages as usual from various sources like CRAN, GitHub, or even specific versions.

# Install the latest dplyr version

install.packages(“dplyr”)

# Or install a specific dplyr version directly using renv

renv::install(“dplyr@1.0.7”)

Step 3: Saving Project Dependencies

A crucial step in ensuring reproducibility is saving project dependencies. renv accomplishes this by creating a lockfile (renv.lock) that records the exact versions of all installed packages. To ensure new dependencies are added to the lockfile, you can create a snapshot of your project using renv::snapshot().

Step 4: Collaborating and Restoring Environments

Sharing your project with collaborators is seamless. Just share the project along with the renv.lock file. Collaborators can then restore the project environment to its exact state using renv::restore().

 

Why does this matter?

 

Let’s dive into an example showcasing the importance of renv in maintaining code consistency over time. Consider the scenario where the dplyr package introduces a new feature, such as “.by” in version 1.1.0.

#  Summarise mean height by species and homeworld

starwars %>%

summarise(

mean_height = mean(height),

.by = c(species, homeworld)

)

# If you run the code above, you will get the following on the R console:

# A tibble: 57 × 3

species homeworld mean_height

<chr>   <chr>           <dbl>

1 Human   Tatooine         179.

2 Droid   Tatooine         132

3 Droid   Naboo             96

4 Human   Alderaan         176.

5 Human   Stewjon          182

6 Human   Eriadu           180

7 Wookiee Kashyyyk         231

8 Human   Corellia         175

9 Rodian  Rodia            173

10 Hutt    Nal Hutta        175

 

Now, if your collaborators are using an older version of dplyr, say version 1.0.7, that did not have the “.by” feature, inconsistencies will arise.

# Running the same code as above, would return the following:

# A tibble: 174 × 2

mean_height .by

<dbl> <chr>

1          NA Human

2          NA Droid

3          NA Droid

4          NA Human

5          NA Human

6          NA Human

7          NA Human

8          NA Droid

9          NA Human

10          NA Human

 

By leveraging renv, you can ensure that your R projects remain reproducible and consistent across different environments. Managing dependencies, sharing projects, and adapting to package updates becomes effortless, enabling smooth collaboration and reliable analysis.

So, next time you start a new project, make sure to setup an R project on RStudio, and remember the power of renv in keeping your code reproducible and your results consistent.

Happy coding!

 

Written by Lucas Alcantara