How to Use {renv} and Bioconductor for Reproducible Data Analysis
Have you ever wanted to go back to an analysis you did in the past or share it with one of your colleagues? But as soon as you do, you start dealing with:
- Package Installation issues
- Getting different results compared to the last time you run your code
- Unexpected errors when running your code
Fortunately, we have {renv} which allows us to store what packages, in which versions, from which repositories we are using in our project.
We already covered {renv} in R renv: How to Manage Dependencies in R Projects Easily, but in this blog post we are going to discuss how to use it alongside Bioconductor.
What is Bioconductor?
Bioconductor is a project that aims to develop and share open source software for precise and repeatable analysis of biological data. It offers a repository of 2216 packages (as of Bioconductor 3.20).
Bioconductor is slightly different compared to CRAN as:
- Bioconductor comes in 6 month releases, where each release is compatible with a single R version. CRAN packages are continuously updated without referencing a particular R version.
- Bioconductor offers its own installation tool via the BiocManager package and you can use BiocManager::install instead of install.packages
Bring interactivity to your bioinformatics data with shiny.gosling, now on Bioconductor.
Why Use {renv} With Bioconductor?
You might wonder why we might want to use renv, if Bioconductor repositories are already versioned and come in releases.
However, release branches in Bioconductor may be updated with bugfixes (source). In Bioconductor Packages: Development, Maintenance, and Peer Review we can even find the description of the procedure on how to introduce bug fixes in Bioconductor releases.
This means that even though we are using for example Bioconductor 3.20 a package within it might have been updated with a bugfix. Using {renv} allows us to keep track of which version is actually being used.
Bioconductor packages you are using might depend on CRAN packages. CRAN packages are not versioned alongside Bioconductor. Using {renv} allows us to track the CRAN packages we might be indirectly using and make sure we continue using the same versions.
Therefore, just using a specific Bioconductor release is not enough to ensure package version reproducibility. It needs to be combined with a tool like {renv} so we can remain calm about using the same package versions in our project.
Ready to take reproducibility a step further? Learn how to combine {renv} with Docker to create a reproducible Shiny application environment.
How to Use Bioconductor With {renv}
Let’s start by creating a new project:
# install.packages("usethis")
usethis::create_project("renv_bioconductor")
Now, let’s start using renv
in our project:
# install.packages("renv")
renv::init(bioconductor = TRUE)
Notice the bioconductor parameter, it can be used to specify the bioconductor release we want to use (e.g. 3.20) or we can pass TRUE to use the default version of Bioconductor recommended by the BiocManager package.
Note: Remember Bioconductor releases are designed to work with specific versions of R. Bioconductor includes that information in the Release Announcements.
Now, after restarting our R session we can check which package repositories we are using:
$repos
BioCsoft
"https://bioconductor.org/packages/3.20/bioc"
BioCann
"https://bioconductor.org/packages/3.20/data/annotation"
BioCexp
"https://bioconductor.org/packages/3.20/data/experiment"
BioCworkflows
"https://bioconductor.org/packages/3.20/workflows"
BioCbooks
"https://bioconductor.org/packages/3.20/books"
CRAN
"https://cloud.r-project.org"
All right, now that we have both Bioconductor and CRAN repositories set up, let’s install some packages. We can use renv::install to install Bioconductor packages:
renv::install(c("GenomicFeatures", "AnnotationDbi"))
You can also use install.packages()
.
Now, we can continue using renv the same way as in any other project. For demonstration purposes let’s add a main.R file in our project with the following content:
library(GenomicFeatures)
library(AnnotationDbi)
When we run renv::snapshot
it will record both of the used packages, their dependencies and versions in the renv.lock
file.
renv::snapshot()
Let’s have a look at our renv.lock
file:
{
"R": {
"Version": "4.3.1",
"Repositories": [
{
"Name": "BioCsoft",
"URL": "https://bioconductor.org/packages/3.20/bioc"
},
{
"Name": "BioCann",
"URL": "https://bioconductor.org/packages/3.20/data/annotation"
},
{
"Name": "BioCexp",
"URL": "https://bioconductor.org/packages/3.20/data/experiment"
},
{
"Name": "BioCworkflows",
"URL": "https://bioconductor.org/packages/3.20/workflows"
},
{
"Name": "BioCbooks",
"URL": "https://bioconductor.org/packages/3.20/books"
},
{
"Name": "CRAN",
"URL": "https://cloud.r-project.org"
}
]
},
"Bioconductor": {
"Version": "3.20"
},
"Packages": {
...
}
}
We can see that it stores:
- A list of used bioconductor repositories and their dependencies.
- The version of the Bioconductor release used
Now, whenever we go back to our code or share it with our colleagues, all used dependencies (in the correct versions!) can be installed using renv::restore().
Bonus: Using Posit Package Manager Bioconductor Mirrors
Posit Package Manager allows you to host your own mirrors of Bioconductor and it might already be used in your company. There is also a free public version Posit Package Manager for community use available.
Using Bioconductor mirrors of Posit Package Manager has the following benefits:
- Better reproducibility as generated Bioconductor setup instructions include a recommended CRAN snapshot compatible with the Bioconductor version we want to use
- Faster package installation times thanks to pre-compiled CRAN binaries
- Better security thanks to curated CRAN repositories and package vulnerability reporting (introduced in Posit Package Manager 2023.12.0)
- Possibility of hosting internally developed packages
To start using a Bioconductor mirror, we can use the instructions available in the setup tab. In our example we are using macOS
and let’s assume we want to use Bioconductor 3.20:
Now, we can use the generated instructions and to configure renv:
# Create new project
usethis::create_project("renv_bioconductor_ppm")
# Configure BioCManager to use Posit Package Manager:
options(BioC_mirror = "https://packagemanager.posit.co/bioconductor")
options(BIOCONDUCTOR_CONFIG_FILE = "https://packagemanager.posit.co/bioconductor/config.yaml")
# Configure a CRAN snapshot compatible with Bioconductor 3.20:
options(repos = c(CRAN = "https://packagemanager.posit.co/cran/2023-10-25"))
# Initialize renv
renv::init(bioconductor = "3.20")
Our lock file now should look like this:
{
"R": {
"Version": "4.3.1",
"Repositories": [
{
"Name": "BioCsoft",
"URL": "https://packagemanager.posit.co/bioconductor/packages/3.20/bioc"
},
{
"Name": "BioCann",
"URL": "https://packagemanager.posit.co/bioconductor/packages/3.20/data/annotation"
},
{
"Name": "BioCexp",
"URL": "https://packagemanager.posit.co/bioconductor/packages/3.20/data/experiment"
},
{
"Name": "BioCworkflows",
"URL": "https://packagemanager.posit.co/bioconductor/packages/3.20/workflows"
},
{
"Name": "BioCbooks",
"URL": "https://packagemanager.posit.co/bioconductor/packages/3.20/books"
},
{
"Name": "CRAN",
"URL": "https://packagemanager.posit.co/cran/2023-10-25"
}
]
},
"Bioconductor": {
"Version": "3.20"
},
"Packages": {
"renv": {
"Package": "renv",
"Version": "1.0.3",
"Source": "Repository",
"Repository": "CRAN",
"Requirements": [
"utils"
],
"Hash": "41b847654f567341725473431dd0d5ab"
}
}
}
As we can see, we are using the Public Posit Package Manager mirror of Bioconductor 3.20 and we are using a CRAN snapshot from October 25th!
Summing Up {renv} and Bioconductor for Reproducible Data Analysis
- Bioconductor is a project that aims to develop and share open source software for precise and repeatable analysis of biological data. It offers a repository of 2289 packages (as of Bioconductor 3.20).
- Using a specific Bioconductor release is not enough to ensure package version reproducibility, that is why it can be a good idea to combine it with using
renv
- To start a new
renv
project with Bioconductor you can just runrenv::init(bioconductor = TRUE)
- Better reproducibility can be achieved by using Bioconductor mirrors of Posit Package Manager as in the setup instructions, we get compatible CRAN snapshots. Additionally, we might benefit from better package installation speeds and enhanced security.
Did you find this useful? Stay up to date on everything Shiny for the life sciences by subscribing to our Shiny Weekly newsletter.