RStudio Package Tests - From Theory to Implementation in R and Python

Estimated time:

time

min

Let's face it - the technical aspect of writing <strong>R and Python packages</strong> from scratch isn't complicated. However, the challenging part lies in <strong>proper testing</strong>. The goals with <strong>package tests</strong> are to ensure the package works properly and without any bugs on the client's hardware and that the correct dependency versions are used. To do this, we use the <strong>RStudio IDE</strong>. Today you'll learn how to write RStudio package tests for Python and R packages, and you'll also learn how to run and package them. Yes, we'll use RStudio for both R and Python. R naturally has better support, but Python is catching up fast. Toward the end of the article, we'll share a couple of tips and tricks regarding package tests and testing in general. Let's get started! <blockquote>Interested in Testing in Shiny? <a href="https://appsilon.com/shinytest2-vs-cypress-e2e-testing/" target="_blank" rel="noopener">Read our comprehensive guide on shinytest2 vs Cypress</a>.</blockquote> Table of contents: <ul><li><a href="#why-rstudio">Package Tests and RStudio IDE - Why RStudio?</a></li><li><a href="#r-tests">R Tests in RStudio with testthat</a></li><li><a href="#python-tests">Python Tests in RStudio with PyTest</a></li><li><a href="#tips">Package Tests and RStudio IDE - Tips & Tricks</a></li><li><a href="#summary">Summing up Tests and RStudio IDE</a></li></ul> <hr /> <h2 id="why-rstudio">Package Tests and RStudio IDE - Why RStudio?</h2> <a href="https://posit.co" target="_blank" rel="noopener">RStudio</a> is an Integrated Development Environment (IDE) explicitly tailored for R - a programming language for statistical computing and graphics. With their recent <a href="https://appsilon.com/posit-rstudio-rebrands/" target="_blank" rel="noopener">rebrand to Posit</a>, the company aims to be more <a href="https://solutions.posit.co/write-code/python/" target="_blank" rel="noopener">Python-friendly</a> and deliver a single data science ecosystem for R and Python. In other words, the name "RStudio" is a tad confusing if you're supporting both R and Python, hence the rebranding. If you're familiar with R, you know that RStudio makes it really simple to test R functions and packages. Today we'll show you how to do both R and Python package tests in RStudio. But first, what really is a package test, and what is a package? <h3>What is a Package?</h3> A package/library/module is a common name for a collection of prewritten code you can use to solve a certain issue without writing everything from scratch. Think of <code>ggplot2</code> package in R, or <code>matplolib</code> library in Python - they both offer amazing data visualization support through a set of built-in functions. In addition, you can also tweak just about every aspect with these two packages. Would you care to write them from scratch? Maybe, but it would take you months of dedicated work to come close, and oftentimes the projects you're working on have a strict and short deadline. That's where packages come in handy. Now, the package you write can implement any programming logic you want. It can be as simple as printing "Hello, world" to the screen, or as complex as training neural network models. There's no minimum requirement for the problem complexity or the number of lines of code. <h3>What are Package Tests?</h3> Once you have the programming logic figured out, you'll want to test it against every scenario you can imagine. It's a good practice to write tests for your functions and packages, so you can guarantee nothing will break after adding some functionality in future releases or modifying the way something works. R and RStudio have excellent support for package tests with <code>testthat</code>. It's an R package you'll learn how to use in the following section. <h2 id="r-tests">R Tests in RStudio with testthat</h2> We'll start by creating a new R package. Open RStudio, set a working directory to a location you want to save the package, and in the console run the following command: <pre><code class="language-r">devtools::create("myrpackage")</code></pre> You should see an output similar to this one: <img class="size-full wp-image-17908" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f6ccbae63b4a1136d81_1-1.webp" alt="Image 1 - Creating a new R package with devtools" width="3286" height="2230" /> Image 1 - Creating a new R package with devtools Feel free to replace <code>myrpackage</code> with whatever name you see fit, of course. You'll see the following directory structure after running the above command: <img class="size-full wp-image-17910" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af330f03f2fe35162f36_93621e4e_2-1.webp" alt="Image 2 - R package directory structure" width="1846" height="1096" /> Image 2 - R package directory structure Let's go over the responsibilities of each file and folder: <ul><li><code>R/</code> - A folder in which all of your R files will go.</li><li><code>NAMESPACE</code> - Manages what needs to be exposed to users of your R package. <code>devtools</code> will take care of the changes for you, and it's unlikely you'll ever have to edit this file manually.</li><li><code>DESCRIPTION</code> - Your package metadata, such as package name, version, description, author info, license, and so on. We'll use it shortly to add R dependencies.</li><li><code>myrpackage.Rproj</code> - RStudio-specific file attached to the project.</li></ul> Okay, we have the package configured, so now let's write some R functions. <h3>R Functions and Dependencies</h3> As said previously, all R code will live in the <code>R/</code> folder. Create a <code>my_functions.R</code> file inside it, and paste the following three functions: <pre><code class="language-r">sum_nums <- function(a, b) { return(a + b) } <br>sum_nums_err <- function(a, b) { return(a + b + 5) } <br>get_users <- function(url) { req <- httr::GET(url = url) res <- httr::content(req, as = "text", encoding = "UTF-8") parsed <- jsonlite::fromJSON(res) return(parsed$data) }</code></pre> The functions are utterly simple - the first two are used to add numbers (the second function adds a constant to the sum), and the last function makes an API request to a URL and returns the content. You can see how we haven't imported the packages using the <code>library()</code> function, and that's deliberate. <b>You should never explicitly import R packages in your own package</b>, but instead, use the double colon notation (<code>::</code>). Now it's time to address dependencies. Inside the <code>DESCRIPTION</code> file, add the following section: <pre><code class="language-text">Imports: httr (>= 1.4.4), jsonlite (>= 1.8.4)</code></pre> You can always check the version installed on your system by running <code>packageVersion("packageName")</code> from the R console. In the end, the <code>DESCRIPTION</code> file should look like this: <img class="size-full wp-image-17912" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af338561628120ce0558_f5a8ea3d_3-1.webp" alt="Image 3 - The package DESCRIPTION file" width="3106" height="2230" /> Image 3 - The package DESCRIPTION file We now have the function logic and package dependencies out of the way, so next, let's focus on package tests in RStudio. <h3>Getting Started with R Package Tests</h3> We mentioned earlier that we'll use <code>testthat</code> R package to manage the testing. You'll have to install it first, so do that with the following command: <pre><code class="language-bash">install.packages("testthat")</code></pre> Once installed, you can tell the package you want to use <code>testthat</code>: <pre><code class="language-r">usethis::use_test(3)</code></pre> The <code>3</code> passed in as a parameter instructs R that the name argument for naming the test function should be "3". Versioning of <code>testthat</code> (at the time of writing 3 was the <a href="https://testthat.r-lib.org" target="_blank" rel="noopener">latest version</a>) is handled implicitly by <code>usethis::use_testthat_impl()</code> or when <code>usethis::test_that()</code> is used to set-up a tests directory. <img class="size-full wp-image-17914" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af345d4d39bf0cf1a5a2_0aa9a67e_4-1.webp" alt="Image 4 - Using testthat in a custom R package" width="1148" height="338" /> Image 4 - Using testthat in a custom R package The last R console command will create a folder <code>tests/</code>, which contains a file named <code>testthat.R</code> and a folder with the same name. Further, the folder contains a test file that <code>usethis::use_test(3)</code> created because of our name argument. <h4>Automated Test Files</h4> Creating a test file is automated by <code>usethis::use_test()</code> when a name is not specificed and the file we want to test is the current script open. To do so, follow these steps: 1. Make sure <code>my_functions.R</code> is the active file (i.e. the current tab in the source pane) 2. Run <code>usethis::use_test()</code> 3. <code>tests/testthat/test-my_functions.R</code> should be created <h4>Manual Test Files</h4> Now let's create <b>your test file </b>manually. In <code>tests/testthat/</code> create a new R file named <code>test_my_functions.R</code>. In general, the manual test files should have a prefix <code>test_</code>, followed by the name of your R script. Once the file is created, paste the following R code inside: <pre><code class="language-r">library(testthat) <br>test_that("sum_nums tests", { expect_equal(sum_nums(5, 10), 15) expect_equal(sum_nums(5, -10), -5) expect_equal(sum_nums(3 * 5, 5 * 5), 40) }) <br>test_that("sum_nums_err tests", { expect_equal(sum_nums_err(5, 10), 15) expect_equal(sum_nums_err(5, -10), -5) expect_equal(sum_nums_err(3 * 5, 5 * 5), 40) }) <br>test_that("get_users tests", { expect_type(get_users("https://dummy.restapiexample.com/api/v1/employees"), "list") expect_type(get_users("https://dummy.restapiexample.com/api/v1/employees"), "data.frame") expect_length(get_users("https://dummy.restapiexample.com/api/v1/employee/1"), 1) })</code></pre> These three code blocks will run a couple of tests. The first block should always pass since the values of passed-in parameters are summed correctly. The second block should always fail since we've just copied the test conditions. Remember that the <code>sum_nums_err()</code> function adds <code>5</code> to the number sum. The third block will fail on the second test since the return type of the <code>get_users()</code> function is a list. Overall, you should have the following package directory structure before proceeding: <img class="size-full wp-image-17916" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af3461a49425c805ce35_d5f93b70_5-1.webp" alt="Image 5 - Directory structure after adding the test files" width="1846" height="1096" /> Image 5 - Directory structure after adding the test files And now it's finally time to run R package tests in RStudio. <h3>Run R Package Tests in RStudio</h3> RStudio will automatically figure out you're in a test file. You can verify that by inspecting the options in the top panel - you'll see the "Run Tests" button. Click on it, and you'll see the following after a couple of seconds: <img class="size-full wp-image-17918" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f7085a91527d5caf919_6-1.webp" alt="Image 6 - Running package tests in RStudio" width="3106" height="2230" /> Image 6 - Running package tests in RStudio (your version of RStudio may differ, check for the 'Build' tab in the top right pane) As you can see, we have four failed tests, all of which we knew that would occur. That's how easy it is to test R packages in RStudio. Before proceeding, I recommend you create your own unit tests. Preferably, create tests that you know will pass or fail in certain scenarios. You could write a test that you know will pass, and then alter the code so that the unit test will expose the change as a failed test. We'll now shift our focus to Python package tests. <h2 id="python-tests">Python Tests in RStudio with PyTest</h2> There are numerous testing libraries when it comes to Python, but we'll stick to <code>pytest</code>. It takes a single shell command to install it, but first, there's a bit of setting up to do. RStudio has taken its first steps into better Python integration. It might not be as strong as R integration, but it will progress over time. We'll make sure to give you an update, so stay tuned to <a href="http://appsilon.com/blog/" target="_blank" rel="noopener">Appsilon Blog</a>. The first step we have to do, or at least it's recommended to do is to set up a new <b>Python virtual environment</b>. This will keep all the dependencies separate from the global Python interpreter, so we can be sure there's no dependency mismatch. Navigate to a folder in which you want to save the Python library, and then run the following shell commands: <pre><code class="language-bash">python3 -m venv .venv source .venv/bin/activate <br>pip install wheel setuptools twine requests pandas pytest pytest-runner</code></pre> You can also specify Python versions. This can be managed with <a href="https://github.com/pyenv/pyenv" target="_blank" rel="noopener">pyenv</a> and the <a href="https://github.com/pyenv/pyenv-virtualenv" target="_blank" rel="noopener">pyenv-virtualenv plugin</a>. This is generally good practice in Python when working with envs. If you choose to do this then run the following shell commands: <pre><code> pyenv install 3.9.15 pyenv virtualenv 3.9.15 my-env-name pyenv local my-env-name </code></pre> The last will create a .python-version file. Later with the pyenv and file above, the environment will activate automatically (no longer requiring sourcing it with venv). But we will proceed without pyenv. <h3>Python Library Continued</h3> These will create and activate a new virtual environment, but also install a couple of dependencies we'll use throughout the section, such as <code>requests</code> for making HTTP requests, <code>pytest</code> for testing, and <code>pandas</code> for working with data. The next step is to create a directory structure. You can do this from a code editor or from a Terminal. We'll give you a couple of shell commands you can copy: <pre><code class="language-bash">touch setup.py touch README.md <br>mkdir mypylib cd mypylib touch __init__.py touch my_functions.py cd .. <br>mkdir tests cd tests touch __init__.py touch test_my_functions.py</code></pre> You shouldn't see any output if following along, but the Terminal window should look similar to this: <img class="size-full wp-image-17920" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f62236d529e4be90f09_7-1.webp" alt="Image 7 - Creating a Python directory structure" width="1641" height="959" /> Image 7 - Creating a Python directory structure For additional confirmation, the file and folder structure should look like this: <img class="size-full wp-image-17922" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f633ff6b1693ab1c9d0_8-1.webp" alt="Image 8 - Python library file and folder structure" width="1846" height="1096" /> Image 8 - Python library file and folder structure But what do these files stand for? Let's go over them one by one: <ul><li><code>mypylib/</code> - A folder that contains the code for your custom Python library.</li><li><code>mypylib/__init__.py</code> - Marks a directory as a Python package directory.</li><li><code>mypylib/my_functions.py</code> - Our source logic, Python code accessible after installing the library.</li><li><code>README.md</code> - Text description of the library.</li><li><code>setup.py</code> - A file that indicates the package has been packaged with Distutils, and makes for easy installation with <code>pip</code>.</li><li><code>tests/</code> - A folder containing Python test files.</li><li><code>tests/__init__.py</code>- Marks a directory as a Python package directory.</li><li><code>tests/test_my_functions.py</code> - Python file containing the actual tests for the <code>my_function.py</code> file.</li><li><code>venv/</code> - Virtual environment files and folders.</li></ul> As you can see, Python's <code>pytest</code> follows the <code>test_</code> naming convention, identically to R's <code>testthat</code>, which is one less thing to remember! With the basics out of the way, open up <code>mypylib/my_functions.py</code> file and paste the following code: <pre><code class="language-python">import requests import pandas as pd <br> def sum_nums(a, b): return a + b <br> def sum_nums_err(a, b): return a + b + 5 <br> def get_users(url): headers = {"User-Agent": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/16.1 Safari/605.1.15"} req = requests.get(url, headers=headers) res = req.json() return pd.DataFrame(res["data"])</code></pre> The <code>get_users()</code> function needs an extra <code>headers</code> information if you want to avoid the 406 status, but everything else is identical, with the obvious R to Python translation. We can import and use the libraries as we normally would in Python, there's no need for "double colon" or any other special notation. And finally, let's take care of the test file. Open up <code>tests/test_my_functions.py</code> and paste in the following: <pre><code class="language-python">from mypylib import my_functions import pandas as pd <br> def test_sum_nums(): assert my_functions.sum_nums(5, 10) == 15 assert my_functions.sum_nums(5, -10) == -5 assert my_functions.sum_nums(3 * 5, 5 * 5) == 40 def test_sum_nums_err(): assert my_functions.sum_nums_err(5, 10) == 15 assert my_functions.sum_nums_err(5, -10) == -5 assert my_functions.sum_nums_err(3 * 5, 5 * 5) == 40 def test_get_users(): assert type(my_functions.get_users("https://dummy.restapiexample.com/api/v1/employees")) == list assert type(my_functions.get_users("https://dummy.restapiexample.com/api/v1/employees")) == pd.DataFrame assert len(my_functions.get_users("https://dummy.restapiexample.com/api/v1/employee/1")) == 1</code></pre> As you can see, it's almost identical to what we had previously in R, just translated into Python. Run the tests now by running the following command from the Terminal: <pre><code class="bash">pytest -v</code></pre> You'll immediately see the following output on the screen: <img class="size-full wp-image-17924" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f645d4048c02f9709e8_9-1.webp" alt="Image 9 - PyTest output (1)" width="3106" height="2230" /> Image 9 - PyTest output (1) Python provides you with a lot of details on what went wrong, too much to fit the screen. If you'd like to see less of it, simply run <code>pytest</code> or <code>pytest tests</code> without <code>-v</code>; additionally, you can explore how to run a single test file or single test from a certain file <a href="https://stackoverflow.com/questions/36456920/is-there-a-way-to-specify-which-pytest-tests-to-run-from-a-file" target="_blank" rel="noopener">here</a>. Here's the last portion of the output: <img class="size-full wp-image-17926" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f6436689419f5cd8da3_10-1.webp" alt="Image 10 - PyTest output (2)" width="3106" height="2230" /> Image 10 - PyTest output (2) We got the same passes and failures as in R with one exception - the <code>get_users()</code> function returns a <code>pd.DataFrame</code> instead of a list, so that's the case that failed here. Otherwise, the output conveys the same information. Up next, we'll take a look at some common tips and tricks when working with package tests in RStudioo and tests in general. <h2 id="tips">Package Tests and RStudio IDE - Tips & Tricks</h2> We'll now go over a series of best practices when it comes to unit tests and package tests, written from years of experience in the field. <h3>Keep Things Simple</h3> As you increase the level of complexity in your tests, you're likely to introduce errors to them. "Testing" tests is not a thing, so make sure to keep them simple, readable, and easy for developers to understand. Have you heard of <b>cyclomatic complexity</b>? It's a term that indicates the number of possible execution paths a given function can follow. Naturally, a function with a lower cyclomatic complexity is easier to follow, understand, and maintain, which means you're less likely to introduce bugs when working on it. You should always optimize for a low cyclomatic complexity (e.g., with a linter tool), especially when writing tests. <h3>Keep Things Deterministic</h3> A piece of code should always behave the same if no changes were made to it - that's the basic definition of the word <i>deterministic</i>. In unit and package tests, this means a function should always pass or always fail the test, provided you don't change the underlying logic behind it, irrelevant of how many times you run it. Having nondeterministic tests - or tests that sometimes pass and sometimes fail without changes to the logic - means developers won't trust them. Make sure your tests don't depend on other tests, file systems, network and API availability, and other environmental values. That's the only way to ensure your tests are deterministic. <h3>Always Address a Single Use Case</h3> This one is simple to understand. Every test you write should be used to test a single use case, and a single use case only. Writing tests this way will give you a better insight into the reasons why the test case failed, which means you'll be faster in discovering code errors. <blockquote>Want to make your R code more durable? <a href="https://appsilon.com/best-practices-for-durable-r-code/" target="_blank" rel="noopener">Make sure to optimize on these 4 areas</a>.</blockquote> <h3>Make Sure the Tests are as Fast as Possible</h3> If it takes ages to run your tests, most developers will skip them, or won't run them as often as they should. Do everything you can to make the tests fast because extensive and repeated testing is the only way to have confidence in your code. There isn't a concrete definition of how fast is fast enough, so that's something you'll have to figure out on your own. Faster is always better. <h3>Consider Test Automation</h3> Automated tests represent the type of test done without much human intervention. Sure, people have to develop this framework when first starting out, but from that point, the execution is done automatically, usually as a part of the built process. <b>But why bother with automation?</b> Testing small units is tedious, repetitive, and less reliable than conducting tests in an automated manner. A dedicated unit testing framework can help you with making testing more automated. Automated testing is also considered to be more efficient, cheaper, and time-saving. The most common approach nowadays is by utilizing a <a href="https://about.gitlab.com/topics/ci-cd/" target="_blank" rel="noopener">CI/CD pipeline</a> (Continuous Integration / Continuous Deployment), which is an important DevOps and Agile methodology practice. <hr /> <h2 id="summary">Summing up Tests and RStudio IDE</h2> It's always a good idea to make your code testable; it's the only way to ensure it will run smoothly as you continue to make changes to your project. Today you've learned how to approach package tests in R and Python, and specifically how to use RStudio for the job. It's not perfect for Python yet, but we expect the level of support to skyrocket in the near future due. Be sure to check out the <a href="https://posit.co/conference/" target="_blank" rel="noopener">Posit::Conf 2023 </a> to hear the latest and greatest from the RStudio creators. We hope you liked our guide. Feel free to share thoughts and ideas in the comment section below, or reach out to us on Twitter - <a href="http://twitter.com/appsilon" target="_blank" rel="noopener">@appsilon</a>. We'd love to hear your thoughts on package testing and unit testing in general. <blockquote>What is User Testing? <a href="https://appsilon.com/user-tests-build-better-shiny-apps-with-effective-user-testing/" target="_blank" rel="noopener">Read (and watch) our guide to effective user tests</a>.</blockquote>