Maximizing Efficiency: A Guide to Benchmarking Memory Usage in Shiny Apps

Reading time:
time
min
By:
Ryszard Szymański
November 30, 2023

R/Shiny allows you to <strong>prototype</strong> a working web application <strong>quickly</strong> and <strong>easily</strong>. However, with increasing amounts of data, your app may become slow and, in extreme cases, crash due to insufficient memory.

When the worst-case scenario happens, we need to figure out a way to<strong> lower the memory usage of our app to avoid those crashes</strong>.

A crucial part of optimization efforts is benchmarking <strong>how much memory our app is consuming</strong>. This allows us to check if the changes we made to the app are indeed moving us in the right direction.

In this step-by-step guide, we will describe how to do that based on an example application.
<h3>Table of Contents</h3><ul><li><a href="#memory-usage">How to Measure Memory Usage of Shiny</a></li><li><a href="#example">Example App</a></li><li><a href="#limitations">Limitations</a></li><li><a href="#conclusion">Conclusion</a></li></ul>
<h2 id="memory-usage">How to Measure Memory Usage of Shiny</h2>
You might already be familiar with the <code>{profmem}</code> package for profiling memory usage of R expressions. <code>{profmem}</code> uses <a href="https://cran.r-project.org/web/packages/profmem/vignettes/profmem.html" target="_blank" rel="noopener noreferrer"><code>Rprofmem</code> under the hood</a> and in the <a href="https://cran.r-project.org/web/packages/profmem/profmem.pdf" target="_blank" rel="noopener noreferrer">docs</a>, we can find that with <code>utils::Rprofmem()</code> it is not possible to quantify the total memory usage at a given time because it only logs allocations and does, therefore, not reflect deallocations done by the garbage collector.

Additionally, <code>Rprofmem</code> does not track allocations made by non-R native libraries or packages that use native <code>calloc()</code> or <code>free()</code> for internal objects.

In the context of Shiny, we are usually interested in how much memory the R process running our app is using. That information allows us to estimate what infrastructure we will need to provision in order to host our app and get an overall feel of how our app scales memory-wise (e.g. does memory usage increase drastically with more users?).

To achieve that, we will use the {bench} package, which provides the <a href="https://bench.r-lib.org/reference/bench_process_memory.html" target="_blank" rel="noopener noreferrer"><code>bench_process_memory</code> function</a>. That function uses operating system APIs to determine how much memory is used by the current R process, including all the memory from child processes and memory allocated outside R’s garbage collector heap.

<code>bench::bench_process_memory</code> informs us not only about the currently used amount of memory but also about the peak memory usage that occurred during the process lifecycle.

<strong>📝 Note:</strong> There are also other packages that can be used for measuring process memory usage, like <a href="https://github.com/shinra-dev/memuse" target="_blank" rel="noopener noreferrer">memuse</a>. However, as of today, it does not support measuring peak memory usage on MacOS - we submitted a Pull Request adding support for that. But later on, we learned that <code>{bench}</code> already supports that. Hence, we recommend using <code>{bench}</code>.

Throughout our example, we will use the following helper function:
<pre><code class="language-r">
wait_for_app_to_start &lt;- function(url) { httr2::request(url) |&gt;
   httr2::req_retry(
     max_seconds = 5,
     backoff = function(attempt) 2 ** attempt
   )
}
<br>measure_mem_usage &lt;- function() {
 result_file &lt;- tempfile(fileext = "RDS")
 port &lt;- httpuv::randomPort()
 app_process &lt;- callr::r_bg(
   function(result_file, port) {
     on.exit({
       saveRDS(bench::bench_process_memory(), result_file)
     })
     
     shiny::runApp(port = port)
   }, args = list(result_file = result_file, port = port))
 
 on.exit({
   if (app_process$is_alive()) {
     app_process$kill()
   }
 })
 
 app_url &lt;- paste0("http://127.0.0.1:", port)
 
 wait_for_app_to_start(app_url)
 
 utils::browseURL(app_url)
 
 cat ("Press [enter] to finish the test...")
 line &lt;- readline()
 
 app_process$interrupt()
 
 app_process$wait()
 
 readRDS(result_file)
}
</code></pre>
<strong>Let’s break down one by one what is happening in this function:</strong>
<ol><li>We start a shiny app in a separate R process - this is important as we don’t want the work we did previously in our R session to impact the results (e.g. we might have analyzed a large dataset which could be the source of peak memory usage)</li><li>We register a callback on function exit that will save the memory measurements in a temporary file</li><li>After the background R process with our app is started, our function opens the app in our browser and waits for user input. This gives us time to simulate user interactions with our app.</li><li>Once we are done clicking through our app, we can hit enter in our R console, and the background process will be interrupted. Once the background process terminates, we read memory measurements from the temporary file.</li></ol>
Let’s see that in action:


<blockquote>Discover more insights on boosting your app's speed and efficiency in our detailed piece: <a href="https://appsilon.com/shiny-benchmark-measuring-app-performance/" target="_blank" rel="noopener">shiny.benchmark – How to Measure Performance Improvements in R Shiny Apps</a>.</blockquote>
<h2 id="example">Example App</h2>
All right, now let’s use our memory benchmarking function on an actual app. Let’s assume we are working with credit card data; we will generate a fake dataset using <code>{charlatan}</code> and save it in an SQLite database:
<pre><code class="language-r">
library(charlatan)
library(DBI)
library(dplyr)
<br>set.seed(123)
<br># Generate Fake Data
TABLE_ROW_COUNT &lt;- 1e7
<br>fake_providers &lt;- ch_credit_card_provider(100)
fake_data &lt;- data.frame(
 provider = sample(fake_providers, size = TABLE_ROW_COUNT, replace = TRUE)
)
<br># Save data to sqlite database
conn &lt;- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")
<br>dbWriteTable(
 conn = conn,
 name = "credit_cards",
 value = fake_data,
 overwrite = TRUE
)
</code></pre>
<strong>Now, let’s create a Shiny App that will display the top 10 most popular card providers:</strong>
<pre><code class="language-r">
library(DBI)
library(dplyr)
library(reactable)
library(shiny)
<br>conn &lt;- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")
<br>shiny::onStop(function() {
 dbDisconnect(conn)
})
<br>ui &lt;- fluidPage(
 titlePanel("Credit Cards App"),
 reactableOutput("top_credit_providers")
)
<br>server &lt;- function(input, output, session) {
 credit_cards &lt;- dbGetQuery(
   conn = conn,
   "SELECT * FROM credit_cards"
 )
<br>  output$top_credit_providers &lt;- renderReactable({
   top_providers &lt;- credit_cards |&gt;
     group_by(provider) |&gt;
     summarise(popularity = n()) |&gt;
     arrange(desc(popularity)) |&gt;
     head(10) |&gt;
     collect()
   
   reactable(top_providers)
 })
 
}
<br>shinyApp(ui, server)
</code></pre>
<strong>Let’s see how much memory the app is using using our helper function:</strong>
<pre><code class="language-r">
&gt; measure_mem_usage()
Press [enter] to finish the test...
<br>current     max
 481MB   481MB
</code></pre>
Ok, now let’s see how that changes if we simulate multiple sessions within the app - this can be done by opening multiple tabs with our app. Here are the results for 2, 3, 4 and 5 sessions:
<pre><code class="language-r">
&gt; measure_mem_usage() # 2 sessions
Press [enter] to finish the test...
<br>current     max
 606MB   606MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 3 sessions
Press [enter] to finish the test...
<br>current     max
 678MB   678MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 4 sessions
Press [enter] to finish the test...
<br>current     max
 769MB   769MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 5 sessions
Press [enter] to finish the test...
<br>current     max
 844MB   844MB
</code></pre>
<img class="size-full wp-image-22221" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01955268d39c6e128baac_memory-usage-across-different-numbers-of-sessions-1.webp" alt="" width="850" height="547" /> Memory Usage Across Different Numbers of Sessions 1

Based on the above measurements, we can see that for each session we are allocating extra 72MB - 100MB of memory.

Let’s try to make our app more efficient, some of you probably noticed that we are fetching the data separately for each session which means we store the same data multiple times in our app.

We can make that more efficient by fetching the data <a href="https://shiny.posit.co/r/articles/improve/scoping/" target="_blank" rel="noopener noreferrer">in the global scope</a>.
<pre><code class="language-r">
library(DBI)
library(dplyr)
library(reactable)
library(shiny)
<br>conn &lt;- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")
<br>credit_cards &lt;- dbGetQuery(
 conn = conn,
 "SELECT * FROM credit_cards"
)
<br>shiny::onStop(function() {
 dbDisconnect(conn)
})
<br>ui &lt;- fluidPage(
 titlePanel("Credit Cards App"),
 reactableOutput("top_credit_providers")
)
<br>server &lt;- function(input, output, session) {
 
 output$top_credit_providers &lt;- renderReactable({
   top_providers &lt;- credit_cards |&gt;
     group_by(provider) |&gt;
     summarise(popularity = n()) |&gt;
     arrange(desc(popularity)) |&gt;
     head(10) |&gt;
     collect()
   
   reactable(top_providers)
 })
 
}
<br>shinyApp(ui, server)
</code></pre>
<strong>Let’s measure if that made our app more memory efficient:</strong>
<pre><code class="language-r">
&gt; measure_mem_usage() # 1 session
Press [enter] to finish the test...
<br>current     max
 474MB   474MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 2 sessions
Press [enter] to finish the test...
<br>current     max
 497MB   497MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 3 sessions
Press [enter] to finish the test...
<br>current     max
 503MB   503MB  
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 4 sessions
Press [enter] to finish the test...
<br>current     max
 530MB   530MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 5 sessions
Press [enter] to finish the test...
<br>current     max
 546MB   546MB
</code></pre>
<img class="size-full wp-image-22223" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b019573519a56046b4280a_memory-usage-across-different-numbers-of-sessions-2.webp" alt="" width="850" height="547" /> Memory Usage Across Different Numbers of Sessions 2

As we can see now our app allocates an extra 6 - 27MB per session this is an almost 4x improvement!

Let’s try to make it even better! Currently we are fetching the whole credit card data into the R process memory, <strong>but we only display the top 10 values! What a waste of memory!</strong>

Let’s fix that by extracting computations into the database - this is very thanks to <code>{dbplyr}</code> as we can reuse the same <code>{dplyr}</code> functions.
<pre><code class="language-r">
library(DBI)
library(dplyr)
library(reactable)
library(shiny)
<br>conn &lt;- dbConnect(drv = RSQLite::SQLite(), "database.sqlite")
<br>credit_cards &lt;- tbl(conn,"credit_cards")
<br>shiny::onStop(function() {
 dbDisconnect(conn)
})
<br>ui &lt;- fluidPage(
 titlePanel("Credit Cards App"),
 reactableOutput("top_credit_providers")
)
<br>server &lt;- function(input, output, session) {
 
 output$top_credit_providers &lt;- renderReactable({
   top_providers &lt;- credit_cards |&gt;
     group_by(provider) |&gt;
     summarise(popularity = n()) |&gt;
     arrange(desc(popularity)) |&gt;
     head(10) |&gt;
     collect()
   
   reactable(top_providers)
 })
 
}
<br>shinyApp(ui, server)
</code></pre>
<strong>Let’s repeat our benchmarks again:</strong>
<pre><code class="language-r">
&gt; measure_mem_usage() # 1 session
Press [enter] to finish the test...
<br>current     max
 229MB   229MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 2 sessions
Press [enter] to finish the test...
<br>current     max
 225MB   225MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 2 sessions
Press [enter] to finish the test...
<br>current     max
 231MB   231MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 3 sessions
Press [enter] to finish the test...
<br>current     max
 232MB   232MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 4 sessions
Press [enter] to finish the test...
<br>current     max
 233MB   233MB
</code></pre>
<pre><code class="language-r">
&gt; measure_mem_usage() # 5 sessions
Press [enter] to finish the test...
<br>current     max
 233MB   233MB
</code></pre>
<img class="size-full wp-image-22225" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b019573519a56046b42846_memory-usage-across-different-numbers-of-sessions-3.webp" alt="" width="850" height="547" /> Memory Usage Across Different Numbers of Sessions 3

Now the memory usage of our app seems to be barely increasing; there is only a 4MB difference between the app used by 1 user and the app used by 5 users.

Not to mention that compared to the apps that were fetching whole datasets into memory, we are saving 245MB of memory!
<h2 id="limitations">Limitations</h2>
The described method of measuring memory usage of a memory app has its limitations. For example, if our app is using {promises}, depending on the type of future backend we are using our measurements might be less accurate.

If our backend uses child processes, <a href="https://bench.r-lib.org/reference/bench_process_memory.html" target="_blank" rel="noopener noreferrer">bench::bench_process_memory</a> will include them in the measurements. For example, when using <code>future::multicore</code>, <a href="https://github.com/HenrikBengtsson/future/issues/155" target="_blank" rel="noopener noreferrer">futures are run in child processes of the main R process</a>.

However, if we are using <code>future::multisession</code>, futures are run in separate processes (not child processes), and in that case, memory used by those processes won’t be included in the measurements.
<h2 id="conclusion">Conclusion</h2>
In this blog post, we described how to benchmark memory usage of the Shiny app using the <code>{bench}</code> package.

Additionally, we showed that by extracting computations into a database, we can make an almost 4x improvement in terms of memory usage.

This improves the scalability of our application and might allow us to cut down on infrastructure costs, as machines with less memory can be used to handle the same traffic.

If you found this article helpful, don't miss out on the latest trends and advancements in R/Shiny — <a href="https://appsilon.us16.list-manage.com/subscribe?u=c042d7c0dbf57c5c6f8b54598&amp;id=870d5bfc05" target="_blank" rel="noopener">subscribe to Shiny Weekly for regular updates and exclusive content</a>.

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
r
shiny
tutorial
tutorials