---

This site uses cookies. Read more.

Loading large dataframes when building Shiny Apps can have a significant impact on the app initialization time. When we ran into this issue in a recent project, we decided to conduct a review of the available methods for reading data from csv files (as provided by our client) to R. In this article we will identify the most efficient of these methods using benchmarking and explain our workflow.

We will compare the following:

  1. read.csv from utils, which was the standard way of reading csv files to R in RStudio,
  2. read_csv from readr which replaced the former method as a standard way of doing it in RStudio,
  3. load and readRDS from base, and
  4. read_feather from feather and fread from data.table.

Data

We need to generate some random data to commence with our test…

set.seed(123)
df <- data.frame(replicate(10, sample(0:2000, 15 * 10^5, rep = TRUE)),
                 replicate(10, stringi::stri_rand_strings(1000, 5)))

…and save the files on a disk to evaluate the load time. Besides the csv format we will also need feather, RDS and Rdata files.

path_csv <- '../assets/data/fast_load/df.csv'
path_feather <- '../assets/data/fast_load/df.feather'
path_rdata <- '../assets/data/fast_load/df.RData'
path_rds <- '../assets/data/fast_load/df.rds'
library(feather)
library(data.table)
write.csv(df, file = path_csv, row.names = F)
write_feather(df, path_feather)
save(df, file = path_rdata)
saveRDS(df, path_rds)

Next, we can check the resulting file sizes:

files <- c('../assets/data/fast_load/df.csv', '../assets/data/fast_load/df.feather', '../assets/data/fast_load/df.RData', '../assets/data/fast_load/df.rds')
info <- file.info(files)
info$size_mb <- info$size/(1024 * 1024)
print(subset(info, select=c("size_mb")))
##                                       size_mb
## ../assets/data/fast_load/df.csv     1780.3005
## ../assets/data/fast_load/df.feather 1145.2881
## ../assets/data/fast_load/df.RData    285.4836
## ../assets/data/fast_load/df.rds      285.4837

Both csv and feather format files take up much more storage space. Csv takes up 6 times and feather 4 times more space as compared to RDS and RData.

Looking to learn more about importing data into R, this DataCamp tutorial covers all you need to know about importing simple text files to more advanced SPSS and SAS files.

Benchmark

We will use the microbenchmark library to compare the read times in 10 rounds for the following methods:

  • utils::read.csv
  • readr::read_csv
  • data.table::fread
  • base::load
  • base::readRDS
  • feather::read_feather

 

library(microbenchmark)
benchmark <- microbenchmark(readCSV = utils::read.csv(path_csv),
               readrCSV = readr::read_csv(path_csv, progress = F),
               fread = data.table::fread(path_csv, showProgress = F),
               loadRdata = base::load(path_rdata),
               readRds = base::readRDS(path_rds),
               readFeather = feather::read_feather(path_feather), times = 10)
print(benchmark, signif = 2)
##Unit: seconds
##        expr   min    lq       mean median    uq   max neval
##     readCSV 200.0 200.0 211.187125  210.0 220.0 240.0    10
##    readrCSV  27.0  28.0  29.770890   29.0  32.0  33.0    10
##       fread  15.0  16.0  17.250016   17.0  17.0  22.0    10
##   loadRdata   4.4   4.7   5.018918    4.8   5.5   5.9    10
##     readRds   4.6   4.7   5.053674    5.1   5.3   5.6    10
## readFeather   1.5   1.8   2.988021    3.4   3.6   4.1    10

And the winner is… feather! However, using feather requires prior conversion of the file to the feather format.
Using load or readRDS can improve performance (second and third place in terms of speed) and has an added benefit of storing a smaller/compressed file. In both cases it is necessary to first convert the file to the proper format.

When it comes to reading from the csv format fread significantly beats read_csv and read.csv, and thus is the best option to read a csv file.

Ultimately, we chose to work with feather files. The csv to feather conversion process is quick and we did not have a strict limitation on storage space in which case either the Rds or RData formats could probably have been a more appropriate choice.

The final workflow was:

  1. reading a csv file provided by our customer using fread,
  2. writing it to feather using write_feather, and
  3. loading a feather file on app initialization using read_feather.

The first two tasks were done once and outside of the Shiny App context.

There is also quite an interesting benchmark done by Hadley here on reading complete files to R. Please note that if you use functions defined in that post, you will end up with a character type object and will have to apply string manipulations to obtain a commonly and widely used dataframe.

If you run into any issues, as an RStudio Full Certified Partner, our team at Appsilon is ready to answer your questions about  loading data into R and other topics related to R Shiny, Data Analytics, and Machine Learning. We’re experts in this area, and we’d love to chat with you.

Follow Us for More