Connecting R to S3: How to Upload and Download Files from an AWS S3 Bucket
More and more people are looking for ways to store files in their native form. Sure, databases are faster and more organized, but they don't provide the ability for saving files in source format (think image and audio), at least not without severe compromises. That's where object storage comes in. Today you'll learn how to create a new storage bucket on Amazon AWS S3 (Simple Storage Service), and how to upload and download files from it with the R programming language. S3 will automatically replicate your data across multiple servers and data centers, so it really is a go-to solution for mission-critical applications. Let's dig in! <blockquote>Are your R Shiny applications slow? <a href="https://appsilon.com/scaling-and-infrastructure-why-is-my-shiny-app-slow/" target="_blank" rel="noopener">Here's a couple of tips on scaling and infrastructure</a>.</blockquote> Table of contents: <ul><li><a href="#configure">How to Create and Configure a New AWS S3 Bucket</a></li><li><a href="#connect">Connecting R to S3: How to Establish S3 Connection</a></li><li><a href="#work">Connecting R to S3: How to Upload and Download Files with R</a></li><li><a href="#summary">Summing up Connecting R to S3</a></li></ul> <hr /> <h2>How to Create and Configure a New AWS S3 Bucket</h2> The logical first step is to create a new S3 bucket. It's assumed you already have an AWS account configured, so we won't cover that. <h3>Creating a new AWS S3 Bucket</h3> To create a new S3 bucket, login to your AWS console and select <i>S3</i> under available service. You'll see the following screens, provided you don't have any buckets created: <img class="size-full wp-image-20388" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74e6b3c80c04fb8630b_83d9f960_1-3.webp" alt="Image 1 - S3 homepage" width="2347" height="1192" /> Image 1 - S3 homepage Click on the big orange button at the right side of the screen to create a bucket. It will redirect you to a new screen in which you'll be able to enter a bucket name and some additional configuration: <img class="size-full wp-image-20390" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74f00fd9b7af135d14b_ba248701_2-3.webp" alt="Image 2 - Bucket general configuration" width="1480" height="1269" /> Image 2 - Bucket general configuration Take note of the bucket region (<i>eu-north-1</i> for us), as you'll need it later. As for the rest of the options, you can leave them on their default settings: <img class="size-full wp-image-20392" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d74feb35c6618a0f71ac_fc6c174e_3-3.webp" alt="Image 3 - Bucket public access" width="1480" height="1269" /> Image 3 - Bucket public access Finally, scroll down to the bottom of the page and click on the <i>Create bucket</i> button: <img class="size-full wp-image-20394" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d750c24a1b8139c5cf15_2ada142d_4-3.webp" alt="Image 4 - Finalizing bucket creation" width="1480" height="1315" /> Image 4 - Finalizing bucket creation Your bucket will be created and you'll see it listed under the bucket list: <img class="size-full wp-image-20396" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d75121d589fe2956668f_ffbfe3b9_5-3.webp" alt="Image 5 - S3 bucket list" width="1654" height="1088" /> Image 5 - S3 bucket list And that's it for the bucket creation. We'll need to do one more step before connecting to S3 from R, so let's get through it. <h3>Configuring AWS Access Keys</h3> AWS access keys will allow you to make seamless connections from R to AWS S3, especially if you save them as environment variables. More on that in a bit. On the top right corner of the screen, you'll see your username. Click on it, and select <i>Security credentials</i>. You'll see the <i>Access keys</i> section at the middle of the list: <img class="size-full wp-image-20398" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7512a88faf6f30fe67b_b1451330_6-3.webp" alt="Image 6 - IAM console" width="2302" height="1088" /> Image 6 - IAM console Click on <i>Create access key</i> and one will be automatically created for you. Just make sure to copy both access and access secret keys somewhere safe. <img class="size-full wp-image-20400" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d752ebc881cc38344b05_a3af8695_7-3.webp" alt="Image 7 - New access keys" width="1923" height="1088" /> Image 7 - New access keys We now have everything needed to start connecting R to S3. Let's do that next. <h2 id="connect">Connecting R to S3: How to Establish S3 Connection</h2> You'll need to know your AWS region, access, and secret keys to establish a S3 connection from R. The R AWS package loads them from environment variables, so we have to set these up first. To start, install the package: <pre><code class="language-r">install.packages("aws.s3")</code></pre> Next, let's set up the environment variables. <h3>Creating the .env File</h3> The <code>.env</code> file is used to store key-value pairs that will be loaded as environment variables. That way, you can keep your R scripts free of sensitive info. Also, you can add <code>.env</code> file to <code>.gitignore</code> to make sure the credentials stay local to you. Start by creating the <code>.env</code> file (no file name, just the extension), right where you plan to store your R script. Paste the following inside: <pre><code class="language-text">AWS_ACCESS_KEY_ID= AWS_SECRET_ACCESS_KEY= AWS_DEFAULT_REGION=</code></pre> And, of course, fill in the values with your credentials and AWS region. There's no need to surround the values with quotes, as R will automatically parse them as strings. <h3>Establishing an R to S3 Connection</h3> And now it's time to establish a connection! Import the AWS package and make sure to load the environment variables from the <code>.env</code> file: <pre><code class="language-r">library(aws.s3) dotenv::load_dot_env()</code></pre> The environment variables are now set up, which means R should have no trouble communicating with AWS. Run the following command to test the connection: <pre><code class="language-r">bucketlist()</code></pre> If you don't get an error, it means R can communicate with AWS S3. You should see a list of buckets available along with their creation date: <img class="size-full wp-image-20402" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7526e90276821da2c9d_c5a4dbb7_8-3.webp" alt="Image 8 - Listing S3 Buckets from R" width="632" height="110" /> Image 8 - Listing S3 Buckets from R Connection established! Up next, let's explore how to save files to S3, and also how to download them. <h2 id="work">Connecting R to S3: How to Upload and Download Files with R</h2> This section will show you how to upload and download files from S3 with R. But before we do so, we'll write a simple pipeline to get the data from web. <h3>R Data Pipeline for Downloading a JSON File from Web</h3> As for the data source, we'll use the <a href="https://jsonplaceholder.typicode.com" target="_blank" rel="noopener">JSON Placeholder</a> API. It's a free API you can use to learn the concepts of working with REST APIs. The API has a <code>/posts</code> endpoint, and that's the place from which we'll extract the data: <img class="size-full wp-image-20404" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d753374a4f9b8ed4597c_07068f3d_9-3.webp" alt="Image 9 - JSON Placeholder API" width="1739" height="965" /> Image 9 - JSON Placeholder API As for the R code, you'll need two packages - <code>httr</code> and <code>jsonlite</code>, so make sure to have them installed. We're then making a GET request to the previously mentioned API, fetching the response, and parsing it to JSON format. Finally, the <code>writeLines()</code> function is used to save the JSON data to disk. Here's the full code snippet: <pre><code class="language-r">library(httr) library(jsonlite) <br>url <- "https://jsonplaceholder.typicode.com/posts" response <- GET(url) json_data <- content(response, "text", encoding = "UTF-8") <br>parsed_data <- fromJSON(json_data) <br>writeLines(json_data, "posts_data.json")</code></pre> As soon as you run it, you'll see the <code>posts_data.json</code> file saved right where your R script is. Here's what it contains: <img class="size-full wp-image-20406" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7540750e0f883990a90_a0b32e0d_10-3.webp" alt="Image 10 - Saved JSON file" width="1665" height="1161" /> Image 10 - Saved JSON file Great! We now have the data, so let's upload it to S3. <h3>How to Upload Files from R to S3</h3> R's <code>aws.s3</code> package has a convenient <code>put_object()</code> function that's responsible for uploading files from R to S3. Here's a list of parameters it expects: <ul><li><code>file</code>: Path to the local file you want to upload</li><li><code>object</code>: File name (path) in the bucket</li><li><code>bucket</code>: Name of your S3 bucket</li></ul> Here's what it looks like in practice: <pre><code class="language-r">put_object( file = "posts_data.json", object = "posts_data_from_r.json", bucket = "appsilonbucket" )</code></pre> You'll see the following output once you run the function: <img class="size-full wp-image-20408" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d754eb35c6618a0f7582_75bbec97_11-3.webp" alt="Image 11 - Uploading a file to S3" width="594" height="250" /> Image 11 - Uploading a file to S3 And that's it! That's all it takes to take a local file and upload it to S3. Here's what the bucket contains after the file upload: <img class="size-full wp-image-20410" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d75557c12c8e1dd8ca64_58905b8d_12-3.webp" alt="Image 12 - File uploaded to S3" width="1739" height="965" /> Image 12 - File uploaded to S3 Up next, let's see how to download files back to R. <h3>How to Download Files from S3 to R</h3> Downloading files from S3 to R is as simple as uploading. The <code>aws.s3</code> package has a <code>get_object()</code> function that will fetch the object from S3. It expects the following parameters: <ul><li><code>object</code>: File name (path) in your S3 bucket</li><li><code>bucket</code>: Name of your S3 bucket</li></ul> Here's how to use the function in R: <pre><code class="language-r">user_data <- get_object( object = "posts_data_from_r.json", bucket = "appsilonbucket" ) user_data</code></pre> The <code>user_data</code> variable contains the following: <img class="size-full wp-image-20410" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d75557c12c8e1dd8ca64_58905b8d_12-3.webp" alt="Image 12 - File uploaded to S3" width="1739" height="965" /> Image 12 - File uploaded to S3 It's not human-readable data as you can see, so we can use the <code>writeBin()</code> function to save it to disk as a JSON file: <pre><code class="language-r">writeBin(user_data, "downloaded_file.json")</code></pre> Here's what the downloaded file contains: <img class="size-full wp-image-20414" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d75719357fedd7b4d3fe_73684bf5_14-3.webp" alt="Image 14 - Saved JSON file" width="1739" height="1169" /> Image 14 - Saved JSON file Yup - it's exactly the same as the file we fetched from the web a couple of sections ago. That's how easy it is to work with Amazon AWS S3 in R. Let's make a brief recap next before finishing off. <hr /> <h2 id="summary">Summing up Connecting R to S3</h2> Connecting R to S3 is easier than it seems at first. All you need are security credentials, and the authentication happens behind the scenes if you load those credentials as environment variables. No manual connection strings or keeping track of the connection state - everything is done for you automatically. You now know how to upload and download files from R, which will come in handy when working on those big, mission-critical R projects. <i>What's your primary use-case for S3 or another object storage platform? </i>Let us know in the comment section below. <blockquote>Your Shiny app is amazing but no one uses it? <a href="https://appsilon.com/reasons-why-shiny-user-adoption-fails/">Here are 9 ways user adaption can fail</a>.</blockquote>