How to Deploy RStudio (Posit) Workbench to AWS Using Terraform
If you’re searching for Cloud services, it’s safe to assume you know about Amazon Web Services (AWS). The AWS cloud platform is arguably the most popular cloud provider, contending its place among the likes of Azure, Google, and IBM. AWS offers a large set of cloud-based products including databases, computing, developer tools, and enterprises applications. With the right platform and the right tools, you can improve operational efficiency, minimize risk, and quickly scale projects. But not all platforms are made equal and depending on the needs of the project and programs used by your team, you might find other platforms a more viable solution. However, in this tutorial, we'll deploy RStudio Workbench to AWS by using Terraform, an infrastructure-as-code tool that can be used to model cloud structure via code. RStudio Workbench is a powerful tool for creating data science insights. Whether you work with R or Python, Workbench makes developer's life easier from team collaboration on centralized servers and server access control to authentication and load balancing. You can learn about the many benefits of Workbench over open-source RStudio Server by reading the <a href="https://docs.rstudio.com/ide/server-pro/">Workbench documentation (previously RStudio Server Pro)</a>. <blockquote><strong>Struggling to find tools for managing distributed teams? Appsilon uses <a href="https://appsilon.com/rstudio-connect-as-a-solution-for-remote-data-science-teams/" target="_blank" rel="noopener noreferrer">RStudio Connect as a solution for remote teams</a>.</strong></blockquote> Continue reading to use Terraform to deploy RStudio Workbench to AWS. And discover how you can close the gap between your current DevOps and what's possible through digital transformation. <ul><li><a href="#anchor-1" target="_blank" rel="noopener noreferrer">Using Packer to build an AMI</a><ul><li><a href="#anchor-2" target="_blank" rel="noopener noreferrer">Pre-requisites</a></li><li><a href="#anchor-3" target="_blank" rel="noopener noreferrer">Variables</a></li><li><a href="#anchor-4" target="_blank" rel="noopener noreferrer">Clone repository</a></li><li><a href="#anchor-5" target="_blank" rel="noopener noreferrer">Template file: ami.pkr.hcl</a></li><li><a href="#anchor-6" target="_blank" rel="noopener noreferrer">Building Packer image</a></li></ul> </li> <li><a href="#anchor-7" target="_blank" rel="noopener noreferrer">Using Terraform to Deploy RStudio Connect Workbench</a> <ul><li><a href="#anchor-8" target="_blank" rel="noopener noreferrer">Pre-requisites</a></li><li><a href="#anchor-9" target="_blank" rel="noopener noreferrer">Write configuration</a></li><li><a href="#anchor-10" target="_blank" rel="noopener noreferrer">Initialize the directory</a></li><li><a href="#anchor-11" target="_blank" rel="noopener noreferrer">Create infrastructure</a></li><li><a href="#anchor-12" target="_blank" rel="noopener noreferrer">Demo</a></li><li><a href="#anchor-13" target="_blank" rel="noopener noreferrer">Destroying infrastructure</a></li></ul> </li> </ul> <img class="aligncenter wp-image-8191 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0200e6d8f975c08204684_architecture.webp" alt="Architecture of deploying RStudio Workbench to AWS" width="803" height="271" /> Note: At the time of writing this article, Posit PBC was RStudio PBC. We use RStudio and Posit interchangeably in this text (e.g. RStudio Workbench == Posit Workbench). <h2 id="anchor-1">Using Packer to build an AMI</h2> <a href="https://www.packer.io/docs" target="_blank" rel="noopener noreferrer">Packer</a> is a free and open-source tool for creating golden images for multiple platforms from a single source configuration. <h3 id="anchor-2">Pre-requisites</h3> Before you can run packer to build images, you'll need a few pre-requisites. <ul><li>Python 3.9.6+</li><li><a href="https://git-scm.com/book/en/v2/Getting-Started-Installing-Git" target="_blank" rel="nofollow noopener noreferrer">git v2.30.1+</a></li><li><a href="https://www.packer.io/downloads" target="_blank" rel="nofollow noopener noreferrer">packer v1.7.3+</a></li><li><a href="https://docs.aws.amazon.com/cli/latest/userguide/install-cliv2.html" target="_blank" rel="nofollow noopener noreferrer">aws-cli v2.0+</a></li><li><a href="https://github.com/99designs/aws-vault" target="_blank" rel="noopener noreferrer">aws-vault v6.3.1+</a></li><li><a href="https://docs.ansible.com/ansible/latest/installation_guide/index.html" target="_blank" rel="nofollow noopener noreferrer">ansible v2.11.1</a></li><li>Access to the AWS account where the images will be built</li></ul> <h3 id="anchor-3">Variables</h3> The following environment variables should be set before running packer: <ul><li>AWS Credentials<ul><li><code>AWS_ACCESS_KEY_ID</code></li><li><code>AWS_SECRET_ACCESS_KEY</code></li></ul> </li> </ul> We suggest using <code>aws-vault</code> for managing AWS profiles and setting these variables automatically. Store AWS credentials for the <code>appsilon</code> profile: <script src="https://gist.github.com/MicahAppsilon/5bbdcd6252aa7652c6d59f1e87de3124.js"></script> Execute a command (using temporary credentials): <script src="https://gist.github.com/MicahAppsilon/2a9394acfa0f7ca190812e7c848e3897.js"></script> <h3 id="anchor-4">Clone repository</h3> Make sure git is installed and then clone the repository with the Packer configuration files for building RStudio Workbench. <script src="https://gist.github.com/MicahAppsilon/a34434d7b63648a157b75ae31d759aaf.js"></script> Fetch Ansible roles defined in <code>requirements.yml</code> by running the following command: <script src="https://gist.github.com/MicahAppsilon/1509c8b5a3c78d2ef191b2e58be06473.js"></script> Next, run the command below to validate AMI: <script src="https://gist.github.com/MicahAppsilon/df13f297db93b96ecbe23a86aef4a9e1.js"></script> You should see no output if files have no issues. <h3 id="anchor-5">Template file: ami.pkr.hcl</h3> The table below describes the purpose of variables used by <code>ami.pkr.hcl</code> template. <table style="height: 117px;" width="841"> <thead> <tr> <th><strong>Variable</strong></th> <th><strong>Purpose</strong></th> </tr> </thead> <tbody> <tr> <td><code>aws_region</code></td> <td>controls where the destination AMI will be stored once built.</td> </tr> <tr> <td><code>r_version</code></td> <td>Version of R to install on AMI.</td> </tr> <tr> <td><code>rstudio_version</code></td> <td>Version of RStudio Workbench to install on AMI.</td> </tr> </tbody> </table> Some configuration blocks used in the template file and their purpose: <ul><li><code>source</code> is used to define the builder Packer will use. In our case, it is the <code>amazon-ebs</code> builder which is able to create Amazon AMIs backed by EBS volumes for use in EC2. More information: <a href="https://www.packer.io/docs/builders/amazon/ebs" target="_blank" rel="nofollow noopener noreferrer">Amazon EBS - Builders | Packer by HashiCorp</a>.</li><li><code>source_ami_filter</code> defines which AMI to use as a base image for our RStudio Workbench image - <code>ubuntu/images/*ubuntu-focal-20.04-amd64-server-*</code>. Beware that the <code>owners</code> filter is set to official Canonical Group Limited supplier which is the company behind Ubuntu base AMIs. This way we can ensure the proper image is being used.</li><li><code>provisioner</code> stanza is used to install and configure third-party software on the machine image after booting. Provisioners prepare the system for use, so in our case, we install<code>ansible</code>and some<code>python</code>dependencies first, and next, we execute <code>ansible-playbook</code> to install and configure R and RStudio Workbench on the system. By supplying <code>var.r_version</code> and <code>var.rstudio_version</code> (default values defined in <code>./example.auto.pkvars.hcl</code>) as extra arguments we can control which versions of corresponding components will be installed.</li></ul> <h3 id="anchor-6">Building Packer image</h3><ol><li>Make sure all the required software (listed above) is installed,</li><li>Load your AWS credentials into your environment using <code>aws-vault</code> and run <code>packer build</code>:</li></ol> <script src="https://gist.github.com/MicahAppsilon/3a58f038aff87cd196b96b65398bf1ea.js"></script> It will take around 15 minutes for the image to be generated. Once the process is completed, you'll see the image under <code>Services -> EC2 -> AMIs</code>. The AMI ID will be output at the end. <a href="https://asciinema.org/a/431363" target="_blank" rel="noopener noreferrer"><img src="https://asciinema.org/a/431363.svg" /></a> <h2 id="anchor-7">Using Terraform to deploy RStudio (Posit) Workbench to AWS</h2> <a href="https://www.terraform.io/docs/index.html" target="_blank" rel="noopener noreferrer">Terraform</a> is an infrastructure as code (IaC) tool that allows you to build, change, and version infrastructure safely and efficiently. You can run Terraform on several operating systems including macOS, Linux, and Windows. To see the full list explore the Terraform download documentation. <blockquote><strong>Is your enterprise missing out on AI innovations? Maybe it's time to <a href="https://appsilon.com/want-to-build-an-ai-model-for-your-business-read-this/" target="_blank" rel="noopener noreferrer">build an AI model for your business</a>.</strong></blockquote> <h3 id="anchor-8">Pre-requisites</h3> Before you can run Terraform to deploy RStudio Workbench to AWS, you'll need the following pre-requisites: <ul><li><a href="https://learn.hashicorp.com/tutorials/terraform/install-cli" target="_blank" rel="nofollow noopener noreferrer">terraform v1.0.0+</a></li><li>Access to the AWS account where the RStudio Workbench will be deployed</li></ul> <h3 id="anchor-9">Write configuration</h3> The set of files used to describe infrastructure in Terraform is known as a Terraform configuration. You will write your first configuration to create a single RStudio Workbench instance, but prior to that you will have to create resources like: <ul><li>VPC</li><li>Security Group</li></ul> Each Terraform configuration must be in its own working directory. Create a directory for your configuration first. <script src="https://gist.github.com/MicahAppsilon/7d0f5949cb97179892322cb502eb71c8.js"></script> Change into the directory. <script src="https://gist.github.com/MicahAppsilon/8fd816eee075ebadd3febac9a2be8eb3.js"></script> Create a file to define your initial Terraform configuration: <script src="https://gist.github.com/MicahAppsilon/402a90f73ccdaa2951270d3f645f257c.js"></script> Open <code>terraform.tf</code> in your text editor, paste in the configuration below, and save the file. <script src="https://gist.github.com/MicahAppsilon/7e219d045de333a62ba56745a686b83f.js"></script> The <code>terraform {}</code> block contains Terraform settings, including the required providers Terraform will use to provision your infrastructure. For each provider, the source attribute defines an optional hostname, a namespace, and the provider type. Terraform installs providers from the Terraform Registry by default. In this example configuration, the AWS provider's source is defined as <code>hashicorp/aws</code>, which is shorthand for <code>registry.terraform.io/hashicorp/aws</code> (<a href="https://learn.hashicorp.com/tutorials/terraform/aws-build" target="_blank" rel="noopener noreferrer">Build Infrastructure | Terraform - HashiCorp Learn</a>). <code>required_version</code> specifies a minimum required version of Terraform to be installed on your machine. <h4>VPC</h4> Create a file to define your VPC. <script src="https://gist.github.com/MicahAppsilon/2d57ed1e5200d20f38eaf1774145998f.js"></script> Open <code>vpc.tf</code> in your text editor, paste in the configuration below, and save the file. <script src="https://gist.github.com/MicahAppsilon/a1a990cae90fb2abf8cfed8899c4e230.js"></script> We are using a community Terraform module that creates VPC resources on AWS. We specify some custom parameters, like <code>vpc_name</code>, <code>vpc_cidr</code> and region in which the resources shall be created. For more information on how this module works, make sure to read module documentation. <h4>Security groups</h4> Create a file to define your security groups, which will act as a virtual firewall for your RStudio Workbench instances to control incoming and outgoing traffic. <script src="https://gist.github.com/MicahAppsilon/e7d366d9894bf980d25ba1f829178b21.js"></script> Open <code>sg.tf</code> in your text editor, paste in the configuration below, and save the file. <script src="https://gist.github.com/MicahAppsilon/01f6ea17e7cac66e804edf7cc015c866.js"></script> We are using a community Terraform module that creates security group resources on AWS. We specify some custom parameters, like <code>vpc_id</code> from <code>vpc.tf</code> file where we defined our VPC in the previous step, security group name, ingress, and egress rules. We allow all incoming connections on ports 80, 443, 22 (SSH), 8787 (default RStudio Workbench port) and also allow for ICMP requests. We allow all outgoing connections so that our instance can reach out to the internet. For more information on how this module works, make sure to read module documentation. <h4>EC2</h4> Create a file to define your RStudio Workbench and EC2 instance. <script src="https://gist.github.com/MicahAppsilon/281e29120deb90856fd428b224398cd1.js"></script> Open <code>ec2.tf</code> in your text editor, paste in the configuration below, and save the file. <script src="https://gist.github.com/MicahAppsilon/baae91503f304385a3b185d29ff309bc.js"></script> We use <code>aws_ami</code> as a data source to get the ID of RStudio Workbench AMI we built using Packer previously. <code>aws_key_pair</code> provides an EC2 key pair resource. A key pair is used to control login access to EC2 instances. We need to specify a path to a public SSH key on our local workstation so that later we can connect to our EC2. Use Amazon's <a href="https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-key-pairs.html#how-to-generate-your-own-key-and-import-it-to-aws" target="_blank" rel="noopener noreferrer">guide on generating keys</a> to create an SSH key in case you do not have any yet. We are also using a community Terraform module that creates an EC2 resource on AWS. We specify some custom parameters, like: <ul><li><code>instance_count</code> - number of instances to run</li><li><code>ami</code> - an ID of RStudio Workbench AMI</li><li><code>instance_type</code> - the flavor of our server (refer to <a href="https://aws.amazon.com/ec2/instance-types/" target="_blank" rel="noopener noreferrer">instance types</a> for available types and pricing)</li><li><code>subnet_ids</code> - network subnets to run the server in (created in <code>vpc.tf</code> previously)</li><li><code>key_name</code> - SSH key to associate with the instance</li><li><code>vpc_security_group_ids</code> - a list of security groups that should be associated with our instance</li><li><code>associate_public_ip_address</code> - a boolean flag that indicates whether a publicly accessible IP should be assigned to our instance, we set this to <code>true</code> to be able to access our RStudio Workbench over the internet</li><li><code>root_block_device</code> - a configuration block to extend default storage for our instance</li></ul> For more information on how this module works, make sure to read the <a href="https://registry.terraform.io/modules/terraform-aws-modules/ec2-instance/aws/latest" target="_blank" rel="nofollow noopener noreferrer">module documentation</a>. <h4><a id="user-content-optionally-application-load-balancer" class="anchor" href="https://github.com/Appsilon/blog-posts-markdown/tree/master/deploying-rsw-to-aws-using-terraform#optionally-application-load-balancer" aria-hidden="true"></a>Optional: Application Load Balancer</h4> Optionally, we can create an ALB (Application Load Balancer), a feature of Elastic Load Balancing that allows a developer to configure and route incoming end-user traffic to applications based in the AWS public cloud. In case you have a custom domain - we can use ALB to configure access to RStudio Workbench over a human-friendly DNS record, something like <code>https://workbench.appsilon.com</code>. To do so, create a file <code>alb.tf</code>: <script src="https://gist.github.com/MicahAppsilon/d6f94786de753f63d1ad050f6c58ff75.js"></script> Open <code>alb.tf</code> in your text editor, paste in the configuration below and save the file. <script src="https://gist.github.com/MicahAppsilon/9d41d0678cf8b7684db580daec38ef89.js"></script> We are using the ACM community Terraform module here. The module creates an ACM certificate and validates it using Route53 DNS. ACM modules rely on <code>aws_route53_zone</code> data retrieved from your account, hence we need to specify the existing zone name here. The <a href="https://registry.terraform.io/modules/terraform-aws-modules/acm/aws/latest" target="_blank" rel="nofollow noopener noreferrer">module documentation</a> contains pertinent information and updates. The rest of the file configures <code>https</code> access to our RStudio Workbench instance defined in <code>ec2.tf</code>. creates a load balancer, and a human-friendly DNS entry. All <code>http</code> access will be redirected to <code>https</code>. <h3><a id="user-content-initialize-the-directory" class="anchor" href="https://github.com/Appsilon/blog-posts-markdown/tree/master/deploying-rsw-to-aws-using-terraform#initialize-the-directory" aria-hidden="true"></a>Initialize the directory</h3> When you create a new configuration — or check out an existing configuration from version control — you need to initialize the directory with <code>terraform init</code>. Initializing a configuration directory downloads and installs the providers defined in the configuration, which in this case is the <code>aws</code> provider. Initialize the directory. <script src="https://gist.github.com/MicahAppsilon/3a5f405d50f6a56d7c09f9cad3e02a56.js"></script> Next, execute <code>terraform validate</code> to make sure the configuration is valid. <script src="https://gist.github.com/MicahAppsilon/ca95a0c4f22aeb3e329e5de3893e7a45.js"></script> <h3 id="anchor-11">Create infrastructure</h3> Apply the configuration now with the <code>aws-vault exec appsilon -- terraform apply</code> command. Terraform will print output similar to what is shown below. We have truncated some of the output to save space. <blockquote>Beware: <code>apply</code> command should be executed from within <code>aws-vault</code> environment so that Terraform can access your AWS account.</blockquote> Before it applies any changes, Terraform prints out the execution plan which describes the actions Terraform will take in order to change your infrastructure to match the configuration. Terraform will now pause and wait for your approval before proceeding. If anything in the plan seems incorrect or dangerous, it is safe to abort here with no changes made to your infrastructure. In this case, the plan is acceptable, so type yes at the confirmation prompt to proceed. Executing the plan will take a few minutes since Terraform waits for the EC2 instance to become available. You have now created infrastructure using Terraform! Visit the <a href="https://console.aws.amazon.com/ec2/v2/home?region=eu-west-1#Instances:sort=instanceId" target="_blank" rel="nofollow noopener noreferrer">EC2 console</a> and find your new instance up and running. Use the EC2 console to grab the IP address in case you would want to connect to it over SSH. <blockquote>Note: Per the <code>aws</code> provider block, your instance was created in the <code>eu-west-1</code> region. Ensure that your AWS Console is set to this region.</blockquote> You can also access your RStudio Workbench instance directly over IP or over DNS (if deployed with optional part - Application Load Balancer). <img class="aligncenter wp-image-8174" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0200f48b83fcd2e569ba4_over_dns.webp" alt="screen capture of RStudio auth-sign-in" width="587" height="431" /> <h3 id="anchor-12">Demo</h3> <a href="https://asciinema.org/a/431374" target="_blank" rel="noopener noreferrer"><img class="" src="https://asciinema.org/a/431374.svg" width="689" height="560" /></a> <h3 id="anchor-13">Destroying infrastructure</h3> The <code>terraform destroy</code> command terminates resources managed by your Terraform project. This command is the inverse of <code>terraform apply</code> in that it terminates all the resources specified in your Terraform state. It does not destroy resources running elsewhere that are not managed by the current Terraform project. Use it with precautions to remove the AWS resources you have created during this tutorial to reduce your AWS bill. <blockquote>Beware: <code>destroy</code> command should be executed from within <code>aws-vault</code> environment so that Terraform can access your AWS account.</blockquote> <h3>Repository with Code</h3> You can find the complete example of the Terraform code above in Appsilon's Github repo <a href="https://github.com/Appsilon/terraform-aws-rstudio-workbench-example" target="_blank" rel="noopener noreferrer">Terraform AWS RStudio Workbench example</a>. <h2>Empower your team through DevOps in the cloud</h2> Whether you're well entrenched in your cloud transformation or are just starting out, the most important thing you can do is keep learning. DevOps through the cloud can improve agility and time to market, but there's always room for improvement. You might consider decoupling applications from physical resources and designing them to be cloud-native. Or make the most of <a href="https://appsilon.com/top-5-tips-for-rstudio-desktop-workbench/" target="_blank" rel="noopener noreferrer">RStudio Workbench</a> by automating performance testing to validate efficient use of resources. You can also be selective when migrating existing applications to prioritize and make sure the cost is justified. And when in doubt, it's always advisable to seek guidance to ensure your project is set up with the right tools, provide DevOps and cloud training, and make sure a proper framework is in place. If you're not sure how to proceed, reach out to <a href="https://appsilon.com/" target="_blank" rel="noopener noreferrer">Appsilon</a>. Appsilon has a well-balanced team of R/Shiny specialists, frontend developers, full-stack engineers, DevOps, and business analysts with years of experience developing ML frameworks and <a href="https://clutch.co/profile/appsilon" target="_blank" rel="noopener noreferrer">data science solutions for Fortune 500 companies</a>. We enjoy solving tough enterprise challenges and understand that unique problems require custom solutions.