Why Remote Data Science Teams Should Use RStudio (Posit) Connect

Much more of the world's workforce is working remotely than ever before. This new normal of remote work is likely to remain the status quo even if the global pandemic situation dramatically improves. Data Science teams are no exception. Distributed teams bring unique challenges, and data science team leaders may be looking for new tools. In this article, we’ll explain how RStudio (Posit) Connect helps organizations properly organize remote data science teams and overcome the typical inefficiencies of remote work. We'll also show you some interesting features of RStudio Connect that you might not have heard about previously.  Some common problems for distributed teams include: <ul><li>Onboarding new users, teams, and “teams of teams” </li><li>Version control and arriving at a “single source of truth”</li><li>Organizational overhead</li><li>Security issues</li></ul> At Appsilon we’ve grappled with these challenges for years as we’ve promoted a remote-work-friendly culture since the early days of our company. Our data scientists and developers collaborate with each other daily from at least three cities in two different countries, and we frequently work with clients around the globe in faraway time zones. We’ve found that <b>RStudio Connect</b> is a tool that can aid all of the parties involved with Data Science in an organization: <b>producers</b> of artifacts, <b>consumers</b> of artifacts, and <b>IT Administrators</b>. RStudio Connect empowers employees to consume and distribute information within an organization and reduces a lot of unnecessary labor going into these processes. Some features of RStudio Connect that we'll cover in this article include:  <ul><li><a href="#onboarding">Onboarding New Members to the DS Ecosystem</a></li><li><a href="#simplify">Simplifying the Role of the System Administrator</a></li><li><a href="#task-scheduling">Task Scheduling</a></li><li><a href="#version-control">Version Control</a></li><li><a href="#custom-emails">Custom Emails</a></li><li><a href="#security">Enhanced Security</a></li></ul> Note: At the time of writing this article, Posit PBC was RStudio PBC. We use RStudio and Posit interchangeably in this text (e.g. RStudio Connect == Posit Connect). <h2 id="onboarding">Getting Started and Onboarding New Members to the DS Ecosystem</h2> One of the first problems that an organization may encounter in a remote work scenario is <b>onboarding new individuals and teams</b> to the data science ecosystem. RStudio Connect shortens the time it takes to get remote teams up and running with sharing and consuming R/Shiny applications. One of the main reasons for this is that much of the infrastructure work is completed for you automatically – there’s no need to design and maintain your own internal solutions for problems like user authentication. We've seen organizations spend vast amounts of developer time endlessly replicating features that are included automatically in RStudio Connect.  Maybe an organization does not have IT Administrator support for its data science team and users. In this case, the data scientists themselves may have to deploy and manage RStudio Connect. Connect’s developers had this use case in mind. RStudio has provided a “Jump Start Examples” tutorial within Connect to help Data Scientists adapt to their new environment and quickly learn best practices. This reduces the hands-on work that team leaders have to do to onboard new users and ensures that everyone gets started with the same common knowledge of the ecosystem and its capabilities.  <img class="wp-image-3756 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b39d3ac3a7c4b7c22c11ba_pasted-image-0.webp" alt="Jump Start Examples" width="931" height="600" /> <em>Jump Start Examples [Source: RStudio]</em> <h2 id="simplify">Simplifying the Role of the System Administrator</h2> RStudio Connect can help simplify the role of the system administrator by offering tools to manage visitor load: <ul><li style="font-weight: 400;">Detailed metrics for the server and the associated processes</li><li style="font-weight: 400;">Logs for all processes spawned by Connect</li><li style="font-weight: 400;">Secure deployments and interactions with artifacts using SSL/TLS</li></ul> Then there is the issue of access management. A recent release (1.8.0) makes it even easier to support data science teams with one enhancement in particular: seamless single sign-on (SSO) integration. RStudio Connect can integrate with the SAML Identity Provider (or IdP) of your company’s choice to perform user authentication and, optionally, user/group membership management. In the SAML world, RStudio Connect fulfills the role of the service provider (or SP). Plus, Every RStudio Connect user account is configured with a role that controls their default capabilities on the system. Data scientists, analysts, and others working in R will most likely want “publisher” accounts. Other users are likely to need only “viewer” accounts.  <h2 id="task-scheduling">Task Scheduling with RStudio (Posit) Connect</h2> One powerful feature of RStudio Connect is the ability to schedule tasks. These tasks can be everything from simple ETL jobs to daily reports. Version 1.8.0 makes it easier for administrators to track these tasks across all publishers in a single place. This new view makes it possible to identify conflicts or times when <b>the server is being overbooked</b>. <img class="size-full wp-image-3755" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d663cba09eb75281ad76_e466a2c4_pasted-image-0-1.webp" alt="RStudio Connect Task Scheduling" width="1600" height="894" /> <em>RStudio Connect Task Scheduling [Source: RStudio]</em> <h2 id="version-control">Version Control and Single Source of Truth</h2> An important reason to use Rstudio Connect is the <b>single source of truth</b> feature. It is built around the  “<a href="https://rstudio.github.io/pins" target="_blank" rel="noopener noreferrer">pins</a>” R package and provides a way for R users to easily share resources using RStudio Connect. Your resources may be text files (CSV, JSON, etc.), R objects (.Rds, .Rda, etc.), or any other type of files you want to share. Sharing these files can be useful in many situations, such as when multiple pieces of content require the same data. Rather than copying that data, each piece of content references a “single source” of truth hosted on RStudio Connect. When content depends on processed datasets or model objects that need to be regularly updated, rather than redeploying the content each time the information changes, use a pinned resource and update only the dataset or model. The update can be automated using a scheduled R Markdown document. Other deployed content will read the newest data on each run. Connect is also helpful when you need to share resources that aren’t structured for traditional tools like databases. Models saved as R objects aren’t easy to store in a database. Rather than using email or file systems to share these R objects, use RStudio Connect to host these resources as pins. This ensures that everyone has easy access to the R objects in a single place. A single source of truth means <b>time savings</b> for all participants, wherever they may be located. Read more about how data quality and data validation saves time and resources <a href="https://appsilon.com/data-quality/" target="_blank" rel="noopener noreferrer">here</a>.  <h2 id="custom-emails">Custom Emails: Reduce Manual Tasks</h2> So now your data science ecosystem is up and running. Next – sending plots, tables, and results inline in emails is a powerful way for data scientists to make an impact. RStudio Connect allows you to create custom emails to send daily reminders, conditional alerts, and to track key metrics. A recent release of the<a href="https://cran.r-project.org/web/packages/blastula/index.html" target="_blank" rel="noopener noreferrer"> blastula package</a> makes it even easier for data scientists to specify these emails programmatically: <figure class="highlight"> <pre><code class="language-r" data-lang="r">if (demand_forecast &gt; 1000) {   render_connect_email(input = "alert-supply-team-email.Rmd") %&gt;%   attach_connect_email(     subject = sprintf("ALERT: Forecasted increase of %g units", increase),     attach_output = TRUE,     attachments = c("demand_forecast_data.csv")   ) <br>} else {   suppress_scheduled_email() }</code></pre> </figure> Imagine sending emails about updates to datasets and dashboards <b>manually</b> for a year or more. Now imagine sharing R Shiny applications (and/or Plumber APIs, Pins, R Markdown docs, etc.) as easily as you share memes on Instagram. Which scenario is more appealing?  <h2 id="security">Enhanced Security</h2> With the deployment of a new network – a whole new ecosystem really – <b>security</b> should be a primary concern. For instance, you need to be thinking about preventing Brute Force and Dictionary attacks. By default, RStudio Connect allows as many login attempts as it can handle from any source when using the PAM, LDAP, and Password authentication providers. Users will be able to log in directly by entering their username and password. Setting the<a href="https://docs.rstudio.com/connect/1.8.2/admin/appendix/configuration/#Authentication.ChallengeResponseEnabled" target="_blank" rel="noopener noreferrer"> Authentication.ChallengeResponseEnabled</a> flag to true enables a CAPTCHA form in the login screen and requires that CAPTCHA be solved in order to authenticate. Both visual and audio CAPTCHA challenges are provided for accessibility needs. <blockquote><strong>Discover the benefits of <a href="https://appsilon.com/why-use-rstudio-connect-authentication/" target="_blank" rel="noopener noreferrer">RStudio Connect Authentication and how to set it up</a>.</strong></blockquote> Additionally, we recommend setting up separate instances of RStudio Connect depending on their purpose - one public instance and a second instance accessible only from the internal infrastructure. This means that you can host publicly accessible demos of Shiny dashboards while keeping your internal RStudio Connect infrastructure inaccessible to unauthorized access. This way it’s easy to show off your work to clients or provide public access without compromising on security.  <h2>Concluding RStudio (Posit) Connect for remote teams</h2> Just as Olga Mierzwa-Sulima points out in her article on Remote Data Science Team Best Practices, distributed and non-distributed Data Science teams alike can benefit from efficient workflows and collaborative tools. We’ve found that RStudio Connect has solved many of our workflow problems with a wide array of available tools and packages. Further, when sharing your data work is as simple as a couple of clicks, you can raise the data literacy of your entire organization by increasing access to meaningful data insights.  We encourage other Data Science teams around the world to consider reaching out to certified RStudio partners for further consultation to make sure that RStudio Connect is the right choice for you. As an <a href="https://appsilon.com/appsilon-data-science-is-now-an-rstudio-full-service-certified-partner/" target="_blank" rel="noopener noreferrer">RStudio Full Certified Partner</a>, we’re well-positioned to help you make the leap or provide further advice. Reach out to us at hello@wordpress.appsilon.com. <h2>Resources</h2><ul><li>CAPTCHA configuration<ul><li><a href="https://docs.rstudio.com/connect/1.8.2/admin/appendix/configuration/#Authentication.ChallengeResponseEnabled" target="_blank" rel="noopener noreferrer">https://docs.rstudio.com/connect/1.8.2/admin/appendix/configuration/#Authentication.ChallengeResponseEnabled</a></li></ul> </li> <li>Blastula package <ul><li><a href="https://cran.r-project.org/web/packages/blastula/index.html" target="_blank" rel="noopener noreferrer">https://cran.r-project.org/web/packages/blastula/index.html</a></li></ul> </li> <li>RStudio Pins package <ul><li><a href="https://rstudio.github.io/pins" target="_blank" rel="noopener noreferrer">https://rstudio.github.io/pins</a></li></ul> </li> </ul> <h2>Learn More</h2><ul><li><a href="https://appsilon.com/remote-data-science-team-best-practices-scrum-github-and-docker/">Remote Data Science Team Best Practices: Scrum, GitHub, Docker, and More</a></li><li><a href="https://blog.rstudio.com/2020/07/21/4-tips-to-make-your-shiny-dashboard-faster/?utm_source=appsilon_blog&amp;utm_medium=blog&amp;utm_campaign=appsilon" target="_blank" rel="noopener noreferrer">4 Tips to Make Your Shiny Dashboard Faster</a></li><li><a href="https://appsilon.com/how-to-write-production-ready-r-code/" target="_blank" rel="noopener noreferrer">Video: How to Write Production-Ready R Code</a></li><li>Try out Appsilon's R Shiny <a href="http://shiny.tools">open source</a> packages</li></ul>

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
shiny dashboards
r
data analytics
rstudio
infrastructure
tutorials