We have worked on numerous R Shiny application projects and have learned a few things along the way. With the following tips, I will save you and your departments time and money and put you in a position for massive success with your R Shiny project.
A bit about us, we are Appsilon Data Science Consulting; we specialize in building data products for the enterprise space. One of our core competencies is building decision support systems with R Shiny — we have delivered over 50 commercial Shiny dashboards. And we have pioneered the space of supporting concurrent users with Shiny, demonstrating that you can have up to 700. The previous conventional wisdom was that one could host only several users at a time. So we have learned a great deal about the best practices in preparing a Shiny project from idea to production in a structured corporate environment. In this article, I will discuss what specific steps you can take to achieve reliability without sacrificing speed of development. I will show you how to avoid costly mistakes and put your project in a position to succeed.
This question should be asked for every technology and Shiny is no different. There is a sweet spot of projects where Shiny is the perfect choice. To find it let’s compare it with other available tools.
Shiny has unmatched speed of development. I remember that the first deal we won with a Fortune 500 company. We won it thanks to implementing the proof of concept of the application they asked for in 24 hours. This is something that we are not able to do with any other framework or language.
When compared to spreadsheets, you get a beautiful user interface as well as automation. You can automate anything in the background. You get easy work on shared resources, scalability, reproducibility, and most importantly, source code. And as Marcin Dubel, one of our software engineers, noted in this article, you get additional features like plotting and expandable rows that you can’t find with Excel and Google Sheets.
Source code is also the main benefit when compared to Business Intelligence applications like Tableau and PowerBI, but additionally, you get almost no running fees, full customization and better Machine Learning (ML) possibilities. So whenever you need a data product that is more complex than a spreadsheet or a BI dashboard, and at the same time you’re not 100% sure of the scope and requirements, Shiny is the perfect choice.
Now that we agree that we want to use Shiny…
Imagine you have a Shiny application that you want to build. We will go through the process of creating it together.
The first lesson is a really simple one. Don’t over-spend on your project. Creating a tool that you want to deliver to a large amount of people is going to be a big project. Instead, start with a small group of folks who are willing to help you, that are going to use the tool itself. And continue to work with them until you get to the point where they are keen on recommending the tool to their peers. Once you get there, you can go all in. This is really crucial because if you build a tool without speaking directly with the stakeholders, your users, you’re going to build something that’s not useful. You will not have buy-in from the users. Basically you will create a huge project that is not useful for the organization and was quite expensive to produce.
Architecture can bring joy to our lives and simplify our lives. The same goes for code. If we use modules to separate the logic into smaller, independent parts, this allows us to maintain the code more easily and also verify the correctness of the module.
You can extract business logic, and run tests to verify that the business logic is correct.
If you load data to the Shiny app from the server or — even worse — if you load the session for each user — that’s not the way to go if you want to scale.
Ideally, we want to have the right mix of tests. I’ve seen engineers, but also business folks, having a love/hate relationship with tests. They love to have them, but they hate to create them.
The key here is understanding that tests pay off when done well. You need to have the largest group conducting very simple unit tests that verifies only basic concepts of the logic. And then fewer and fewer complex tests. Until you get to the point of a few end-to-end tests. This allows you to minimize the time needed for manual testing.
Sadly, this is not yet the standard for Shiny apps. What we see more often is an anti-pattern called the “Test Cone” (below).
This is a negative example — few or no unit tests and a lot of end-to-end (e2e) tests. Running end-to-end tests requires a lot of time, so down the road people stop to run them and to write new tests. And not having tests is the easiest way to get bugs. We don’t want bugs in production. And we want to avoid as many of them as possible.
But we need to validate more than the logic and the source code. We also need to validate the data.
Lesson four is about validating data before running Shiny. And this should be done automatically. You should set an owner for each dataset to take responsibility and react properly if any test fails. We also need to set up a logging infrastructure and make our app useful in case anything goes wrong – You want to gather both good uses of and errors in your app. You also want to make your app fault-tolerant. If the API we use in the app goes down, the app should still be useful to the greatest extent possible. We don’t want business folks to go through an R stack in their browsers.
Although this is the standard for software engineering in general, this is not standard yet for R and Shiny.
To prevent errors, you can set up a daily data validation email to the data owners, or text them with alerts. You can set up continuous integration to validate code and style automatically as well. You can use lintr for that. Sometimes it also makes sense to show data status directly in the app for the end-users to notify them which data set they’re working on. You can use the open source package shiny.info for that.
We are hyper-vigilant about data quality, so we even identified the different kinds of problems that can happen to the data. Here is a more detailed cheat sheet and explanation about what kinds of tests there are. If you set up these checks you will avoid many errors in the production environment.
So the app is up and running. It’s well tested. We’re ready to go all-in. Typically you want to run the Shiny app using multiple processes, and also quite often using multiple servers. You can use Amazon Web Services (AWS) for that. To run multiple processes you can use RStudio Connect. Once you have the setup, you need to run performance tests.
The below is a real example of a deployment architecture. The load balancer distributes work among applications in RStudio Connect. The master node is responsible for performing the test and gathering statistics to summarize them. To make it even more sweet, we have an R script that gathers the logs and creates a report for us from the performance tests.
Finally – the last lesson. Deployment and automation. We need at least 2 deployment environments, so that developers can test on a different server than the one that the users are on. Automated deployment pays off very quickly and you can magically operate on cloud resources, and it’s very easy to roll back or to create another instance of an app in several seconds.
So we are at the point where Shiny is much more than just a beautiful user interface. We have a list of engineering tasks that need to be done to make the app really useful and ready for production in an enterprise environment. But the good news is that it’s all possible with R. We can deploy such apps with Shiny. We can make apps for hundreds of users, for whole departments. And we can make them bulletproof as well.