R for Programmers - 7 Essential R Packages for Programmers
<em><strong>Updated</strong>: October 15, 2022.</em> <span data-preserver-spaces="true">R is a programming language created by Ross Ihaka and Robert Gentleman in 1993. It was designed for analytics, statistics, and data visualizations. Nowadays, R can handle anything from basic programming to machine learning and deep learning. Today we will explore how to approach learning and practicing R for programmers.</span> <span data-preserver-spaces="true">As mentioned before, R can do almost anything. It performs the best when applied to anything data related - such as statistics, data science, and machine learning.</span> <span data-preserver-spaces="true">The language is most widely used in academia, but many large companies such as Google, Facebook, Uber, and Airbnb use it daily.</span> <span data-preserver-spaces="true">This R for programmers guide will show you how to:</span> <ul><li><a href="#load-datasets">Load datasets</a></li><li><a href="#scrape-webpages">Scrape Webpages</a></li><li><a href="#rest-apis">Build REST APIs</a></li><li><a href="#data-analysis">Analyze Data and Show Statistical Summaries</a></li><li><a href="#data-visualization">Visualize Data</a></li><li><a href="#machine-learning">Train a Machine Learning Model</a></li><li><a href="#web-applications">Develop Simple Web Applications</a></li><li><a href="#markdown">Create Interactive Markdown Documents with Quarto</a></li></ul> <hr /> <h2 id="load-datasets"><span data-preserver-spaces="true">Load datasets</span></h2> <span data-preserver-spaces="true">To perform any sort of analysis, you first have to load the data. With R, you can connect to any data source you can imagine. A simple Google search will yield either a premade library or an example of API calls for any data source type.</span> <span data-preserver-spaces="true">For a simple demonstration, we'll see how to load CSV data. You can find the Iris dataset in CSV format on </span><a class="editor-rtfLink" href="https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">this link</span></a><span data-preserver-spaces="true">, so please download it to your machine. Here's how to load it in R:</span> <pre><code class="language-r">iris <- read.csv("iris.csv") head(iris)</code></pre> <span data-preserver-spaces="true">And here's what the <code>head</code> function outputs - the first six rows:</span> <img class="size-full wp-image-6051" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5ee0a40386eec0ab9d8_f62ab9d7_1.webp" alt="Image 1 - Iris dataset head" width="1084" height="330" /> Image 1 - Iris dataset head <span data-preserver-spaces="true">Did you know there's no need to download the dataset? You can load it from the web:</span> <pre><code class="language-r">iris <- read.csv("https://gist.githubusercontent.com/netj/8836201/raw/6f9306ad21398ea43cba4f7d537619d0e07d5ae3/iris.csv") head(iris)</code></pre> <span data-preserver-spaces="true">That's all great, but what if you can't find an appropriate dataset? That's where web scraping comes into play.</span> <h2 id="scrape-webpages"><span data-preserver-spaces="true">Web scraping</span></h2> <span data-preserver-spaces="true">A good dataset is difficult to find, so sometimes you have to be creative. Web scraping is considered one of the more "creative" ways of collecting data, as long as you don't cross any legal boundaries. </span> <span data-preserver-spaces="true">In R, the <code>rvest</code> package is used for the task. As some websites have strict policies against scraping, we need to be extra careful. There are pages online designed for practicing web scraping, so that's good news for us. We will scrape this </span><a class="editor-rtfLink" href="http://books.toscrape.com/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">page</span></a><span data-preserver-spaces="true"> and retrieve book titles in a single category:</span> <pre><code class="language-r">library(rvest) <br>url <- "http://books.toscrape.com/catalogue/category/books/travel_2/index.html" titles <- read_html(url) %>% html_nodes("h3") %>% html_nodes("a") %>% html_text()</code></pre> <span data-preserver-spaces="true">The <code>titles</code> variable contains the following elements:</span> <img class="size-full wp-image-6052" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b02193e606e242ff08fef4_002.webp" alt="Image 2 - Web Scraping example in R" width="1008" height="104" /> Image 2 - Web Scraping example in R <span data-preserver-spaces="true">Yes - it's that easy. Just don't cross any boundaries. Check if a website has a public API first - if so, there's no need for scraping. If not, check their policies.</span> <h2 id="rest-apis"><span data-preserver-spaces="true">Build REST APIs</span></h2> <span data-preserver-spaces="true">We can't have an R for Programmers article without discussing REST APIs. With practical machine learning comes the issue of model deployment. Currently, the best option is to wrap the predictive functionality of a model into a REST API. Showing how to do that effectively would require at least an article or two, so we will cover the basics today.</span> <span data-preserver-spaces="true">In R, the <code>plumber</code> package is used to build REST APIs. Here's the one that comes in by default when you create a <code>plumber</code> project:</span> <pre><code class="language-r">library(plumber) <br>#* @apiTitle Plumber Example API <br>#* Echo back the input #* @param msg The message to echo #* @get /echo function(msg = "") { list(msg = paste0("The message is: '", msg, "'")) } <br>#* Plot a histogram #* @png #* @get /plot function() { rand <- rnorm(100) hist(rand) } <br>#* Return the sum of two numbers #* @param a The first number to add #* @param b The second number to add #* @post /sum function(a, b) { as.numeric(a) + as.numeric(b) }</code></pre> <span data-preserver-spaces="true">The API has three endpoints:</span> <ol><li><span data-preserver-spaces="true"><code>/echo</code> - returns a specified message in the response </span></li><li><span data-preserver-spaces="true"><code>/plot</code> - shows a histogram of 100 random normally distributed numbers</span></li><li><span data-preserver-spaces="true"><code>/sum</code> - sums two numbers</span></li></ol> <span data-preserver-spaces="true">The <code>plumber</code> package comes with Swagger UI, so you can explore and test your API in the web browser. Let's take a look:</span> <img class="size-full wp-image-6053" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5ef49b91708eb32e44f_1a45d7c8_003.gif" alt="Image 3 - Plumber REST API Showcase" width="1128" height="826" /> Image 3 - Plumber REST API Showcase <h2 id="data-analysis"><span data-preserver-spaces="true">Statistics and Data Analysis</span></h2> <span data-preserver-spaces="true">This is one of the biggest reasons why R is so popular. There are entire books and courses on this topic, so we will only go over the basics. We intend to cover more advanced concepts in the following articles, so stay tuned to our blog if that interests you.</span> <span data-preserver-spaces="true">Most of the data manipulation in R is done with the <code>dplyr</code> package. Still, we need a dataset to manipulate with - </span><em><span data-preserver-spaces="true">Gapminder</span></em><span data-preserver-spaces="true"> will do the trick. It is available in R through the <code>gapminder</code> package. Here's how to load both libraries and explore the first couple of rows:</span> <pre><code class="language-r">library(dplyr) library(gapminder) <br>head(gapminder)</code></pre> <span data-preserver-spaces="true">You should see the following in the console:</span> <img class="size-full wp-image-6054" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5f025b5dfc1d5c88791_f0992af5_4.webp" alt="Image 4 - Head of Gapminder dataset" width="1026" height="414" /> Image 4 - Head of Gapminder dataset <span data-preserver-spaces="true">To perform any kind of statistical analysis, you could use R's built-in functions such as <code>min</code>, <code>max</code>, <code>range</code>, <code>mean</code>, <code>median</code>, <code>quantile</code>, <code>IQR</code>, <code>sd</code>, and <code>var</code>. These are great if you need something specific, but a simple call to the <code>summary</code> function will provide you with enough information, most likely:</span> <pre><code class="language-r">summary(gapminder)</code></pre> <span data-preserver-spaces="true">Here's a statistical summary of the Gapminder dataset:</span> <img class="size-full wp-image-6055" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5f0be40ac3284ae209d_3b764523_5.webp" alt="Image 5 - Statistical summary of the Gapminder dataset" width="1848" height="334" /> Image 5 - Statistical summary of the Gapminder dataset <span data-preserver-spaces="true">With <code>dplyr</code>, you can drill down and keep only the data of interest. Let's see how to show only data for Poland and how to calculate the total GDP:</span> <pre><code class="language-r">gapminder %>% filter(continent == "Europe", country == "Poland") %>% mutate(TotalGDP = pop * gdpPercap)</code></pre> <span data-preserver-spaces="true">The corresponding results are shown in the console:</span> <img class="size-full wp-image-6056" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b2708ee7bf5d6c2d3035b0_6.webp" alt="Image 6 - History data and total GDP for Poland" width="1216" height="586" /> Image 6 - History data and total GDP for Poland <h2 id="data-visualization"><span data-preserver-spaces="true">Data Visualization</span></h2> <span data-preserver-spaces="true">R is known for its impeccable data visualization capabilities. The <code>ggplot2</code> package is a good starting point because it's easy to use and looks great by default. We'll use it to make a couple of basic visualizations on the Gapminder dataset.</span> <span data-preserver-spaces="true">To start, we will create a line chart comparing the total population in Poland over time. We will need to filter out the dataset first, so it only shows data for Poland. Below you'll find a code snippet for library imports, dataset filtering, and data visualization:</span> <pre><code class="language-r">library(dplyr) library(gapminder) library(scales) library(ggplot2) <br>poland <- gapminder %>% filter(continent == "Europe", country == "Poland") <br>ggplot(poland, aes(x = year, y = pop)) + geom_line(size = 2, color = "#0099f9") + ggtitle("Poland population over time") + xlab("Year") + ylab("Population") + expand_limits(y = c(10^6 * 25, NA)) + scale_y_continuous( labels = paste0(c(25, 30, 35, 40), "M"), breaks = 10^6 * c(25, 30, 35, 40) ) + theme_bw()</code></pre> <span data-preserver-spaces="true">Here is the corresponding output:</span> <img class="size-full wp-image-6057" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b2708ee7bf5d6c2d303614_7.webp" alt="Image 7 - Poland population over time" width="2134" height="1382" /> Image 7 - Poland population over time <span data-preserver-spaces="true">You can get a similar visualization with the first two code lines - the others are added for styling.</span> <span data-preserver-spaces="true">The <code>ggplot2</code> package can display almost any data visualization type, so let's explore bar charts next. We want to visualize the average life expectancy over European countries in 2007. Here is the code snippet for dataset filtering and visualization:</span> <pre><code class="language-r">europe_2007 <- gapminder %>% filter(continent == "Europe", year == 2007) <br>ggplot(europe_2007, aes(x = reorder(country, -lifeExp), y = lifeExp)) + geom_bar(stat = "identity", fill = "#0099f9") + geom_text(aes(label = lifeExp), color = "white", hjust = 1.3) + ggtitle("Average life expectancy in Europe countries in 2007") + xlab("Country") + ylab("Life expectancy (years)") + coord_flip() + theme_bw()</code></pre> <span data-preserver-spaces="true">Here's how the chart looks like:</span> <img class="size-full wp-image-6058" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b2708f6402741d1b155573_8.webp" alt="Image 8 - Average life expectancy in European countries in 2007" width="2142" height="1386" /> Image 8 - Average life expectancy in European countries in 2007 <span data-preserver-spaces="true">Once again, the first two code lines for the visualization will produce similar output. The rest are here to make it look better.</span> <h2 id="machine-learning"><span data-preserver-spaces="true">Training a Machine Learning Model</span></h2> <span data-preserver-spaces="true">Another must-have point in any R for programmers guide is machine learning. The <code>rpart</code> package is great for machine learning, and we will use it to make a classifier for the well-known Iris dataset. The dataset is built into R, so you don't have to worry about loading it manually. The <code>caTools</code> is used for train/test split.</span> <span data-preserver-spaces="true">Here's how to load in the libraries, perform the train/test split, and fit and visualize the model:</span> <pre><code class="language-r">library(caTools) library(rpart) library(rpart.plot) <br>set.seed(42) sample <- sample.split(iris, SplitRatio = 0.75) iris_train = subset(iris, sample == TRUE) iris_test = subset(iris, sample == FALSE) <br>model <- rpart(Species ~., data = iris_train, method = "class") rpart.plot(model)</code></pre> <span data-preserver-spaces="true">The snippet shouldn't take more than a second or two to execute. Once done, you'll be presented with the following visualization:</span> <img class="size-full wp-image-6059" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b2708f0bafec45eedc6091_9.webp" alt="Image 9 - Decision tree visualization for Iris dataset" width="954" height="402" /> Image 9 - Decision tree visualization for Iris dataset <span data-preserver-spaces="true">The above figure tells you everything about the decision-making process of the algorithm. We can now evaluate the model on previously unseen data (test set). Here's how to make predictions, print confusion matrix, and accuracy:</span> <pre><code class="language-r">preds <- predict(model, iris_test, type = "class") <br>confusion_matrix <- table(iris_test$Species, preds) print(confusion_matrix) <br>accuracy <- sum(diag(confusion_matrix)) / sum(confusion_matrix) print(accuracy)</code></pre> <img class="size-full wp-image-6060" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5f35c2b4c98eb805a2b_1b2474a8_10.webp" alt="Image 10 - Confusion matrix and accuracy on the test subset" width="377" height="165" /> Image 10 - Confusion matrix and accuracy on the test subset <span data-preserver-spaces="true">As you can see, we got a 95% accurate model with only a couple of lines of code. </span> <h2 id="web-applications"><span data-preserver-spaces="true">Develop Simple Web Applications</span></h2> <span data-preserver-spaces="true">At </span><a class="editor-rtfLink" href="https://wordpress.appsilon.com/shiny" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Appsilon</span></a><span data-preserver-spaces="true">, we are global leaders in R Shiny, and we've developed some of the world's most </span><a class="editor-rtfLink" href="http://demo.appsilon.ai/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">advanced R Shiny dashboards</span></a><span data-preserver-spaces="true">. It is a go-to package for developing web applications.</span> <span data-preserver-spaces="true">For the web app example in this R for programmers guide, we'll see how to make simple interactive dashboards that display a scatter plot of the two user-specified columns. The dataset of choice is also built into R - <code>mtcars</code>.</span> <span data-preserver-spaces="true">Here is a script for the Shiny app:</span> <pre><code class="language-r">library(shiny) library(ggplot2) <br>ui <- fluidPage( sidebarPanel( width = 3, tags$h4("Select"), varSelectInput( inputId = "x_select", label = "X-Axis", data = mtcars ), varSelectInput( inputId = "y_select", label = "Y-Axis", data = mtcars ) ), mainPanel( plotOutput(outputId = "scatter") ) ) <br>server <- function(input, output) { output$scatter <- renderPlot({ col1 <- sym(input$x_select) col2 <- sym(input$y_select) <br> ggplot(mtcars, aes(x = !!col1, y = !!col2)) + geom_point(size = 6, color = "#0099f9") + ggtitle("MTCars Dataset Explorer") + theme_bw() }) } <br>shinyApp(ui = ui, server = server)</code></pre> <span data-preserver-spaces="true">And here's the corresponding Shiny app:</span> <img class="size-full wp-image-6061" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0219ed89fb3d6dd308544_11.gif" alt="Image 11 - MTCars Shiny app" width="1196" height="476" /> Image 11 - MTCars Shiny app <span data-preserver-spaces="true">This dashboard is as simple as they come, but that doesn't mean you can't develop beautiful-looking apps with Shiny.</span> <blockquote><span data-preserver-spaces="true">Looking for inspiration? </span><a class="editor-rtfLink" href="https://demo.appsilon.ai/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Take a look at our Shiny App Demo Gallery.</span></a></blockquote> <h2 id="markdown">Create Interactive Markdown Documents with Quarto</h2> Think of R Quarto as a next-gen version of R Markdown. It allows you to create high-quality articles, reports, presentations, PDFs, books, Word documents, ePubs, websites, and even more - all straight from R. To get started, please refer to our <a href="https://appsilon.com/r-quarto-tutorial/">official R Quarto guide</a>. In this section, we'll only show you how to make a Markdown document, and not how to export it. The mentioned guide dives deep into that as well. In RStudio, click on the plus button and select <em>Quarto Document</em>. The setup process is simple, just add the title and the author, everything else should be left as is. The code snippet below shows you how to visualize the MT Cars dataset both as a table and as a chart with R Quarto: <pre><code class="language-r">--- title: "Demo Document" author: "Dario Radečić" format: html editor: visual --- <br>## MT Cars Dataset <br>A well-known dataset by data science and machine learning professionals. <br>```{r} head(mtcars) ``` <br>## Data visualization <br>You can easily visualize data with R Quarto Markdown documents: @fig-mtcars shows a relationship between `wt` and `mpg`: <br>```{r} #| label: fig-mtcars #| fig-cap: Vehicle weight per 1000lbs vs. Miles per gallon <br>library(ggplot2) <br>ggplot(mtcars, aes(x = wt, y = mpg)) + geom_point(size = 5, aes(color = cyl)) ```</code></pre> Here's what the document looks like in visual mode: <img class="size-full wp-image-16155" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b27099243c2f7f2e63d9b8_12-2.webp" alt="Image 12 - R Quarto markdown document" width="1312" height="1446" /> Image 12 - R Quarto markdown document It doesn't get easier than that. There are so many things you can do with Quarto and we can't cover them here, but additional articles on Appsilon blog have you covered. <blockquote>Want to learn more about R Quarto? <a href="https://appsilon.com/r-quarto-tutorial/" target="_blank" rel="noopener">Read our complete guide on Appsilon blog</a>.</blockquote> And that's all for today. Let's wrap things up next. <hr /> <h2><span data-preserver-spaces="true">Summing up R for Programmers</span></h2> <span data-preserver-spaces="true">To conclude - R can do almost anything that a general-purpose programming language can do. The question isn't "Can R do it", but instead "Is R the right tool for the job". If you are working on anything data-related, then yes, R can do it and is a perfect candidate for the job.</span> <span data-preserver-spaces="true">If you don't intend to work with data in any way, shape, or form, R might not be the optimal tool. Sure, R can do almost anything, but some tasks are much easier to do in Python or Java.</span> <span data-preserver-spaces="true">Want to learn more about R? Start here:</span> <ul><li><a href="https://appsilon.com/rstudio-shortcuts-and-tips/" target="_blank" rel="noopener noreferrer">RStudio Shortcuts and Tricks</a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/how-to-write-production-ready-r-code/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">How to Write Production-Ready R Code: Tools and Patterns</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/video-tutorial-create-and-customize-a-simple-shiny-dashboard/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Video: Create and Customize a Simple Shiny Dashboard</span></a></li></ul>