---

This site uses cookies. Read more.

 9 November, 2021

Boxplots with R and ggplot2

Are your data visualizations an eyesore? It’s a common problem, so don’t worry too much about it. The solution is easier than you think, as R provides countless ways to make stunning visuals. Today you’ll learn how to create impressive boxplots with R and the ggplot2 package.

Read the series from the beginning:

This article demonstrates how to make stunning boxplots with ggplot based on any dataset. We’ll start simple with a brief introduction and interpretation of boxplots and then dive deep into visualizing and styling ggplot boxplots.

Table of contents:


What Is a Boxplot?

A boxplot is one of the simplest ways of representing a distribution of a continuous variable. It consists of two parts:

  • Box — Extends from the first to the third quartile (Q1 to Q3) with a line in the middle that represents the median. The range of values between Q1 and Q3 is also known as an Interquartile range (IQR).
  • Whiskers — Lines extending from both ends of the box indicate variability outside Q1 and Q3. The minimum/maximum whisker values are calculated as Q1/Q3 -/+ 1.5 * IQR. Everything outside is represented as an outlier.

Take a look at the following visual representation:

Image 1 - Boxplot representation

Image 1 – Boxplot representation

In short, boxplots provide a ton of information for a single chart. Boxplots tell you whether the variable is normally distributed, or if the distribution is skewed in either direction. You can also easily spot the outliers, which always helps.

Let’s see how you can use R and ggplot to visualize boxplots.

Make Your First ggplot Boxplot

R has many datasets built-in, one of them being mtcars. It’s a small and easy-to-explore dataset we’ll use today to draw boxplots. You’ll need only ggplot2 installed to follow along.

We’ll visualize boxplots for the mpg (Miles per gallon) variable among different cyl (Number of cylinders) options in most of the charts. You’ll have to convert the cyl variable to a factor beforehand. Here’s how:

The head() function prints the first six rows of the dataset:

Image 2 - Head of MTCars dataset

Image 2 – Head of MTCars dataset

From the image alone, you can see that mpg is continuous, and cyl is categorical. It’s a variable-type combination you’re looking for when working with boxplots.

Visualization

You can make ggplot boxplots look stunning with a bit of work, but starting out they’ll look pretty plain. Think of this as a blank canvas to paint your beautiful boxplot story. The geom_boxplot() function is used in ggplot2 to draw boxplots. Here’s how to use it to make a default-looking boxplot of the miles per gallon variable:

Image 3 - Simple boxplot with ggplot2

Image 3 – Simple boxplot with ggplot2

And boy is it ugly. We’ll deal with the stylings later after we go over the basics.

Every so often, you’ll want to visualize multiple boxplots on a single chart — each representing a distribution of the variable with some filter condition applied. For example, we can visualize the distribution of miles per gallon for every possible cylinder value. The latter is already converted to a factor, so you’re ready to go.

Here’s the code:

Image 4 - Miles per gallon among different cylinder numbers

Image 4 – Miles per gallon among different cylinder numbers

It makes sense — a car makes fewer miles per gallon the more cylinders it has. There are outliers for cars with eight cylinders, represented with dots above and whiskers below.

You can change the orientation of the chart if you find this one hard to look at. Just call the coord_flip() function when coding the chart:

Image 5 - Changing the orientation

Image 5 – Changing the orientation

We’ll stick with the default orientation moving forward. Let’s say you want to display every data point on the boxplot. The mtcars dataset is relatively small, so it might actually be a good idea. You’ll have to call the geom_dotplot() function to do so:

Image 6 - Displaying all data points on the boxplot

Image 6 – Displaying all data points on the boxplot

Be extra careful if you’re doing this for a larger dataset. Outliers are a bit harder to spot and it’s easy to get overwhelmed.

Let’s explore how you can make boxplots more appealing to the eye.

Style ggplot Boxplots — Change Theme, Outline, and Fill Color

Let’s start with the outline color. It might just be enough to give your visualization an extra punch. You can specify an attribute that decides which color is applied in the call to aes(), and then use the scale_color_manual() function to provide a list of colors:

Image 7 - Changing the outline color

Image 7 – Changing the outline color

There are other ways to specify the color palette, but we find this option to be the most customizable.

If you want to change the fill color instead, you have options. You can specify a color to the fill parameter inside geom_boxplot() if you want all boxplots to have the same color:

Image 8 - Changing the fill color

Image 8 – Changing the fill color

The alternative is to apply the same logic we used in the outline color — a variable controls which color is applied where, and you can use the scale_color_manual() function to change the colors:

Image 9 - Changing the fill color (2)

Image 9 – Changing the fill color (2)

Now we’re getting somewhere. The only thing we haven’t addressed is that horrendous background color. You can get rid of it by changing the theme. For example, adding theme_classic() will make your chart a bit more modern and minimalistic:

Image 10 - Changing the theme

Image 10 – Changing the theme

Style boils down to personal preference, but this one is much easier to look at in our opinion.

There’s still one gigantic elephant in the room left to discuss — titles and labels. No one knows what your ggplot boxplot represents without them.

Add Text, Titles, Subtitles, Captions, and Axis Labels to ggplot Boxplots

Let’s start with text labels. It’s somewhat unusual to add them to boxplots, as they’re usually used on charts where exact values are displayed (bar, line, etc.). Nevertheless, you can display any text you want with ggplot boxplots, you’ll just have to get a bit more creative.

For example, if you want to display the number of observations, mean, and median above every boxplot, you’ll first have to declare a function that fetches that information. We decided to name ours get_box_stats():

Discover more Boxplot arguments in the ggplot2 boxplot documentation.

You can now pass it to stat_summary() function when drawing boxplots:

Image 11 - Adding text

Image 11 – Adding text

Neat, right? Much better than displaying values directly on the chart.

Let’s cover titles and axes labels next. These are mandatory for production-ready charts, as without them, the users don’t know what they’re looking at. You can use the following code snippet to add title, subtitle, caption, x-axis label, and y-axis label:

Image 12 - Adding title, subtitle, caption, and axis labels

Image 12 – Adding title, subtitle, caption, and axis labels

If you think these look a bit plain, you’re not alone. You can use the theme() function to style them. Be aware, your custom styles will be ignored if you call theme_classic() after declaring custom styles:

Image 13 - Styling title, subtitle, and caption

Image 13 – Styling title, subtitle, and caption

Much better — assuming you like the blue color. Everything covered so far is just enough to get you on the right track when making ggplot boxplots, so we’ll stop here.

Looking for more examples of Boxplots? Check out the r-bloggers boxplot feed to see what the R community has to say


Conclusion

Today you’ve learned what boxplots are, how to draw them with R and the ggplot2 library, and how to make them aesthetically pleasing by changing colors, adding text, titles, and axis labels. It’s enough to style boxplots however you want. You know what to tweak, and now it’s up to you to pick fonts and colors. When creating data visualizations with R, you’re only limited by your creativity (and R knowledge). If you need help finding inspiration or tools be sure to check out what can be achieved with advanced R programming.

At Appsilon, we’ve used ggplot2 package frequently when developing enterprise R Shiny dashboards for Fortune 500 companies. If you have a keen eye for design and know a thing or two about R and Shiny, reach out. We have several R Shiny developer positions available.

Read more: How to Start a Career as an R Shiny Developer



Reach out to Appsilon

Maria Grycuk
Maria Grycuk
Project Manager