How to Make Stunning Histograms in R: A Complete Guide with ggplot2
<em><strong>Updated</strong>: September 1, 2022.</em>
<h2>R ggplot histogram</h2>
Be honest. How uninspiring are your data visualizations? Expert designers make graph design look effortless, but in reality, it can’t be further from the truth. Luckily, the R programming language provides countless ways to make your visualizations eye-catching. Today you'll learn how to make R ggplot histograms and how to tweak them to their full potential.
Read more on our R ggplot series:
<ul><li><a title="How to Make Stunning Bar Charts with R" href="https://wordpress.appsilon.com/ggplot2-bar-charts/" target="_blank" rel="noopener noreferrer">Bar Charts with R</a></li><li><a title="How to Make Stunning Line Charts with R" href="https://wordpress.appsilon.com/ggplot2-line-charts/" target="_blank" rel="noopener noreferrer">Line Charts with R</a></li><li><a href="https://appsilon.com/ggplot-scatter-plots/" target="_blank" rel="noopener noreferrer">Scatter Plots with R</a></li><li><a href="https://appsilon.com/ggplot2-boxplots/" target="_blank" rel="noopener noreferrer">Boxplots with R</a></li></ul>
This article will show you how to make stunning histograms with R’s <code>ggplot2</code> library. We’ll start with a brief introduction and theory behind histograms, just in case you’re rusty on the subject. You’ll then see how to create and tweak R ggplot histogram taking them to new heights.
Table of contents:
<ul><li><a href="#what-is-a-histogram">What Is a Histogram?</a></li><li><a href="#first-histogram">Make Your First ggplot Histogram</a></li><li><a href="#style">How to Style and Annotate ggplot Histograms</a></li><li><a href="#text">Add Text, Titles, Subtitles, Captions, and Axis Labels to ggplot Histograms</a></li><li><a href="#conclusion">Conclusion</a></li></ul>
<hr />
<h2 id="what-is-a-histogram">What is a Histogram?</h2>
A histogram is a way to graphically represent the distribution of your data using bars of different heights. A single bar (bin) represents a range of values, and the height of the bar represents how many data points fall into the range. You can change the number of bins easily.
The easiest way to understand them is through visualization. The image below shows a histogram of 10,000 numbers drawn from a standard normal distribution (mean = 0, standard deviation = 1):
<img class="size-full wp-image-8771" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4e6bd0d518f384bcaaa_61c92ef2_1.webp" alt="Image 1 - Histogram of a standard normal distribution" width="2286" height="1594" /> Image 1 - Histogram of a standard normal distribution
Although at first glance the histogram doesn't look like much, it actually tells you a lot. When data is distributed normally (bell curve), you can draw the following conclusions:
<ul><li><strong>68.26%</strong> of the data points are located between -1 and +1 standard deviations (34.13% in either direction).</li><li><strong>95.44%</strong> of the data points are located between -2 and +2 standard deviations (47.72% in either direction).</li><li><strong>99.72%</strong> of the data points are located between -3 and +3 standard deviations (49.86% in either direction).</li><li>Anything outside the -3 and +3 standard deviation range is considered to be an <strong>outlier</strong>.</li></ul>
In reality, you’re rarely dealing with a perfectly normal distribution. It’s usually skewed in either direction or has multiple peaks. Keep this in mind when drawing conclusions from the shape of a histogram, alone.
Let’s see how you can use R and ggplot to visualize histograms.
<h2 id="first-histogram">Make Your First ggplot Histogram</h2>
We’ll use the <code>Gapminder</code> dataset throughout the article to visualize histograms. It’s a relatively small dataset showing life expectancy, population, and GDP per capita in countries between 1952 and 2007. We’ll use only a subset that shows countries in Europe and discard everything else.
Here’s the code you need to import libraries, load, and filter the dataset:
<pre><code class="language-r">library(dplyr)
library(ggplot2)
library(gapminder)
<br>gm_eu <- gapminder %>%
filter(continent == "Europe")
gm_eu</code></pre>
Here’s how the first couple of rows from <code>gm_eu</code> look like:
<img class="size-full wp-image-8772" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4e7f9a88e2eece1401d_eb355cab_2.webp" alt="Image 2 - Europe countries of the Gapminder dataset" width="1152" height="670" /> Image 2 - European countries of the Gapminder dataset
We’ll visualize the <code>lifeExp</code> column with histograms, as it provides enough continuous data to play around with.
Let’s make the most basic ggplot histogram first. You can use the <code>geom_histogram()</code> function to do so. Provided you’ve passed in the dataset and the default aesthetics:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram()</code></pre>
<img class="size-full wp-image-8773" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4e829ecb62f89b3a484_821f90bc_3.webp" alt="Image 3 - Default histogram" width="2286" height="1594" /> Image 3 - Default histogram
Well, you won’t see anything like that on a website or in a magazine, so we better get our keyboard dirty with some tweaking.
Let’s start by changing the number of bins (bars). The default value is 30, and it works in most cases. If you want your histograms to look <em>boxier</em>, use fewer bins. On the other hand, go big if you want your histograms to look like density plots. Here’s what a histogram with 10 bins looks like:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(bins = 10)</code></pre>
<img class="size-full wp-image-8774" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4e9f7aa9374eb240d4e_67b16d62_4.webp" alt="Image 4 - Histogram with 10 bins" width="2286" height="1594" /> Image 4 - Histogram with 10 bins
Let’s stick with the default number of bins for the rest of the article, as it looks somewhat better.
The coloring is painful to look at. There’s nothing wrong with gray, but it looks too boring. Here’s how to enhance your ggplot histogram to make give it some Appsilon flair — blue fill color with black borders:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8")</code></pre>
<img class="size-full wp-image-8775" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4e900fd9b7af13402fe_629b0814_5.webp" alt="Image 5 - Tweaking the fill and outline color" width="2286" height="1594" /> Image 5 - Tweaking the fill and outline color
Much better, provided you like the blue color. Let’s dive deeper into styling and annotations next.
<h2 id="style">How to Style and Annotate ggplot Histograms</h2>
<h3>Styling</h3>
You can bring more life to your ggplot histogram. For example, we sometimes like to add a vertical line representing the mean, and two surrounding lines representing the range between -1 and +1 standard deviations from the mean. It’s a good idea to style the lines differently, just so your histogram isn’t confusing.
The following code snippet draws a black line at the mean, and dashed black lines at -1 and +1 standard deviation marks:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
geom_vline(aes(xintercept = mean(lifeExp)), color = "#000000", size = 1.25) +
geom_vline(aes(xintercept = mean(lifeExp) + sd(lifeExp)), color = "#000000", size = 1, linetype = "dashed") +
geom_vline(aes(xintercept = mean(lifeExp) - sd(lifeExp)), color = "#000000", size = 1, linetype = "dashed")</code></pre>
<img class="size-full wp-image-8776" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4eae4ab0fdeecf90a58_d7edb5b2_6.webp" alt="Image 6 - Adding vertical lines to histograms" width="2286" height="1594" /> Image 6 - Adding vertical lines to histograms
<strong>Are you up for a challenge?</strong> Try to recreate our histogram from <em>Image 1</em>. Hint: use <code>geom_segment()</code> instead of <code>geom_vline()</code>.
Every so often you want to make your ggplot histogram richer by combining it with a density plot. It shows more or less the same information, just in a <em>smoother</em> format. Here’s how you can add a density plot overlay to your histogram:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(aes(y = ..density..), color = "#000000", fill = "#0099F8") +
geom_density(color = "#000000", fill = "#F85700", alpha = 0.6)</code></pre>
<img class="size-full wp-image-8777" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4ea7e3fcb2832ab6539_793bde43_7.webp" alt="Image 7 - Adding density plots to histograms" width="2286" height="1594" /> Image 7 - Adding density plots to histograms
It’s somewhat of a richer data representation than if you’d've gone with the histogram alone. For example, if you were to embed the above chart to a dashboard, you could let the user toggle the overlay for maximum customizability.
<blockquote><strong>Do you want to build dashboards professionally? <a href="https://appsilon.com/how-to-start-a-career-as-an-r-shiny-developer/" target="_blank" rel="noopener noreferrer">Here’s how to start a career as an R Shiny Developer</a>.</strong></blockquote>
<h3>Annotations</h3>
Finally, let’s see how you can add annotations to your ggplot histogram. Maybe you find vertical lines too intrusive, and you just want a plain textual representation of specific values.
First things first, you’ll need to create a <code>data.frame</code> for annotations. It should contain X and Y values, and also the labels that will be displayed:
<pre><code class="language-r">annotations <- data.frame(
x = c(round(min(gm_eu$lifeExp), 2), round(mean(gm_eu$lifeExp), 2), round(max(gm_eu$lifeExp), 2)),
y = c(4, 52, 5),
label = c("Min:", "Mean:", "Max:")
)
</code></pre>
You can now include these in a <code>geom_text()</code> layer. Hint: make the annotations bold, so they’re easier to spot:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
geom_text(data = annotations, aes(x = x, y = y, label = paste(label, x)), size = 5, fontface = "bold")</code></pre>
<img class="size-full wp-image-8778" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4eb87994b7930ba395a_be2dbe14_8.webp" alt="Image 8 - Adding annotations to histograms" width="2286" height="1594" /> Image 8 - Adding annotations to histograms
The trick with annotations is making sure there’s some gap between them, so the text doesn’t overlap.
<h3>R ggplot histogram theming</h3>
Let’s also see how you can remove this grayish background color. The easiest approach is by adding a more minimalistic theme to the chart. The <code>theme_classic()</code> is one of our top picks:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
theme_classic()</code></pre>
<img class="size-full wp-image-8779" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4ecdfb1b34228fa726f_09708992_9.webp" alt="Image 9 - Changing the theme" width="2286" height="1594" /> Image 9 - Changing the theme
If that theme isn't your piece of the pie, here is the good news - <a href="https://ggplot2.tidyverse.org/reference/ggtheme.html">you have options</a>. Let's explore a couple of them.
The one below will apply a dark look to your charts:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
theme_dark()</code></pre>
<img class="size-full wp-image-15456" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4ec237e4636e90ddbac_9561df63_10.webp" alt="Image 10 - Dark theme" width="2284" height="1728" /> Image 10 - Dark theme
Dark and blur combo don't necessarily go well together, but you can always tweak the bin color for something lighter.
In case you want to get rid of axes and axes labels altogether, the Void theme is your friend:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
theme_void()</code></pre>
<img class="size-full wp-image-15458" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4ed00fd9b7af1340742_447f6982_11.webp" alt="Image 11 - Void theme" width="2284" height="1728" /> Image 11 - Void theme
We also like the Test theme - it keeps the stylings on a minimal level and surrounds the entire chart with a light grayish border:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
theme_test()</code></pre>
<img class="size-full wp-image-15460" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b267014b3ab6e1ead85d55_12.webp" alt="Image 12 - Test theme" width="2284" height="1728" /> Image 12 - Test theme
The only thing missing from our ggplot histogram is the title and axis labels. The users don’t know what they’re looking at without them.
<h2 id="text">Add Text, Titles, Subtitles, Captions, and Axis Labels to ggplot Histograms</h2>
Titles and axis labels are mandatory for production-ready charts. Subtitles or captions are optional, but we’ll show you how to add them as well. The magic happens in the <code>labs()</code> layer. You can use it to specify the values for title, subtitle, caption, X-axis, and Y-axis:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
labs(
title = "Histogram of Life Expectancy in Europe",
subtitle = "Made by Appsilon",
caption = "Source: Gapminder dataset",
x = "Life expectancy",
y = "Count"
) +
theme_classic()</code></pre>
<img class="wp-image-8780 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4eeb303cf646946af43_a44ce1a2_10.webp" alt="Image 13 - Adding title, subtitle, caption, and axis labels" width="2286" height="1594" /> Image 13 - Adding title, subtitle, caption, and axis labels
It’s a good start, but the newly added elements don’t stand out. You can change the font, color, size, among other things, in the <code>theme()</code> layer. Just make sure to include a custom theme layer like <code>theme_classic()</code> before you write your styles. These would get overridden otherwise:
<pre><code class="language-r">ggplot(gm_eu, aes(lifeExp)) +
geom_histogram(color = "#000000", fill = "#0099F8") +
labs(
title = "Histogram of Life Expectancy in Europe",
subtitle = "Made by Appsilon",
caption = "Source: Gapminder dataset",
x = "Life expectancy",
y = "Count"
) +
theme_classic() +
theme(
plot.title = element_text(color = "#0099F8", size = 16, face = "bold"),
plot.subtitle = element_text(size = 10, face = "bold"),
plot.caption = element_text(face = "italic")
)</code></pre>
<img class="wp-image-8781 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d4ef6d09e2c23971aae1_d35b7067_11.webp" alt="Image 14 - Styling title, subtitle, and caption" width="2286" height="1594" /> Image 14 - Styling title, subtitle, and caption
It’s starting to shape up now. And it also matches the color palette of our ggplot histogram. We’ve covered everything needed to get you started visualizing your data distributions with histograms, so we’ll call it a day here. But there's so much more you can do with your visualizations. Check out some of our <a href="https://appsilon.com/shiny/" target="_blank" rel="noopener noreferrer">Shiny demos</a> to see where advanced-level R programming can take your data visualizations.
<blockquote><strong>Did you know there’s another way to visualize data distributions? Read our <a href="https://appsilon.com/how-to-make-stunning-boxplots-in-r-a-complete-guide-with-ggplot2/" target="_blank" rel="noopener noreferrer">complete guide to boxplots</a>.</strong></blockquote>
<hr />
<h2 id="conclusion">Summary of R ggplot Histogram</h2>
Today you’ve learned what histograms are, why they are important for visualizing the distribution of continuous data, and how to make them appealing with R and the <code>ggplot2</code> library. It’s enough to set you on the right track, and now it’s up to you to apply this knowledge to your datasets. We’re sure you can manage it.
At Appsilon, we’ve used histograms and the <code>ggplot2</code> package in developing enterprise R Shiny dashboards for Fortune 500 companies. If R and R Shiny is something you have experience with, we might have a position ready for you.
<blockquote>Start a career at Appsilon — <a href="https://appsilon.com/careers/" target="_blank" rel="noopener noreferrer"> positions available</a>.</blockquote>