R lattice: How to Create Powerful and Flexible Statistical Graphics in R
Data visualization plays a vital role in understanding and interpreting big and messy datasets. Think of it as a <b>bridge</b> that connects the raw numbers to a form that's easier to comprehend and analyze. Through visualization, you can uncover patterns, trends, and insights that might remain hidden in spreadsheets or databases. Statistical graphs in R crank the whole thing to 11. Unlike basic graphs, they are tailored for a more in-depth analysis, and focus on multivariate data and complex relationships within datasets. They help you reveal the underlying structure of data and make it simpler to draw meaningful insights. This is where the <a href="https://cran.r-project.org/web/packages/lattice/index.html" target="_blank" rel="noopener">R lattice</a> package chimes in. It's a specifically designed package for creating <b>advanced statistical graphs</b>. It makes it easier to handle the complexity of multivariate data visualization. In this article, we'll explore the nuances of the R lattice package and how it can enhance your data visualization capabilities. <blockquote>Looking to explore data through a drag-and-drop environment? <a href="https://appsilon.com/r-esquisse-drag-n-drop-charts/" target="_blank" rel="noopener">Try R Esquisse - It's just like Tableau</a>.</blockquote> Table of contents: <ul><li><a href="#introduction">Introduction to the R Lattice Package</a></li><li><a href="#examples">Practical Guide to Using R Lattice</a></li><li><a href="#summary">Summing up R Lattice</a></li></ul> <hr /> <h2 id="introduction">Introduction to the R Lattice Package</h2> You might be wondering what the R lattice package is all about. Well, it was created by Deepayan Sarkar, and started off with the idea of supporting <i>trellis graphs</i> (displays variables or relationships between variables, conditioned on one or more other variables). Over time, it has grown into a powerful tool for creating high-level statistical graphics in R. <b>Its main goal nowadays?</b> To help you visualize complex, multivariate data more easily. Lattice stands out for a few reasons. It's been around for a while and is known for its <b>speed</b>, especially when dealing with large datasets. Also, it's flexible enough to meet most of your standard graphics needs and can also handle more unique requirements. If you're wondering how it compares to <b>R's industry-standard</b> - <a href="https://appsilon.com/tag/ggplot2/" target="_blank" rel="noopener">ggplot2</a> - here are a couple of things you should know: <ul><li>Lattice is <b>faster</b>, which is a big plus when you're working with huge amounts of data.</li><li>Ggplot2 is known for its consistency and popularity when creating publication-quality graphics, but lattice beats it in <b>simplicity</b>.</li><li>Lattice is amazing for <b>initial data analysis and model fitting</b>. But when it's time to create those final polished plots, you might want to switch back to ggplot2.</li></ul> Long story short, R lattice is not a replacement for ggplot2 - it's just another option you have that's better suited to the scenarios listed above. Think of R lattice as your <b>toolkit for exploratory data analysis</b>. It allows you to play around with different visual formats, tweak settings, and see how your data behaves under various conditions. This is incredibly helpful when you're trying to understand the story behind your data or when you're trying to identify trends and patterns that aren't immediately obvious. If you're also new to ggplot2, here are a couple of good resources to get started: <ul><li><a href="https://appsilon.com/ggplot2-bar-charts/" target="_blank" rel="noopener">How to Make Stunning Bar Charts in R</a></li><li><a href="https://appsilon.com/ggplot2-line-charts/" target="_blank" rel="noopener">How to Make Stunning Line Charts in R</a></li><li><a href="https://appsilon.com/ggplot-scatter-plots/" target="_blank" rel="noopener">How to Make Stunning Scatter Plots in R</a></li></ul> In the following section, we'll dive deep into the practical stuff - from installation to charting, and everything in between. <h2 id="examples">Practical Guide to Using R Lattice</h2> You know the basics behind R lattice by now, so the only thing left to cover is practical use cases. Let's start with installation, and then we'll dive into some charts. <h3>Prerequisite: Installing R Lattice</h3> You can install the package just like you'd do with any other - via the <code>install.packages()</code> function ran from the R console: <pre><code class="language-r">install.packages('lattice')</code></pre> The package is now installed, so let's use it next. <h3>Prerequisite: Dataset for Visualization</h3> There's one more thing we'll need before diving into the charts - the dataset. After all, it's the one thing providing actual values for the graphs. We'll keep things simple and leverage the built-in <code>mtcars</code> dataset. We made a couple of modifications to it, mainly to <b>convert a couple of variables to factors</b> for the ease of visualization later. Run the following snippet and you'll be good to go: <pre><code class="language-r">library(lattice) <br>data <- mtcars data$gear <- factor(data$gear, levels = c(3, 4, 5)) data$cyl <- factor(data$cyl, levels = c(4, 6, 8)) head(data)</code></pre> This is what the dataset looks like: <img class="size-full wp-image-22624" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7739b186acba34f9a86_97e10e1d_1.webp" alt="Image 1 - Head of the mtcars dataset" width="755" height="207" /> Image 1 - Head of the mtcars dataset Onto the visulization next. <h3>Scatter Plots and Chart Basics with Lattice</h3> Probably the most common chart type in general (not only in statistics) is scatter plot. It shows a relationship between two variables on a two-dimensional plot. There's a way to <b>include more than 2 variables</b>, but more on that in a bit. R lattice uses the <code>xyplot()</code> function for creating scatter plots. Inside it, you need to represent the X and Y variable relationship as a formula, and also attach your data.frame. Here's how it works in practice: <pre><code class="language-r">xyplot(hp ~ mpg, data = data)</code></pre> <img class="size-full wp-image-22626" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7746115b77c55a19062_15c29920_2.webp" alt="Image 2 - Simple scatter plot" width="1256" height="972" /> Image 2 - Simple scatter plot Not pretty, but works! Sometimes you want to enrich your scatter plots by <b>adding one more categorical variable</b>. This is typically done by altering marker colors depending on the category of the third variable. As it turns out, doing this in R lattice is quite straightforward: <pre><code class="language-r">xyplot(hp ~ mpg, group = cyl, data = data, auto.key = TRUE)</code></pre> <img class="size-full wp-image-22628" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7757a511f178f8e5793_aacd20c8_3.webp" alt="Image 3 - Scatter plot with groups and legend" width="1256" height="972" /> Image 3 - Scatter plot with groups and legend The <code>auto.key = TRUE</code> parameter will include a legend to your chart. If the chart looks too crammed to you, we have good news. You can <b>automatically split it into multiple subplots</b>. For example, we'll have three groups based on the <code>cyl</code> variable. Since there are 3 possible values, lattice will create 3 subplots: <pre><code class="language-r">xyplot(hp ~ mpg | cyl, group = cyl, data = data, scales = "free")</code></pre> <img class="size-full wp-image-22630" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7767a511f178f8e5813_2361e780_4.webp" alt="Image 4 - Multiple scatter plots" width="1256" height="972" /> Image 4 - Multiple scatter plots If you're wondering, the <code>scales = "free"</code> parameter will make sure each subplot has its own axis scale. But if you want to literally take your charts to another dimension, you're in luck. The <code>cloud()</code> function allows you to <b>create 3D scatter plots in R lattice</b>. The formula syntax is a bit longer now since we're working with an additional variable: <pre><code class="language-r">cloud(hp ~ mpg * disp, data = data, group = cyl, auto.key = TRUE)</code></pre> <img class="size-full wp-image-22632" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7779006b587a562d3ab_8abdffae_5.webp" alt="Image 5 - 3D scatter plot" width="1256" height="972" /> Image 5 - 3D scatter plot And that's pretty much it for the fundamentals of scatter plots in lattice. You now understand the basics, so next we'll shift our focus on the reason you're actually here - <b>statistical plots</b>. <h3>Distribution Plots - Histogram, Box Plots, and Density Charts</h3> This section will cover a wide array of charts that fit nicely into one umbrella term - distribution plots. Let's begin with the one everyone knows. Histograms are one way of visualizing the <b>distribution of a single variable</b>. You can plot them with lattice by using the <code>histogram()</code> function, and you can also tweak the amount of bins by playing around with the <code>breaks</code> parameter. Here's one histogram showing the overall miles per gallon distribution: <pre><code class="language-r">histogram(~mpg, data = data, breaks = 9)</code></pre> <img class="size-full wp-image-22634" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d777d7575130662e73e0_f23941f2_6.webp" alt="Image 6 - A histogram" width="1256" height="972" /> Image 6 - A histogram Another commonly used plot for showing variable distribution is a box plot. Making things better is the fact you can split it by a category, demonstrated by the formula below. In short, you're seeing a horsepower distribution for each distinct cylinder: <pre><code class="language-r">bwplot(hp ~ cyl, data = data)</code></pre> <img class="size-full wp-image-22636" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d77816b3ff510e39adc2_f45951df_7.webp" alt="Image 7 - A box plot" width="1256" height="972" /> Image 7 - A box plot Box plots come in flavors, such as a violin plot. It conveys the same information, but with a different look and feel. Violin plots allow you to see where <b>the bulk of your data is located</b>. You can convert a box plot to a violin plot by adding <code>panel = panel.violin</code> to your existing box plot implementation: <pre><code class="language-r">bwplot(hp ~ cyl, data = data, panel = panel.violin)</code></pre> <img class="size-full wp-image-22638" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d779c6822ae9d748a625_538ec8bd_8.webp" alt="Image 8 - A violin plot" width="1256" height="972" /> Image 8 - A violin plot Let's dial back to histograms. Sometimes you don't care for putting data into bins and displaying them as a distribution but want a smooth curve instead. That's where <b>density plots</b> come in. In essence, they show the same information as histograms, but in a slightly smoother fashion. R lattice has a <code>densityplot()</code> function you can use: <pre><code class="language-r">densityplot(~mpg, data = data, plot.points = FALSE)</code></pre> <img class="size-full wp-image-22640" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7799b186acba34f9f1d_0b6e20ac_9.webp" alt="Image 9 - A density plot" width="1256" height="972" /> Image 9 - A density plot The <code>plot.points = FALSE</code> parameter specifies we don't want the original data points to be plotted below the chart. Let's now shift gears and discuss a slightly more advanced usage of the lattice package. All of the functions shown earlier allow you to <b>plot multiple charts</b> in a layout of your choice. For example, you can split <code>cyl</code> by <code>gear</code> to end up with 3 horizontally aligned box plots, each showing a box plot for a given cylinder value. Don't forget to also include the <code>layout</code> parameter, as it controls chart organization in columns and rows: <pre><code class="language-r">bwplot(hp ~ cyl | gear, data = data, layout = c(3, 1), scales = "free")</code></pre> <img class="size-full wp-image-22642" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d77ac6822ae9d748a6a8_29307dd5_10.webp" alt="Image 10 - Multiple box plots" width="1256" height="972" /> Image 10 - Multiple box plots If you're not a fan of having multiple individual plots but would like to show multiple distribution curves based on a single categorical factor, you can use the following code snippet as an example: <pre><code class="language-r">densityplot(~mpg, groups = cyl, data = data, plot.points = FALSE, auto.key = TRUE)</code></pre> <img class="size-full wp-image-22644" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d77b9b186acba34f9fb0_a4d7efde_11.webp" alt="Image 11 - Multiple density curves on a single plot" width="1256" height="972" /> Image 11 - Multiple density curves on a single plot This gets you a familiar-looking density plot with multiple density curves. Neat! There's one thing all of the charts made so far have in common - <b>they look horrible</b>. The following section will show you how to address the visuals. <h3>How to Style R Lattice Charts</h3> We mentioned earlier in the article that lattice isn't the package of choice if you want to produce the best-looking visuals. Still, there's much you can tweak, and this section will steer you in the right direction. Up first, we'll address the disgusting scatter plot of horsepower and miles per gallon. Here are the changes we'll make, alongside some basic explanations: <ul><li><b>Add a title to the legend: </b>Achieved with the <code>auto.key</code> argument. Make sure to pass the <code>title</code> as a list property</li><li><b>Tweak marker color and size: </b>The <code>par.settings</code> argument allows you to pass a list of marker colors, symbols, and sizes. Not intuitive by any stretch of the imagination, but that's how you do it</li><li><b>Add titles to axes and the chart: </b>For obvious reasons - any chart is incomplete without these.</li></ul> Here's how the visual tweaks look in the code: <pre><code class="language-r">xyplot( hp ~ mpg, group = cyl, data = data, # Add legend title auto.key = list(title = "Cylinder", points = TRUE, lines = FALSE), # Tweak marker color and size par.settings = list(superpose.symbol = list( col = c("red", "green", "blue"), pch = c(16, 17, 18), cex = c(1.5, 1.5, 1.5) )), # Chart/axis title xlab = "Miles per gallon", ylab = "Horse power", main = "Car Horsepower by MPG and Cylinder" )</code></pre> <img class="size-full wp-image-22646" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d77c13d4a2a1c0fc801a_1bf1bd75_12.webp" alt="Image 12 - Styled scatter plot" width="1256" height="972" /> Image 12 - Styled scatter plot Not the best-looking visual we've ever seen, but also not embarrassing. The second example will show you how to apply similar styles to our multiple-density plot. In addition to everything covered in the previous styled chart, we'll add the following: <ul><li><b>Tweak the thickness of each line: </b>Done through the same <code>par.settings</code> argument, but this time by using <code>superpose.line</code>. The rest is intuitive</li><li><b>Left-align the chart title: </b>By adding the <code>just = "right"</code> parameter to <code>main</code>. Once again, not intuitive, but what can you do.</li></ul> Since this is a density plot, you don't really need a Y-axis label, but you know how to add it if you want to. Here's the full code snippet: <pre><code class="language-r">densityplot( ~mpg, groups = cyl, data = data, plot.points = FALSE, # Add legend title auto.key = list(title = "Cylinder Count"), # Add a left-aligned title main = list(label = "Density Plot of MPG by Cylinder Count", just = "right"), # Change line color and thickness par.settings = list(superpose.line = list( col = c("red", "blue", "green"), lwd = c(2, 4, 2) )) )</code></pre> <img class="size-full wp-image-22648" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d77d87994b7930bc19e1_45418ec3_13.webp" alt="Image 13 - Styled density plot" width="1256" height="972" /> Image 13 - Styled density plot And that's all you need to know to start working with the R lattice package. Let's make a brief recap next. <hr /> <h2 id="summary">Summing up R Lattice</h2> In this article, we've taken a close look at the R lattice package by exploring how it simplifies the process of creating detailed statistical graphs. This package is particularly useful when dealing with large amounts of data or when you need to analyze complex relationships between multiple variables. To summarize, <b>Lattice is fast, flexible, and allows for a lot of customization, making it a great tool for both beginners and experienced users</b>. While ggplot2 might be preferred for creating polished, final visuals for presentations or publications, lattice is excellent for initial explorations and analyses. Its speed and efficiency make it a practical choice for quickly understanding your data. Therefore, make sure you have it under your toolbelt as a data professional. <blockquote>Today you've learned what R lattice is and how it compares to ggplot2 - <a href="https://appsilon.com/matplotlib-vs-ggplot/" target="_blank" rel="noopener">But what about ggplot2 vs. matplotlib? Read our detailed comparison</a>.</blockquote>