R and GPX - How to Read and Visualize GPX Files in R

Estimated time:
time
min

Geospatial data is everywhere around us, and it's essential for data professionals to know how to work with it. One common way to store this type of data is in GPX files. Today you'll learn everything about it, from theory and common questions to R and GPX file parsing. We'll start simple - with just a bit of theory and commonly asked questions. This is needed to get a deeper understanding of how storing geospatial data works. If you're already familiar with the topic, feel free to skip the first section. <blockquote>New to geomapping in R? Follow this <a href="https://appsilon.com/leaflet-geomaps/" target="_blank" rel="noopener">guide to make stunning geomaps in R with Leaflet</a>.</blockquote> Table of contents: <ul><li><a href="#introduction">Introduction to R and GPX</a></li><li><a href="#load-gpx">How to Load and Parse GPX files in R</a></li><li><a href="#visualize">How to Visualize GPX files in R</a></li><li><a href="#summary">Summary of R and GPX</a></li></ul> <hr /> <h2 id="introduction">Introduction to R and GPX</h2> Online route mapping services such as Strava and Komoot store the routes in GPX file format. It's an easy and convenient way to analyze, visualize, and display different types of geospatial data, such as geolocation (latitude, longitude), elevation, and many more. For example, take a look at the following image. It represents a Strava cycling route in Croatia I plan to embark on later this summer. It's the highest paved road in the country, and I expect the views to be breathtaking: <img class="alignnone size-full wp-image-13682" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d321c9f020529c869f48_4d2e26ca_1-4.webp" alt="Image 1 - Strava cycling route" width="1836" height="1212" /> Why is this relevant? Because Strava allows you to export any route or workout in GPX file format. But what is GPX anyway? <h3>What is a GPX file?</h3> Put simply, GPX stands for <i>GPS eXchange Format</i>, and it's nothing but a simple text file with geographical information, such as latitude, longitude, elevation, time, and so on. If you plot these points on a map, you'll know exactly where you need to go, and what sort of terrain you might expect, at least according to the elevation. The Strava route we'll analyze today is just a plain route and has 1855 latitude, longitude, and elevation data points. If I was to complete this route and export the file from workouts, it would also include timestamps. These data points are ridiculously easy to load into R. You don't need a dedicated package to combine R and GPX - all is done with an XML parser. More on that in a bit. <h3>What is the difference between GPS and GPX?</h3> This is a common question beginners have. GPS stands for <i>Global Positioning System</i> which provides users with positioning, navigation, and timing services. GPX, on the other hand, is a file format used to exchange GPS data by storing geographical information at given intervals. These data include waypoints, tracks, elevation, and routes. If you're working on GPS programs or plan to build navigation applications, GPX files are a common map data format used. GPX is an open standard in the geospatial world that has been around for 2 decades. It's important you know how to work with them. <h3>What program opens a GPX file?</h3> You can't open a GPX file without dedicated software or a programming language. Downloadable software includes Google Earth Pro and Garmin BaseCamp, just to name a few. If you're into coding, you should know that any major programming language can load and parse GPX files, R and Python included. <h2 id="load-gpx">How to Load and Parse GPX files in R</h2> Now you'll learn how to combine R and GPX. First things first, we'll load a GPX file into R. To do so, we'll have to install a library for parsing XML files. Yes - GPX is just a fancier version of XML: <pre><code class="language-r">install.packages("XML")</code></pre> We can now use the <code>XML::htmlTreeParse()</code> function to read a GPX file. Make sure you know where your file is saved beforehand: <pre><code class="language-r">library(XML) <br>gpx_parsed &lt;- htmlTreeParse(file = "croatia_bike.gpx", useInternalNodes = TRUE) gpx_parsed</code></pre> The <code>gpx_parsed</code> variable contains the following: <img class="size-full wp-image-13684" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d3224e6cc45db1fa1c72_ca10a445_2-4.webp" alt="Image 2 - Contents of a GPX file loaded into R" width="1988" height="612" /> Image 2 - Contents of a GPX file loaded into R If you think that looks like a mess, you are not wrong. The file is pretty much unreadable in this form, but you can spot a structure if you focus for long enough. The <code>trkpt</code> element contains latitude and longitude information for every point, and there's also an <code>ele</code> tag which contains the elevation. Use the following R code to extract and store them in a more readable data structure - <code>data.frame</code>: <pre><code class="language-r">coords &lt;- xpathSApply(doc = gpx_parsed, path = "//trkpt", fun = xmlAttrs) elevation &lt;- xpathSApply(doc = gpx_parsed, path = "//trkpt/ele", fun = xmlValue) <br>df &lt;- data.frame(  lat = as.numeric(coords["lat", ]),  lon = as.numeric(coords["lon", ]),  elevation = as.numeric(elevation) ) <br>head(df, 10) tail(df, 10)</code></pre> <img class="size-full wp-image-13686" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d3239f5df3a3ab21b22c_6dee6d9c_3-4.webp" alt="Image 3 - First 10 rows of the GPX file" width="490" height="460" /> Image 3 - First 10 rows of the GPX file <img class="size-full wp-image-13688" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d3244e6cc45db1fa1cef_1a993bf5_4-4.webp" alt="Image 4 - Last 10 rows of the GPX file" width="526" height="422" /> Image 4 - Last 10 rows of the GPX file The route represents a roundtrip, so starting and ending data points will be almost identical. The fun part happens in the middle, but we can't know that for sure before inspecting the data further. The best way to do so is graphically, so next, we'll go over a couple of options for visualizing GPX data in R. <h2 id="visualize">How to Visualize GPX files in R</h2> When it comes to data visualization and GPX files, you have options. You can go as simple as using a built-in <code>plot()</code> function or you can pay for custom solutions. The best approach would be to use the <code>ggmap</code> package, but it requires a GCP subscription to an API which isn't free. We won't cover it in the article, but we'll go over the next best thing. For starters, let's explore the most basic option. It boils down to plotting a line chart that has all individual data points connected: <pre><code class="language-r">plot(x = df$lon, y = df$lat, type = "l", col = "black", lwd = 3,     xlab = "Longitude", ylab = "Latitude")</code></pre> <img class="size-full wp-image-13690" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d3244d0cd7b298106278_230d890a_5-4.webp" alt="Image 5 - Plotting GPX data points with R's built-in function" width="1970" height="1928" /> Image 5 - Plotting GPX data points with R's built-in function The route looks on point, but the visualization is useless. There's no underlying map below it, so we have no idea where this route takes place. The other, significantly better alternative is the <code>leaflet</code> package. It's designed for visualizing geospatial data, so it won't have any trouble working with our data frame: <pre><code class="language-r">library(leaflet) <br>leaflet() %&gt;%  addTiles() %&gt;%  addPolylines(data = df, lat = ~lat, lng = ~lon, color = "#000000", opacity = 0.8, weight = 3)</code></pre> <img class="size-full wp-image-13692" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d325ff233005820f69c5_c0257275_6-4.webp" alt="Image 6 - Plotting GPX data points with Leaflet" width="1732" height="1554" /> Image 6 - Plotting GPX data points with Leaflet Now we're getting somewhere! The route looks almost identical to the one shown earlier on Strava, but we don't have to stop here. You can invest hours into producing a perfect geospatial visualization, but for the purpose of this article, we'll display one additional thing - elevation. Leaflet doesn't ship with an easy way of using elevation data (numeric) for coloring purposes, so we have to be somewhat creative. The <code>get_color()</code> function will return one of four colors, depending on the elevation group. Then, data points for groups are added manually to the chart inside a <code>for</code> loop: <pre><code class="language-r">get_color &lt;- function(elevation) {  if (elevation &lt; 500) {    return("green")  }  if (elevation &lt; 1000) {    return("yellow")  }  if (elevation &lt; 1500) {    return("orange")  }  return("red") } <br># New dataset with the new variable for color df_color &lt;- df %&gt;%  rowwise() %&gt;%  mutate(color = get_color(elevation)) <br>df_color$last_color &lt;- dplyr::lag(df_color$color) <br># Map map &lt;- leaflet() %&gt;% addTiles() for (color in levels(as.factor(df_color$color))) {  map &lt;- addPolylines(map, lat = ~lat, lng = ~lon, data = df_color[df_color$color == color | df_color$last_color == color, ], color = ~color) } map</code></pre> <img class="size-full wp-image-13694" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d326d16df45f371e9aff_99edccd2_7-3.webp" alt="Image 7 - Plotting GPX data points and elevation with Leaflet" width="1612" height="1342" /> Image 7 - Plotting GPX data points and elevation with Leaflet The map isn't perfect, but it informs us which route segments have a higher elevation than the others. <hr /> <h2 id="summary">Summary of R and GPX</h2> And that's the basics of R and GPX! You've learned the basic theory behind this file format, and how to work with it in the R programming language. We've only scratched the surface, as there's plenty more you can do. For example, plotting the elevation profile or making the polyline interactive would be an excellent next step. Now it's time for the homework assignment. We encourage you to play around with any GPX file you can find and use R to visualize it. Feel free to explore other visualization libraries and make something truly amazing. When done, please share your results with us on Twitter - <a href="https://twitter.com/appsilon" target="_blank" rel="noopener">@appsilon</a>. We'd love to see what you can come up with. <blockquote>Want to build interactive maps with R and R Shiny? <a href="https://appsilon.com/leaflet-vs-tmap-build-interactive-maps-with-r-shiny/" target="_blank" rel="noopener">Try Leaflet and Tmap</a>.</blockquote>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
r
data visualization
tutorials