stringr: 10 Examples on How to Do Efficient String Processing in R

Reading time:
time
min
By:
Dario Radečić
June 13, 2023

Working with strings in R can be surprisingly complex and challenging. Dealing with diverse data types, including textual, numeric, and language-specific characters, adds further complexity. It's even worse if you're collecting string data through some website form. Good luck processing that. Truth be told, R's built-in functions for working with strings leave a lot to be desired. That's where the R package stringr comes in, and it ships with the <a href="https://stringr.tidyverse.org/" target="_blank" rel="noopener">Tidyverse</a> ecosystem, so it's likely you already have it installed. Don't worry if you don't, as we'll walk you through the stringr installation steps. This article will demonstrate 10 useful stringr functions that you should know in order to work efficiently with string data and avoid wasting time reinventing the wheel. Before diving into the examples, we will cover some basics about the R stringr package. <blockquote>Need to manage environment-specific configuration files in R? <a href="https://appsilon.com/r-config/" target="_blank" rel="noopener">Look no futher than R config package</a>.</blockquote> Table of contents: <ul><li><a href="#introduction">What is the stringr package and How to Install it</a></li><li><a href="#examples">stringr in Action - 10 Functions You Must Know</a></li><li><a href="#summary">Summing up R's stringr</a></li></ul> <hr /> <h2 id="introduction">What is the stringr package and How to Install it</h2> The stringr package provides you with a collection of functions for working with strings. It was developed by <a href="https://appsilon.com/best-r-shiny-books-and-courses/#book-1" target="_blank" rel="noopener">Hadley Wickham</a>, who is a Chief Scientist at Posit and a well-known figure in the world of R programming language. This package is designed to be user-friendly, easy to learn, and easy to use, which makes it an essential tool for those who want to work with string data effectively. The <code>stringr</code> package has a lot of things going for it. It's consistent with function naming, which isn't always given in other packages. For example, all <code>stringr</code> functions have a prefix of <code>str_</code>, followed by the function name. You can expect to find pretty much any function you can imagine, from simple string operations to pattern matching, substitution, trimming, splitting, and much more. It's an easy to understand tidyverse wrapper over common stringi functions; generally, if the use-case is not too complex, stringr helps the user avoid using stringi. But before you can use the package, you'll have to install it. The recommended method is to install the entire <code>tidyverse</code>, as <code>stringr</code> is a part of it. You can do so by running the following command from the R console: <pre><code class="language-r">install.packages("tidyverse")</code></pre> Alternatively, you can install only<code>stringr</code> by running the following command: <pre><code class="language-r">install.packages("stringr")</code></pre> Either way, you now have <code>stringr</code> installed, which means we can go over the top 10 functions next. <h2 id="examples">stringr in Action - 10 Functions to Preprocess Textual Data</h2> This section will give you 10 function examples of the <code>stringr</code> package, which will come in handy when preprocessing textual data. As for the data, we'll declare a vector <code>x</code> that contains five strings: <pre><code class="language-r">library(stringr) <br>x &lt;- c("house", "car", "plant", "telephone", "arm chair") print(x)</code></pre> Here's what it looks like: <img class="size-full wp-image-18893" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af000a1858e6d4e924f9_629c0668_1-2.webp" alt="Image 1 - Vector of strings" width="1370" height="102" /> Image 1 - Vector of strings We can now apply a whole collection of stringr functions to this vector. Let's start with a simple one. <h3>1. str_length()</h3> This function is used when you want to return the number of characters in a given string. When applied to a vector, it returns a vector where each item represents the number of characters in a corresponding string. The <code>str_length()</code> function takes a string or a vector as a parameter and returns either an int or a vector of ints, depending on what was passed in. Take a look at the following example - we're using the function on the entire vector at once: <pre><code class="language-r">str_length(x)</code></pre> And this is the output: <img class="size-full wp-image-18895" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af00849aa4fb70c92d51_93733bd7_2-2.webp" alt="Image 2 - R stringr str_length() function" width="358" height="96" /> Image 2 -  stringr str_length() function The returned vector of integers matches the input vector of strings and informs you how long each string is. <h3>2. str_sub()</h3> The <code>str_sub()</code> function returns a substring of a given string. It takes three parameters: <ol><li>The string (or a vector of strings)</li><li>The starting index of the substring</li><li>The ending index of the substring</li></ol> For example, if you pass in <code>2</code> and <code>5</code> for the last two parameters, only a part of the string between those index locations would be returned. This function is much easier to understand in practice, so let's apply it to our vector of strings: <pre><code class="language-r">str_sub(x, start = 2, end = 5)</code></pre> And here is the result: <img class="size-full wp-image-18897" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af016aa62d746d2deaed_c6e3e929_3-2.webp" alt="Image 3 - R stringr str_sub() function" width="840" height="100" /> Image 3 - stringr str_sub() function It's useful when you want to limit the number of characters or trim the start/end of a string. <h3>3. str_detect()</h3> This function returns a boolean or a vector of booleans. The value depends on whether the entered pattern exists in a given string or not. The <code>str_detect()</code> function takes two parameters - your string (or vector of strings) and a pattern to search for. If the pattern is found, the function returns <code>TRUE</code>; otherwise, it returns <code>FALSE</code>. Let's take a look at it in code. We'll search for the <code>ar</code> letter pattern in our vector of strings: <pre><code class="language-r">str_detect(x, "ar")</code></pre> Here's the resulting vector of booleans: <img class="size-full wp-image-18899" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af02b5f5c329492a3cca_15239783_4-2.webp" alt="Image 4 - R stringr str_detect() function" width="732" height="98" /> Image 4 - stringr str_detect() function It's a boolean vector, which means you can use it to select only those input strings that satisfy the condition: <pre><code class="language-r">x[str_detect(x, "ar")]</code></pre> We now get a vector of strings back: <img class="size-full wp-image-18901" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f4d22b1cc05c827b1f7_5-2.webp" alt="Image 5 - R stringr str_detect() function (2)" width="612" height="104" /> Image 5 - stringr str_detect() function (2) There's a more convenient function for doing so, and we'll explore it later, but it doesn't hurt to be a bit creative. <h3>4. str_replace()</h3> The <code>str_replace()</code> function is useful when you want to replace the first occurrence of a pattern in a string with a specified replacement string. It takes three parameters: <ol><li>The string (or a vector of strings) to search</li><li>The pattern to search</li><li>The replacement string</li></ol> The function returns a modified string in which the pattern to search is replaced with the replacement string, <b>but only at the first occurrence</b>. Let's give it a shot and replace all letters <code>e</code> with a string <code>***</code>: <pre><code class="language-r">str_replace(x, "e", "***")</code></pre> Here's what it returns: <img class="size-full wp-image-18903" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af032814a08d133da3c7_5cd4d5cd_6-2.webp" alt="Image 6 - R stringr str_replace() function" width="1554" height="98" /> Image 6 - stringr str_replace() function The function does what was advertised, which is replacing only the first occurrence of the search pattern. Just take a look at the <code>telephone</code> string and you'll see that only the first <code>e</code> was replaced. If you want to replace all occurrences, do so with the upcoming function. <h3>5. str_replace_all()</h3> This function is almost identical to the previous one, but it replaces <b>all occurrences</b> of the search pattern with the provided replacement string. It takes in identical parameters, so there's no need to go over them once again. We'll once again replace all characters <code>e</code> with a string <code>***</code>. Here's the code: <pre><code class="language-r">str_replace_all(x, "e", "***")</code></pre> And these are the results: <img class="size-full wp-image-18905" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29fa9337d6008eabd3a56_7-2.webp" alt="Image 7 - R stringr str_replace_all() function" width="1900" height="100" /> Image 7 - stringr str_replace_all() function Take a look at the <code>telephone</code> string and you'll immediately see that all <code>e</code>'s were successfully replaced. In practice, you'll use <code>str_replace_all()</code> much more frequent than <code>str_replace()</code>. <h3>6. str_count()</h3> The <code>str_count()</code> function is here to count the number of times a search pattern appears in a string. It takes two parameters: the string on which the search is performed (or a vector of strings), and the search pattern which can also be a regular expression. This function will return an integer (or a vector of integers) representing the number of times the search pattern was found. Let's declare the letter <code>a</code> as a search pattern and perform the search on our vector of words: <pre><code class="language-r">str_count(x, "a")</code></pre> Here's what the function returns: <img class="size-full wp-image-18907" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29fa94a2eaf755d9279c4_8-2.webp" alt="Image 8 - R stringr str_count() function" width="438" height="102" /> Image 8 - stringr str_count() function That's the number of times the letter <code>a</code> is present in all of the input strings. Easy! <h3>7. str_subset()</h3> Remember earlier when we said there's an easier way to get a vector of strings that satisfies the condition than comparing it to a boolean vector? Well, this is the function for the job. The <code>str_subset()</code> function returns a subset of a vector of strings that match a certain search pattern. It takes in two parameters: the vector of strings to search and the search pattern itself. Let's take a look at this function in code and return all words that contain a letter <code>a</code>: <pre><code class="language-r">str_subset(x, "a")</code></pre> We get a vector of three strings back: <img class="size-full wp-image-18909" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29faaafe4df21c48a899a_9-2.webp" alt="Image 9 - R stringr str_subset() function" width="860" height="104" /> Image 9 - stringr str_subset() function Neat! No need to reinvent the wheel. <h3>8. str_trim()</h3> The <code>str_trim()</code> function is useful when you have messy strings full of leading and trailing whitespaces. It will remove all of them, either from a single string or from a vector of strings. Since our vector <code>x</code> doesn't contain any elements with leading or trailing whitespaces, we'll declare a new one that does: <pre><code class="language-r">y &lt;- c("  hello ", "from  ", " R   ") print(y)</code></pre> Here's what it looks like: <img class="size-full wp-image-18911" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29faa5b341395a2e55964_10-2.webp" alt="Image 10 - String vector with leading and trailing whitespaces" width="746" height="102" /> Image 10 - String vector with leading and trailing whitespaces From here, just pass this vector into the <code>str_trim()</code> function and you'll be good to go: <pre><code class="language-r">str_trim(y)</code></pre> This is the result: <img class="size-full wp-image-18913" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29fabf2a5e39943539453_11-2.webp" alt="Image 11 - R stringr str_trim() function" width="518" height="106" /> Image 11 - stringr str_trim() function This function is particularly useful when processing form data, and will make sure no whitespace was entered by mistake. <h3>9. str_split()</h3> This function will split a string or a vector of strings into a vector of substrings or a list of vectors of substrings, depending on the format of data passed in. It does so on a specified delimiter which you have to pass in, meaning there are two parameters in total to this function. Now, there's only one string with two words in our <code>x</code> vector, so we'll declare a new one where strings are a bit wordier: <pre><code class="language-r">z &lt;- c("office chair", "front desk", "brown laptop case") print(z)</code></pre> This is what it looks like: <img class="size-full wp-image-18915" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f66d1d86db85264ee7c_12-1.webp" alt="Image 12 - A vector of lengthy strings" width="1386" height="108" /> Image 12 - A vector of lengthy strings We can now call <code>str_split()</code> on <code>z</code> and pass in space as a delimiter: <pre><code class="language-r">str_split(z, " ")</code></pre> The function returns a list in which each child element is a vector of strings: <img class="size-full wp-image-18917" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af0967eac46f1bcfd80a_6c3501d9_13.webp" alt="Image 13 - R stringr str_split() function" width="644" height="478" /> Image 13 - stringr str_split() function Let's take a look at another function before wrapping up. <h3>10. str_to_xyz()</h3> There's actually no function named <code>str_to_xyz()</code>, but there's a set of functions for transforming a string or a vector of strings. You can use one of the following functions: <ul><li><code>str_to_title()</code> - To capitalize first letter of each word in a string</li><li><code>str_to_sentence()</code> - To capitalize the first letter of a string</li><li><code>str_to_upper()</code> - To uppercase the entire string</li><li><code>str_to_lower()</code> - To lowercase the entire string</li></ul> We'll show you two of these in action. First, let's use <code>str_to_title()</code> on the entire vector <code>x</code>: <pre><code class="language-r">str_to_title(x)</code></pre> Here are the results: <img class="size-full wp-image-18919" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af0a74fe971f0385a034_28577c5f_14.webp" alt="Image 14 - R stringr str_to_title() function" width="1374" height="106" /> Image 14 - stringr str_to_title() function Each word now has the first letter capitalized. Up next, let's take a look at <code>str_to_upper()</code>. Here's the code: <pre><code class="language-r">str_to_upper(x)</code></pre> And these are the results: <img class="size-full wp-image-18921" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7af0b0a1858e6d4e92fe1_9111ee48_15.webp" alt="Image 15 - R stringr str_to_upper() function" width="1384" height="94" /> Image 15 - stringr str_to_upper() function All letters of each vector item are now uppercased. And these are the top 10 stringr functions you must know. Let's make a brief recap next. <hr /> <h2 id="summary">Summing Up Strings in R with stringr</h2> To conclude, the R stringr package packs a powerful set of functions for working with text data. We've explored 10 of them in this article, and we hope they'll help you in your job. The main benefit of using the stringr package is its simplicity. The functions are intuitive and easy to use, even for newcomers to R. In addition, the package offers consistent syntax across functions, making it easy to learn and apply these tools to different text analysis projects. <i>What's your favorite stringr/stringi function? Or a set of functions?</i> Make sure to share in the comment section below, or reach out on Twitter - <a href="https://twitter.com/appsilon" target="_blank" rel="noopener">@appsilon</a>. We'd love to hear your thoughts. <blockquote>Having trouble managing dependencies in R projects? <a href="https://appsilon.com/renv-how-to-manage-dependencies-in-r/" target="_blank" rel="noopener">Try R renv, you'll never look back</a>.</blockquote>

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
tutorial
tutorials