Connecting R to Azure: How to Connect to an Azure-Hosted SQL Server Database from R

Reading time:
time
min
By:
Dario Radečić
January 1, 1970

Data engineering has been getting a lot of coverage in the last couple of years, and for a good reason - it's an integral part of any data project and enables data scientists and analytics engineers to do their job without worrying about data quality. One of the first concepts any data engineer must know is connecting various programming languages to various databases. Today we'll go about connecting R to the Azure SQL Server database. You'll learn how to provision the database, install the driver, and do a lot of SQL magic straight from R. Let's dig in! <blockquote>Programming in R can significantly improve your business workflows - <a href="https://appsilon.com/r-programming-vs-excel-for-business-workflow/" target="_blank" rel="noopener">Here are 5 practical examples how</a>.</blockquote> Table of contents: <ul><li><a href="#database-setup">SQL Server Database Setup on Azure - Provision the Database and Install the Driver</a></li><li><a href="#connect">How to Connect R to Azure SQL Server Database</a></li><li><a href="#create-table">Connecting R to Azure - Creating Tables</a></li><li><a href="#insert-data">Inserting and Selecting Data into Azure SQL Server from R</a></li><li><a href="#aggregate-data">Connecting R to Azure - Data Manipulation and Aggregation</a></li><li><a href="#visualize-data">Visualizing R Azure Data with ggplot2</a></li></ul> <hr /> <h2 id="database-setup">SQL Server Database Setup on Azure - Provision the Database and Install the Driver</h2> It's assumed you already have an Azure account configured with everything that goes with it - active subscription and resource group. If that's the case, provisioning a new SQL Server database instance will feel like a walk in the park. <h3>How to Provision a SQL Server Database on Azure</h3> Open up the Azure dashboard, open up the side menu, and click on <i>SQL databases</i> as shown below: <img class="size-full wp-image-20343" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d722ff254f2505a6b55f_ff83d68c_1-2.webp" alt="Image 1 - Creating a new Azure resource" width="1562" height="1216" /> Image 1 - Creating a new Azure resource This will list all of the databases you currently have. If you're new to Azure, you won't have any, so click on the create new database button. This will redirect you to the database creation wizard. The first step is to create a new database server, so click on that option before clicking on anything else. Give your server a name, and pick <i>Use both SQL and Azure AD authentication</i> under <i>Authentication method</i>. Specify the admin username and password, and click on <i>OK</i>: <img class="size-full wp-image-20345" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d722bc3cf4aed6cda7e7_9a0ef90e_2-2.webp" alt="Image 2 - Creating a new SQL database server" width="1968" height="2240" /> Image 2 - Creating a new SQL database server Now make sure your subscription, resource group, and previously configured server are selected in the <i>Basics</i> tab. Give your database a name before proceeding onto the <i>Network</i> tab: <img class="size-full wp-image-20347" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d723537e1bbc539681ed_a6b15539_3-2.webp" alt="Image 3 - Basic SQL database configuration" width="1968" height="2240" /> Image 3 - Basic SQL database configuration While in the Network tab, make sure to set the <i>Connectivity method</i> to <i>Public endpoint</i>, and also to allow all firewall rules: <img class="size-full wp-image-20349" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7246778869364ec01f8_bd7d9853_4-2.webp" alt="Image 4 - SQL database networking" width="1968" height="2240" /> Image 4 - SQL database networking And that's it! You can switch over to the <i>Review + create</i> tab to see everything you've configured: <img class="size-full wp-image-20351" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d724c039958680cbdd44_2832f052_5-2.webp" alt="Image 5 - SQL database final review" width="1968" height="2240" /> Image 5 - SQL database final review If you're satisfied with what you see, click on the big blue <i>Create</i> button to start provisioning the database. It will take a couple of minutes, but once done, you should see the following screen: <img class="size-full wp-image-20353" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d725cba09eb7528229b8_7fdb6980_6-2.webp" alt="Image 6 - SQL Server database successful creation" width="1968" height="1556" /> Image 6 - SQL Server database successful creation Your database is now provisioned, which means you're almost ready to connect to it. Let's get your connection parameters next. <h3>How to Obtain Connection Parameters</h3> Continuing where we left off in <i>Image 6</i>, click on the <i>Go to resource</i> button. This will open up the overview panel of your database instance, as shown below: <img class="size-full wp-image-20355" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72609973e5f1b980b02_0fdc69ed_7-2.webp" alt="Image 7 - Azure database service overview" width="2918" height="1556" /> Image 7 - Azure database service overview On the right side of the screen, you'll see the <b>Connection strings</b> property with a link. Click on that link, and you'll be redirected to the following screen (make sure to switch to the ODBC tab): <img class="size-full wp-image-20357" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7277b3b08462745fe2c_6307d667_8-2.webp" alt="Image 8 - Azure database connection strings" width="2918" height="1556" /> Image 8 - Azure database connection strings We've configured our database to use both SQL and AD authentication, which means the first connection string should work fine. Just note the <code>{your_password_here}</code> placeholder. That's where you should write the password for your user that was created earlier (shown in <i>Image 2</i>). Copy this piece of code somewhere safe, and make sure to specify the password. That's all you need for now. We're almost there, but the connection to an Azure-hosted SQL Server database won't work without an adequate driver. Let's see how to install it next. <h3>How to Install SQL Server Driver on Your PC/Laptop</h3> Your PC/Mac/Laptop needs a SQL Server driver in order to communicate with a remote database. Luckily, it's really easy to install. We're running macOS, so we'll show installation instructions only for it. For other operating systems, feel free to use the following links as a resource: <ul><li><a href="https://learn.microsoft.com/en-us/sql/connect/odbc/microsoft-odbc-driver-for-sql-server?view=sql-server-ver16" target="_blank" rel="noopener">SQL Server Driver for Windows</a></li><li><a href="https://learn.microsoft.com/en-us/sql/connect/odbc/linux-mac/installing-the-microsoft-odbc-driver-for-sql-server?view=sql-server-ver16&amp;tabs=alpine18-install%2Calpine17-install%2Cdebian8-install%2Credhat7-13-install%2Crhel7-offline" target="_blank" rel="noopener">SQL Server Driver for Linux</a></li></ul> As for macOS, simply open a new Terminal window and run the following commands one by one: <pre><code class="language-bash">/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install.sh)" brew tap microsoft/mssql-release https://github.com/Microsoft/homebrew-mssql-release brew update HOMEBREW_ACCEPT_EULA=Y brew install msodbcsql18 mssql-tools18</code></pre> Installation took between 10 and 15 minutes on our end. Patience is the key. Once the driver is installed, you can proceed to the following section in which we'll establish a database connection through R. <h2 id="connect">How to Connect R to Azure SQL Server Database</h2> You now have the driver installed and a connection string stored somewhere safe. That's all you need to establish a connection through R. This section will show the best practice for doing so. <h3>Setting up the .env File</h3> Wait, what is <code>.env</code> file? Well, it's a file without a filename (only an extension) that's used to hold sensitive information such as database connection strings. You essentially write the values there and add the file to <code>.gitignore</code>, so your credentials aren't visible to everyone. The best part is that you can load this <code>.env</code> file in R and read its key-value pairs as environment variables. To get started, create the file and paste the following: <pre><code class="language-text">DB_CONN_STR=</code></pre> You'll want to paste the ODBC connection string right after the equals sign. There's no need to surround the connection string with quotes, so keep that in mind. Let's now see how to read this <code>.env</code> file in R, and how to use it to establish a database connection. <h3>Establishing R Azure SQL Server Connection</h3> We'll use R's <code>odbc</code> package to communicate with the database. You'll also need <code>DBI</code> installed, so make sure you have both of these installed before proceeding. The <code>load_dot_env()</code> function will load the environment variables that hold the connection string: <pre><code class="language-r">library(odbc) dotenv::load_dot_env()</code></pre> And now, you can access this environment variable when declaring a new database connection via the <code>dbConnect()</code> function: <pre><code class="language-r">conn = dbConnect(  odbc::odbc(),  .connection_string = Sys.getenv("DB_CONN_STR") )</code></pre> And that's it - clean and free of sensitive info. If you don't get any errors, it means the connection to the database was successfully established. If you do, make sure you've entered the password correctly. <h2 id="create-table">Connecting R to Azure - Creating Tables</h2> Now let's start actually working with the SQL Server database by creating a table. It will be a simple one, containing info on a couple of movies. We'll store the values for movie name, average rating, duration, category, and a made-up value for a number of views. We'll use stats from a couple of recent movies, but that's a topic for a different section. Here's the command you need to run to create the table: <pre><code class="language-r">dbSendQuery(  conn = conn,  statement = paste(    "CREATE TABLE movies (",    "mov_id INT PRIMARY KEY IDENTITY(1, 1),",    "mov_name VARCHAR(100),",    "mov_avg_rating DECIMAL(3, 2),",    "mov_duration INT,",    "mov_category VARCHAR(32),",    "mov_num_watches INT",    ")"  ) )</code></pre> The <code>dbSendQuery()</code> function is used to run a SQL statement on the database. The statement itself is just one long string split into multiple lines with the R's <code>paste()</code> function. These are the results you'll see in the R console: <img class="size-full wp-image-20359" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72885b6bd7053b8b502_8f60ac24_9-2.webp" alt="Image 9 - Create table statement result" width="1704" height="312" /> Image 9 - Create table statement result It looks like the table was created, but how can we confirm? That's easy - you can use the <code>dbListTables()</code> function to list all the tables in the database: <pre><code class="language-r">dbListTables(conn = conn)</code></pre> Here's the output: <img class="size-full wp-image-20361" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7298e4ed268cf59860f_41de9a22_10-2.webp" alt="Image 10 - Listing all database tables" width="896" height="294" /> Image 10 - Listing all database tables You can see our <code>movies</code> table taking first place. You could (and should) create your tables in a new schema, but we've decided to keep everything in one place to make things simpler. The next step is to insert some data into our newly created table. <h2 id="insert-data">Inserting and Selecting Data into Azure SQL Server from R</h2> This section will show you how to insert multiple records into a SQL Server database one by one, and then how to get them all and convert them into a data frame object. <h3>How to Insert Data to Azure SQL Server from R</h3> The easiest way to insert records into the database is to use the <code>dbSendQuery()</code> function to run a SQL <code>INSERT</code> statement. We already have the <code>movies</code> table, so let's now populate it with 5 recent ones: <pre><code class="language-r">dbSendQuery(conn = conn, statement = "INSERT INTO movies (mov_name, mov_avg_rating, mov_duration, mov_category, mov_num_watches) VALUES ('Oppenheimer', 8.70, 180, 'Biography', 10257400)") dbSendQuery(conn = conn, statement = "INSERT INTO movies (mov_name, mov_avg_rating, mov_duration, mov_category, mov_num_watches) VALUES ('Mission: Impossible - Dead Reckoning Part One', 8.00, 163, 'Action', 1958890)") dbSendQuery(conn = conn, statement = "INSERT INTO movies (mov_name, mov_avg_rating, mov_duration, mov_category, mov_num_watches) VALUES ('Avatar: The Way of Water', 7.60, 192, 'Action', 575901)") dbSendQuery(conn = conn, statement = "INSERT INTO movies (mov_name, mov_avg_rating, mov_duration, mov_category, mov_num_watches) VALUES ('John Wick: Chapter 4', 7.80, 169, 'Action', 2979900)") dbSendQuery(conn = conn, statement = "INSERT INTO movies (mov_name, mov_avg_rating, mov_duration, mov_category, mov_num_watches) VALUES ('Barbie', 7.50, 114, 'Comedy', 1276950)") dbCommit(conn = conn)</code></pre> Once again, the <code>mov_num_watches</code> values are completely made up. The others were picked up from official sources. These records are now stored in the database, so let's see how to collect them with R. <h3>How to Select Data from Azure SQL Server to R</h3> In order to fetch rows from a SQL Server table in R, you'll have to combine the <code>dbSendQuery()</code> and <code>dbFetch()</code> functions. The first one is used to run a SQL <code>SELECT</code> statement, and the second one fetches the results and stores them into a variable. Here's the implementation example: <pre><code class="language-r">movies &lt;- dbSendQuery(conn = conn, statement = "SELECT * FROM movies") movies &lt;- dbFetch(movies) movies</code></pre> The resulting <code>movies</code> variable now holds a data frame object, containing all the records from the <code>movies</code> table: <img class="size-full wp-image-20363" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d7298e4ed268cf598680_af8479e7_11-2.webp" alt="Image 11 - Selecting all data from a table" width="2040" height="312" /> Image 11 - Selecting all data from a table We now have some data to work with. Let's aggregate it and explore additional storage options next. <h2 id="aggregate-data">Connecting R to Azure - Data Manipulation and Aggregation</h2> In this section, you'll learn how to aggregate previously fetched data to extract average ratings by movie category and watch time in hours. <h3>Average Rating by Category</h3> Every movie has a category attached to it. Truth be told, we have a scarce library of 5 movies, 3 of them being action ones. Nevertheless, the logic will translate to much larger tables. R's <code>dplyr</code> pacakge allows us to group the dataset by the movie category and calculate summary statistics - the number of movies in a category and the average rating across the category. We only care about the prior, and the former is here just to put some weight on the average rating. Here's the code: <pre><code class="language-r">library(dplyr) <br>avg_rating_by_category &lt;- movies %&gt;%  group_by(mov_category) %&gt;%  summarise(    count = n(),    avg_rating = mean(mov_avg_rating)  ) avg_rating_by_category</code></pre> This is what the resulting data frame looks like: <img class="size-full wp-image-20365" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72ac77436744522f08d_04afee15_12-2.webp" alt="Image 12 - Calculating average movie rating by category" width="620" height="306" /> Image 12 - Calculating average movie rating by category One done, one to go. <h3>Watch Time</h3> Our dataset currently has columns describing movie duration in minutes and number of watches. If we multiply these two and divide the result by 60, we'll get the total watch time in hours. Here's a code snippet that implements this logic: <pre><code class="language-r">watch_time &lt;- movies %&gt;%  mutate(watch_time_hr = (mov_duration * mov_num_watches) / 60) %&gt;%  select(c("mov_name", "watch_time_hr")) watch_time</code></pre> The resulting data frame contains the movie name and the watch time in hours: <img class="size-full wp-image-20367" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72b21d589fe295646f2_983922e0_13-2.webp" alt="Image 13 - Calculating watch time in hours" width="1128" height="308" /> Image 13 - Calculating watch time in hours We now have two new data frames that are ready to be pushed to the database. The following section will show you what's the optimal strategy for that. <h3>Save Aggregated R data.frame to Azure SQL Server</h3> In a from-scratch approach, you would first create dedicated SQL tables and then iterate over a data structure to insert rows. R packs a convenient function that will make your life easier. The <code>dbWriteTable()</code> function allows you to dump the data frame directly to a database table. You simply have to provide the table name (the table doesn't have to exist) and your data frame data structure: <pre><code class="language-r">dbWriteTable(conn = conn, name = "movies_rating_by_cat", value = avg_rating_by_category, row.names = FALSE) dbWriteTable(conn = conn, name = "movies_watch_time", value = watch_time, row.names = FALSE)</code></pre> The tables should now be created and populated. We can make sure of that by running the <code>dbListTables()</code> function once again: <pre><code class="language-r">dbListTables(conn = conn)</code></pre> Here are the results: <img class="size-full wp-image-20369" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72bc6822ae9d7486e9a_177561cc_14-2.webp" alt="Image 14 - Listing all database tables" width="732" height="194" /> Image 14 - Listing all database tables And it looks like both tables were created! Let's verify if they actually contain our data next. <h2 id="visualize-data">Visualizing R Azure Data with ggplot2</h2> The aim of this section is to extract values from the two previously created tables and to visualize them with R's <code>ggplot2</code> package. <h3>Average Rating by Category</h3> Up first, let's take care of the average rating table. We can fetch the data in the same manner as before - by combining <code>dbSendQuery()</code> and <code>dbFetch()</code> functions: <pre><code class="language-r">library(ggplot2) <br>db_rating_by_cat &lt;- dbSendQuery(conn = conn, statement = "SELECT * FROM movies_rating_by_cat") db_rating_by_cat &lt;- dbFetch(db_rating_by_cat) db_rating_by_cat</code></pre> We get a familiar-looking data frame back: <img class="size-full wp-image-20371" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72c6778869364ec05e5_50242cdd_15-2.webp" alt="Image 15 - Average rating by category fetched from the database" width="606" height="220" /> Image 15 - Average rating by category fetched from the database And now for the visualization, we'll use <code>ggplot2</code> to make a vertical bat chart with text labels for each bar. The individual bars represent movie categories, while the height of the bars represents the average movie rating for that category. To further complicate things, we'll also color the bars according to the number of movies in the category: <pre><code class="language-r">ggplot(db_rating_by_cat, aes(x = avg_rating, y = mov_category, fill = count)) +  geom_bar(stat = "identity", position = "dodge") +  geom_text(aes(label = avg_rating), vjust = -0.5, color = "black", size = 4) +  xlab("Average Rating") +  ylab("Movie Category") +  ggtitle("Average Rating by Category") +  theme_minimal() +  coord_flip()</code></pre> Here's the resulting visualization: <img class="size-full wp-image-20373" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72d019bb567ab85fd52_4a8e4f2b_16-2.webp" alt="Image 16 - Average rating by category visualized" width="1858" height="1572" /> Image 16 - Average rating by category visualized The <i>Action</i> category is the only one with multiple movies, so it's the only relevant category to examine. Up next, let's make a similar chart for the watch time. <h3>Watch Time</h3> Before the plot, we'll have to extract the data from the database. You already know how to do this, but here's a code snippet if you get stuck: <pre><code class="language-r">db_watch_time &lt;- dbSendQuery(conn = conn, statement = "SELECT * FROM movies_watch_time") db_watch_time &lt;- dbFetch(db_watch_time) db_watch_time</code></pre> The resulting data frame has only two columns - movie name and number of hours watched: <img class="size-full wp-image-20375" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d72e87994b7930bbddfe_5fe704d0_17-2.webp" alt="Image 17 - Watch time fetched from the database" width="1124" height="274" /> Image 17 - Watch time fetched from the database As for the visualization, we'll make a horizontal bar chart in which the length of the bar corresponds to watch hours. The code is somewhat similar to the previous chart: <pre><code class="language-r">ggplot(db_watch_time, aes(x =mov_name, y = watch_time_hr)) +  geom_col(fill = "steelblue") +  xlab("Watch Time (in hours)") +  ylab("Movie Name") +  ggtitle("Watch Time of Movies") +  theme_minimal() +  coord_flip()</code></pre> This is what the chart looks like: <img class="size-full wp-image-20377" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b023a10b330f2efeac897c_18-2.webp" alt="Image 18 - Watch time visualized" width="1898" height="1488" /> Image 18 - Watch time visualized And that's it for today! You now know how to can connect to an Azure SQL Server database from R, extract data, and make amazing data visualizations. Do you need more assistance in understanding <code>ggplot2</code> code? We have plenty of articles for you, covering <a href="https://appsilon.com/ggplot2-bar-charts/" target="_blank" rel="noopener">bar charts</a>, <a href="https://appsilon.com/ggplot2-line-charts/" target="_blank" rel="noopener">line charts</a>, <a href="https://appsilon.com/ggplot-scatter-plots/" target="_blank" rel="noopener">scatter plots</a>, <a href="https://appsilon.com/ggplot2-boxplots/" target="_blank" rel="noopener">box plots</a>, and <a href="https://appsilon.com/ggplot2-histograms/" target="_blank" rel="noopener">histograms</a>. <hr /> <h2 id="summary">Connecting R to Azure - Summary</h2> Long story short, connecting R to Azure SQL Server database takes more work than, let's say, connecting to Postgres or MySQL. The connection string is somewhat more complex, and you need to install an additional driver. It's nothing you can't do in a couple of minutes, but definitely worth pointing out. This article demonstrated how to connect R to a SQL Server database, how to create a table, insert data, fetch data, aggregate data, and finally, how to visualize data straight from the database. It's more or less all you need to get started exploring further. <i>Which database engine do you use most often and what's your preferred way of connecting to it from R?</i> Let us know in the comment section below. <blockquote>Developing production-ready R Shiny apps is no small task. <a href="https://appsilon.com/performant-r-shiny-apps-with-database-indexing-normalization/" target="_blank" rel="noopener">Make sure your company is giving you everything you need to make this process smoother</a>.</blockquote>

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
r
database
tutorials