Introduction to SQL: 5 Key Concepts Every Data Professional Must Know

Estimated time:
time
min

<h2><span data-preserver-spaces="true">Introduction to SQL</span></h2> <span data-preserver-spaces="true">SQL has been around for decades, and is a go-to language for data analysis and lookups. With the rise of data-related programming languages such as R and Python, it's easy to use SQL only for a simple <code>SELECT *</code> statement and perform the filterings and aggregations later. While tempting, that's not the best solution.</span> <blockquote><span data-preserver-spaces="true">Are you a programmer who wants to learn R? </span><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-for-programmers/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Check out our complete R guide for programmers</span></a><span data-preserver-spaces="true">.</span></blockquote> <span data-preserver-spaces="true">Today you'll learn the basics of SQL through a ton of hands-on examples. You'll need to have a </span><a class="editor-rtfLink" href="https://www.postgresql.org/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">PostgreSQL</span></a><span data-preserver-spaces="true"> database installed to follow along.</span> <span data-preserver-spaces="true">The article is structured as follows:</span> <ul><li><a href="#dataset">Dataset introduction</a></li><li><a href="#select">Select data</a></li><li><a href="#filter">Filter data</a></li><li><a href="#sort">Sort data</a></li><li><a href="#match">Match data</a></li><li><a href="#join">Join and group data</a></li><li><a href="#conclusion">Conclusion</a></li></ul> <h2 id="dataset"><span data-preserver-spaces="true">Dataset Introduction</span></h2> <span data-preserver-spaces="true">As briefly mentioned earlier, you'll need to have the PostgreSQL database installed. You'll also need the </span><a class="editor-rtfLink" href="https://www.postgresqltutorial.com/postgresql-sample-database/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Dvd Rental</span></a><span data-preserver-spaces="true"> dataset that you can load to your database via the </span><em><span data-preserver-spaces="true">restore</span></em><span data-preserver-spaces="true"> functionality.</span> <span data-preserver-spaces="true">The ER diagram of the Dvd rental dataset is shown below:</span> <img class="size-full wp-image-6518" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d57d303d514718a0c89c_29614cef_1-4.webp" alt="Image 1 - DVD rental database diagram (source: https://www.postgresqltutorial.com/postgresql-sample-database/)" width="730" height="920" /> Image 1 - DVD rental database diagram (source: https://www.postgresqltutorial.com/postgresql-sample-database/) <span data-preserver-spaces="true">You'll use customer and payment tables throughout the article, but feel free to explore the others on your own.</span> <h2 id="select"><span data-preserver-spaces="true">Select Data</span></h2> <span data-preserver-spaces="true">The most basic operation you'll do with SQL is selecting the data. It's done with the <code>SELECT</code> keyword (not case-sensitive). If you want to grab all columns from a particular table, you can use the <code>SELECT * FROM &lt;table_name&gt;</code> syntax. Likewise, if you want only specific columns, you can replace the star sign with column names.</span> <span data-preserver-spaces="true">Let's take a look at a couple of examples to grasp a full picture.</span> <span data-preserver-spaces="true">Here's how you can grab all data from the </span><em><span data-preserver-spaces="true">customer</span></em><span data-preserver-spaces="true"> table:</span> <script src="https://gist.github.com/darioappsilon/e0fe136a4ba37b06dfa27e1984b913dc.js"></script> <span data-preserver-spaces="true">The results are shown in the image below:</span> <img class="size-full wp-image-6519" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d57e2195e903584eb801_8ca9ce7c_2-5.webp" alt="Image 2 - Data from the customer table" width="2712" height="604" /> Image 2 - Data from the customer table <span data-preserver-spaces="true">But what if you only want the data on customer ID, first, and last name? Here's what you can do:</span> <script src="https://gist.github.com/darioappsilon/c400ccb749902f2bf3cacccddcf682fc.js"></script> <span data-preserver-spaces="true">And here are the results:</span> <img class="size-full wp-image-6520" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d57f215de09d850dc308_13506224_3-5.webp" alt="Image 3 - Data from the customer table - custom columns" width="986" height="604" /> Image 3 - Data from the customer table - custom columns <span data-preserver-spaces="true">It's always a good practice to specify column names instead of using the star syntax. The table structure might change in the future, so you might end up with more columns than anticipated. Even if that's not the case, what's the point of selecting columns you won't work with?</span> <h2 id="filter"><span data-preserver-spaces="true">Filter Data</span></h2> <span data-preserver-spaces="true">It's highly likely you won't need all records from a table. That's where filtering comes into play. You can filter the result set with the <code>WHERE</code> keyword. Any condition that has to be met goes after it.</span> <span data-preserver-spaces="true">Here's how you can grab only inactive customers:</span> <script src="https://gist.github.com/darioappsilon/dca19ed432016fa50b222d4953596254.js"></script> <span data-preserver-spaces="true">The results are shown below:</span> <img class="size-full wp-image-6521" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5807a511f178f8ccfa6_c5457972_4-4.webp" alt="Image 4 - Inactive customers from the customer table" width="2694" height="604" /> Image 4 - Inactive customers from the customer table <span data-preserver-spaces="true">But what if you want to filter by multiple conditions? You can't use the <code>WHERE</code> keyword again. Instead, you can list conditions separated by the <code>AND</code> keyword.</span> <span data-preserver-spaces="true">Here's how to select all inactive customers from the first store (store_id is 1):</span> <script src="https://gist.github.com/darioappsilon/74eff39131d24bdf3a4aecabf2c0cb2a.js"></script> <span data-preserver-spaces="true">Here are the results:</span> <img class="size-full wp-image-6522" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5800ca3d363ce96781b_d8cbc627_5-4.webp" alt="Image 5 - Inactive customers from the first store" width="2680" height="480" /> Image 5 - Inactive customers from the first store <span data-preserver-spaces="true">You can put as many filter conditions after this one; just make sure to separate them with the <code>AND</code> keyword.</span> <h2 id="sort"><span data-preserver-spaces="true">Sort Data</span></h2> <span data-preserver-spaces="true">Sorting is an essential part of every analysis. Maybe you want to sort users by their registration date, products by the expiration date, or movies by rating - the <code>ORDER BY</code> keyword has you covered.</span> <span data-preserver-spaces="true">Let's see how you can sort the customers by their respective ID:</span> <script src="https://gist.github.com/darioappsilon/7765b0e3512d2e3ab5e6d571793b4526.js"></script> <span data-preserver-spaces="true">The results are shown below:</span> <img class="size-full wp-image-6523" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d581c039958680caa76c_cff1f249_6-3.webp" alt="Image 6 - Customers sorted by the customer ID " width="2710" height="604" /> Image 6 - Customers sorted by the customer ID <span data-preserver-spaces="true">As you can see, sorting works in ascending order by default. Sometimes you want items ordered from highest to lowest (descending), so you'll need one additional keyword - <code>DESC</code>.</span> <span data-preserver-spaces="true">Here's how to do the same sorting but in descending order:</span> <script src="https://gist.github.com/darioappsilon/5dba76c34bf27c5d02b120e77783cc5e.js"></script> <span data-preserver-spaces="true">The results are shown in the image below:</span> <img class="size-full wp-image-6524" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d58288e64fe97f78b10a_9b1c400a_7-3.webp" alt="Image 7 - Customers sorted by customer ID in the descending order" width="2710" height="604" /> Image 7 - Customers sorted by customer ID in the descending order <span data-preserver-spaces="true">And that's all there is to data sorting.</span> <h2 id="match"><span data-preserver-spaces="true">Match Data</span></h2> <span data-preserver-spaces="true">Sometimes you don't know what exactly are you looking for, but you have a rough idea. For example, maybe you know that the name of the customer of interest starts with some letter (or a sequence of letters), but you're not quite sure.</span> <span data-preserver-spaces="true">That's where matching comes in. In SQL, matching is implemented with the <code>LIKE</code> keywords. There are multiple ways to do matching, but we'll cover only the basics.</span> <span data-preserver-spaces="true">For example, let's say you want to see only these customers whose first name starts with "An":</span> <script src="https://gist.github.com/darioappsilon/63c68c6304247ca57920add0a39e39ae.js"></script> <span data-preserver-spaces="true">The results are shown below:</span> <img class="size-full wp-image-6525" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d5836778869364eadd3e_524dc956_8-3.webp" alt="Image 8 - Customers whose first name starts with &quot;An&quot;" width="2680" height="528" /> Image 8 - Customers whose first name starts with "An" <span data-preserver-spaces="true">You can do matching on different parts of the variable. For example, let's say you want only those customers whose first name ends with "ne":</span> <script src="https://gist.github.com/darioappsilon/487342c656a17a0461eede59b7dbc6c0.js"></script> <span data-preserver-spaces="true">Here are the results:</span> <img class="size-full wp-image-6526" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d58314d9d20d749316ee_fb6c2d24_9-2.webp" alt="Image 9 - Customers whose first name ends with &quot;ne&quot;" width="2710" height="604" /> Image 9 - Customers whose first name ends with "ne" <span data-preserver-spaces="true">There are more advanced matching operations, like specifying the number of characters before and after, but that's beyond the scope for today.</span> <h2 id="join"><span data-preserver-spaces="true">Join and Group Data</span></h2> <span data-preserver-spaces="true">It's improbable that all of the data you need is stored in a single table. More often than not, you'll have to use joins to combine results from two or more tables. Luckily, that's easy to do with SQL.</span> <span data-preserver-spaces="true">There are many types of joins:</span> <ul><li><span data-preserver-spaces="true"><code>INNER JOIN</code> - returns the rows that have matching values in both tables</span></li><li><span data-preserver-spaces="true"><code>LEFT JOIN</code> - returns all rows from the left table and only the matched rows from the right table</span></li><li><span data-preserver-spaces="true"><code>RIGHT JOIN</code> - returns all rows from the right table and only the matched rows from the left table</span></li><li><span data-preserver-spaces="true"><code>FULL JOIN</code> - returns all rows when there's a match in either table</span></li></ul> <span data-preserver-spaces="true">Here's how you can use joins to combine </span><em><span data-preserver-spaces="true">customer</span></em><span data-preserver-spaces="true"> and </span><em><span data-preserver-spaces="true">payment</span></em><span data-preserver-spaces="true"> tables and extract the amount paid per transaction:</span> <script src="https://gist.github.com/darioappsilon/e4c50f2737b96bdbbc6eec548b44437f.js"></script> <span data-preserver-spaces="true">The results are shown below:</span> <img class="size-full wp-image-6527" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d584d48dce80f00e277d_bef75ea0_10-2.webp" alt="Image 10 - Amount paid per transaction by a customer" width="1040" height="610" /> Image 10 - Amount paid per transaction by a customer <span data-preserver-spaces="true">As you can see, there are multiple records for every customer. That's because a single record represents a single transaction, and a single customer can make multiple transactions.</span> <span data-preserver-spaces="true">If you want to find the sum of the amounts for every customer, you'll have to use the <code>GROUP BY</code> keyword and an aggregation function. Let's take a look at an example:</span> <script src="https://gist.github.com/darioappsilon/63ddbe0211074b844e2d0e74eb916ff4.js"></script> <span data-preserver-spaces="true">The results are shown below:</span> <img class="size-full wp-image-6528" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d584ee674add8ab7f41e_2234fcbd_11-2.webp" alt="Image 11 - Total amount per customer" width="946" height="610" /> Image 11 - Total amount per customer <span data-preserver-spaces="true">So what happened here? Put simply, you've made distinct groups from every first and last name (and assumed every customer has a unique name) and calculated the sum per group.</span> <span data-preserver-spaces="true">A couple of things can be improved. For example, right now we're returning all records completely unsorted. The following code snippet orders the rows by the total sum (descending) and keeps only the first five rows:</span> <script src="https://gist.github.com/darioappsilon/0030dfcfc34f3c3d68213ac1356e075a.js"></script> <span data-preserver-spaces="true">Here are the results:</span> <img class="size-full wp-image-6529" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7d585d48dce80f00e2997_1639a829_12-2.webp" alt="Image 12 - Top 5 customers by the amount spent" width="914" height="326" /> Image 12 - Top 5 customers by the amount spent <span data-preserver-spaces="true">You can still improve this result set. For example, let's say you want to combine the first and last name to a single column named </span><em><span data-preserver-spaces="true">customer_name</span></em><span data-preserver-spaces="true">, and you also want to rename the aggregated column to </span><em><span data-preserver-spaces="true">total</span></em><span data-preserver-spaces="true">:</span> <script src="https://gist.github.com/darioappsilon/a53d3cb5e3d1864157c4cff87494806f.js"></script> <span data-preserver-spaces="true">The results are shown below:</span> <img class="size-full wp-image-6530" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b395b4f32c7f090d661ef6_13-1.webp" alt="Image 13 - Top 5 customers by the amount spent (v2)" width="502" height="326" /> Image 13 - Top 5 customers by the amount spent (v2) <h2 id="conclusion"><span data-preserver-spaces="true">Conclusion</span></h2> <span data-preserver-spaces="true">Today you've learned the basics of SQL in the PostgreSQL environment. SQL is a broad topic, and you can (and should) always learn more. More intermediate and advanced-level guides are coming soon, so stay tuned to the Appsilon blog.</span> <span data-preserver-spaces="true">To summarize - do as much data filtering/aggregating in the database. It's a much faster approach than dragging the entire dataset(s) to the memory and performing the filtering there.</span> <h2><span data-preserver-spaces="true">Learn More</span></h2><ul><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/7-data-science-skills/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">7 Must-Have Skills to Get a Job as a Data Scientist</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-linear-regression/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Machine Learning with R: A Complete Guide to Linear Regression</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-logistic-regression/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">Machine Learning with R: A Complete Guide to Logistic Regression</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-rest-api/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">How to Make REST APIs with R: A Beginners Guide to Plumber</span></a></li><li><a class="editor-rtfLink" href="https://wordpress.appsilon.com/r-dplyr-tutorial/" target="_blank" rel="noopener noreferrer"><span data-preserver-spaces="true">How to Analyze Data with R: A Complete Beginner Guide to dplyr</span></a></li></ul> <a href="https://appsilon.com/careers/" target="_blank" rel="noopener noreferrer"><img class="size-large wp-image-5940 aligncenter" src="https://wordpress.appsilon.com/wp-content/uploads/2020/11/appsilon.hiring0-1024x576.jpg" alt="" width="1024" height="576" /></a> <p style="text-align: center;"><strong><span data-preserver-spaces="true">Appsilon is hiring for remote roles! See our </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Careers</span></strong></a><strong><span data-preserver-spaces="true"> page for all open positions, including </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#r-shiny-developer" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">R Shiny Developers</span></strong></a><strong><span data-preserver-spaces="true">, </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#fullstack-software-engineer-tech-lead" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Fullstack Engineers</span></strong></a><strong><span data-preserver-spaces="true">, </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#frontend-engineer" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Frontend Engineers</span></strong></a><strong><span data-preserver-spaces="true">, a </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#senior-infrastructure-engineer" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Senior Infrastructure Engineer</span></strong></a><strong><span data-preserver-spaces="true">, and a </span></strong><a class="editor-rtfLink" href="https://wordpress.appsilon.com/careers/#community-manager" target="_blank" rel="noopener noreferrer"><strong><span data-preserver-spaces="true">Community Manager</span></strong></a><strong><span data-preserver-spaces="true">. Join Appsilon and work on groundbreaking projects with the world's most influential Fortune 500 companies.</span></strong></p>

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
tutorials
data analytics
sql