Top 10 Machine Learning Evaluation Metrics for Regression - Implemented in R

Reading time:
time
min
By:
Dario Radečić
September 12, 2023

It's one thing to train a machine learning model, but how can you know it's any good? That's where evaluation metrics come into play. Today, we bring you the top 10 machine learning evaluation metrics for the regression datasets, implemented from scratch in R. We'll go over the theory, mathematical formulas, and R code first, and then we'll train a machine learning model and use these metrics to evaluate its performance. Make sure to stay tuned to the <a href="https://appsilon.com/blog/" target="_blank" rel="noopener">Appsilon blog</a>, as we'll soon release a similar article regarding classification datasets. All of the metrics you'll see can be calculated by calling some R function or referring to a specialized package, but we'll implement everything from scratch. This way, you'll get a much better understanding of the topic. So without much ado, let's dive into the good stuff. <blockquote>Practical Machine Learning Use-Case - <a href="https://appsilon.com/yolo-counting-nests-antarctic-birds/" target="_blank" rel="noopener">Counting Nests of Shags with YOLO</a>.</blockquote> Table of contents: <ul><li><a href="#metrics">Machine Learning Evaluation Metrics for Regression - From Theory to Implementation</a></li><li><a href="#use-case">Using Regression Evaluation Metrics on a Real Dataset - A Machine Learning Use-Case</a></li><li><a href="#summary">Summing Up Machine Learning Evaluation Metrics for Regression</a></li></ul> <hr /> <h2 id="metrics">Machine Learning Evaluation Metrics for Regression - From Theory to Implementation</h2> This portion of the article will walk you through the top 10 machine learning evaluation metrics for regression, from background and theory to mathematical formulas and R code implementation. We'll start with the most obvious one - <i>Mean Absolute Error</i>. <h3>1. Mean Absolute Error (MAE)</h3> The <i>Mean Absolute Error</i> metrics measures the average absolute difference between predicted and actual values. You can calculate it by taking the absolute value of the difference between the predicted and actual values, and then taking the average across all samples. This metric is popular among data scientists and machine learning engineers because it's easy to interpret - it just tells you how off your regression model is on average. Take a look at the mathematical formula: <img class="size-full wp-image-18505" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec097ee4cc141789528_b7415b22_1-1.webp" alt="Image 1 - Mean Absolute Error formula" width="478" height="134" /> Image 1 - Mean Absolute Error formula It's not that difficult to replicate in R programming language, but if you get stuck, just copy our code snippet: <pre><code class="language-r">MAE &lt;- function(y, y_pred) {  mean(abs(y - y_pred)) }</code></pre> <h3>2. Mean Squared Error (MSE)</h3> Mean Squared Error is almost identical to Mean Absolute Error, but with one critical difference - it measures the squared difference between actual and predicted values instead of the absolute difference. Squaring penalizes larger errors more heavily than smaller errors, which is frequently the behavior you want when training machine learning models. Because of this reason, <b>MSE is the most common loss function used in machine learning</b>. Keep in mind that MSE returns the error in units squared, so it's a bit trickier to interpret. For example, if you're predicting housing prices in thousands of dollars, MSE will return an error in the unit of thousands of dollars squared. Taking a square root from the calculation is the best next step if you care for interpretability, and that's where the next metric comes in. Take a look at the mathematical formula: <img class="size-full wp-image-18507" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec008d47aa0b02f2b12_5df1e7a7_2-1.webp" alt="Image 2 - Mean Squared Error formula" width="498" height="134" /> Image 2 - Mean Squared Error formula If you want R implementation, here's one approach that works: <pre><code class="language-r">MSE &lt;- function(y, y_pred) {  mean((y - y_pred)^2) }</code></pre> <h3>3. Root Mean Squared Error (RMSE)</h3> The <i>Root Mean Squared Error</i> metric does everything MSE does, but it makes the error more interpretable by bringing the value to the original units. If you're predicting age, it makes no sense to interpret the error in pounds or kilograms squared, as it's just confusing. RMSE still penalizes larger errors like MSE but takes an additional step of bringing the error values back to the original unit. Take a look at the mathematical formula: <img class="size-full wp-image-18509" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec1b54b1c0441e91687_4e325a2b_3-1.webp" alt="Image 3 - Root Mean Squared Error formula" width="592" height="180" /> Image 3 - Root Mean Squared Error formula R implementation is also straightforward: <pre><code class="language-r">RMSE &lt;- function(y, y_pred) {  sqrt(mean((y - y_pred)^2)) }</code></pre> <h3>4. Root Mean Squared Log Error (RMSLE)</h3> The <i>Root Mean Squared Log Error</i> metric is similar to RMSE, but it takes the logarithm of the predicted and actual values before calculating the squared difference. It is used oftentimes when the target variable has a wide range of values. The RMSLE function first takes the natural logarithm of the predicted values and actual values and then calculates the squared differences between them. Once done, it calculates the RMSE based on the logarithmic difference. Take a look at the mathematical formula: <img class="size-full wp-image-18511" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec1d92a154296f79f31_bfe1f374_4-1.webp" alt="Image 4 - Root Mean Squared Error formula" width="998" height="180" /> Image 4 - Root Mean Squared Error formula R implementation is easy, just be careful with the brackets: <pre><code class="language-r">RMSLE &lt;- function(y, y_pred) {  sqrt(mean((log(y_pred + 1) - log(y + 1))^2)) }</code></pre> <h3>5. R-squared (R2)</h3> The <i>R-squared</i> metric, sometimes also called <i>Coefficient of Determination</i> measures how much variable in the target variable is explained by the machine learning model. The possible values range from 0 to 1, larger being better, and 1 meaning the line perfectly fits the data. R-squared is calculated by dividing the variance of the predicted values by the variance of actual values. Take a look at the mathematical formula: <img class="size-full wp-image-18513" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec27ee95b6c9e2b5646_1b9811f9_5-1.webp" alt="Image 5 - R-squared formula" width="584" height="121" /> Image 5 - R-squared formula If you prefer R over math, here's the code snippet for you: <pre><code class="language-r">R2 &lt;- function(y, y_pred) {  1 - sum((y - y_pred)^2) / sum((y - mean(y))^2) }</code></pre> <h3>6. Mean Absolute Percentage Error (MAPE)</h3> The <i>Mean Absolute Percentage Error</i> measures the average percentage difference between predicted and actual values. You can calculate MAPE by taking the absolute value of the difference between the predicted and actual values, and then divide these by the actual values, and finally take the average across all samples. It's quite a process, but you'll see that the implementation is mostly straightforward. The MAPE metric is oftentimes used when the target variable is expressed as a percentage and has a non-zero mean, so keep that in mind. Take a look at the mathematical formula: <img class="size-full wp-image-18515" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec212c4dca380d43c59_5233aeb8_6-1.webp" alt="Image 6 - Mean Absolute Percentage Error formula" width="708" height="134" /> Image 6 - Mean Absolute Percentage Error formula Implementation in R is close to effortless, especially if you consider all the steps described in the first paragraph: <pre><code class="language-r">MAPE &lt;- function(y, y_pred) {  mean(abs((y - y_pred) / y)) * 100 }</code></pre> <h3>7. Symmetric Mean Absolute Percentage Error (SMAPE)</h3> The <i>Symmetric Mean Absolute Percentage Error</i> is quite similar to MAPE, but it can handle zero values. You can calculate it by first taking the absolute value of the difference between the predicted and actual values, then dividing the sum of the predicted and actual values, and finally multiplying it by 2 and taking the average across all samples. Once again, it's quite a few steps, but the metric is useful for all cases in which you would use MAPE, and also when the target variable can be zero. Take a look at the mathematical formula: <img class="size-full wp-image-18517" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec364509da69a82cdc1_7f9d378c_7-1.webp" alt="Image 7 - Symmetric Mean Absolute Percentage Error formula" width="758" height="134" /> Image 7 - Symmetric Mean Absolute Percentage Error formula From-scratch implementation in R is quite straightforward if you understand the formula: <pre><code class="language-r">SMAPE &lt;- function(y, y_pred) {  mean(2 * abs(y - y_pred) / (abs(y) + abs(y_pred))) * 100 }</code></pre> <h3>8. Mean Directional Accuracy (MDA)</h3> The <i>Mean Directional Accuracy</i> metric is quite a unique one, as it measures the percentage of correct predictions in terms of the direction of change (up or down). You can calculate it by counting the number of times the predicted value has the same sign as the actual value divided by the total number of samples. Easy enough, but the formula might look scary at first: <img class="size-full wp-image-18519" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec4bb4e9308f9a44be4_5a0a98b9_8-1.webp" alt="Image 8 - Mean Directional Accuracy formula" width="1077" height="134" /> Image 8 - Mean Directional Accuracy formula R implementation also isn't the easiest to read, especially if you want to condense the code into a single line: <pre><code class="language-r">MDA &lt;- function(y, y_pred) {  mean(sign(y[2:length(y)] - y[1:(length(y) - 1)]) == sign(y_pred[2:length(y_pred)] - y[1:(length(y) - 1)])) }</code></pre> <h3>9. Median Absolute Error (MedAE)</h3> Let's now take a break from these complicated equations and cover a simple and intuitive metric - <i>Median Absolute Error</i>. It's almost identical to MAE, but it calculates the median over the mean. For this reason, MedAE is less sensitive to outliers. Take a look at the formula: <img class="size-full wp-image-18521" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec5ef0ba20db044e1e4_0a286688_9-1.webp" alt="Image 9 - Median Absolute Error formula" width="1076" height="50" /> Image 9 - Median Absolute Error formula Implementing MedAE in R requires almost no effort: <pre><code class="language-r">MedAE &lt;- function(y, y_pred) {  median(abs(y - y_pred)) }</code></pre> <h3>10. Explained Variance Score (EVS)</h3> And finally, let's discuss <i>Explained Variance Score</i>. This metric measures the proportion of variance in the target variable that is explained by the model. You can calculate EVS by taking the difference between the variance of the actual values and the variance of the residuals, and then divide by the variance of the actual values. The values for this metric range from 0 to 1, where 1 indicates a perfect fit to the data. Take a look at the formula: <img class="size-full wp-image-18523" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec697ee4cc141789a84_ef08d539_10-1.webp" alt="Image 10 - Explained Variance Score formula" width="504" height="117" /> Image 10 - Explained Variance Score formula R implementation is easy, as there's already a <code>var()</code> function built-in: <pre><code class="language-r">EVS &lt;- function(y, y_pred) {  1 - var(y - y_pred) / var(y) }</code></pre> <h2 id="use-case">Using Regression Evaluation Metrics on a Real Dataset - A Machine Learning Use-Case</h2> By now, you should be familiar with the machine learning evaluation metrics for regression we've described in the previous section. You don't have to be an expert, of course, just be aware of what the metric stands for and what's the possible value range. In this section, we'll shift focus on training a machine learning model from scratch and using the previously implemented regression metrics to evaluate the model. This is not a comprehensive section on machine learning. Our goal is to train the model as fast as possible so we can focus on evaluation using the regression metrics. For more in-depth guides on machine learning, check out <a href="https://appsilon.com/tag/machine-learning/" target="_blank" rel="noopener">our recent articles</a>. <h3>Dataset Loading and Preparation</h3> We'll keep things simple and use the well-known <a href="https://www.cs.toronto.edu/~delve/data/boston/bostonDetail.html" target="_blank" rel="noopener">Boston Housing Dataset</a>. You don't have to download it, as it's built into the R's <code>MASS</code> package. You might have to install the package, though. First, let's import the dataset and see what it looks like: <pre><code class="language-r">library(MASS) library(caret) <br>data(Boston) df &lt;- Boston head(df)</code></pre> Here are the first six rows, returned by the <code>head()</code> function: <img class="size-full wp-image-18525" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec6849aa4fb70c8ffee_ebd7b608_11-1.webp" alt="Image 11 - Head of the Boston Housing Prices dataset" width="1772" height="408" /> Image 11 - Head of the Boston Housing Prices dataset For simplicity's sake, we'll remove all rows that have any missing values. This should be a problem since the dataset is tailored toward machine learning newcomers: <pre><code class="language-r"># Remove all rows with missing values df &lt;- df[complete.cases(df), ]</code></pre> <h3>Train/Test Split</h3> The goal of a dataset split into training and testing subsets is to have a small portion of the data that was never seen by the predictive model. This way, we can monitor how the model behaves on previously unseen data. The standard procedure is to have 80% of the data in the training set, and 20% in the testing set. Here's how to implement that in R: <pre><code class="language-r">set.seed(123) trainIndex &lt;- createDataPartition(df$medv, p = 0.8, list = FALSE) train &lt;- df[trainIndex, ] test &lt;- df[-trainIndex, ] <br>dim(train) dim(test)</code></pre> The <code>dim()</code> function prints the dimensionality of a <code>data.frame</code> (number of rows, number of columns): <img class="size-full wp-image-18527" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec8f5697e4ce35e499d_0c908cc1_12-1.webp" alt="Image 12 - Dimensionality of training and testing sets" width="278" height="206" /> Image 12 - Dimensionality of training and testing sets <h3>Dataset Standardization</h3> Finally, let's talk about standardization. It's a step you shouldn't skip because most machine learning algorithms expect to have data on the same scale. For example, if one feature ranges from 1 to 10, and the other from 1000 to 10000, a machine learning model might think the second feature is more important. It obviously isn't, and standardization is here to tell the model that. We'll use the <code>caret</code> R package to center and scale the values in both sets, as implemented via the following snippet: <pre><code class="language-r"># Standardize the variables preProcValues &lt;- preProcess(train[, -ncol(train)], method = c("center", "scale")) train[, -ncol(train)] &lt;- predict(preProcValues, train[, -ncol(train)]) test[, -ncol(test)] &lt;- predict(preProcValues, test[, -ncol(test)])</code></pre> The last step is to ensure the target variable is numeric in both sets. You can do so by applying the <code>as.numeric()</code> function on the column: <pre><code class="language-r"># Ensure the target variable is numeric train$medv &lt;- as.numeric(train$medv) test$medv &lt;- as.numeric(test$medv)</code></pre> Let's check the structure of the training set to verify everything looks correct: <pre><code class="language-r">str(train)</code></pre> <img class="size-full wp-image-18529" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b7aec910baf85d764dc397_16835d36_13-1.webp" alt="Image 13 - Structure of the training set" width="1490" height="830" /> Image 13 - Structure of the training set All variables are numeric, and all predictors (not the target variable) are on the same scale. Let's continue by training a machine learning model. <h3>Training a Machine Learning Model</h3> R makes it incredibly easy to train a regression model. All we have to do is to call the <code>train()</code> function on our training dataset, and specify we want to predict <code>medv</code> based on all of the available features. The following code snippet trains the model and prints its summary: <pre><code class="language-r">model &lt;- train(medv ~ ., data = train, method = "lm") summary(model)</code></pre> <img class="size-full wp-image-18531" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f68946999252296b231_14-1.webp" alt="Image 14 - Machine learning model summary" width="1374" height="1618" /> Image 14 - Machine learning model summary If you see a star next to the feature name, it means the given feature is important for the model's predictive performance. <b>How important?</b> The number of stars determines that. The more stars the feature has, the lower the P-value, and hence the more impact the feature has on the overall prediction. We'll now use this model to make predictions on the test set: <pre><code class="language-r">predictions &lt;- predict(model, newdata = test) predictions</code></pre> <img class="size-full wp-image-18533" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f8933cc61f272760a40_15-1.webp" alt="Image 15 - Test set predictions" width="3060" height="874" /> Image 15 - Test set predictions These are the values we'll use to evaluate the model with our choice of 10 regression metrics. <h3>Machine Learning Evaluation Metrics for Regression in Action</h3> This is the part you've been waiting for. We'll now finally use our 10 R functions to calculate the values of regression metrics. One approach is to make a <code>data.frame</code> of regression metrics, just so we don't have to print each one manually. Here's the code snippet: <pre><code class="language-r">dfEvalMetrics &lt;- data.frame(  "MAE"   = MAE(y = test$medv, y_pred = predictions),  "MSE"   = MSE(y = test$medv, y_pred = predictions),  "RMSE"  = RMSE(y = test$medv, y_pred = predictions),  "RMSLE" = RMSLE(y = test$medv, y_pred = predictions),  "R2"    = R2(y = test$medv, y_pred = predictions),  "MAPE"  = MAPE(y = test$medv, y_pred = predictions),  "SMAPE" = SMAPE(y = test$medv, y_pred = predictions),  "MDA"   = MDA(y = test$medv, y_pred = predictions),  "MedAE" = MedAE(y = test$medv, y_pred = predictions),  "EVS"   = EVS(y = test$medv, y_pred = predictions) ) t(dfEvalMetrics)</code></pre> And here are the results: <img class="size-full wp-image-18535" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b29f8a04183d06cce5b07f_16-1.webp" alt="Image 16 - Values of the regression evaluation metrics" width="410" height="616" /> Image 16 - Values of the regression evaluation metrics On average, our predicted value is 4.6 units wrong when compared to the actual values (RMSE). In percentage terms, that's roughly 17% (MAPE). The regression model "line" covers approximately 75.7% of the variance from the data. <hr /> <h2 id="summary">Summing Up Machine Learning Evaluation Metrics for Regression</h2> And there you have it - the most comprehensive guide on machine learning evaluation metrics for regression in R. We've covered all of the metrics we work with daily, and explained them with theory, math, and R programming language implementation. We hope this article gave you a clear picture of how easy it is to implement these metrics from scratch, and that you've also gained a deeper understanding of the math and overall logic. <i>What are your favorite machine learning evaluation metrics for regression? Are there any we've left out?</i> Make sure to let us know in the comment section below. Or even better - reach out on Twitter - <a href="http://twitter.com/appsilon" target="_blank" rel="noopener">@appsilon</a>. We'd love to hear your feedback. <blockquote>Are you up for a challenge? <a href="https://appsilon.com/data-science-take-home-challenges/" target="_blank" rel="noopener">Here are 5 R-based take-home challenges for Data Scientists</a>. How many can you solve?</blockquote>

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts
ai&research