xspliner: An R Package to Build Explainable Surrogate ML Models

Reading time:

time

min

July 7, 2020

<em>This talk was presented virtually at eRum 2020 by <a href="https://wordpress.appsilon.com">Appsilon</a> engineer Krystian Igras. <a href="https://youtu.be/_cQQBuU4jm8">Here</a> is a direct link to the video.</em>
<h3>Why Should We Explain Black Box ML Models?</h3>
A vast majority of state-of-the-art ML algorithms are black boxes, meaning it is difficult to understand their inner workings. The more that algorithms are used as decision support systems in everyday life, the greater the necessity of understanding the underlying decision rules. This is important for many reasons, including regulatory issues as well as making sure that the model has learned sensible features. For instance, it might be that a particular ML algorithm discriminates against a minority group for an arbitrary reason. It is difficult to catch this sort of problem if your model is a black box. I have created an R package (xspliner) that helps create explainable surrogate models to better understand black box ML algorithms.

<img class="wp-image-4830 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0220e1f8aab1bceb8d9b5_Krystian_eRum_appsilon_xspliner_10.webp" alt="xspliner pdp" width="1920" height="1080" /> <em>Marginal response and PDP curves</em>

One of the most promising methods to explain black box ML models is to build an explainable surrogate model. This can be achieved by inferring Partial Dependence Plot (PDP) curves from the black box model and building Generalized Linear Models based on these curves. The advantage of this approach is that it is model agnostic, which means you can use it regardless of what methods you used to create your model.

<img class="wp-image-4833 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0220f10fa9efb4ccc1958_Krystian_eRum_appsilon_xspliner0.webp" alt="glm model xspliner" width="1920" height="1080" /> <em>Construction of Generalized Linear Model with spline-based approximated PDP transformations</em>

In this presentation, you will learn what PDP curves and GLMs are and how you can calculate them based on black box models. I'll also show you a custom visualization of how PDP curves are constructed. We will then take a look at a credit-scoring use case in which we take the GBM Model and treat it as a surrogate to create an explainable GLM Model. Finally, the new model is used to create a user-friendly credit scoring tool that also allows the creditor to receive a detailed report summing up the final decision whether to grant credit or not. Want to use xspliner? It is available on CRAN!

<h3>Learn More</h3><ul><li>Want to learn how to create a computer vision model within an R environment? Watch Jędrzej Świeżewski's eRum/useR <a href="https://appsilon.com/fast-ai-in-r/">presentation on fast.ai in R</a>.</li><li>Want to learn how to write high-quality, production-ready R code? See Marcin Dubel's eRum/useR presentation on Production-Ready R Code <a href="https://youtu.be/U1-j7c_8LFQ">here</a>.</li><li>Video Tutorial: <a href="https://appsilon.com/video-tutorial-create-and-customize-a-simple-shiny-dashboard/">How to Create and Customize a Simple Shiny Dashboard</a></li><li>Find more Appsilon Data Science tutorials <a href="https://appsilon.com/tag/tutorials/">here</a>.</li></ul>
Does your company need help with enterprise data analytics, machine learning, or Shiny dashboards? Reach out to us at hello@wordpress.appsilon.com.

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Get the Checklist

Ensure Your R and Python Code Meets FDA and EMA Standards

A comprehensive diagnosis of your R and Python software and computing environment compliance with actionable recommendations and areas for improvement.

Book the Audit

xspliner: An R Package to Build Explainable Surrogate ML Models

Have questions or insights?

Is Your Software GxP Compliant?

Ensure Your R and Python Code Meets FDA and EMA Standards

What’s New in Rhino 1.11.0: Devmode, Auto Tests, and a New Destructure Operator

A Quick Guide to Getting the Most Out of ShinyConf 2025

Visualizing 700,000 Cells: Appsilon's Dashboard Featured in Nature Biotechnology

Share Your Data Goals with Us

xspliner: An R Package to Build Explainable Surrogate ML Models

Have questions or insights?

Is Your Software GxP Compliant?

Ensure Your R and Python Code Meets FDA and EMA Standards

Read about similar topics

What’s New in Rhino 1.11.0: Devmode, Auto Tests, and a New Destructure Operator

A Quick Guide to Getting the Most Out of ShinyConf 2025

Visualizing 700,000 Cells: Appsilon's Dashboard Featured in Nature Biotechnology

Share Your Data Goals with Us