xspliner: An R Package to Build Explainable Surrogate ML Models
<em>This talk was presented virtually at eRum 2020 by <a href="https://wordpress.appsilon.com">Appsilon</a> engineer Krystian Igras. <a href="https://youtu.be/_cQQBuU4jm8">Here</a> is a direct link to the video.</em>
<h3>Why Should We Explain Black Box ML Models?</h3>
A vast majority of state-of-the-art ML algorithms are black boxes, meaning it is difficult to understand their inner workings. The more that algorithms are used as decision support systems in everyday life, the greater the necessity of understanding the underlying decision rules. This is important for many reasons, including regulatory issues as well as making sure that the model has learned sensible features. For instance, it might be that a particular ML algorithm discriminates against a minority group for an arbitrary reason. It is difficult to catch this sort of problem if your model is a black box. I have created an R package (xspliner) that helps create explainable surrogate models to better understand black box ML algorithms.
<img class="wp-image-4830 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0220e1f8aab1bceb8d9b5_Krystian_eRum_appsilon_xspliner_10.webp" alt="xspliner pdp" width="1920" height="1080" /> <em>Marginal response and PDP curves</em>
One of the most promising methods to explain black box ML models is to build an explainable surrogate model. This can be achieved by inferring Partial Dependence Plot (PDP) curves from the black box model and building Generalized Linear Models based on these curves. The advantage of this approach is that it is model agnostic, which means you can use it regardless of what methods you used to create your model.
<img class="wp-image-4833 size-full" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b0220f10fa9efb4ccc1958_Krystian_eRum_appsilon_xspliner0.webp" alt="glm model xspliner" width="1920" height="1080" /> <em>Construction of Generalized Linear Model with spline-based approximated PDP transformations</em>
In this presentation, you will learn what PDP curves and GLMs are and how you can calculate them based on black box models. I'll also show you a custom visualization of how PDP curves are constructed. We will then take a look at a credit-scoring use case in which we take the GBM Model and treat it as a surrogate to create an explainable GLM Model. Finally, the new model is used to create a user-friendly credit scoring tool that also allows the creditor to receive a detailed report summing up the final decision whether to grant credit or not. Want to use xspliner? It is available on CRAN!
<h3>Learn More</h3><ul><li>Want to learn how to create a computer vision model within an R environment? Watch Jędrzej Świeżewski's eRum/useR <a href="https://appsilon.com/fast-ai-in-r/">presentation on fast.ai in R</a>.</li><li>Want to learn how to write high-quality, production-ready R code? See Marcin Dubel's eRum/useR presentation on Production-Ready R Code <a href="https://youtu.be/U1-j7c_8LFQ">here</a>.</li><li>Video Tutorial: <a href="https://appsilon.com/video-tutorial-create-and-customize-a-simple-shiny-dashboard/">How to Create and Customize a Simple Shiny Dashboard</a></li><li>Find more Appsilon Data Science tutorials <a href="https://appsilon.com/tag/tutorials/">here</a>.</li></ul>
Does your company need help with enterprise data analytics, machine learning, or Shiny dashboards? Reach out to us at hello@wordpress.appsilon.com.