eRum 2020: Appsilon Presentations On xspliner, fast.ai, and Writing Production-Ready R Code
As you may already know, the 2020 European R Users Meeting will be a virtual event. This year, Appsilon engineers Krystian Igras, Marcin Dubel, and Jędrzej Świeżewski will be giving virtual presentations on Friday, June 19th. Tune in to learn about xspliner, making production-ready R code, and using R for Machine Learning projects with fast.ai. You can view all talks at the eRum 2020 Hopin event here (make sure to bookmark this link for Friday).
The presentations are all on Friday, June 19th. Yes, two will happen in parallel – so you’ll have to choose!
- 10:15 CEST: Krystian Igras: Explaining Black-Box Models with Xspliner to Make Deliberate Business Decisions
- 11:50 CEST: Marcin Dubel: Tools and Patterns for Making Clean & Production-Ready R Code
- 11:50 CEST: Jędrzej Świeżewski, PhD: Fast.ai in R: Preserving Wildlife with Computer Vision
We hope to see you there! Please find abstracts for each talk below:
Krystian Igras: Explaining black-box models with xspliner to make deliberate business decisions
A vast majority of the state of the art ML algorithms are black boxes, meaning it is difficult to understand their inner workings. The more that algorithms are used as decision support systems in everyday life, the greater the necessity of understanding the underlying decision rules. This is important for many reasons, including regulatory issues as well as making sure that the model learned sensible features. You can achieve all that with the xspliner R package that I have created.
One of the most promising methods to explain models is building surrogate models. This can be achieved by inferring Partial Dependence Plot (PDP) curves from the black box model and building Generalized Linear Models based on these curves. The advantage of this approach is that it is model agnostic, which means you can use it regardless of what methods you used to create your model.
From this presentation, you will learn what PDP curves and GLMs are and how you can calculate them based on black box models. We will take a look at an interesting business use case in which we’ll find out whether the original black box model or the surrogate one is a better decision system for our needs. Finally, we will see an example of how you can explain your models using this approach with the xspliner package for R (available on CRAN!).
Marcin Dubel: Tools and Patterns for Making Clean & Production-Ready R Code
In this talk you’ll learn the tools and best practices for making clean, reproducible R code in a working environment ready to be shared and productionized. Save your team’s time for maintenance, adjusting, and struggling with packages.
R is a great tool for fast data analysis. Its simplicity in setup combined with powerful features and community support makes it a perfect language for many subject matter experts e.g. in finance or bioinformatics. Nevertheless what is often the case is that while the code is providing a great solution, the application or model is not easily distributed to other team members or outside the team.
Both Appsilon and I personally have taken part in many R projects for which the goal was to clean and organize the code as well as the project structure. Data science teams working for our clients have all the expert knowledge and skills required to deliver the value, but they are missing the programming experience required to provide mature, reproducible and production quality code.
We would like to share our approach, best practices and useful tools to share code shamelessly.
During the presentation I will show:
- setting up the development environment with **packrat**, **renv** and **docker**,
- organizing the project structure,
- the best practices in writing R code, automated with **linter**,
- sharing the code using git,
- organizing workflow with **drake**,
- optimizing the Shiny apps and data loading with **plumber** and **database**,
- preparing the tests and continuous integration **circle CI**.
Jędrzej Świeżewski, PhD: Fast.ai in R: Preserving Wildlife with Computer Vision
In this presentation, we will discuss using the latest techniques in computer vision as an important part of “AI for Good” efforts, namely, enhancing wildlife preservation. We will present how to make use of the latest technical advancements in an R setup even if they are originally implemented in Python.
A topic rightfully receiving growing attention among Machine Learning researchers and practitioners is how to make good use of the power obtained with the advancement of the tools. One of the avenues in these efforts is assisting wildlife conservation by employing computer vision in making observations of wildlife much more effective. We will discuss several of such efforts during the talk.
One of the very promising frameworks for computer vision developed recently is the Fast.ai wrapper of PyTorch, a Python framework used for computer vision among other things. While it incorporates the latest theoretical developments in the field (such as one cycle policy training) it provides an easy to use framework allowing a much wider audience to benefit from the tools, such as AI for Good initiatives run by people who are not formally trained in Machine Learning.
During the presentation we will show how to make use of a model trained using the Python’s fastai library within an R workflow with the use of the reticulate package. We will focus on use cases concerning classifying species of African wildlife based on images from camera traps.
If you’re looking for some R tutorials, try these out: