Benefits of Model Serialization in ML

Estimated time:
time
min

The process of saving your model to use it later is called <b>serialization</b>. In this article, we’ll discuss the various benefits of machine learning model serialization. We use PyTorch in our machine learning projects so we’ll focus on this technology, but the key message applies to other technologies. In the end, we will compare different methods in a single table. Table of Contents <ul><li><a href="#mbaza">Mbaza case for model serialization</a></li><li><a href="#scenario">Typical scenarios for model serialization</a></li><li><a href="#simple">The simplest way to save a PyTorch model</a></li><li><a href="#torchscript">Exporting the model to TorchScript</a></li><li><a href="#onnx">Exporting model to ONNX format</a></li><li><a href="#comparison">PyTorch vs TorchScript vs ONNX</a></li></ul> <hr /> <h2 id="mbaza"><b>Mbaza case for model serialization</b></h2> While working on the <a href="https://appsilon.com/gabon-wildlife-ai-for-biodiversity-conservation/" target="_blank" rel="noopener">Mbaza project</a> we used serialization to address the following issues: <ol><li style="font-weight: 400;" aria-level="1">We were constrained by inference time: we wanted to use larger models, but they were too slow.</li><li style="font-weight: 400;" aria-level="1">The original Python environment needed for inference was very heavy.</li><li style="font-weight: 400;" aria-level="1">With each model we wanted to use, we needed to ship an inference environment matching the model.</li></ol> <blockquote>Need to manage your machine learning data? Check out <a href="https://appsilon.com/ml-data-versioning-with-dvc/" target="_blank" rel="noopener">ML data versioning with DVC</a>.</blockquote> Exporting our model to the <a href="https://onnx.ai/" target="_blank" rel="noopener">ONNX format</a> solved all of the above issues! <h2 id="scenario"><b>Typical scenarios for ML model serialization</b></h2> In most cases, after training the model, you want to be able to make an inference from it. Usually, this is not an immediate need but may be desired later on. And now depending on your use case, you might want to run the inference on: <ol><li style="font-weight: 400;" aria-level="1">The same machine you trained the model in the identical setup.</li><li style="font-weight: 400;" aria-level="1">The same machine configuration but without GPU, using only CPU.</li><li style="font-weight: 400;" aria-level="1">Different operating systems, e.g., the model trained on Linux, that has to run on windows setup.</li><li style="font-weight: 400;" aria-level="1">Some specialized devices like Raspberry Pi.</li></ol> The above list is not comprehensive, but you get the point. Often, you can pick the best model looking solely at numbers like: <ol><li style="font-weight: 400;" aria-level="1">Loss value.</li><li style="font-weight: 400;" aria-level="1">Chosen metrics.</li><li style="font-weight: 400;" aria-level="1">Execution time per sample/batch.</li><li style="font-weight: 400;" aria-level="1">Amount of memory required by the model.</li></ol> Before we continue, let’s clarify something. ‘Many cases’, ‘often’, or ‘usually’ doesn’t mean all cases, and sometimes you have to <i>see</i> the model’s results (looking at you, GANs 👀). On this occasion, you want to be able to compare results from different models, trained on different architectures, possibly in different hardware setups. Sometimes a well-designed system of configuration might solve this issue, but - and I know I sound like a broken record here - that’s not always the case. <h2 id="simple"><b>The simplest way to save a PyTorch model</b></h2> The <i>go-to</i> method to save your model with PyTorch is to call torch.save(model.state_dict(), PATH) on your model. By following the official PyTorch tutorial, you can save and load your model in the same environment with or without the GPU. It’s also easy to<a href="https://pytorch.org/tutorials/recipes/recipes/saving_and_loading_a_general_checkpoint.html" target="_blank" rel="noopener"> save/load the model for further training</a>. <blockquote>Learn how to maximize your data science projects using <a href="https://appsilon.com/pytorch-lightning-hydra-templates-in-machine-learning/" target="_blank" rel="noopener">templates with PyTorch Lightning &amp; Hydra</a>.</blockquote> So what are the limitations of this approach? Well, you need the same environment for model training and inference. Sometimes you might be lucky enough that the same model code will work under various PyTorch versions, but it’s not guaranteed. Often your code might depend on some additional libraries that may be poorly written and break the API in minor versions. If you switch the environment to different OS problems are likely to get worse. <blockquote>Ready to publish your APIs, Jupyter Notebook, and Interactive Python content in one place? <a href="https://appsilon.com/how-to-deploy-rstudio-connect-into-local-kubernetes-cluster/" target="_blank" rel="noopener">Deploy RStudio Connect on a local Kubernetes cluster with our step-by-step guide</a>.</blockquote> On the plus side, we have to say that this is the easiest way to save the model and will work every time in a constant environment setup. <h2 id="torchscript"><b>Exporting the model to TorchScript</b></h2> Suppose that you’re interested in running the model’s inference in the future and will not retrain your model anymore. Then you might consider<a href="https://pytorch.org/tutorials/recipes/torchscript_inference.html" target="_blank" rel="noopener"> the TorchScript export option</a>. <h3>What is TorchScript?</h3> Creators of PyTorch developed a new language-agnostic TorchScript format for neural networks models serialization. It comes with a built-in, <b>just-in-time compilation</b>, and makes your model independent of particular python/PyTorch versions, not to mention other libraries. It comes with many advantages! For example, the model exported to TorchScript doesn’t require the original code to load. This can be very useful in certain cases. <h3>TorchScript influence on performance</h3> By taking advantage of the earlier mentioned just-in-time compilation, TorchScript models evaluate <b>faster</b> than raw PyTorch models. It’s worth mentioning that you can compile <b>only some parts of your code</b> with jit to make them faster while being able to fine-tune your model in general at the same time. Using additional libraries doesn’t only introduce more dependencies, it also boosts the environment size. By using TorchScript you can create a smaller environment (although you still have to install PyTorch) or even get rid of it all the way and <a href="https://pytorch.org/tutorials/advanced/cpp_export.html" target="_blank" rel="noopener">use only C++</a>!  <h3>TorchScript downsides</h3> TorchScript doesn’t allow you to fine-tune your models. Also, not every operation is supported by TorchScript yet. However, we<b> rarely observed a need to make edits in our code</b> to export to TorchScript. Usually calling scripted_model = torch.jit.script(model); scripted_model.save(PATH) is enough! <h2 id="onnx"><b>Exporting ML model to the ONNX format</b></h2> It seems like we solved most of our aforementioned problems, so why discuss ONNX? There's a good reason. So stay with me. It's what we used in the Mbaza project and you might find it helpful in yours. <h3>What is ONNX?</h3> ONNX stands for the Open Neural Network Exchange. It’s a single format developed to serve the interface role between different frameworks. You can train the model in PyTorch, Tensorflow, scikit-learn, Caffe2, xgboost, and many more, and export it into the ONNX format.  Regardless of the training, you will always be using <a href="https://onnxruntime.ai/" target="_blank" rel="noopener">ONNX Runtime</a> to do the inference. This means you<b> don’t even need a PyTorch</b> to run your PyTorch models! And, if you have an ONNX Runtime written in different technology like C++, JS, C#, or java, you may not need Python altogether! There are a lot of cases when the ONNX format comes in handy. <h3>ONNX influence on performance</h3> This all sounds too good to be true. How does it affect speed?  We typically saw an<b> increase of over 50% in the model speed performance</b> when compared to raw PyTorch with ONNX being much faster than TorchScript! Remember, the sole PyTorch package weighs around 700MB-1GB compressed (depending on the version, and architecture). The minimal environment with python 3.9 and PyTorch 1.11 weighs 1.7GB, while the minimal environment with python 3.9 and onnxruntime 1.11 weighs 270MB. That makes it <b>over 6 times smaller</b>. <h3>ONNX downsides</h3> So what’s the catch? The set of supported operations in ONNX is even more restricted than in TorchScript. More code changes may be required to make it work with ONNX. But believe me, it’s worth it if you move to production. Exporting with ONNX is a bit trickier than with TorchScript. This is because exporting to ONNX requires you to provide the example input to the network and its name. But don’t worry, there is a top-notch tutorial in<a href="https://pytorch.org/tutorials/advanced/super_resolution_with_onnxruntime.html" target="_blank" rel="noopener"> the official PyTorch documentation</a>. <h2 id="comparison"><b>Final comparison - PyTorch vs TorchScript vs ONNX</b></h2> We can conclude the above discussions in the following table: <img class="size-full wp-image-14348 aligncenter" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01dd9e9c00d5eac12f4ff_Comparison-Table-List-Infographic-Gantt-Chart-Graph.webp" alt="ML model format export comparison between PyTorch, TorchScript, and ONNX" width="1024" height="768" /> When we compare all export methods we see that only exporting with raw PyTorch by saving the optimizer dict allows fine-tuning later. For inference, it’s best to use the ONNX format as it’s easily runnable on various hardware and OSes. In case it’s hard to adjust your code to ONNX format, you might want to consider TorchScript. It is very easy to switch versions of TorchScript and ONNX models as import doesn’t require the original model’s code. <h2>Model serialization in machine learning - summary</h2> I hope that this post helped you to see the differences between ways of model serialization in PyTorch, potential problems, and how to deal with them. I’m sure that after this warm start it’ll be much easier to serialize your model! Are you having trouble with your model? Want to collaborate on a <a href="https://appsilon.com/data-for-good/" target="_blank" rel="noopener">Data for Good</a> project? Reach out to Appsilon’s <a href="https://appsilon.com/ai-research/" target="_blank" rel="noopener">AI & Research team</a> to see how we can streamline your development, enhance project management, and help you develop innovative solutions!

Contact us!
Damian's Avatar
Damian Rodziewicz
Head of Sales
PyTorch
data science
ai&research