PyTorch Lightning & Hydra - Templates in Machine Learning
Are you maximizing the benefit of templates for your machine learning or data science projects? At Appsilon, we’ve built numerous R Shiny dashboards and machine learning projects for data science teams at Fortune 500s. Over the years, we’ve recognized the value of templates for quickly building and, equally important, maintaining these projects. R Shiny applications start as quick Proof of Concepts (POCs) for clients. More often than not, these POCs transition into well-tested, easy-to-maintain, and scalable applications. <blockquote>Looking to achieve this and more with your Shiny applications? Try out our <a href="https://github.com/Appsilon/rhino">Rhino package</a> to create Shiny apps the Appsilon way!</blockquote> We know the importance of implementing templates, frameworks, and best practices from the get-go. But not everyone is as well-versed in the trials that follow. Sooner or later, Project Managers realize that more rigid structures for the project are needed as modules’ dependencies follow infamous spaghetti code design or tests for crucial parts of the software come up short. If you’re interested in trying out Shiny for your machine learning or data science projects, you can follow our <a href="https://appsilon.github.io/rhino/articles/how-to/migrate-app-to-rhino.html">guide to migrating Shiny apps to Rhino</a>. At Appsilon, we’re more than just R Shiny experts. We're data scientists, developers, machine learning engineers, analysts, and more. We use all available tools that help us solve the problem at hand. <blockquote>What is the YOLO Algorithm and YOLO Object Detection? Explore the <a href="https://appsilon.com/object-detection-yolo-algorithm/" target="_blank" rel="noopener">most popular guide to YOLO Object Detection!</a></blockquote> In this post, we’ll focus on a particular case study for maintaining and developing a rather ‘aged’ deep learning project written in PyTorch. We’ll show you what we struggled with and what helped us, in the hope that it helps you by showcasing the importance of templates. <hr /> <h2>Project background</h2> To give a little insight, we were (at least) the third owner of this code - with some parts dating back to 2016. In the deep learning world, 6 years might as well be an entire epoch. We ended up refactoring it to PyTorch Lightning and using the <a href="https://github.com/ashleve/lightning-hydra-template">lightning-hydra-template</a>. It worked wonders! But we’ll explain why later. So what exactly were the problems we faced? Because we joined late to the party, too many minor errors had been introduced and were caught too late; wasting our computing resources, time, and frankly - money. It was also important to copy the already trained model to the <i>backup</i> directory or risk losing the work. We had to manually mark it with a proper version tag! We already used<a href="https://appsilon.com/neptune-for-mlops/"> Neptune</a> to not only monitor training but also to keep track of the experiments in relation to code versions. But Neptune can only help you go so far. The configuration was stored in a .py file, a setup we inherited as part of the legacy code. It was an advanced, custom machine learning setup, including GANs. The case was based on a peculiar data type, and hence the code included numerous complicated conversion functions. And yet, none of them were automatically tested. The conclusion was clear: we have to refactor the code. <h2>Refactoring code to PyTorch Lightning</h2> It’s easy to say <i>refactor the code</i>, but where do we start? What do we follow? One obvious thing was that we want to use<a href="https://www.pytorchlightning.ai/"> PyTorch Lightning</a>, pl for short. We wanted to train our models on both single and multiple gpus, while being able to develop and test code locally on cpu. We knew PyTorch Lightning was capable of that and <i>much more</i>. <blockquote>Need to manage your machine learning data? Check out <a href="https://appsilon.com/ml-data-versioning-with-dvc/" target="_blank" rel="noopener">ML data versioning with DVC</a>.</blockquote> To start using pl you just make your main model class inherit from pl.LightningModule instead of nn.Module. The good news is that pl.LightningModule already inherits from nn.Module so all your old code is still compatible (super important!). During the process of rewriting into the PyTorch Lightning framework, we had to disentangle the code, extract clear validation and training loops, and take care of our datasets’ loading. All changes we had to make to adjust our code to PyTorch Lightning increased the readability of our code. <h2>Benefits of using PyTorch Lightning</h2> So now we have our model code rewritten. And it’s a big benefit on its own. But because we’ve used PyTorch Lightning we’ve gained additional benefits! The full list of them is hard to fit here so we’ll share <b>features that we found most useful</b>: <ol><li style="font-weight: 400;" aria-level="1">Once it started working, it worked <b>flawlessly on both cpu and gpu</b> with just a simple parameter switch.</li><li style="font-weight: 400;" aria-level="1">While debugging, <b>setting the option </b><b>detect_anomaly=True</b> was bliss. It was much easier to use than<a href="https://pytorch.org/docs/stable/autograd.html#anomaly-detection">the default PyTorch anomaly detection</a>, and allowed us to track down some nasty bugs.</li><li style="font-weight: 400;" aria-level="1">Running code for <b>a single epoch of training and validation </b>with fast_dev_run is exceptionally convenient.</li><li style="font-weight: 400;" aria-level="1">After the code works, it’s time to.. time it. A bunch of<a href="https://pytorch-lightning.readthedocs.io/en/stable/advanced/profiler.html">available profilers</a> allow <b>profiling your code with a single parameter</b> in code changed!</li><li style="font-weight: 400;" aria-level="1"><b>Neptune logger integration</b> worked out of the box.</li></ol> Last but not least, it’s always nice when output in the terminal looks enjoyable, it’s possible by the <b>rich library for displaying tables and progress bars</b>. <img class="size-full wp-image-13357" src="https://webflow-prod-assets.s3.amazonaws.com/6525256482c9e9a06c7a9d3c%2F65b01e078470194980450548_PyTorch-progress-bars.gif" alt="Pytorch progress bars indicating epoch and testing" width="958" height="542" /> Gif by <a href="https://devblog.pytorchlightning.ai/super-charged-progress-bars-with-rich-lightning-669653d6ab97" target="_blank" rel="noopener">PyTorch Lightning team</a> via Medium. <h2><b>Machine learning template</b></h2> So far we’ve touched on the topic of rewriting the core PyTorch code. It solves some of our aforementioned problems, but not all of them. To resolve these, we needed additional help. <blockquote>Is your data clean and ready for your pipeline? Learn how to use these <a href="https://appsilon.com/data-cleaning-in-r/" target="_blank" rel="noopener">2 R packages to clean and validate datasets</a>.</blockquote> After some research, we decided to try out<a href="https://github.com/ashleve/lightning-hydra-template"> lightning-hydra-template</a> - a GitHub project from user<a href="https://github.com/ashleve"> ashleve</a> with over 1.2k stars! This is a template for neural network projects in PyTorch that uses<a href="https://hydra.cc/"> Hydra</a> for managing experiment runs and configuration. By using this template, alongside Hydra, which we'll discuss next, we gained a clear structure to follow. Now, all our experiment scripts and notebooks are separated from the main model code. Some other features we appreciate are: <ol><li style="font-weight: 400;" aria-level="1">Already <b>prepared </b><b>.gitignore</b><b> file</b>. One might say it’s not a big deal but you often start adding __pycache__/*, .vscode, and so on to .gitignore, so why not use an already prepared version and later just fine-tune it?</li><li style="font-weight: 400;" aria-level="1"><b>Pre-commit hooks for git</b>. It is a good practice to format your code, sort imports, remove trailing whitespace, and so on before committing it. Usually, you <i>don’t want to waste your time</i> on setting git commit hooks, and then the whole project quality drops. Why not do it in the very beginning?</li><li style="font-weight: 400;" aria-level="1"><b>File setup.cfg</b> that prepares us for using pytest.</li><li style="font-weight: 400;" aria-level="1"><b>Encouragement to use </b><b>.env</b> file to work with your environmental variables. They are the right place to store your API keys, you shouldn’t put them in the regular configs!</li></ol> So the last unexplained fragment is why we decided to use Hydra, and what it helped solve? <h2>Benefits of using Hydra for configuration and experiment running</h2> Using Hydra to manage your configuration is more complicated than storing configs on top of python files, as constants, or using json/yaml files. So what are the benefits? To name just a few: <ol><li style="font-weight: 400;" aria-level="1">You can organize your configs in a modular way. This means one file is responsible for storing your model configs, one for your path, one more for the logger, and so on. However, all configs are available from a single object in Python.</li><li style="font-weight: 400;" aria-level="1">The Hydra config object in Python is compatible with PyTorch Lightning and Neptune. With a single command, the whole config is attached remotely to experiment!</li><li style="font-weight: 400;" aria-level="1">Storing config in separate files has benefits of its own, but with Hydra you can override parameters from CLI, without needing to change files! Everything is properly stored in Neptune.</li></ol> Ok, but how will I know which parameters were used to train my model locally if I override config from the CLI? Hydra creates a different directory for each experiment run! The directory is named in %m-%d-%Y/%H-%M-%S fashion (and can be customized if need be) so it is always easy to find your experiment. In this directory, the final config is stored as well as files that you create during the experiment run, extremely convenient. The information on the corresponding Neptune experiment is also added to this directory. <h2>Benefits of templates for projects - closing notes</h2> Take it from a team that builds multiple projects every quarter - use templates when starting a new project. Good templates should make it easy to rapidly develop from the start, keeping good practices in the forethought. If it happens that your code loses quality, you better find some time and do the refactor before it’s too late! In our case using PyTorch Lightning and Hydra greatly improved our code readability, and maintainability, and allowed for the easy addition of tests to track the correctness. <blockquote>Ready to publish your R Markdown/APIs/Jupyter Notebook/interactive Python content in one place? <a href="https://appsilon.com/how-to-deploy-rstudio-connect-into-local-kubernetes-cluster/" target="_blank" rel="noopener">Deploy RStudio Connect on a local Kubernetes cluster with our step-by-step guide</a>.</blockquote> There are many tools, templates, and frameworks out there. So don’t feel that you have to do it one way. Find what works best for your team and adjust accordingly. Do you have a favorite tool that you find helpful for your machine learning or data science projects? Share it with us in the comments below!