3 Ways to Enhance AI’s Role in Drug Discovery

Reading time:

time

min

pharma

By:

Jędrzej Świeżewski, Ph.D.

December 13, 2024

AI is transforming drug discovery and research, delivering results with increased speed, accuracy, and reliability. By applying the right methodological and technical insight to refine existing industry models, we can build on established methods and achieve results once thought impossible.

Explore how AI accelerates protein crystal detection in drug development. Read the whitepaper: AI in Drug Development: Fast and Accurate Protein Crystal Detection.

As “right” is a subjective term, let’s first explain what it means using some examples of recent collaborations that focus on the AI and ML pipelines for protein crystallization screening.

Watch the Presentation

Protein crystallization screening is a critical yet time-consuming step of the drug development process. Robots conduct experiments by preparing a protein, crystallizing it under various conditions (such as different pH levels or temperatures), and extracting the crystal for X-ray imaging and analysis. The identified structure of proteins discovered in this process is then used by scientists to design drugs that target it effectively.

Traditional methods relied heavily on the human expertise of crystallographers, but consistency has been a challenge and studies have shown that experts agree on crystallization results only 70-90% of the time, and even agree with their own judgment as low as 83% of the time.

It is an area that clearly demonstrates the successful application of AI and ML in drug development. I have recently worked on projects with partners that highlight three key aspects that can be applied to significantly enhance and improve precision and consistency.

These key aspects are:

Leverage industry standard tools and methods.
Make the whole process robust and reproducible.
Go beyond what is possible for humans.

Let’s present three examples to demonstrate their application.

Curious about how machine learning is transforming drug discovery? Check out our blog post on 5 Promising Applications of Machine Learning in Drug Discovery to learn more.

30% Reduction in Missed Crystals on the MARCO Benchmark

The most popular ML model in this space is called MARCO (Machine Recognition of Crystallization Outcomes), a collaboration launched several years ago by leading institutes and companies in the field. They gathered a large dataset, trained a machine-learning model, and open-sourced it. The MARCO model has been widely adopted by crystallization teams.

The widely recognized problem of the MARCO model is that it doesn’t generalize well to new datasets, which impacts its effectiveness outside its original scope. When we examined this model, we asked ourselves: “How can we improve on it?”

To assess whether improvements were possible, we benchmarked MARCO using the same training and test datasets it originally employed. Then, by refining the ML modeling techniques, we achieved a significant improvement in a key metric: crystal recall. This measures the percentage of accurately detected crystals.

Without increasing false positives, we reduced the rate of missed crystals by nearly one-third.

Next, we examined how MARCO performed on entirely new data from a different lab, a known challenge in the field. Crystallographers often report that MARCO’s performance diminishes with their unique datasets. Testing MARCO on a particularly challenging dataset, we found its accuracy was just 76%. To improve this, we adapted the neural network and trained the model with a purposefully small additional subset of challenging images - just 60 with crystals and 60 without.

With only these 120 additional images, our model’s accuracy improved by nearly ten percentage points.

Scaling up with larger datasets, we achieved even more substantial gains. With datasets containing merely two thousand images, our model’s accuracy climbed into the 90% range, far surpassing MARCO’s performance on the same data.

The takeaway? By applying the latest techniques and tailoring ML models appropriately, we can surpass the current state-of-the-art and significantly improve accuracy and the robustness of protein crystallization workflows. Importantly, huge local datasets are not needed to fine-tune the models to local needs. What is readily available is often enough, provided it is combined with an efficient modeling strategy.

Missed Crystals Reduced from 15% to Less Than 3% on Production Data

AstraZeneca wanted to extend the results achieved with MARCO. In collaboration with the University of York, they developed an improved internal model which seemed promising but the crystallographers weren’t confident in its reliability.

Appsilon partnered with AstraZeneca to work on identifying and addressing shortcomings in how their new model was built and assessed. By addressing these issues, we improved the robustness of the model and its evaluation process. Additionally, we implemented several technical enhancements to the machine learning pipeline.

The results were striking: AstraZeneca’s original model missed 15% of crystals in their production data. With our improved model, this rate dropped to less than 3%.

This dramatic improvement highlights the importance of robust methodologies, reproducible processes, and targeted technical advancements in delivering effective machine learning solutions. It also underscores how AI can significantly enhance outcomes when applied correctly.

Reproducibility is key to scaling AI in drug discovery. Dive into our article on Building Reproducible Data Pipelines to explore how they make a difference.

Patterns Detection Beyond Human Capability

The models discussed so far focus on brightfield imagery, which is the industry standard. However, oftentimes additional data is collected in the experiments such as UV light imagery and time-lapse of the data over multiple days.

In collaboration with a leading crystallization center, we’re developing a model that incorporates these additional data dimensions. Our aim is that the model will leverage this additional information to improve the detection of crystals even further.

Unlocking the potential hidden in the additional modalities of the data is expected to redefine detection capabilities, enabling discoveries previously thought impossible.

This highlights how AI can analyze data and images realms beyond human cognition to provide unparalleled insights.

Discover how ML enhances RNA-ligand binding predictions for drug discovery. Learn more about our collaboration with the International Institute of Molecular and Cell Biology in Warsaw (IIMCB).

Optimize AI and ML in Your Drug Discovery Processes

AI is now an essential tool in drug development research, transforming and improving discovery and results. By identifying and addressing issues and opportunities in the AI and ML pipeline, we can adapt the models to enhance accuracy, consistency, and efficiency.

Expert knowledge and experience are critical for understanding the strengths and limitations of these models, refining them to tackle specific challenges, and ensuring they deliver meaningful, actionable results.

Not sure how to apply AI models to your drug discovery pipelines?

Contact us to discuss how we can combine technical expertise with domain insight to unlock AI’s potential.

Would you like to learn more about how machine learning transforms protein crystal detection? Learn more about our model, Crystal Clear Vision.

Have questions or insights?

Engage with experts, share ideas and take your data journey to the next level!

Stop Struggling with Outdated Clinical Data Systems

Join pharma data leaders from Jazz Pharmaceuticals and Novo Nordisk in our live podcast episode as they share what really works when building modern, compliant Statistical Computing Environments (SCEs).

Save My Spot

Is Your Software GxP Compliant?

Download a checklist designed for clinical managers in data departments to make sure that software meets requirements for FDA and EMA submissions.

Get the Checklist

Ensure Your R and Python Code Meets FDA and EMA Standards

A comprehensive diagnosis of your R and Python software and computing environment compliance with actionable recommendations and areas for improvement.

Book the Audit