Dexterous Drug Discovery

AI in Drug Development: Fast and Accurate Protein Crystal Detection

The process of drug discovery is notoriously costly and time-consuming. Protein crystallization is an established approach to finding complex protein targets. However, due to its complexity, it requires substantial computational resources and expertise. The latest advancements in AI offer solutions to these challenges.

astellas
Genmab
merck
johnson and johnson
World Health Organisation
Kenvue
Phuse
Pharmaverse
astellas
Genmab
merck
johnson and johnson
World Health Organisation
Kenvue
Phuse
Pharmaverse

Table of contents

Executive Summary

The process of drug discovery is notoriously costly and time-consuming. Protein crystallization is an established approach to finding complex protein targets. However, due to its complexity, it requires substantial computational resources and expertise.

A key strategy for mitigating the costs is emerging to be machine learning. The existing state-of-the-art model, MARCO, even on its own test data, misses almost 10% of crystals and demands extensive computational effort. Also, it is often discovered to behave poorly on local data. This inefficiency leads to missed opportunities in identifying potential drug candidates, directly impacting the speed and cost of bringing new medicines to market.

Below we present a brief overview of advancements achievable in this area, proving it with specific use cases.

State of the Art Review

Traditional approach

  1. Leveraging protein crystallization for drug discovery hinges on finding the right crystallization conditions for a given protein. This is often done with high throughput screening, e.g., using robotic setups.
  2. A key bottleneck of this process is that crystallization conditions for proteins of interest are hard to find. Hence, even if a given trial produces a crystal, for it to be later shipped for synchrotron analysis and utilized in the downstream process, it needs to be spotted by the crystallographer. When systematically measured [1-3], trained crystallographers agree on crystallization outcomes in roughly 80% (70-93%) of the observed images.
  3. Hence, while some crystals can be found this way, many go unnoticed, draining the resources and prolonging the costly screening efforts.

MARCO

  1. Introduced in 2018 [1], the MARCO model is an established state-of-the-art solution, improving on the above traditional approach, by leveraging machine learning to automate the inspection of crystallization outcomes.
  2. The method has been developed in a wide collaboration of academic and industry partners, with the hope of producing a universal solution to the problem.
  3. MARCO is still often used as the go-to method for trying to automate crystallization screening. Unfortunately, while a step forward, it comes short in the following areas:
    • Poor generalization to local data. When tested on data outside of the original dataset, MARCO’s performance drops to even 60% accuracy [4,5], making it unusable for practical applications.
    • High computational demand. Due to its architecture, the MARCO model is hard to fine-tune with local data. The retraining is also computationally expensive.
    • Rigid structure. The original model was developed to distinguish between four classes of outcomes only. In some business cases, this is not fine-grained enough, and the original model is hard to adapt to custom requirements.

Crystal Clear Vision: A Breakthrough in AI-Powered Detection

  1. Crystal Clear Vision is a powerful framework for tailored machine learning models, optimized for local data, fine-tuning, deployment and customization to specific needs (e.g., image modalities, time series patterns, varying crystallization outcome classes).
  2. We have proven its effectiveness on 3 different occasions, outperforming:
    • By 30% the MARCO model on the benchmark MARCO dataset [1]
    • By 68% the benchmark model on a public dataset [4]
    • By 81% the previous model used at Top10 Pharma (see below)
Testing crystal clear vision - chart

Insights from Domain Experts

Challenges and pain points:

It would be nice not to be obliged to inspect the plates manually because it is quite a boring process and irritating for the eyes. - Team lead at a CRO.

Protein crystals are integral in the way we do science. Getting a 2 Ångström resolution of drug-protein interaction goes a really long way in demonstrating a claim and saying that we can move on to an animal model because we have a really promising lead compound here. - Protein scientist at top10 Biopharma.

Adoption barriers:

People think that they are good at spotting protein crystals but we have proven that that is not the case.
- Professor at the University of York.

There are some people who are really adverse to any change. There are people who are threatened by new technology taking their jobs. Others will think that this will give them so much more time to actually do their job.
- Protein scientist at top10 Biopharma.

Impact of AI like Crystal Clear Vision:

You want to minimize the number of experiments that you do by learning something from previous experiments. Its important to save time and money and you do that by finding the crystals as soon as you possibly can.
- Professor at the University of York

It would be nice to have a solution that would allow you to scan plates faster and even in an efficient way.
- Team lead at a CRO

ROI and practicality:

If we had a solution for this problem, it would definitely speed up the work, especially if it could be combined with some analysis of the conditions that those crystals were identified, because some screens could be redundant.
- Team lead at a CRO

Typical Problems in Protein Crystallography

Comparison Table
Challenges in Protein Crystal Detection The Bright Future with AI-Driven Solution
High-throughput screening is a time consuming task, and due to its repetitiveness it is extremely difficult to maintain one's focus when screening thousands of images in a batch. With more robust outcome assessments, the condition searches can be shortened, and time-to-crystal shortened.
Lack of consistency in scoring the crystallization outcomes by crystallographers makes it hard to scale the searches over the space of conditions across many people. With unified approaches to scoring, results across crystallization attempts become comparable and hence cross-attempt insights can be drawn.
Peculiar crystallization setups or techniques are not accommodated by state-of-the-art solutions, e.g., MARCO is known to struggle with LCP data, modalities such as UV or SHG need to be manually inspected. Levering imagery beyond bright field in a single model, brings the knowledge hidden in them to the forefront, from the start.

Case Study: 5x Improvement Achieved
at Top 10 Pharma

Styled Box
We have partnered with a Top 10 Pharmaceutical company to assess the performance of our approach on the data from the company's crystallization laboratory. Together we have curated a dataset of particular interest to the partner as a benchmark for machine learning models. Our method (based on transformer architecture, careful weight adjustment and several generalization enhancing techniques) proved to significantly outperform the previously used model on this challenging benchmark. In particular, we improved:
  • Overall accuracy from 86% to 94%
  • Recall on crystals from 85% to 97%
The recall improvement is especially important, as it shows a 5-fold reduction in the likelihood of missing a crystal. The above result has been achieved solely based on single bright field images, so we are looking forward to incorporating other modalities in the process

Once you crystallize a protein - don’t let it go unnoticed!

Customize the AI-driven solution: from basic off-the-shelf MARCO, to tailored multi-modal solution working on your data, increasing your efficiency.

Can you spot the crystal? Our model did!


References, Acknowledgements

[1] Bruno AE, Charbonneau P, Newman J, Snell EH, So DR, et al. (2018) Classification of crystallization outcomes using deep convolutional neural networks. PLOS ONE 13(6): e0198883. https://doi.org/10.1371/journal.pone.0198883
[2] Wilson J. Automated Classification of Images from Crystallisation Experiments. In: Perner P, editor. Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. Springer Berlin Heidelberg;. p. 459–473.
[3] Snell EH, Luft JR, Potter SA, Lauricella AM, Gulde SM, Malkowski MG, et al. Establishing a training set through the visual analysis of crystallization trials. Part I: 50 000 images. Acta Cryst D. 2008;64(11):1123–1130.
[4] Rosa N, Watkins CJ, Newman J (2023) Moving beyond MARCO. PLOS ONE 18(3): e0283124. https://doi.org/10.1371/journal.pone.0283124
[5] Milne J, Qian C, Hargreaves D, Wang Y, Wilson J (2023) Not getting in too deep: A practical deep learning approach to routine crystallisation image classification. PLOS ONE 18(3): e0282562. https://doi.org/10.1371/journal.pone.0282562
[6] We thank the numerous crystallographers, data scientists, researchers and directors for their work in the field, trust, shared opinions and data.
Explore Possibilities

Share Your Data Goals with Us

From advanced analytics to platform development and pharma consulting, we craft solutions tailored to your needs.

Talk to our Experts