AI-Generated Proteins Achieve 84% Expression Success in the Lab

  • MP4 reached an 84% lab expression rate, vastly surpassing the 20–30% seen with non-AI methods.
  • Development of MP5 is underway, aiming to expand molecular programming capabilities beyond proteins.
Image

MP4: A Generalized Protein Design Foundation Model

Our transformer-based foundation model uses both textual descriptions and quantitative variables to generate de novo protein sequences, enabling the creation of entirely new proteins and redesigns of existing ones. Trained in Q3 2024 on 3.2 billion data points across 70 different tasks and 3,800 AMD-MI250 GPU-days.

Recently, we released thousands of de novo sequences, including 5,300 new enzymes across 1,900 functional classes.

From Design to Reality: 84% Success Rate in the Lab

We selected 96 AI-designed sequences and sent them to lab for a variety of tests. Out of these, 2 could not be cloned, leaving 94 for the final round. Of those, 79 successfully expressed in the lab — that is, they can be physically produced by ribosomes and remain soluble proteins, confirming they can be created as intended outside of a digital model. In the field, expression rates for computationally designed diverse and de novo proteins would be expected to fall between 20–30%, making our 84% success rate a notable achievement. Detailed whitepaper is here and you can explore the 96 test sequences in our MP4 expression repository.

Image
The numbers indicate the level of expression based on Adaptyv Bio's metrics, where higher values correspond to greater expression levels. The names represent the sequence IDs from our repository, and you can track detailed information for each sequence here.

Post Training Validation

The 96 sequences sent to the lab were selected from a repository of 1000+ novel sequences we published in September. The majority of these sequences demonstrate high pLDDT — a measure of structural foldability — and also score high in both sequence novelty and predicted function match. Their viability was stringently tested with a combination of computational methods as outlined in our MP4 white-paper.

Image
The X-axis represents the degree of sequence uniqueness and novelty compared to a database of protein sequences, while the Y-axis indicates the expression level.

What Is Next?

We’ve launched additional lab tests, thermostability, scale up, and crystallography, with results to be published soon. Meanwhile, we’re developing MP5 — a more versatile and powerful molecule programming model .

We extend our thanks to the AMD MI250 GPUs team for their support in the training process and to Adaptyv Bio for the lab support.