Mol.E aims to design protein sequences directly from function
March 10, 2023
Natural proteins are one of the fundamental building blocks of living organisms. For decades, scientists have been engineering proteins to improve or even radically change their functions. Protein engineering, coupled with better understanding of human disease, has led to new therapeutics for the treatment of many diseases, including cancer.
In the past few years, the use of AI/ML for protein-related research has grown exponentially, thanks to the incredible results for protein structure prediction achieved by AlphaFold, developed by DeepMind, was publicized in 2021. The AlphaFold model was possible due to a combination of new algorithmic developments (namely Transformers) as well as the availability of hardware (namely TPUs).
Introducing Mol.E
Here, we report a preview of Mol.E, developed by 310.ai, a Transformer-based model trained on NVIDIA GPUs, that is the first step toward using ML to design protein sequences directly from function. Much current work is focused on protein sequences (to take advantage of large datasets) or on protein structure (to take advantage of AlphaFold results), so why do we focus on function? Well, the biology paradigm is that sequence encodes structure, and that structure dictates function. So, ultimately it’s the sequence and function relationship that is most impactful.
Our most accurate version of Mol.E to date uses 16 layers, 16 attention heads, vector dimension of 800, and feed forward dimension of 800. It’s trained on batches of 50 samples at 260 samples per second over 260M total. R&D work using 8 NVIDIA A10 Tensor Core GPUs allows us to compare up to 8 experiments in parallel to quickly learn and iterate. So far, we see sequence recovery (hard accuracy) of 64.1% (compared to ProGEN at 45%). If we compare the predicted structure of generated vs. target protein sequences, over 26.2-29.5% of our samples have a TM-Score > 0.5 (compared to Foldingdiff at 22.6%).
#Training Samples | Accuracy |
---|---|
52M | 50% |
100M | 60.7% |
260M | 64.1% |
One interaction at a time
While Mol.E has been tested on design of proteins for up to 99 properties at a time, there is still much more to be done. For now, the properties we’ve focused on are qualitative and apply to a single protein. We’re excited to be tackling protein-protein interactions next and setting the stage to create truly unique proteins that nature never dreamed of.
Find out more about us at https://310.ai/