MPM3
MPM3 is a transformer-based AI model for molecule programming that can be used for redesign, diversification, or de novo (from scratch) design of protein sequences. Latest version is MPM4. Protein design is the process of creating proteins with specific functions by manipulating the amino acid sequence. Scientists design proteins to create anti-cancer drugs (e.g. antibodies), gene editing medicines (e.g. Cas9), laundry detergents (e.g. amylases), digestible milk (e.g. lactases), and more. The terms protein engineering or optimization are usually for small changes on top of a natural starting point, while the term protein design usually makes more extensive changes on top of some starting point, and the term de novo protein design implies creation of a sequence from scratch (though may still have a known starting point).
Redesign A0A1L9RXX7 residues 40-60
Redesign Inputs
Protein sequence: A single chain protein sequence with length less than 300 amino acids.
Residue range: A continuous range of residues to be redesigned. E.g. 50-60.
Function (optional): A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".
Temperature (optional): Lower introduces fewer changes, higher introduces more changes.
Redesign Outputs
Protein sequence: A modified single chain protein sequence with length equal to input length. Only the residue range specified is modified.
Redesign Examples
Redesign Q06750 residues 31-52 at temperature 1.2, and show 3 results.
Diversify Inputs
Protein sequence: A single chain protein sequence with length less than 300 amino acids.
Function (optional): A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".
Temperature (optional): Lower introduces fewer changes, higher introduces more changes.
Diversify Outputs
Protein sequence: A modified single chain protein sequence. Length is usually equal to input length, though rare insertions or deletions are possible.
Diversify Examples
Diversify Q7L266 with temperature 2.0
From Scratch Inputs
Function: A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".
From Scratch Outputs
Protein sequence: A single chain protein sequence. Length less than 300 amino acids.
From Scratch Examples
Create a de novo hydrolase
Create an asparagine protein
Analyze Protein Design Results
While ultimately, protein design results must be evaluated experimentally, there are computational evaluations possible.
- Function and Property Prediction: When a good, independent computational predictor of the desired function or property is available, it can provide an excellent validation of designed protein sequences. Unfortunately, this is most often not available.
- Structural Examination: A combination of structure prediction and expert knowledge can be used to evaluate a function that is closely tied to protein structure. For example, locating a catalytic triad for an enzyme or preserving a known binding motif.