Protein Design
Protein design is the process of creating proteins with specific functions by manipulating the amino acid sequence. Scientists design proteins to create anti-cancer drugs (e.g. antibodies), gene editing medicines (e.g. Cas9), laundry detergents (e.g. amylases), digestible milk (e.g. lactases), and more. The terms protein engineering or optimization are usually for small changes on top of a natural starting point, while the term protein design usually makes more extensive changes on top of some starting point, and the term de novo protein design implies creation of a sequence from scratch (though may still have a known starting point).
Redesign A0A1L9RXX7 residues 40-60
Sequence Design
MPM3
MPM3 is a transformer-based AI model for molecule programming that can be used for redesign, diversification, or de novo (from scratch) design of protein sequences. Latest version is MPM4.
Redesign Inputs
Protein sequence: A single chain protein sequence with length less than 300 amino acids.
Residue range: A continuous range of residues to be redesigned. E.g. 50-60.
Function (optional): A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".
Temperature (optional): Lower introduces fewer changes, higher introduces more changes.
Redesign Outputs
Protein sequence: A modified single chain protein sequence with length equal to input length. Only the residue range specified is modified.
Redesign Example Scripts
Redesign Q06750 residues 31-52 at temperature 1.2, and show 3 results.
Diversify Inputs
Protein sequence: A single chain protein sequence with length less than 300 amino acids.
Function (optional): A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".
Temperature (optional): Lower introduces fewer changes, higher introduces more changes.
Diversify Outputs
Protein sequence: A modified single chain protein sequence. Length is usually equal to input length, though rare insertions or deletions are possible.
Diversify Example Scripts
Diversify Q7L266 with temperature 2.0
From Scratch Inputs
Function: A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".
From Scratch Outputs
Protein sequence: A single chain protein sequence. Length less than 300 amino acids.
From Scratch Example Scripts
Create a de novo hydrolase
ProtGPT2
ProtGPT2 is an AI transformer method for unconditional de novo protein sequence design
Inputs
none: Protein sequences are generated unconditionally, so no input is required.
Outputs
Protein sequence: A single chain protein sequence. Length is typically between 400 and 500 amino acids.
Example Scripts
Create 3 proteins
Structure Design
ProteinMPNN
ProteinMPNN is a GNN-based AI method for designing a protein sequence given a protein structure. This is called structure-based design or inversefolding.
Inputs
PDB file: A protein structure. Multiple chains are allowed.
Fixed residues: Specification of which parts of the structure to keep fixed (not designed).
Outputs
Protein sequence: A designed sequence of the same length as the input structure.
Example Scripts
Inversefold P00698 Inversefold P60568 and show 3 results
How to Evaluate Protein Design Results
While ultimately, protein design results must be evaluated experimentally, there are computational evaluations possible.
- Function and Property Prediction: When a good, independent computational predictor of the desired function or property is available, it can provide an excellent validation of designed protein sequences. Unfortunately, this is most often not available.
- Structural Examination: A combination of structure prediction and expert knowledge can be used to evaluate a function that is closely tied to protein structure. For example, locating a catalytic triad for an enzyme or preserving a known binding motif.
Integration with Other Tools
Folding
Folding designed sequences can be used to verify that it is foldable and retains the desired shape or motifs required for function.
Function Prediction
Computational tools that predict functions and properties from sequences can be used to evaluate designed proteins for their intended functions and characteristics, while also screening for undesirable traits.