Skip to main content

Protein Design

Protein design is the process of creating proteins with specific functions by manipulating the amino acid sequence. Scientists design proteins to create anti-cancer drugs (e.g. antibodies), gene editing medicines (e.g. Cas9), laundry detergents (e.g. amylases), digestible milk (e.g. lactases), and more. The terms protein engineering or optimization are usually for small changes on top of a natural starting point, while the term protein design usually makes more extensive changes on top of some starting point, and the term de novo protein design implies creation of a sequence from scratch (though may still have a known starting point).

Redesign A0A1L9RXX7 residues 40-60

design1

Sequence Design

MPM3

MPM3 is a transformer-based AI model for molecule programming that can be used for redesign, diversification, or de novo (from scratch) design of protein sequences. Latest version is MPM4.

Redesign Inputs

Protein sequence: A single chain protein sequence with length less than 300 amino acids.

Residue range: A continuous range of residues to be redesigned. E.g. 50-60.

Function (optional): A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".

Temperature (optional): Lower introduces fewer changes, higher introduces more changes.

Redesign Outputs

Protein sequence: A modified single chain protein sequence with length equal to input length. Only the residue range specified is modified.

Redesign Example Scripts

Redesign Q06750 residues 31-52 at temperature 1.2, and show 3 results.

design2

Diversify Inputs

Protein sequence: A single chain protein sequence with length less than 300 amino acids.

Function (optional): A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".

Temperature (optional): Lower introduces fewer changes, higher introduces more changes.

Diversify Outputs

Protein sequence: A modified single chain protein sequence. Length is usually equal to input length, though rare insertions or deletions are possible.

Diversify Example Scripts

Diversify Q7L266 with temperature 2.0

design3

From Scratch Inputs

Function: A keyword describing a protein function. E.g. "leucine rich repeat" or "hydrolase".

From Scratch Outputs

Protein sequence: A single chain protein sequence. Length less than 300 amino acids.

From Scratch Example Scripts

Create a de novo hydrolase

design4

ProtGPT2

ProtGPT2 is an AI transformer method for unconditional de novo protein sequence design

Inputs

none: Protein sequences are generated unconditionally, so no input is required.

Outputs

Protein sequence: A single chain protein sequence. Length is typically between 400 and 500 amino acids.

Example Scripts

Create 3 proteins

design4

Structure Design

ProteinMPNN

ProteinMPNN is a GNN-based AI method for designing a protein sequence given a protein structure. This is called structure-based design or inversefolding.

Inputs

PDB file: A protein structure. Multiple chains are allowed.

Fixed residues: Specification of which parts of the structure to keep fixed (not designed).

Outputs

Protein sequence: A designed sequence of the same length as the input structure.

Example Scripts

Inversefold P00698 Inversefold P60568 and show 3 results

How to Evaluate Protein Design Results

While ultimately, protein design results must be evaluated experimentally, there are computational evaluations possible.

  • Function and Property Prediction: When a good, independent computational predictor of the desired function or property is available, it can provide an excellent validation of designed protein sequences. Unfortunately, this is most often not available.
  • Structural Examination: A combination of structure prediction and expert knowledge can be used to evaluate a function that is closely tied to protein structure. For example, locating a catalytic triad for an enzyme or preserving a known binding motif.

Integration with Other Tools

Folding

Folding designed sequences can be used to verify that it is foldable and retains the desired shape or motifs required for function.

Function Prediction

Computational tools that predict functions and properties from sequences can be used to evaluate designed proteins for their intended functions and characteristics, while also screening for undesirable traits.