Skip to main content

Structure Prediction

Folding is the problem of taking a sequence (string) and predicting the structure (3D coordinates). Also called structure prediction, folding usually applies to proteins, but RNA folding is also studied. Protein structures are useful for understand how a protein functions and providing a starting point for making changes to improve the function. Because experimental structure determination using cryo-EM or crystallography are slow, expensive, and sometimes impossible, computational structure prediction can be very valuble.

Fold ‘DIHICGICKQQFNNLDAFVAHKQSGSQ’

Protein

AlphaFold2

AlphaFold (aka AlphaFold2, AF2) is a transformer-based AI model for predicting the 3D structure of proteins from sequence. The gold-standard for accuracy (while the newest version, AlphaFold3, may be more accurate, it is not as freely available for use).

ESMFold

ESMFold is an ML model that is a variation of AlphaFold2 that offers higher speed.

Inputs

Protein sequence: A string of the single letter amino acid sequence. Common 20 amino acids only. For single chain (monomeric) proteins.

Outputs

PDB file: File containing the predicted 3D coordinates of each residue in the input sequence.
pLDDT: Specified in the b-factor column of the PDB file. A per residue confidence score between 0 and 100 with higher being better.

Example Scripts

Fold A0A7S7MT40

Fold 'GIGDPVTCLKSGAICHPVFCPRRYKQIGTCGLPGTKCCKKP'

Defensin

Nucleic Acids

AlphaFold3

AlphaFold3 (aka AF3) is a newer version of AF2 using diffusion for predicting the 3D structure of proteins + nucleic acids + small molecules + post-translational modification. It is capable of multi-chains and therefore is also a cofolding or docking method.

RoseTTAFold All-Atom

RFAA is a variation of RoseTTAFold for predicting the 3D structure of proteins + nucleic acids + small molecules + post-translational modification. It is capable of multi-chains and therefore is also a cofolding or docking method.

RoseTTAFoldNA

RoseTTAFoldNA is a variation of RoseTTAFold for AI structure prediction of protein + DNA + RNA. Capable of multi-chains and therefore also a cofolding or protein-nucleic acid docking method.

DeepFoldRNA

DeepFoldRNA is transformer-based AI model for the prediction of RNA 3D structure. Only for RNA.

How to Evaluate Folding Results

  • pLDDT: Predicted local distance difference test (pLDDT) is a per residue confidence score. 0-50: very low (correlated with disorder/flexibility), 50-70: low, 70-90: high, 90-100: very high (highly structured/stable).
  • PAE: Predicted aligned error (PAE) is a residue pair confidence. 0-5 angstroms (low): The relative position between the 2 residues is known (they move together). 20+ anstroms (high): The relative position is not known (they move independently of each other)

Integration with Other Tools

Docking

Folding typically refers to predicting the structure of a single chain protein. The prediction of multiple chain proteins or proteins with other molecules (like small molecules) is called docking or cofolding.