Structure Prediction
Folding is the problem of taking a sequence (string) and predicting the structure (3D coordinates). Also called structure prediction, folding usually applies to proteins, but RNA folding is also studied. Protein structures are useful for understand how a protein functions and providing a starting point for making changes to improve the function. Because experimental structure determination using cryo-EM or crystallography are slow, expensive, and sometimes impossible, computational structure prediction can be very valuble.
Fold ‘DIHICGICKQQFNNLDAFVAHKQSGSQ’
Protein
AlphaFold2
AlphaFold (aka AlphaFold2, AF2) is a transformer-based AI model for predicting the 3D structure of proteins from sequence. The gold-standard for accuracy (while the newest version, AlphaFold3, may be more accurate, it is not as freely available for use).
ESMFold
ESMFold is an ML model that is a variation of AlphaFold2 that offers higher speed.
Inputs
Protein sequence: A string of the single letter amino acid sequence. Common 20 amino acids only. For single chain (monomeric) proteins.
Outputs
PDB file: File containing the predicted 3D coordinates of each residue in the input sequence.
pLDDT: Specified in the b-factor column of the PDB file. A per residue confidence score between 0 and 100 with higher being better.
Example Scripts
Fold A0A7S7MT40
Fold 'GIGDPVTCLKSGAICHPVFCPRRYKQIGTCGLPGTKCCKKP'
Nucleic Acids
AlphaFold3
AlphaFold3 (aka AF3) is a newer version of AF2 using diffusion for predicting the 3D structure of proteins + nucleic acids + small molecules + post-translational modification. It is capable of multi-chains and therefore is also a cofolding or docking method.
RoseTTAFold All-Atom
RFAA is a variation of RoseTTAFold for predicting the 3D structure of proteins + nucleic acids + small molecules + post-translational modification. It is capable of multi-chains and therefore is also a cofolding or docking method.
RoseTTAFoldNA
RoseTTAFoldNA is a variation of RoseTTAFold for AI structure prediction of protein + DNA + RNA. Capable of multi-chains and therefore also a cofolding or protein-nucleic acid docking method.
DeepFoldRNA
DeepFoldRNA is transformer-based AI model for the prediction of RNA 3D structure. Only for RNA.
How to Evaluate Folding Results
- pLDDT: Predicted local distance difference test (pLDDT) is a per residue confidence score. 0-50: very low (correlated with disorder/flexibility), 50-70: low, 70-90: high, 90-100: very high (highly structured/stable).
- PAE: Predicted aligned error (PAE) is a residue pair confidence. 0-5 angstroms (low): The relative position between the 2 residues is known (they move together). 20+ anstroms (high): The relative position is not known (they move independently of each other)
Integration with Other Tools
Docking
Folding typically refers to predicting the structure of a single chain protein. The prediction of multiple chain proteins or proteins with other molecules (like small molecules) is called docking or cofolding.