DiffDock for drug discovery

good, bad, and ugly

DiffDock for drug discovery: good, bad, and ugly

Molecular docking plays a crucial role in the initial computational steps of the in silico drug design pipeline (read more here). As many drugs consist of small molecules (ligands), accurately predicting their conformation and placement on the protein surface (binding) is essential for modifying the biological function of the target molecules.

Traditional docking methods utilize a scoring-based evaluation of calculated poses, employing empirical free energy scoring functions (though not representing actual thermodynamic energy). These functions reflect the binding affinity for each pose (atomic configuration of a ligand). 

Here, poses can be understood in two ways: 1) the global, superficial exploration of the ligand on different sites of the target molecule surface (when the location of the binding pocket is unknown), and 2) the local, precise investigation of potential ligand conformations in a specific region (when the location of the binding pocket is known). In both cases the scoring function assesses the pose quality or likelihood, and the search part of the algorithms stochastically modify the pose to find the global optimum (minimum) of the scoring function. 

Various tools employ different parameters and equations to calculate such scores. For instance, the world-famous program called AutoDock (Morris et al. 2009) applies a weighted sum of physiochemical terms such as Van der Waals and electrostatic interactions, hydrogen bonds, torsion strain, and others. However, since these scores mimic rather than represent actual binding energy, their reliability is imperfect and varies significantly depending on the tool used.

The authors propose a novel approach to molecular docking. Rather than treating it as a search problem, DiffDock (Corso et al. 2022) frames docking as a generative modeling approach for possible ligand poses. In other words, "given a ligand and target protein structure, DiffDock learns a distribution over ligand poses". The "Diff" in DiffDock refers to diffusion, which, in this context, can be seen as "the progressive refinement of random poses via updates of their translations, rotations, and torsion angles". Previous AI-based models (Stärk et al. 2022; Lu et al. 2022) for docking followed the regression approach to solve the biological problem. However, regression models tend to predict the mean of the distribution, potentially placing it in a region of low density. In scenarios like global symmetry in proteins (aleatoric uncertainty), regression models may fail to capture diverse poses. Additionally, even without strong aleatoric uncertainty, regression models can exhibit issues such as steric clashes in EquiBind's (Stärk et al. 2022) predictions and self-intersections in TANKBind's (Lu et al. 2022) predictions due to epistemic uncertainty.

Regression modelsGenerative models
EquiBind, TANKBindDiffDock
Predict the mean of the distribution, potentially in a region of low densitySample all true poses, overcoming uncertainty in correct poses
Have have steric clashes and numerous self-intersections even with moderate global symmetry (aleatoric uncertainty)More accurate and excels in challenging scenarios, such as handling global symmetry in proteins (aleatoric uncertainty)

Testing in real-life applications

Despite achieving a higher percentage of successful dockings compared to other tools, DiffDock's results still require additional verification before practical usage in scientific applications. Let’s take a look at how it works in action. I tested DiffDock with the well-known complexes involving aspirin, ibuprofen, and paracetamol with their respective targets. You can find the details and PDB IDs of these complexes in the “Drug design for beginners” blog post. Targets are highlighted in light gray, while the drugs in their original binding pose and pocket (as depicted in the experimental PDB structures) are presented in dark gray.

DiffDock -Testing in real-life applications

All the poses predicted by DiffDock are shown in color. DiffDock successfully predicts the position of ibuprofen, with all 40 predictions falling into the binding pocket in appropriate conformations. On the contrary, paracetamol docking results are bad: only a few predictions bind close to the native binding site, while the majority of poses are located outside the pocket. Aspirin falls in the middle, with the correct binding pocket but some translated and flipped ligand poses compared to the native one.

DiffDock -Testing in real-life applications

Overall, DiffDock should be used with caution. You can't really rely on every prediction it makes. For example, if the predicted poses look like those in the paracetamol case (spread around the surface), then the model is not confident with its results and you shouldn't trust them. However, the general trend shows that when all the poses are close to one another and placed at the same region on the target molecule surface (like in the ibuprofen example), you can trust the results more. Thanks to the authors of the model, we have a rough evaluation of DiffDock confidence scores in practical terms: you can treat the scores as good (as good prediction) when they are above 0, and from moderate to low when the scores are bad (the more negative, the less confident and the worse the prediction).

Architecture insights

The main idea behind the DiffDock is “Any ligand pose consistent with a seed conformation can be reached by a combination of (1) ligand translations, (2) ligand rotations, and (3) changes to torsion angles”. Hence, the authors construct the score model and the confidence model to take as input the current ligand pose and the protein structure in 3D space. Both models use SE(3)-equivariant convolutional networks over point clouds, but the score model operates on a coarse-grained representation of protein structures (C-alpha), while the confidence model utilizes an all-atom structure. The representation involves heterogeneous geometric graphs, language model embeddings, and convolution operations for translational, rotational, and torsional scores. The multiscale setup improves performance and speeds up the process compared to atomic-scale approaches. The architectural components include initial features, distance-based connections, and specific convolutional operations tailored to each model's requirements.

This architecture achieves a higher success rate in docking into both experimental and predicted target structures. The evaluation was made in terms of the percentage of predictions with a heavy-atom RMSD <2Å and the median heavy-atom RMSD between the predicted and the experimental ligand atoms. The experimental structures were obtained from 363 PDBBind (Wang et al. 2005) complexes from 2019 that were used for testing of the model. DiffDock outperforms state-of-the-art tools such as SMINA (Koes, Baumgartner, and Camacho 2013), QuickVina-Q (Alhossary et al. 2015), GLIDE (Yang et al. 2021), GNINA (McNutt et al. 2021), Autodock Vina (Eberhardt et al. 2021), EquiBind (Stärk et al. 2022), and TANKBind (Lu et al. 2022), demonstrating remarkably faster speeds (3 to 12 times faster than the best search-based method), but with only moderate improvement in accuracy. This speed is particularly valuable for applications like virtual screening and reverse screening for drug candidates or protein targets. 

The availability of Google Collab and 310 copilot notebooks makes running DiffDock convenient. As the input you provide a ligand in SMILES (Weininger 1988) notation, and a structure of your protein target in PDB format. Another strength lies in its compatibility with predicted structures, yielding similar accuracy results. DiffDock’s “ability to generalize to imperfect structures, even without retraining, can be attributed to a combination of (1) the robustness of the diffusion model to small perturbations in the backbone atoms, and (2) the fact that DiffDock does not use the exact position of side chains in the score model and is therefore forced to implicitly model their flexibility”. It highlights the importance of dynamics in understanding and predicting protein-ligand complexes, hence methods like molecular dynamics (MD) simulations or the new AI tools (Jing, Berger, and Jaakkola 2024) that are trying to replace it or speed it up are of paramount importance for the next (after AlpfaFold2 (Jumper et al. 2021)) leap in structural biology.

References

Alhossary, Amr, Stephanus Daniel Handoko, Yuguang Mu, and Chee-Keong Kwoh. 2015. “Fast, Accurate, and Reliable Molecular Docking with QuickVina 2.” Bioinformatics  31 (13): 2214–16. https://doi.org/10.1093/bioinformatics/btv082.

Corso, Gabriele, Hannes Stärk, Bowen Jing, Regina Barzilay, and Tommi Jaakkola. 2022. “DiffDock: Diffusion Steps, Twists, and Turns for Molecular Docking.” https://doi.org/10.48550/ARXIV.2210.01776. 

Eberhardt, Jerome, Diogo Santos-Martins, Andreas F. Tillack, and Stefano Forli. 2021. “AutoDock Vina 1.2.0: New Docking Methods, Expanded Force Field, and Python Bindings.” Journal of Chemical Information and Modeling 61 (8): 3891–98.

Jing, Bowen, Bonnie Berger, and Tommi Jaakkola. 2024. “AlphaFold Meets Flow Matching for Generating Protein Ensembles.” https://doi.org/10.48550/ARXIV.2402.04845.

Jumper, John, Richard Evans, Alexander Pritzel, Tim Green, Michael Figurnov, Olaf Ronneberger, Kathryn Tunyasuvunakool, et al. 2021. “Highly Accurate Protein Structure Prediction with AlphaFold.” Nature 596 (7873): 583–89. https://doi.org/10.1038/s41586-021-03819-2.

Koes, David Ryan, Matthew P. Baumgartner, and Carlos J. Camacho. 2013. “Lessons Learned in Empirical Scoring with Smina from the CSAR 2011 Benchmarking Exercise.” Journal of Chemical Information and Modeling 53 (8): 1893–1904. https://doi.org/10.1021/ci300604z.

Lu, Wei, Qifeng Wu, Jixian Zhang, Jiahua Rao, Chengtao Li, and Shuangjia Zheng. 2022. “TANKBind: Trigonometry-Aware Neural NetworKs for Drug-Protein Binding Structure Prediction.” bioRxiv. https://doi.org/10.1101/2022.06.06.495043.

McNutt, Andrew T., Paul Francoeur, Rishal Aggarwal, Tomohide Masuda, Rocco Meli, Matthew Ragoza, Jocelyn Sunseri, and David Ryan Koes. 2021. “GNINA 1.0: Molecular Docking with Deep Learning.” Journal of Cheminformatics 13 (1): 43. https://doi.org/10.1186/s13321-021-00522-2.

Morris, Garrett M., Ruth Huey, William Lindstrom, Michel F. Sanner, Richard K. Belew, David S. Goodsell, and Arthur J. Olson. 2009. “AutoDock4 and AutoDockTools4: Automated Docking with Selective Receptor Flexibility.” Journal of Computational Chemistry 30 (16): 2785–91. https://doi.org/10.1002/jcc.21256.

Stärk, Hannes, Octavian-Eugen Ganea, Lagnajit Pattanaik, Regina Barzilay, and Tommi Jaakkola. 2022. “EquiBind: Geometric Deep Learning for Drug Binding Structure Prediction.” https://doi.org/10.48550/ARXIV.2202.05146.

Wang, Renxiao, Xueliang Fang, Yipin Lu, Chao-Yie Yang, and Shaomeng Wang. 2005. “The PDBbind Database: Methodologies and Updates.” Journal of Medicinal Chemistry 48 (12): 4111–19. https://doi.org/10.1021/jm048957q.

Weininger, David. 1988. “SMILES, a Chemical Language and Information System. 1. Introduction to Methodology and Encoding Rules.” Journal of Chemical Information and Computer Sciences 28 (1): 31–36. https://doi.org/10.1021/ci00057a005.

Yang, Ying, Kun Yao, Matthew P. Repasky, Karl Leswing, Robert Abel, Brian K. Shoichet, and Steven V. Jerome. 2021. “Efficient Exploration of Chemical Space with Docking and Deep Learning.” Journal of Chemical Theory and Computation 17 (11): 7106–19. https://doi.org/10.1021/acs.jctc.1c00810.