DiffDock, but better

The latest version of DiffDock, DiffDock-L (Corso et al. 2024), improves accuracy by up to 50% in blind docking compared to its predecessor and other tools. Overall, the enhancement is attributed to increased training data, larger model sizes, and novel synthetic data generation techniques. Similar to its previous iteration, it employs a two-step approach, utilizing a diffusion model to sample potential ligand poses followed by a confidence model to rank and score each pose. Traditional methods for evaluating machine learning (ML) docking techniques often overlook the aspect of generalization across different protein structures, which is crucial, especially for binding pockets, as they are highly conserved regions in proteins. Even proteins with low sequence similarity can share similar binding pockets, as demonstrated by two proteins with very different sequences (having only 22% amino acid sequence similarity) but with similar binding pockets. This also highlights the challenge that existing docking benchmarks often lack diversity in binding modes, making it difficult to assess the methods' generalization ability across the proteome, as most protein-ligand interactions are relatively similar due to their vital functions.

Recreated based on the original paper

To overcome the aforementioned issue, the authors examined the domain organization of proteins (as opposed to the previously used time separation of the train and test datasets). Protein domains are distinct, stable structural units that often carry similar roles across different proteins and share evolutionary origins. Using domains, Corso et al. applied a more thorough classification of protein-ligand complexes, which helped to ensure that proteins with the same protein domains were either only in the train or only in the test dataset. As a result, they proposed DockGen, which evaluates the generalization of ML models across protein domains, revealing the inadequacy of current ML-based docking methods in predicting binding poses for unseen pockets.

Docking, along with protein folding and other structural biology challenges, is often approached as an NP-hard combinatorial optimization problem: “The NP perspective suggests a useful insight into the problem: it is easier to check that a pose is good than to generate a good pose.” 

From this perspective, evaluating pose quality is simpler than generating good poses, meaning that self-training strategies where feedback from a discriminative model refines the generative model's exploration of conformation space should better generalize to protein domains lacking ground truth data for poses. In other words, the procedure can be divided into three steps: "(1) rolling out the steps of the generation process, (2) evaluating the success at the end, and (3) feeding back information from the end to the initial steps." Keeping this in mind, the authors proposed Confidence Bootstrapping to improve the diffusion sampling component using feedback from the confidence model. It improved the success rate from 10% to 24% on the DockGen benchmark.

Feel intrigued? Already today you can try DiffDock Web with the latest DiffDock-L!

Let’s compare the old and the new versions of DiffDock. At the upper panel you can find the results obtained from the old DiffDock (you can read more about them in our previous blog post). Predictions made by the newest DiffDock-L are shown at the bottom. We used the same 3 systems for comparison, but DiffDock-L by default outputs only 10 predictions compared to the previous 40. The changes can be also connected to the different resources for the prediction: we used Google Colab’s version of DiffDock, while DiffDock-L we run on Hugging Face. Nevertheless as targets we chose the same ibuprofen (PDB ID: 3P6H), aspirin (PDB ID: 1OXR), and paracetamol (PDB ID: 3PY4) complexes as before. We don’t see much differences in the aspirin and ibuprofen cases: predicted ligand poses distributed similarly at the native binding pocket. The pose conformations are very close, especially aspirin. However, paracetamol shows a wider range of binding places. DiffDock’s predictions are split between two binding pockets, while DiffDock-L put the ligand all over the target surface (the poses are spread over more than just 2 binding pockets). Another interesting observation is that the new binding pockets don't match the old ones: even though DiffDock-L predicts more binding places overall, the results don’t match with the DiffDock’s pockets. All these shows that you still need to beware of docking and don’t carelessly use the output for the next drug design steps without thoroughly checking your predictions beforehand.

References

Corso, Gabriele., Arthur Deng, Benjamin Fry, Nicholas Polizzi, Regina Barzilay, Tommi Jaakkola. 2024. “Deep Confident Steps to New Pockets: Strategies for Docking Generalization.”
https://arxiv.org/abs/2402.18396