Drug design for beginners

Eva Smorodina
December 6, 2023

Challenges in experimental protein structure determination

Biological macromolecules (proteins, for example) have complex 3D structures vital for their functions and interactions within living organisms. However, not all protein sequences have corresponding experimentally obtained structures. This limitation is connected to various challenges associated with the current structure determination methods such as X-ray, Cryo-EM, or NMR. All these approaches require significant expertise, resources, and sometimes luck to purify and solve the protein of interest, which not every laboratory or company can afford. Flexible molecules, like those with highly dynamic structures, pose a special problem due to the additional efforts to “tame the shrew” molecule for decryption. It can be even more frustrating when you need to obtain not just any structure, but a high-quality structure where individual atoms can be distinguished (high resolution), rather than just groups of residues or local elements (low resolution).

Artificial intelligence revolution in molecular modeling

To overcome these obstacles and try to avoid experimental structure determination (at least at the beginning of the research), scientists can apply molecular modeling tools. This field allows us to predict missing structures, refine existing ones to enhance their properties, and deepen our understanding of the intricate relationship between protein structure and function.

Thanks to the recent boom in the artificial intelligence (AI) field, we have plenty of tools that are able to provide 3D models based on a 1D input protein sequence. Probably today, everyone has heard the buzzword AI, but probably not many people really understand what it means. So, very briefly, AI is a class of computer algorithms that “mimic” human intelligence. Unlike traditional programs, AI can perform tasks that typically require human intelligence, such as making decisions, recognizing patterns, and learning from experience. It includes various techniques such as machine learning (ML) and deep learning (DL), that process a big amount of data and learn hidden patterns in it. ML and DL are the most popular types of AI applied to structural biology problems. ML algorithms learn from data and make predictions or decisions based on that learning. DL, in its turn, involves artificial neural networks with multiple layers, allowing models to learn more complex patterns in bigger datasets than ML in general. Before AI, there were still many computationalI tools that have been able to deal with biological challenges such as structure prediction. However, it was quite complicated to get a good protein model for each sequence. One of the most famous tools, called Rosetta (Leman et al., 2020), required numerous iterations with changing parameters to get a structure prediction of good quality. Now it has become much easier. One of the most famous tools, AlphaFold2 (Jumper et al., 2021), is conveniently wrapped into a Google Colaboratory (or just “Colab”) notebook that allows people, even without any computer science background, to just paste a sequence and get the desired structure. And the quality of the models increased drastically compared to non-AI analogs. You can find a relative comparison between the performance of non-AI and AI tools in the picture below (adapted from DeepMind’s AlphaFold post).

Ligand-binding site prediction for drug design

But getting the structure is not the end, it’s just the beginning. For example, one of the most appealing tasks in drug design is ligand-binding prediction. Many current medications such as Aspirin, Ibuprofen, or Paracetamol are, in fact, small molecule ligands (Beck et al., 2022). Ligands bind in a particular place on a target protein molecule (protein-ligand binding site) and modify protein function. Some of the well-known examples of such “interferences” are enzyme inhibition. Enzymes work through a cascade of molecular reactions that they catalyze, converting the initial “input” compound through a series of transformations to other “output” ones. These products can cause discomfort to humans, and if something can block the cascade reaction in the middle (before the “nasty” compounds appear), a person feels better. Aspirin works in such a way: it blocks the active site of a particular enzyme that is involved in the synthesis of compounds that mediate inflammation, pain, and fever in the body, preventing them from appearing. Therefore, the ability to predict ligand-binding sites as part of drug design has the utmost importance for human well-being.

References

Jumper, J., Evans, R., Pritzel, A. et al. Highly accurate protein structure prediction with AlphaFold. Nature (2021). https://doi.org/10.1038/s41586-021-03819-2

Leman, J.K., Weitzner, B.D., Lewis, S.M. et al. Macromolecular modeling and design in Rosetta: recent methods and frameworks. Nat Methods (2020). https://doi.org/10.1038/s41592-020-0848-2

Beck, H., Härter, M., Haß, B. et al. Small molecules and their impact in drug discovery: A perspective on the occasion of the 125th anniversary of the Bayer Chemical Research Laboratory. Drug Discovery Today (2022). https://doi.org/10.1016/j.drudis.2022.02.015