Resources

When AI Meets Structural Biology: Unlocking Protein Secrets to Improve Human Health

Article by :Sivasankar Putta, Amitabha Chaudhuri, Apurva Kalia

When AI Meets Structural Biology: Unlocking Protein Secrets to Improve Human Health

Proteins: Building blocks of life

Proteins are the cell’s nanomachines—the workhorses of biology. Present in every tissue, they are essential to life: they build cellular structures, catalyze energy production, regulate nearly every biological process, and protect the body from internal and external threats.

But what makes proteins truly fascinating are their structure. Every protein is composed of a string of amino acids folded into a unique three-dimensional structure, and this unique structure determines a protein’s function. Understanding these structures and how they organize magically from a string of amino acids has been at the heart of structural biology, since 1934, when the first X-ray diffraction image of a protein crystal was observed.

However, chemical composition of proteins came to be understood in late 1800 when Emil Fischer demonstrated how amino acids are chemically linked to each other through peptide bonds to form different length of polypeptide chains, ranging from 76 amino acids ubiquitin (smallest human protein) to over 27000 - 35,000 amino acids titin (longest human protein). Later, in the 1930s, Linus Pauling provided the first clues about protein structure by discovering the α-helix and β-sheet, two fundamental structures present in all proteins. The first three-dimensional protein structure of myoglobin was solved, followed by the tetrameric structure of hemoglobin (1).  

The four levels of protein structure are from amino acid sequence to three-dimensional structure.
Figure 1. The four levels of protein structure are from amino acid sequence to three-dimensional structure.

Seven Nobel prizes including the Nobel Prize in Chemistry in 1962 for solving the first structure of proteins myoglobin and hemoglobin went to John Kendrew and Max Perutz. Structural biology techniques became the workhorse for solving protein structures and has continued unabated over the last 50 years creating a repository of more than 240,000 experimentally determined and over a million computed protein structure models in the Protein Data Bank (PDB).

Figure 1 illustrates the hierarchical organization of protein structure across four levels. The primary structure is depicted as a linear chain of amino acids, represented by coloured spheres, showing the linear sequence that contains the code for higher-order folding. The secondary structure (1st level of organization) highlights local folding patterns, exemplified here by an alpha-helix stabilized through hydrogen bonds. The tertiary structure demonstrates the complete three-dimensional folding of a single polypeptide chain, combining elements such as alpha-helices and beta-sheets into a compact, functional unit. Finally, the quaternary structure shows the assembly of multiple folded polypeptide subunits, illustrated as intertwined purple and green domains, forming a functional multi-subunit protein complex. Together, these representations capture the progressive levels of protein organization—from a simple sequence to a biologically active structure.

The precise three-dimensional structure of a protein is stabilized by a combination of forces: strong covalent disulfide bonds and a variety of weaker non-covalent interactions, such as electrostatic interactions (salt bridges), hydrogen bonds, hydrophobic interactions, and the van der Waals forces (1).

These interactions are optimal when the amino acids are oriented and packed optimally around each other. When amino acids are replaced by other amino acids due to gene mutations, the optimality of packing, and the bonds holding a structure in place are destabilized (Figure 2).

Illustration shows how a critical point mutation can affect protein structure, ultimately impacting its function.
Figure 2. Illustration shows how a critical point mutation can affect protein structure, ultimately impacting its function.

The loss of protein structure alters its function, which are the cause of many human diseases.  Many examples of human diseases demonstrate the relationship between amino acid change with loss of protein function. Sickle cell disease, where a single amino acid change in hemoglobin (glutamine to valine at position-6) destabilizes its structure, causing red blood cells to deform and impair oxygen transport. In cystic fibrosis, a mutation, deletion of phenylalanine 508 (ΔF508) in the CFTR (Cystic Fibrosis Transmembrane Conductance Regulator) protein prevents proper folding, leading to misfolding, degradation, and defective chloride transport. Similarly, mutations in p53 (Y 220C, V143A, G245S, R175H, R249S) a key tumor suppressor, destabilize its structure and prevent it from regulating cell division, contributing to cancer.  These examples highlight that preserving structural integrity is essential to protein function and to health (2,3).

How Protein structure determination by structural biology approaches transformed medicine and human health.

The success stories of drug discovery over the last four decades are deeply connected to our ability to visualize and understand three-dimensional structures of proteins. From X-ray crystallography to cryo-electron microscopy and now artificial intelligence (AI), every major innovation has opened new doors to precision medicine through our understanding of protein structure and function.

X-Ray Crystallography: The foundation of rational drug Design.

X-ray crystallography derived protein structures provided the very first insights for rational drug design, leading to the development of classical drugs such as HIV protease inhibitors. In this method, X-rays diffracting through protein crystals produce patterns, which are deconvoluted to reconstruct the 3D atomic model structure of the protein.  The crystal structure of HIV protease enabled the rational design of the first HIV protease inhibitors (e.g., saquinavir) (Figure 3A), revolutionizing antiretroviral therapy. Structural insights identify functional pockets in proteins to which small molecules are designed to fit, thereby modulating protein function. Here, a precise fit between the binding pocket and the structure of the small molecule dictates specificity and prevents toxicity.  The design of many targeted drugs for protein kinases was guided by X-ray crystal structures of the protein–inhibitor complexes (4,5).

Electron Crystallography, X-ray Free Electron Lasers (XFEL) and Small-Angle X-ray Scattering (SAXS): capturing the dynamics.

In this method, Electron beams are used instead of X-rays to analyse 2D crystals of proteins. This approach is especially powerful for membrane proteins, which are difficult to crystallize in 3D. This approach enabled the elucidation of aquaporin water channels, providing insights that informed strategies for developing diuretics and treatments for kidney diseases.

XFEL allows the capture of ultrafast structural changes in proteins with femtosecond resolution. These studies have unfolded dynamic processes in systems such as photosystem II and drug targets like GPCRs, leading to enhanced drug optimization.

SAXS provides low-resolution structural information on macromolecules in solutions. SAXS has guided the optimization of monoclonal antibodies by studying aggregation and conformational stability (5).

Nuclear Magnetic Resonance (NMR) spectroscopy: Watching proteins in motion.

NMR spectroscopy further added the ability to investigate protein dynamics and behavior in solutions. NMR studies proteins in solution rather than crystals. By applying a strong magnetic field, NMR measures how atomic nuclei respond, providing information about distances between atoms and their movements. A landmark example of a major success in drug target discovery comes from the structural elucidation of the Bcl-2 family of proteins, which regulate cell death, through the application of Structure-Activity Relationship (SAR) by NMR. (Figure 3B).  This resulted in the design of ABT-737 and related BH3 mimetics for cancer therapy (6).

Drug–target structures revealed by major structural biology breakthrough techniques.
Figure 3. Drug–target structures revealed by major structural biology breakthrough techniques.

A) HIV-1 protease in complex with Saquinavir by X-ray crystallography (PDB ID: 1HXB) B) Structural elucidation of the Bcl-xL bound to ligands (PDB ID: 1YSG) using the SAR by NMR technique C) Cryo-EM structure of ozanimod-bound (S1P₁) in complex with Gi protein (PDB ID: 7EW0). Structural images were taken from PDB.  

 

Cryo-Electron Microscopy (Cryo-EM): Opening new dimensions.

Cryo-EM flash-freezes proteins in thin ice and images them with electron beams, producing thousands of 2D images that can be combined into high-resolution 3D structures. The first landmark cryo-EM structure that directly guided drug discovery is widely considered to be the adenosine A2A receptor (A2AR) in complex with the G protein (Gi) and the one of the first cryo-EM structures with an FDA-approved drug bound ozanimod– sphingosine-1-phosphate receptor 1 (S1P₁)–Gi complex (Figure 3C), used for multiple sclerosis and ulcerative colitis

In addition, Single-particle analysis of ion channel modulators has revealed the structures of TRP channels and voltage-gated sodium channels, demonstrated the design of modulators for pain and cardiac disorders.  Cryo-EM has enabled the resolution of large and complex biomolecular assemblies, such as ribosomes bound to antibiotics, guiding the development of a new generation of antimicrobial drugs. More recently, Cryo-EM also played a crucial role in determining the structure of the SARS-CoV-2 spike protein, shed light on the way for rapid vaccine development during the COVID-19 pandemic.

Cryo-electron tomography (cryo-ET) is an advanced form of cryo-EM that tilts samples and reconstructs proteins in their natural cellular environment.  This method rendered the visualization of macromolecules in their native cellular environment, helping to reveal viral assembly intermediates and assisting in antiviral drug design (7, 8 ,9).

Challenges and The AI Revolution: From Prediction to Protein Design.

Structural modelling of disease-related proteins, particularly those that are disordered or experimentally challenging, has traditionally relied on computational approaches such as homology modelling, threading, and ab initio methods (e.g., MODELLER, I-TASSER, Rosetta). While these approaches provide valuable structural insights, they often face limitations: they can be time-consuming, depend on available templates, and may be less accurate for complex or novel protein targets.

The advent of AI-driven approaches, especially deep learning–based tools, has transformed protein structure prediction. AI models automatically learn complex relationships between sequence, structure, and function from vast protein databases, identifying patterns that traditional methods may miss. By integrating data-driven learning with biologically and geometrically informed priors, AI can efficiently predict accurate 3D structures—even for disordered proteins or those lacking homologous templates.

Advanced learning paradigms—including supervised, unsupervised, and reinforcement learning—enable AI to predict structural and functional outcomes, refine protein variants, and explore large design spaces computationally. Consequently, AI not only accelerates structure determination but also improves reliability, offering transformative potential for therapeutic protein design and drug development. Figure 4 shows a simple illustration of AI-based drug optimization from protein sequences for the target. Importantly, AI complements and enhances traditional computational strategies. Many AI models incorporate principles from homology modelling, threading, and ab initio approaches into their training or prediction pipelines, ensuring these foundational methods remain relevant in modern protein structure analysis.

Table 1 summarizes the correspondence between traditional and AI/deep learning–based methods across key areas of structural modelling and protein engineering (10).

Integrated structural and ai-based computational workflow for protein modeling and drug discovery
Figure 4. Integrated structural and ai-based computational workflow for protein modeling and drug discovery:

Experimental protein structures often lack flexible regions; for example, the p53 protein (PDB-4AC0) shown here can be resolved using full-length AlphaFold models. Molecular dynamics (MD) simulations capture conformational dynamics, while virtual screening identifies small-molecule binders. Together, this pipeline integrates AI-based structure prediction, dynamics, and ligand screening to support and strengthen the therapeutic development. p53 images were created using PyMOL.

AlphaFold is a groundbreaking AI system developed by DeepMind that uses deep learning to predict protein structures with near-experimental accuracy. It revolutionized biology by solving the decades-old challenge of determining a protein’s 3D structure from its amino acid sequence. The first version, released in 2018, estimated distances and angles between amino acids using deep learning. In 2020, AlphaFold 2 introduced a transformer-based architecture with the innovative Evoformer module, which combined sequence and structural data through attention mechanisms and iterative refinement. By 2021, DeepMind launched the AlphaFold Protein Structure Database, providing predicted structures for over 200 million proteins—greatly accelerating research scope in drug discovery, genomics, and molecular biology. In 2024, AlphaFold 3 expanded its capabilities as a multimodal AI model, able to predict the 3D structures and interactions of all biological molecules, including proteins, DNA/RNA, ligands, and antibodies (10).  

AlphaFold remains a landmark in science, helping researchers understand diseases like Alzheimer’s and Parkinson’s, map cancer mutations, and identify diagnostic biomarkers (11). Similarly, RoseTTAFold from David Baker’s lab offered a faster, lighter approach, which proved critical during COVID-19 for modelling viral proteins and supporting diagnostic test design. Tools like ESMFold eliminate the need for multiple sequence alignments, enabling rapid structural predictions for patient and metagenomic samples where homologs are not available. Other methods, including OmegaFold, HelixFold, and Uni-Fold, have enhanced scalability and speed, facilitating research into cancer variants and rare disease protein structures.

AI is also advancing the modeling of protein complexes. AlphaFold Multimer, for instance, predicts antibody-antigen interactions, assisting antibody design and elucidating viral-host binding mechanisms, such as SARS-CoV-2 spike–ACE2 interactions. Beyond prediction, AI integrates with experimental techniques: tools like CryoDRGN and DeepEM accelerate the interpretation of cryo-EM and NMR density maps, supporting the identification of critical diagnostic proteins like prions and amyloid fibrils. Complementary AI tools—such as MutPred2, EVE, and AlphaMissense—predict mutation impacts, providing essential insights for clinical genomics. Furthermore, generative models like RF Diffusion from Baker’s lab enable the design of novel binders for disease markers, supporting rapid diagnostics and early cancer detection. AI also aids biomarker discovery by linking structural fingerprints of circulating proteins to imaging markers, such as amyloid fibrils in Alzheimer’s disease (12).

However, these predictions often provide only static snapshots, while real proteins are dynamic entities. This is where molecular dynamics (MD) simulations play a crucial role, helping us understand protein movements and interactions in a biological context. MD simulations employ physics-based models to simulate protein motion over time, applying Newton’s laws to reveal how proteins fold, flex, and interact with ligands in realistic environments. The integration of AI with protein structure dynamics, particularly through MD simulations, is proving transformative in drug discovery, making structure-based drug design more accurate, faster, and cost-effective. These advancements are expected to accelerate target identification, deepen our understanding of disease mechanisms, and streamline drug development.

AI now not only predicts protein structures but also designs them, opening a new era in precision medicine. Companies like Insilico Medicine, BenevolentAI, Recursion, and Exscientia already have AI-modeled drugs in various Phase trials for diseases including oncology, fibrosis, psychiatry, and rare disorders. Others are progressing toward the clinic—for example, Generate Biomedicines is developing AI-designed protein therapeutics for cancer and genetic diseases. Additional companies, such as Charm Therapeutics (oncology), Iktos (rare and infectious diseases), XtalPi (cancer, antivirals, metabolic disorders), Owkin (oncology, immunology), and Standigm (neurodegeneration, cancer), are building diverse pipelines with AI-driven approaches.

Altogether, AI-driven structural modeling has evolved from traditional homology-based methods to advanced deep learning tools. These approaches have transformed diagnostics by speeding biomarker discovery, improving mutation analysis, supporting drug and antibody design, and even creating novel proteins for diagnostic tests (13,14,15). The integration of AI with traditional experimental methods represents a truly synergistic approach in modern protein science. Computational predictions can guide experimental planning, accelerate structural determination, and even inspire novel protein designs, while techniques such as X-ray crystallography, NMR, and cryo-EM provide the empirical validation necessary for scientific rigor. By streamlining workflows, enhancing hypothesis generation, and prioritizing the most promising experiments, AI allows researchers to allocate resources more efficiently and focus on the most impactful investigations.

In this collaborative framework, AI serves as a powerful partner, amplifying human expertise and laboratory capabilities rather than replacing them. The 2024 Nobel Prize in Chemistry—awarded to David Baker for his work in computational protein design and to Demis Hassabis and John Jumper for AlphaFold—highlights that AI achieves its greatest value when paired with experimental validation. Despite the impressive accuracy of AI predictions, they may still overlook critical aspects such as protein dynamics, rare conformations, post-translational modifications, or complex multi-protein interactions, all of which are essential for understanding and designing effective therapies.

Table 1. Advancement of Protein Structure Prediction Tools.
Table 1. Advancement of Protein Structure Prediction Tools.

Our Goal at ThinkBio.AI®

At Thinkbio.Ai®, We leverage the powerful integration of AI-driven protein structure predictions and molecular simulations to explore protein conformational dynamics at the molecular level, providing deeper insights into disease mechanisms. By elucidating the protein structure–function relationship in health and disease contexts, our goal is to accelerate rational drug design, enable prediction of mutations as functionally significant, and support the development of precision therapies. This approach not only speeds up disease research but also supports the creation of targeted treatments, ultimately improving human health. In parallel, we develop AI platforms and tools that empower pharmaceutical, healthcare, and research teams to transform complex biological and clinical data into actionable insights.  

References

  1. Leach, A.R. & Gillet, V.J. (2005). 3D structure and the drug-discovery process. Mol. BioSyst. 1, 327–337.
  2. Dobson, C.M. (2003). Protein folding and misfolding. Nature 426, 884–890.
  3. Valastyan, J.S. & Lindquist, S. (2014). Mechanisms of protein-folding diseases at a glance. Dis. Model. Mech. 7, 9–14.
  4. Noble, M.E.M., Endicott, J.A. & Johnson, L.N. (2004). Protein kinase inhibitors: insights into drug design from structure. Science 303, 1800–1805.
  5. Grandori, R. (2023). Protein structure and dynamics in the era of integrative structural biology. Front. Biophys. 1, 1219843.
  6. Oltersdorf, T., Elmore, S.W. & Shoemaker, A.R. et al. (2005). An inhibitor of Bcl-2 family proteins induces regression of solid tumours. Nature 435, 677–681.
  7. García-Nafrí­a, J., Lee, Y., Bai, X., Carpenter, B., & Tate, C.G. (2018). Cryo-EM structure of the adenosine A₂A receptor coupled to an engineered heterotrimeric G protein. eLife, 7, e35946.
  8. Yuan, Y., Jia, G., Wu, C. et al. (2021). Structures of signaling complexes of lipid receptors S1PR1 and S1PR5 reveal mechanisms of activation and drug recognition. Cell Res. 31, 1263–1274.
  9. Walls, A.C., Park, Y.-J., Tortorici, M.A., Wall, A., McGuire, A.T. & Veesler, D. (2020). Structure, function, and antigenicity of the SARS-CoV-2 spike glycoprotein. Cell 181, 281–292.
  10. Koh, H.Y., Zheng, Y., Yang, M. et al. AI-driven protein design. Nat Rev Bioeng (2025).
  11. Mubeen, H., Masood, A., Zafar, A., Khan, Z.Q., Khan, M.Q., and Nisa, A.U. (2024). Insights into AlphaFold's breakthrough in neurodegenerative diseases. Irish Journal of Medical Science, 193(5), 2577–2588
  12. Li, H., Nithin, C., Kmiecik, S., and Huang, S.Y. (2025). Computational methods for modeling protein–protein interactions in the AI era: current status and future directions. Drug Discovery Today, 30(6), 104382.
  13. Fu, Y., Ding, X., Zhang, M. et al.Intestinal mucosal barrier repair and immune regulation with an AI-developed gut-restricted PHD inhibitor. Nat Biotechnology(2024).
  14. Qiu X, Li H, Ver Steeg G, Godzik A. Advances in AI for Protein Structure Prediction: Implications for Cancer Drug Discovery and Development. Biomolecules. 2024 Mar 12;14(3):339. doi: 10.3390/biom14030339. PMID: 38540759; PMCID: PMC10968151.
  15. Ren, F., Aliper, A., Chen, J. et al.A small-molecule TNIK inhibitor targets fibrosis in preclinical and clinical models. Nat Biotechnology43, 63–75 (2025).