What is the Protein Folding Problem?

Proteins are the molecular machines of life: enzymes, antibodies, receptors, structural components. They're made of chains of amino acids, but a protein's function depends on its three-dimensional structure: how that chain folds into a precise 3D shape.

The protein folding problem: given just the sequence of amino acids (determined from the gene), can we predict the protein's 3D structure? The amino acid sequence is the "instructions"; the 3D structure is the "result" of those instructions. But the folding process is incredibly complex.

Why Folding Is Hard
  • Combinatorial explosion: A 100-amino-acid protein has ~10^100 possible conformations
  • Levinthal's paradox: Random sampling would take longer than the age of the universe
  • Complex physics: Hydrogen bonds, electrostatics, van der Waals forces, hydrophobic effects
  • Subtle effects: Tiny sequence changes can dramatically alter structure

Why It Matters

Knowing a protein's structure reveals how it works, and how to intervene when it malfunctions:

Experimentally determining structures (by X-ray crystallography, NMR, or cryo-EM) is slow, expensive, and doesn't work for all proteins. Only about 170,000 structures had been experimentally solved, out of hundreds of millions of known protein sequences.

50 Years of Attempts

In 1972, Christian Anfinsen won the Nobel Prize for showing that a protein's structure is determined by its amino acid sequence; the folding information is encoded in the sequence itself.[5] This implied prediction should be possible.

For 50 years, researchers tried various approaches:

Progress was made, but accuracy remained limited, especially for proteins unlike any with known structures.

CASP: The Critical Competition

In 1994, John Moult organized the first CASP (Critical Assessment of protein Structure Prediction), a blind competition where researchers predict structures of proteins whose structures are known but not yet public.

CASP became the field's benchmark. For years, scores plateaued around 40-60 on the GDT (Global Distance Test) scale, where 100 means perfect prediction. The problem seemed intractable.

DeepMind Enters (2016-2018)

DeepMind, the London-based AI company (owned by Alphabet/Google), had made headlines with AlphaGo defeating the world champion at Go. In 2016, they turned their attention to protein folding.

Led by Demis Hassabis (DeepMind co-founder and CEO) and John Jumper (senior researcher), a team began developing AlphaFold.

AlphaFold 1 (CASP13, 2018)

The first version competed in CASP13 and won decisively, achieving GDT scores around 60, well above competitors. It used deep learning to predict distances between amino acid pairs, then optimized structures to match those predictions.

It was impressive but not yet transformative. The structures weren't accurate enough for many applications.

AlphaFold 2: The Breakthrough (CASP14, 2020)

In November 2020, AlphaFold 2 competed in CASP14. The results stunned the field:[1]

"This is a problem that I was beginning to think would not get solved in my lifetime."

- John Moult, CASP founder[4]

How It Works

AlphaFold 2 introduced several innovations:

  1. Attention mechanisms: Uses "Evoformer" blocks that let distant parts of the sequence influence each other, capturing long-range interactions crucial for folding
  2. Multiple sequence alignments: Analyzes evolutionary relatives of the target protein, extracting co-evolution information that constrains structure
  3. Iterative refinement: Predicts structure, then feeds that prediction back to improve it
  4. End-to-end learning: Trained directly on protein structures, learning features automatically

The Key People

Demis Hassabis

Co-founder and CEO of DeepMind. A former chess prodigy and game designer, Hassabis pursued the goal of "solving intelligence, and then using that to solve everything else." Protein folding was a test case.

John Jumper

Lead researcher on AlphaFold 2. A physicist by training, Jumper brought deep understanding of both machine learning and protein biophysics. He architected the technical approach that achieved the breakthrough.

Hassabis and Jumper shared the 2024 Nobel Prize in Chemistry for AlphaFold, along with David Baker for related work on computational protein design.[3]

Impact: Revolutionizing Biology

The AlphaFold Database

In 2021, DeepMind released predicted structures for 98.5% of human proteins, then expanded to over 200 million proteins from across the tree of life.[2] This free database contains more structures than had been experimentally determined in the entire history of structural biology.

Applications

COVID-19 research benefited immediately. AlphaFold was used to study SARS-CoV-2 proteins and potential drug targets.

Limitations and Future

AlphaFold isn't perfect:

The next frontier includes protein design (creating proteins with desired functions), understanding dynamics, and predicting effects of drugs and mutations.

Legacy

AlphaFold represents a paradigm shift:

The protein folding problem, once the "holy grail" of computational biology, has been essentially solved. AlphaFold is already one of the most impactful AI applications in science, with ripples that will be felt for decades.

Sources

  1. Jumper, J., et al. (2021). Highly accurate protein structure prediction with AlphaFold. Nature, 596(7873), 583-589.
  2. Varadi, M., et al. (2022). AlphaFold Protein Structure Database: massively expanding the structural coverage of protein-sequence space. Nucleic Acids Research, 50(D1), D419-D426.
  3. NobelPrize.org. (2024). The Nobel Prize in Chemistry 2024. nobelprize.org
  4. Service, R. F. (2020). 'The game has changed.' AI triumphs at solving protein structures. Science, 370(6521), 1144-1145.
  5. Anfinsen, C. B. (1973). Principles that govern the folding of protein chains. Science, 181(4096), 223-230.