What is DNA Sequencing?

DNA sequencing is the process of determining the precise order of nucleotides (A, T, G, C) in a DNA molecule. This "reading" of the genetic code has transformed biology, enabling us to understand genomes, diagnose diseases, solve crimes, trace ancestry, and engineer organisms.

The history of sequencing is a story of increasingly clever chemistry and engineering, driving costs down and throughput up by orders of magnitude.

Sequencing Generations
  • First generation: Sanger sequencing (1977); reads ~1,000 bases per reaction
  • Second generation: NGS/short-read (2005+); massively parallel; reads millions of short fragments
  • Third generation: Long-read (2010+); reads thousands to millions of bases per molecule

Before Sequencing: The Early Days

In 1953, Watson and Crick described DNA's double helix structure. But knowing the structure didn't reveal the sequence, the actual genetic "letters." For two decades, sequencing remained essentially impossible.

Early methods were laborious. Ray Wu developed enzymatic approaches in the early 1970s but could only sequence short stretches. The field needed a breakthrough.

Frederick Sanger: The Dideoxy Method (1977)

Frederick Sanger was a quiet British biochemist who had already won a Nobel Prize (1958) for sequencing the protein insulin. In the 1970s, he turned to DNA.

The Elegant Insight

Sanger's method, published in 1977, used a brilliant trick. He employed dideoxynucleotides (ddNTPs), modified nucleotides lacking the 3' hydroxyl group needed to form the next phosphodiester bond. When incorporated, they terminate chain elongation.[1]

By running four reactions (each with a small proportion of one ddNTP: ddATP, ddTTP, ddGTP, or ddCTP), you generate a ladder of fragments ending at every occurrence of that base. Separating these fragments by size reveals the sequence.

How It Works

  1. Prime and extend: A primer binds to the template; DNA polymerase begins synthesis
  2. Random termination: Occasionally, a ddNTP is incorporated instead of a normal dNTP, terminating that chain
  3. Size separation: Gel electrophoresis separates fragments by length
  4. Read the ladder: The sequence is read from the pattern of bands

Sanger received his second Nobel Prize in 1980, one of only four people to win two Nobels.[4]

Automation

The original method used radioactive labels and manual reading of X-ray films. Leroy Hood and colleagues at Caltech automated the process using fluorescent labels (different colors for each base) and laser detection. This enabled automated DNA sequencers, machines that could sequence DNA continuously.

"The development of methods to sequence DNA was perhaps the single most important technical advance in biological research in the last century."

The Human Genome Project (1990-2003)

The ultimate test of sequencing technology: decode the 3 billion base pairs of human DNA.

Led by Francis Collins (NIH) and an international consortium, the public Human Genome Project used Sanger sequencing at massive scale. Parallel efforts by Craig Venter at Celera Genomics used a "shotgun" approach, sequencing random fragments and computationally assembling them.

The project cost $2.7 billion and took 13 years. The draft sequence was announced in 2000 (in a joint announcement by Collins and Venter), with the "complete" sequence published in 2003.[2]

Next-Generation Sequencing (2005+)

Sanger sequencing, while revolutionary, had limitations: each reaction read one DNA fragment at a time. The next breakthrough was massively parallel sequencing.

The Key Innovation

Instead of reading one fragment per reaction, NGS platforms sequence millions of fragments simultaneously on a single chip. This parallelization, combined with clever biochemistry, increased throughput by orders of magnitude.

Major Platforms

Illumina (Sequencing by Synthesis)

The dominant platform today. DNA fragments are attached to a flow cell surface, amplified into clusters, and sequenced by adding fluorescently labeled nucleotides one at a time. After each addition, a camera captures which base was incorporated at each cluster.

Ion Torrent (Semiconductor Sequencing)

Detects hydrogen ions released during nucleotide incorporation. No optical system needed, as detection is purely electronic.

Short-Read Advantages and Limitations

NGS reads are short (100-300 bases for Illumina). For human genomes, this means sequencing is like shredding a book and reassembling it from overlapping fragments. Repetitive regions (common in genomes) are challenging to assemble correctly.

Third Generation: Long-Read Sequencing (2010+)

Long-read technologies can sequence individual DNA molecules for thousands to millions of bases, solving the assembly problem for repetitive regions.

Pacific Biosciences (PacBio)

Single-molecule real-time (SMRT) sequencing watches a single DNA polymerase molecule incorporating fluorescent nucleotides in real time. Read lengths of 10,000-20,000+ bases are routine.

Oxford Nanopore

A truly revolutionary approach: DNA passes through a nanoscale pore, and the electrical current change as each base transits reveals the sequence. No optics, no amplification, no fluorescence.

The Cost Revolution

The drop in sequencing cost is unprecedented in technology:

This outpaced Moore's Law by orders of magnitude, driven by the revolution in NGS technology.[5]

Key Contributors

Applications

Sequencing has gone from impossible to routine, from laboratories to living rooms. The $200 genome represents one of the most dramatic technology cost reductions in history.[3]

Sources

  1. Sanger, F., et al. (1977). DNA sequencing with chain-terminating inhibitors. PNAS, 74(12), 5463-5467.
  2. Lander, E. S., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921.
  3. Shendure, J., et al. (2017). DNA sequencing at 40: past, present and future. Nature, 550(7676), 345-353.
  4. NobelPrize.org. (1980). The Nobel Prize in Chemistry 1980. nobelprize.org
  5. NHGRI. (2024). The Cost of Sequencing a Human Genome. genome.gov