What is DNA Sequencing?
DNA sequencing is the process of determining the precise order of nucleotides (A, T, G, C) in a DNA molecule. This "reading" of the genetic code has transformed biology, enabling us to understand genomes, diagnose diseases, solve crimes, trace ancestry, and engineer organisms.
The history of sequencing is a story of increasingly clever chemistry and engineering, driving costs down and throughput up by orders of magnitude.
- First generation: Sanger sequencing (1977); reads ~1,000 bases per reaction
- Second generation: NGS/short-read (2005+); massively parallel; reads millions of short fragments
- Third generation: Long-read (2010+); reads thousands to millions of bases per molecule
Before Sequencing: The Early Days
In 1953, Watson and Crick described DNA's double helix structure. But knowing the structure didn't reveal the sequence, the actual genetic "letters." For two decades, sequencing remained essentially impossible.
Early methods were laborious. Ray Wu developed enzymatic approaches in the early 1970s but could only sequence short stretches. The field needed a breakthrough.
Frederick Sanger: The Dideoxy Method (1977)
Frederick Sanger was a quiet British biochemist who had already won a Nobel Prize (1958) for sequencing the protein insulin. In the 1970s, he turned to DNA.
The Elegant Insight
Sanger's method, published in 1977, used a brilliant trick. He employed dideoxynucleotides (ddNTPs), modified nucleotides lacking the 3' hydroxyl group needed to form the next phosphodiester bond. When incorporated, they terminate chain elongation.[1]
By running four reactions (each with a small proportion of one ddNTP: ddATP, ddTTP, ddGTP, or ddCTP), you generate a ladder of fragments ending at every occurrence of that base. Separating these fragments by size reveals the sequence.
How It Works
- Prime and extend: A primer binds to the template; DNA polymerase begins synthesis
- Random termination: Occasionally, a ddNTP is incorporated instead of a normal dNTP, terminating that chain
- Size separation: Gel electrophoresis separates fragments by length
- Read the ladder: The sequence is read from the pattern of bands
Sanger received his second Nobel Prize in 1980, one of only four people to win two Nobels.[4]
Automation
The original method used radioactive labels and manual reading of X-ray films. Leroy Hood and colleagues at Caltech automated the process using fluorescent labels (different colors for each base) and laser detection. This enabled automated DNA sequencers, machines that could sequence DNA continuously.
"The development of methods to sequence DNA was perhaps the single most important technical advance in biological research in the last century."
The Human Genome Project (1990-2003)
The ultimate test of sequencing technology: decode the 3 billion base pairs of human DNA.
Led by Francis Collins (NIH) and an international consortium, the public Human Genome Project used Sanger sequencing at massive scale. Parallel efforts by Craig Venter at Celera Genomics used a "shotgun" approach, sequencing random fragments and computationally assembling them.
The project cost $2.7 billion and took 13 years. The draft sequence was announced in 2000 (in a joint announcement by Collins and Venter), with the "complete" sequence published in 2003.[2]
Next-Generation Sequencing (2005+)
Sanger sequencing, while revolutionary, had limitations: each reaction read one DNA fragment at a time. The next breakthrough was massively parallel sequencing.
The Key Innovation
Instead of reading one fragment per reaction, NGS platforms sequence millions of fragments simultaneously on a single chip. This parallelization, combined with clever biochemistry, increased throughput by orders of magnitude.
Major Platforms
Illumina (Sequencing by Synthesis)
The dominant platform today. DNA fragments are attached to a flow cell surface, amplified into clusters, and sequenced by adding fluorescently labeled nucleotides one at a time. After each addition, a camera captures which base was incorporated at each cluster.
- Strengths: High accuracy (~99.9%), high throughput, low cost per base
- Limitations: Short reads (150-300 bases), requires amplification
Ion Torrent (Semiconductor Sequencing)
Detects hydrogen ions released during nucleotide incorporation. No optical system needed, as detection is purely electronic.
Short-Read Advantages and Limitations
NGS reads are short (100-300 bases for Illumina). For human genomes, this means sequencing is like shredding a book and reassembling it from overlapping fragments. Repetitive regions (common in genomes) are challenging to assemble correctly.
Third Generation: Long-Read Sequencing (2010+)
Long-read technologies can sequence individual DNA molecules for thousands to millions of bases, solving the assembly problem for repetitive regions.
Pacific Biosciences (PacBio)
Single-molecule real-time (SMRT) sequencing watches a single DNA polymerase molecule incorporating fluorescent nucleotides in real time. Read lengths of 10,000-20,000+ bases are routine.
Oxford Nanopore
A truly revolutionary approach: DNA passes through a nanoscale pore, and the electrical current change as each base transits reveals the sequence. No optics, no amplification, no fluorescence.
- MinION: A USB-stick-sized sequencer costing ~$1,000
- Read length: Virtually unlimited; reads of >2 million bases achieved
- Portability: Sequencing in the field, in space, even in the jungle
The Cost Revolution
The drop in sequencing cost is unprecedented in technology:
- 2001: ~$100,000,000 per human genome
- 2007: ~$10,000,000
- 2014: ~$1,000
- 2024: ~$200
This outpaced Moore's Law by orders of magnitude, driven by the revolution in NGS technology.[5]
Key Contributors
- Frederick Sanger: Developed the foundational sequencing method
- Walter Gilbert: Developed chemical cleavage sequencing (shared 1980 Nobel with Sanger)
- Leroy Hood: Automated fluorescent sequencing
- Craig Venter: Shotgun sequencing and the race to sequence the human genome
- Shankar Balasubramanian & David Klenerman: Developed sequencing-by-synthesis (Illumina's core technology)
- Hagan Bayley & others: Pioneered nanopore sequencing
Applications
- Precision medicine: Cancer genomics, pharmacogenomics, rare disease diagnosis
- Infectious disease: Pathogen identification, outbreak tracking, antibiotic resistance
- Research: Understanding gene function, evolution, developmental biology
- Agriculture: Crop and livestock improvement
- Ancestry: Consumer genetics, forensics, population genetics
Sequencing has gone from impossible to routine, from laboratories to living rooms. The $200 genome represents one of the most dramatic technology cost reductions in history.[3]
Sources
- Sanger, F., et al. (1977). DNA sequencing with chain-terminating inhibitors. PNAS, 74(12), 5463-5467.
- Lander, E. S., et al. (2001). Initial sequencing and analysis of the human genome. Nature, 409(6822), 860-921.
- Shendure, J., et al. (2017). DNA sequencing at 40: past, present and future. Nature, 550(7676), 345-353.
- NobelPrize.org. (1980). The Nobel Prize in Chemistry 1980. nobelprize.org
- NHGRI. (2024). The Cost of Sequencing a Human Genome. genome.gov