Deoxyribonucleic
acid
(DNA) is the primary chemical component of chromosomes and is the material
of which genes are made. It is sometimes called
the "molecule of heredity," because parents transmit copied portions
of their own DNA to offspring during reproduction, and because
they propagate their traits by doing so.
In bacteria and other simple or
prokaryotic cell organisms, DNA is
distributed more or less throughout the cell. In the complex or
eukaryotic cells that make up
plants, animals and in other
multi-celled organisms, most of the DNA resides
in the cell nucleus. The energy-generating
organelles known as chloroplasts and mitochondria also carry DNA,
as do many viruses.
Although sometimes
called "the molecule of heredity," pieces of DNA as people typically
think of them are not single molecules. Rather, they are pairs
of molecules, which entwine like vines to form a double
helix (top
half of the illustration at the right).
Each vine-like
molecule is a strand of DNA: a chemically linked chain of nucleotides, each of which
consists of a sugar, a phosphate and
one of four kinds of aromatic "bases". Because DNA strands
are composed of these nucleotide subunits, they are polymers.
The diversity
of the bases means that there are four kinds of nucleotides, which
are commonly referred to by the identity of their bases. These
are adenine (A), thymine (T), cytosine (C), and guanine (G).
In a DNA double
helix, two polynucleotide strands come together through complementary pairing of the
bases, which occurs by hydrogen bonding. Each base
forms hydrogen bonds readily to only one other -- A to T and C
to G -- so that the identity of the base on one strand dictates
what base must face it on the opposing strand. Thus the entire
nucleotide sequence of each strand
is complementary to that of the other, and when separated, each
may act as a template with which to replicate the other (middle
and lower half of the illustration at the right).
Because pairing
causes the nucleotide bases to face the helical axis, the sugar
and phosphate groups of the nucleotides run along the outside,
and the two chains they form are sometimes called the "backbones"
of the helix. In fact, it is chemical bonds between the phosphates
and the sugars that link one nucleotide to the next in the DNA
strand.
Within a gene,
the sequence of nucleotides along a DNA strand defines a protein, which an organism is liable to manufacture
or "express" at one or several
points in its life using the information of the sequence. The
relationship between the nucleotide sequence and the amino-acid
sequence of the protein is determined by simple cellular rules
of translation, known
collectively as the genetic code. Reading along
the "protein-coding" sequence of a gene, each successive sequence
of three nucleotides (called a codon) specifies or "encodes" one
amino acid.
In many species of organism, only a small
fraction of the total sequence of the genome appears to encode protein.
The function of the rest is a matter of speculation. It is known
that certain nucleotide sequences specify affinity for DNA binding proteins,
which play a wide variety of vital roles, in particular through
control of replication and transcription. These sequences are
frequently called regulatory sequences,
and researchers assume that so far they have identified only a
tiny fraction of the total that exist. "Junk DNA" represents sequences
that do not yet appear to contain genes or to have a function.
Sequence also
determines a DNA segment's susceptibility to cleavage by restriction enzymes,
the quintessential tools of genetic engineering.
The position of cleavage sites throughout an individual's genome
determines one kind of an individual's "DNA fingerprint".
Main article:
DNA replication Mechanical
properties relevant to biology
The hydrogen
bonds between the strands of the double helix are weak enough
that they can be easily separated by enzymes. Enzymes known as helicases unwind the strands
to facilitate the advance of sequence-reading enzymes such as
DNA
polymerase. The unwinding requires that helicases chemically
cleave the phosphate backbone of one of the strands so that it
can swivel around the other. The stands can also be separated
by gentle heating, as used in PCR, provided they have
fewer than about 10,000 base pairs (10 kilobase
pairs, or 10 kbp). The intertwining of the DNA strands makes long
segments difficult to separate.
When the ends
of a piece of double-helical DNA are joined so that it forms a
circle, as in plasmid DNA, the strands are topologically knotted. This
means they cannot be separated by gentle heating or by any process
that does not involve breaking a strand. The task of unknotting
topologically linked strands of DNA falls to enzymes known as
topoisomerases. Some of
these enzymes unknot circular DNA by cleaving two strands so that
another double-stranded segment can pass through. Unknotting is
required for the replication of circular DNA as well as for various
types of recombination in linear
DNA.
Space-filling model of a section of DNA molecule
The DNA helix
can assume one of three slightly different geometries, of which
the "B" form described by James D. Watson and Francis Crick is believed
to predominate in cells. It is 2 nanometers wide and extends
3.4 nanometers per 10 bp of sequence. This is also the approximate
length of sequence in which the helix makes one complete turn
about its axis. This frequency of twist (known as the helical
pitch) depends largely on stacking forces that each base
exerts on its neighbors in the chain.
The narrow
breadth of the double helix makes it impossible to detect by conventional
electron
microscopy, except by heavy staining. At the same time, the
DNA found in many cells can be macroscopic in length -- approximately
5 centimeters
long for strands in a human chromosome. Consequently, cells must
compact or "package" DNA to carry it within them. This is one
of the functions of the chromosomes, which contain spool-like
proteins known as histones, around which DNA winds.
The B form
of the DNA helix twists 360° per 10.6 bp in the absence of strain.
But many molecular biological processes can induce strain. A DNA
segment with excess or insufficient helical twisting is referred
to, respectively, as positively or negatively "supercoiled". DNA in vivo
is typically negatively supercoiled, which facilitates the unwinding
of the double-helix required for RNA transcription.
The two other
known double-helical forms of DNA, called A and Z, differ modestly
in their geometry and dimensions. The A form appears likely to
occur only in dehydrated samples of DNA, such those used in crystallography experiments,
and possibly in hybrid pairings of DNA and RNA strands. Segments of
DNA that cells have methylated for regulatory
purposes may adopt the Z geometry, in which the strands turn about
the helical axis like a mirror image of the B form.
The asymmetric
shape and linkage of nucleotides means that a DNA strand always
has a discernable orientation or directionality. Because of this
directionality, close inspection of a double helix reveals that,
although the nucleotides along one strand are heading one way
(e.g. the "ascending strand") the others are heading
the other (e.g. the "descending strand"). This arrangement
of the strands is called antiparallel.
For reasons
of chemical nomenclature, people who work with DNA refer to the
asymmetric termini of each strand as the 5' and
3' ends (pronounced "five prime" and "three prime").
DNA workers and enzymes alike always read nucleotide sequences
in the "5' to 3' direction". In a vertically
oriented double helix, the 3' strand is said to be ascending while
the 5' strand is said to be descending.
As a result
of their antiparallel arrangement and the sequence-reading preferences
of enzymes, even if both strands carried identical instead of
complementary sequences, cells could properly translate only one
of them. The other strand a cell can only read backwards. Molecular biologists
call a sequence "sense" if it is translated or
translatable, and they call its complement "antisense".
It follows then, somewhat paradoxically, that the template for
transcription is the antisense strand. The resulting
transcript is an RNA replica of the sense strand and
is itself sense.
Some viruses
blur the distinction between sense and antisense, because certain
sequences of their genomes do double duty, encoding
one protein when read 5' to 3' along one strand, and a second
protein when read in the opposite direction along the other strand.
As a result, the genomes of these viruses are unusually compact
for the number of genes they contain, which biologists view as
an adaptation.
Topologists
like to note that the juxtaposition of the 3' end of one DNA strand
beside the 5' end of the other at both termini of a double-helical
segment makes the arrangement a "crab canon".
In some viruses DNA appears in a non-helical,
single-stranded form. Because many of the DNA
repair mechanisms of cells work only on paired bases, viruses
that carry single-stranded DNA genomes mutate more frequently
than they would otherwise. As a result, such species may adapt
more rapidly to avoid extinction. The result would not be so favorable
in more complicated and more slowly replicating organisms, however,
which may explain why only viruses carry single-stranded DNA.
These viruses presumably also benefit from the lower cost of replicating
one strand versus two.
Working in
the 19th century, biochemists initially isolated DNA and RNA (mixed
together) from cell nuclei. They were relatively quick to appreciate
the polymeric nature of their "nucleic acid" isolates, but realized
only later that nucleotides were of two types--one containing
ribose and the other deoxyribose. It was this subsequent discovery
that led to the identification and naming of DNA as a substance
distinct from RNA.
Friederich
Miescher (1844-1895) discovered a substance he called "nuclein"
in 1869. Somewhat later he isolated a pure sample of the material
now known as DNA from the sperm of salmon, and in 1889 his pupil,
Richard
Altmann, named it "nucleic acid". This substance was found
to exist only in the chromosomes.
Max Delbrück, Nikolai
V. Timofeeff-Ressovsky, and Karl
G. Zimmer published results in 1935 suggesting that chromosomes
are very large molecules the structure of which can be changed
by treatment with X-rays, and that by so changing their structure
it was possible to change the heritable characteristics governed
by those chromosomes. (Delbrück and Salvador Luria were awarded
the Nobel Prize in 1969 for their work on the genetic structure
of viruses.) In 1943, Oswald Theodore Avery
discovered that traits proper to the "smooth" form of the Pneumococcus
could be transferred to the "rough" form of the same bacteria
merely by making the killed "smooth" (S) form available to the
live "rough" (R) form. Quite unexpectedly, the living R Pneumococcus
bacteria were transformed into a new strain of the S form, and
the transferred S characteristics turned out to be heritable.
In 1944, the
renowned physicist, Erwin Schrödinger, published
a brief book entitled What is Life?, in which he maintained
that chromosomes contained what he called the "hereditary code-script"
of life. He added: "But the term code-script is, of course, too
narrow. The chromosome structures are at the same time instrumental
in bringing about the development they foreshadow. They are law-code
and executive power -- or, to use another simile, they are architect's
plan and builder's craft -- in one." He conceived of these dual
functional elements as being woven into the molecular structure
of chromosomes. By understanding the exact molecular structure
of the chromosomes one could hope to understand both the "architect's
plan" and also how that plan was carried out through the "builder's
craft." Francis Crick, James Watson, Maurice Wilkins, Seymour
Benzer, et al., took up the physicist's challenge to work
out the structure of the chromosomes and the question of how the
segments of the chromosomes that were conceived to relate to specific
traits could possibly do their jobs.
Just how the
presence of specific features in the molecular structure of chromosomes
could produce traits and behaviors in living organisms was unimaginable
at the time. Because chemical dissection of DNA samples always
yielded the same four nucleotides, the chemical composition of
DNA appeared simple, perhaps even uniform. Organisms, on the other
hand, are fantastically complex individually and widely diverse
collectively. Geneticists did not speak of genes as conveyors
of "information" in such words, but if they had, they would not
have hesitated to quantify the amount of information that genes
need to convey as vast. The idea that information might reside
in a chemical in the same way that it exists in text--as a finite
alphabet of letters arranged in a sequence of unlimited length--had
not yet been conceived. It would emerge upon the discovery of
DNA's structure, but few researchers imagined that DNA's structure
had much to say about genetics.
In the 1950s,
only a few groups made it their goal to determine the structure
of DNA. These included an American group led by Linus
Pauling, and two groups in Britain. At Cambridge University,
Crick and Watson were building physical models using metal rods
and balls, in which they incorporated the known chemical structures
of the nucleotides, as well as the known position of the linkages
joining one nucleotide to the next along the polymer. At King's College, London,
Maurice Wilkins and Rosalind Franklin were
examining x-ray diffraction patterns
of DNA fibers.
A key inspiration
in the work of all of these teams was the discovery in 1948 by
Pauling that many proteins included helical (see alpha helix)
shapes. Pauling had deduced this structure from x-ray patterns.
Even in the initial crude diffraction data from DNA, it was evident
that the structure involved helices. But this insight was only
a beginning. There remained the questions of how many strands
came together, whether this number was the same for every helix,
whether the bases pointed toward the helical axis or away, and
ultimately what were the explicit angles and coordinates of all
the bonds and atoms. Such questions motivated the modeling efforts
of Watson and Crick.
In their modeling,
Watson and Crick restricted themselves to what they saw as chemically
and biologically reasonable. Still, the breadth of possibilities
was very wide. A breakthrough occurred in 1952, when Erwin
Chargaff visited Cambridge and inspired Crick with a description
of experiments Chargaff had published in 1947. Chargaff had observed
that the proportions of the four nucleotides vary between one
DNA sample and the next, but that for particular pairs of nucleotides
-- adenine and thymine, guanine and cytosine -- the two nucleotides
are always present in equal proportions.
Watson and
Crick had begun to contemplate double helical arrangements, and
they saw that by reversing the directionality of one strand with
respect to the other, they could provide an explanation for Chargaff's
puzzling finding. This explanation was the complementary pairing
of the bases, which also had the effect of ensuring that the distance
between the phosphate chains did not vary along a sequence. Watson
and Crick were able to discern that this distance was constant
and to measure its exact value of 2 nanometers from an X-ray pattern
obtained by Franklin. The same pattern also gave them the 3.4
nanometer-per-10 bp "pitch" of the helix. The pair quickly converged
upon a model, which they announced before Franklin herself published
any of her work.
The great
assistance Watson and Crick derived from Franklin's data has become
a subject of controversy, and it has angered people who believe
Franklin has not received the credit due to her. The most controversial
aspect is that Franklin's critical X-ray pattern was shown to
Watson and Crick without Franklin's knowledge or permission. Wilkins
showed it to them at his lab while Franklin was away.
Watson and
Crick's model attracted great interest immediately upon its presentation.
Arriving at their conclusion on February 21,
1953, Watson and Crick made their
first announcement on February 28. Their paper 'A Structure for Deoxyribose
Nucleic Acid' was published on April 25. In
an influential presentation in 1957, Crick laid out the "Central Dogma", which foretold
the relationship between DNA, RNA, and proteins, and articulated
the "sequence hypothesis." A critical confirmation of the replication
mechanism that was implied by the double-helical structure followed
in 1958 in the form of the
Meselson-Stahl experiment.
Work by Crick and coworkers deciphered the genetic
code not long afterward. These findings represent the birth
of molecular biology.
Watson, Crick,
and Wilkins were awarded a Nobel Prize
in 1962, by which time Franklin had
died.
|