BCH5425 Molecular Biology and Biotechnology
Dr. Michael Blaber
Period of time between first man-powered
flight and landing on the moon (1902-1969):
Period of time between discovery
of structure of DNA and determination of the sequence of the entire
human genome (1953-2010?)
57 years (?)
Genomic DNA libraries
Size of some genomes and chromosomes:
|Comparative Sequence Sizes
|(yeast chromosome 3)||350 Thousand
|Escherichia coli (bacterium) genome||4.6 Million
|Largest yeast chromosome now mapped||5.8 Million
|Entire yeast genome (completed 5/96)||15 Million
|Smallest human chromosome (Y)||50 Million
|Largest human chromosome (1)||250 Million
|Entire human genome||3 Billion
- The human genome contains approximately 50,000
unique genes within 3-4 billion base pairs
of DNA, scattered about in 23 pairs
Fragmentation of genomic DNA for library construction
Restriction endonuclease digestion
- A six-cutter (e.g. Eco RI) will cut on average every 4.1
Kb. Complete digestion of human DNA with this type of enzyme
will result in approximately 1 x 106 unique fragments.
- What is the probability of finding a clone within a given
The exact probability of having any given DNA sequence in the
library can be calculated from the equation
N = ln(1 -P)/ln(1 - f)
is the desired probability
is the fractional proportion of the genome in a single recombinant
is the necessary number of recombinants
For example, how large a library (i.e. how many clones) would
you need in order to have a 99% probability of finding a desired
sequence represented in a library created by digestion with a
N = ln(1 - 0.99)/ln(1 - (4096/3x109))
N = 3.37 x 106 clones
Thus, from this type of analysis we can see that we need a technology
which will allow us to achieve the following:
- Stable insertion of relatively large DNA fragments into
our cloning vector
- High efficiency of insertion and the ability to handle
large numbers of clones
- For example, when plating E. coli
colonies on a 3" petri plate, the maximum practical density
to allow isolation of individual colonies is about 100-200
colonies per plate.
- If we were to try to plate our library of 3.37 x 106
in such a way would need about 22,500 plates.
- Not only that, but such large DNA fragments are not well tolerated
in typical E. coli cloning vectors such as pBR322.
Bacteriophage lambda vectors are commonly used for construction
of genomic libraries
Bacteriophage l is an E. coli
phage with a type of icosahedral phage particle which contains
the viral genome:
- During replication, the phage DNA is produced in a concatameric
form, which is cleaved by appropriate endonucleases to allow packaging
of a single genome within the phage capsid.
- It was found that internal regions of the phage genome, which
were not essential to phage replication, could be removed and
replaced with DNA of interest.
- This hybrid DNA could be efficiently packaged, and form an
The advantages of this type of system vs plasmids like pBR322
- The phage genome is able to package efficiently with DNA
inserts as large as 20 Kb.
- Furthermore, the packaged phage are highly infectious and
infect E. coli at a much higher efficiency than plasmid transformation
Incomplete Digestion of Genomic DNA will allow identification
of sequence overlaps
Complete digestion with an endonuclease will result in a library
containing no overlapping fragments:
- However, incomplete digestion will result in a library
containing overlapping fragments:
- Thus, the sequence information obtained from one clone
will allow the isolation of clones containing neighboring (overlapping)
- This can allow large contiguous stretches of sequence information
to be obtained ("Chromosome Walking").
Once a library (cDNA or genomic) has been constructed we want
to be able to identify clones which contain DNA of interest.
- For example, from protein sequence information we can deduce
possible stretches of the corresponding DNA sequence (there
will however be ambiguity due to the degeneracy of codons).
- If we can synthesize an oligonucleotide complementary to our
DNA sequence of interest we can use it to specifically hybridize
to the appropriate clone in our libraray (i.e. to probe
In standard methodologies the oligonucleotide is phosphorylated
at the 5' end with radiolabeled g32P-ATP
and T4 polynucleotide kinase.
- The probe is then incubated with individual phage plaques
which have been fixed onto nitrocellulose and their DNA denatured
by treatment with base.
- If the plaque contains complementary DNA to to probe sequence,
the probe will hybridize.
- If the nitrocellulose (containing many individual plaques)
is exposed to x-ray film, only those plaques with hybridized
probe will show up (as a dark spot):
Note that its important to keep
track of the orientation of the nitrocellulose in relationship
to the x-ray film (usually radioactive ink is used to identify
the nitrocellulose orientation).
If we are designing DNA probes from protein sequence information
we will have possible ambiguity in our deduced DNA sequence
used for the design of the probe.
- Usually 14-24mer oligonucleotides are used as probes, a 14-24mer
probe means we need a stretch of 5-8 amino acids in the
- Given the choice, the best amino acid sequences to look for
in a polypeptide are those with low codon degeneracy (see
- Thus, we would look for a short stretch of polypeptide sequence
hopefully containing Met or Trp,
and with the remaining amino acids comprising either Phe,
Tyr, His, Gln, Asn , Lys, Asp, Glu or Cys.
- Regions including Leu, Arg or Ser
are to be avoided (6 codons each).
During oligonucleotide synthesis multiple bases will be incorporated
at ambiguous positions.
- Thus our probe will actually be a mixture of oligonucleotides.
- The higher the degeneracy, the greater the posibility of "false
positives", i.e. clones which hybridize but are unrelated
to the actual sequence we want.
- Positive clones are sequenced and the deduced amino acid sequence
is compared to our polypeptide sequence information to identify
If the particular vector, or phage, used to construct a cDNA library
contains a promoter region upstream
of the insertion site we may be able to screen for desired clones
by looking for expression of the protein
- In this case, we need an assay which is both sensitive
(we will not be producing a lot of protein) and specific
(we want to minimize any false positives).
- One of the best assays, which is both sensitive and specific,
makes use of antibodies.
Antigen, antibody, epitope
One of the defense mechanisms of vertebrates is the ability to
distinguish between self and non-self molecules.
- Thus, if a foreign molecule (either from another species or
sometimes from another individual within a species) invades a
vertebrate organism, the immune system functions to learn to identify
- In future invasions by the same molecule, the organism mounts
a defense against it by producing specific antibodies which
recognize and bind to the foreign antigen.
- When antibodies bind to antigen certain white blood cells
(macrophages and monocytes) recognize the invading body as foreign
and respond by destroying it.
Antibodies are 'Y' shaped molecules which contain two identical
heavy chains, and two identical light chains.
- The stem of the 'Y' comprises the Fc
(constant) domain, and the 'arms' of the 'Y' comprise
the Fab (variable) domains.
- Antigens bind to the complementarity-determining regions
(CDR's) located at the ends of the Fab domains.
Antibodies are synthesized by B lymphocytes. Each B lymphocyte
is capable of producing a single type of antibody directed against
a specific structural determinant, or epitope, on an antigen.
- Thus, an immune response to a protein antigen may result in
a population of B lymphocytes each producing antibodies which
recognize a different structural determinant of the foreign protein.
- An epitope may be a contiguous region of 5 or 6 amino acids
in the foreign polypeptide, or the epitope may comprise a half
dozen or so amino acids brought in juxtaposition in the native
protein, yet widely spaced in the polypeptide sequence.
- Thus, some antibodies will recognize native and denatured
forms of a foreign protein equally well, while other antibodies
may only recognize one or the other.
If the protein of interest has been purified it can be used to
induce an immune response in a host animal.
- Typical host animals include mouse, chicken, rabbit, goat,
sheep, horse and occasionally, human.
- After an initial immunization, followed by one or more booster
shots, the B lymphocytes of the host animal may produce antibodies
directed against the antigen.
- The antibodies can be be purified from blood samples withdrawn
from the animal. Such preparations of antibodies are said to be
- This refers to the fact that the antibodies present are from
a collection of different B lymphocytes and thus will recognize
a variety of different epitopes on the antigen protein.
- The ability to isolate antibodies from blood samples means
that the host animal does not need to be destroyed.
- Of course, the size of the animal determines how much antibodies
one can obtain. For example, a rabbit can provide 5 mls of blood
every two weeks, a mouse provides significantly less, while a
horse can provide quite a bit more.
An antibodiy isolated from a single B lymphocyte cell population
is termed monoclonal.
- It recognizes a single epitope on the antigenic protein.
- Antibody producing B lymphocytes can be isolated from the
spleen or from lymph nodes. However, they have a finite life
span in culture, i.e. they will undergo a certain number of
cell divisions and then die.
- These cells can, however, be fused with immortal (cancerous
myeloma) lymphocytes to produce a hybridoma cell.
- Such a cell is immortal like the myeloma, and produces
a specific antibody from the B lymphocyte. The ability to grow
indefinitely in culture allows the isolation of useful amounts
of specific monoclonal antibodies.
Sometimes immunizing with the protein of interest is problematic:
appropriate amounts of purified material cannot be produced, or
the protein is itself toxic at the dosage level necessary to produce
an immune response.
- If partial sequence information is known, then large amounts
of polypeptides representing short fragments of the protein, can
be synthesized and used to immunize the animal.
- Often these polypeptides are covalently attached to a carrier
protein (typically serum albumin) to enhance the antigenic response.
- Antibodies produced against such peptides will recognize only
epitopes within the polypeptide. Thus, even polyclonal antibodies
would be quite limited in their epitope recognition.
As with radiolabeled oligonucleotides, antibodies can be used
to identify library clones which contain a cDNA of interest. This
method would of course rely upon a host vector or phage which
contains a promoter upstream from the site of insertion of the
- Antibodies can be used to screen viral plaques or plasmid
clonies which have been bound to nitrocellulose.
- Bound antibodies can be identified using radiolabeled protein
A (which binds to immunoglobulins) or via a second antibody (which,
like protein A, can recognize general immunoglobulins) which has
a dye or dye releasing enzyme covalently attached.
1998 Dr. Michael Blaber