BCH 4053 Biochemistry I
Fall 2001
Dr. Michael Blaber
Lecture 9
Chemical groups in proteins, Purification of proteins, Amino acid sequencing
Chemical Groups in Proteins
proteins: have only amino acids in them
Conjugated proteins: have other types of molecules as an integral part of their structure - often covalently bonded through linkage(s) either with the main chain or side chain atoms
- The Prosthetic group:
is the non-protein group that is found in a conjugated protein
The following is a list of commonly observed prosthetic groups in conjugated proteins
- Glycoproteins
. These have carbohydrate, typically covalently bonded to either Ser, Thr or Asn side chains. Often confers added solubility, serves to regulate the in vivo half-life, and is a commonly observed modification in extracellular and membrane-bound proteins. Can also function in cell-cell communication and identity.
- Lipoproteins
. These have lipid prosthetic groups. Lipoproteins may serve a transport function for lipids, and the lipids are therefore typically bound via non-covalent interactions. Can also serve to anchor soluble proteins to a lipid surface or membrane
- Nucleoproteins
. Complexes (often non-covalent) of nucleic acids and proteins. Often involved in the packaging or regulation of the nucleic acid (i.e. genetic material)
- Phosphoproteins
. These are proteins with a phosphate covalently bound to the side chain hydroxyl groups of either serine, threonine or tyrosine. Phosphorylation often regulates the functional activity of the protein to which it is bound (i.e. turning it on or off depending on the state of phosphorylation). This regulation often is part of a signal transduction pathway from outside to inside the cell.
- Metalloproteins
. Contain a metal ion bound via non-covalent electrostatic interactions. The metal ion is often required for function or stability of the metalloprotein (e.g. often the metalloprotein is a metal-requiring enzyme), or the protein is a transport molecule for the metal.
- Hemoproteins
. An important subclass of metalloproteins that contain a
Fe (Iron) prosthetic group. Many important redox and energy pathways involve hemoproteins.
Flavoproteins. Flavin is an important prosthetic group for a variety of redox reactions. As a functional group it has the ability to reversibly bind/release electrons
Purification of Protein Mixtures
Selective purification of a particular protein from a mixture of different proteins takes advantage of the unique physical and chemical properties of the protein of interest.
- A "purification scheme" is a series of fractionation steps using methodologies that separate mixtures of proteins based upon some physical or chemical property
What common properties distinguish different proteins?
- Molecular mass
. Proteins have a mass that is proportional to the length of the polypeptide chain. The average molecular mass of amino acids is approximately 118 Da, or 118 g/mole. However, a water (18 Da) is released when a peptide bond is formed. So a useful heuristic (rule of thumb) is that the average mass of an amino acid in a polypeptide is 110 Da. Therefore, a protein with 150 amino acids would have a molecular mass of approximately 16,500 Da (16.5 kDa), or 16,500 g/mole. Various analytical methods will separate molecules based on molecular mass (for example, a method known as gel-filtration chromatography).
- Solubility
. Different proteins have different solubilities in certain salt solutions. This can be used to selectively fractionate a protein of interest.
- pKa.
Different proteins have different pKa values. Thus, at a certain pH different proteins will have a different net charge and this electrostatic property can be used to separate proteins (in a method known as ion-exchange chromatography)
- Hydrophobicity.
Different proteins will have different proportions of aliphatic and aromatic side chains on their surface. These groups contribute to the hydrophobicity of the protein. This property can be used to separate such proteins in a method known as hydrophobicity interaction chromatography.
- Affinity for specific ligands.
The functionality of a particular protein may require the binding of a prosthetic group. This property can be exploited to separate such proteins in a method known as affinity chromatography.
An important key point about fractionation steps is that you need to be able to know which fraction contains your protein of interest
- An "assay" is a analytical methodology used to identify a component of interest (i.e. the protein you want to purify)
Before you can purify something, you must have an assay
Ideally, an assay should be:
- Specific
. It detects only your protein of interest, and does not give any "false positives" or "false negatives"
- Sensitive
. Often the process of assaying can be destructive to the sample. Therefore, you don't want to use up all your sample when you assay it.
- Rapid
. You don't want to wait weeks for the result
- Quantitative
. You don't want a simple "yes/no" answer in an assay. In order to be able to determine yield and purity you must have a quantitative assay.
Yield tells you how much protein of interest is recovered after each fractionation step. Combined losses can rapidly deplete your protein. Overall yield is the product of the yield for each fractionation step.
Purity should increase with each fractionation step. The hallmark of a pure protein is that the purity does not increase no matter what additional purification steps are taken.
More detailed information on protein purification and chromatography can be found at:
Protein purification: Assays and Initial Steps in a Purification
Protein purification: Ion exchange, Dialysis and Concentration
Protein purification: Gel Filtration, Hydrophobic Interaction Chromatography, Affinity Chromatography, Plumbing
Protein purification: Running the experiment
Determination of the amino acid sequence of a protein
Proteins are chemically well-defined. All the molecules of a given sample of pure protein have an identical primary sequence.
The analytical chemistry for determination of the sequence of amino acids in a polypeptide has been worked out, and to a large degree, automated.
It should be noted that proteins can also be sequenced by sequencing the DNA that codes for them (and subsequently translating the DNA sequence into protein sequence). Many proteins are actually sequenced in this way. But direct protein sequencing is still important in many cases.
Starting at the amino terminal, the amino acids in a polypeptide can be sequentially identified using a method known as Edman degradation. The practical limit on the "amino terminal sequencing" is about 25-35 "cycles" (i.e. amino acids). The following is a description of the so-called "Edman chemistry" associated with N-terminal peptide sequencing:

- Note that with Edman chemistry only the N-terminal residue is attacked and removed, the rest of the polypeptide remains intact after the reaction.
- The new amino terminal group (previously the second amino acid in the polypeptide chain) is now available for another round of reactions. Thus, the method can be automated.
- The amino acid side chain of the phenylthiohydantoin derivative can be identified using liquid chromatography. Modern amino acid sequencers can probably sequence on the order of two to three dozen cycles (amino acids) of a polypeptide.
- Note that the reaction requires a free amino group on the N-terminal of the protein
. If the amino-terminal residue is methylated or formylated then the reaction will not proceed (and the polypeptide is said to have a "blocked" N-terminal).
- Starting at the carboxyl terminal, the amino acids can be sequentially identified using enzymes called carboxy-peptidases, that sequentially hydrolyze amino acids at the carboxy terminal. This is not well automated (often done by hand), and can typically identify only 5-6 amino acids at the carboxy terminal
- Therefore, only short peptides of ~30 amino acids can be directly sequenced in their entirety. Most proteins require some type of chemical fragmentation to be able to sequence the entire polypeptide sequence.

Peptide Mapping
How can sequence information for the entire polypeptide be obtained?
One method is that of peptide mapping. Peptide mapping makes use of proteolytic cleavages of the polypeptide to produce smaller polypeptides. These smaller polypeptides can then be isolated from one another and subject to sequence analysis.
However, now we have another problem. The product of such proteolytic cleavages are peptide fragments, and although we might be able to separate and sequence the individual peptides, we have no idea what order they are supposed to be in:

How do we order the different sequences that we obtain?
One of the easiest ways is to repeat the experiment, but with a protease with a different specificity, and in this way obtain overlapping sequence information.
|
Name |
Source |
Specificity |
|
Chymotrypsin |
Bovine Pancreas |
Cleavage after Tyr, Phe and Trp; some cleavage after Leu, Met and Ala |
|
Bromelain |
Pineapple |
Cleavage after Lys, Ala and Tyr |
|
Trypsin |
Bovine Pancreas |
Cleavage after Arg, less after Lys |
|
V8 protease |
Staphylococcus aureus |
Cleavage after Glu, less after Asp |

Overlapping sequence information can allow you to align the peptides in the correct order and determine the sequence of the original large polypeptide (i.e. protein).
- Another complication to direct protein sequencing is the effect of disulfide bonds and multiple polypeptide chains in the tertiary structure of a protein:
A single polypeptide will give an unambiguous amino terminal sequence:

|
Cycle |
1 |
2 |
3 |
4 |
|
Amino acid |
Alanine |
Phenylalanine |
Asparagine |
Lysine |
However, a disulfide-linked pair of polypeptides will give an ambiguous sequence:

|
Cycle |
1 |
2 |
3 |
4 |
|
Amino acid(s) |
Alanine, Asparagine |
Proline, Phenylalanine |
Aspartic acid, Asparagine |
Lysine, Methionine |
One of the first steps in protein sequencing is to therefore reduce any disulfide bonds and to separate individual polypeptide chains.
Other considerations prior to amino or carboxyl terminal sequencing
The amino acid composition may be determined using acid hydrolysis. However, this can use up valuable material that could be better put to use in sequencing
Various methodologies can be used to identify whether the protein is a conjugated protein. If prosthetic groups are covalently bound to side chains, this can interfere with identification of the side chains, or the sequencing chemistry
For more information see the following:
Protein Sequencing, Peptide Mapping, Synthetic Genes
Sequence Determination by Mass Spectrometry
Mass spectrometry is a method that separates and quantitates molecules based upon their mass to charge ratio (m/z). It is so accurate that it can assign a mass to a molecule to within 1 Da of accuracy. Therefore, the composition of atoms within the molecule can be accurately identified.
The Nature of Amino Acid Sequences
When scientists first began sequencing proteins there were many unanswered questions regarding proteins and the amino acids.
- For example, there are 20 common amino acids, are they equally represented in protein sequences? (i.e. is each amino acid present present to an equal extent, or 5%?)
- How similar are homologous proteins from different species? Will we find that related organisms have related amino acid sequences (e.g. are rat and mouse hemoglobin more similar to each other than to human hemoglobin?)
- Can we use this information to infer the evolutionary relationship between organisms?
With regard to amino acids in proteins, it was found that while each amino acid can be found in proteins, some (e.g. alanine) are present in larger amounts, and some are relatively infrequent (e.g. Tryptophan):
|
Amino Acid |
Frequency of Occurrence in Proteins (%) |
|
Ala |
9.0 |
|
Arg |
4.7 |
|
Asn |
4.4 |
|
Asp |
5.5 |
|
Cys |
2.8 |
|
Gln |
3.9 |
|
Glu |
6.2 |
|
Gly |
7.5 |
|
His |
2.1 |
|
Ile |
4.6 |
|
Leu |
7.5 |
|
Lys |
7.0 |
|
Met |
1.7 |
|
Phe |
3.5 |
|
Pro |
4.6 |
|
Ser |
7.1 |
|
Thr |
6.0 |
|
Trp |
1.1 |
|
Tyr |
3.5 |
|
Val |
6.9 |
One surprises for scientists who studied homologous proteins between different species, involved the comparison of human with other great apes, in particular, the chimpanzee.
- Human and chimp cytochrome C (a protein involved in electron transport) turned out to be identical to each other. In fact, the first few proteins compared between chimp and human proved to be identical - which led some scientists to joke that the differences between human and chimps were merely cultural. However, further characterization identified amino acid sequence differences between the proteins of humans and chimps. Nonetheless, it was clear that many proteins were highly identical when comparing human and chimp species
Protein sequence analysis provided a way to determine the "similarity" of species on a molecular level.
- "Tree" diagrams could be constructed to reflect molecular similarities
- Comparisons identified groups of organisms that were very similar to each other, and significantly different from other organisms. These groups, or "nodes", could be diagrammed as branch-points in evolutionary trees.
- These trees, based on molecular similarity, were very similar to trees constructed by phylogenetic relationships (i.e. morphological characteristics). Thus, morphological differences have as their basis, sequence differences in proteins. Mechanisms that lead to amino acid mutations in proteins can therefore result in morphological differences.
Shared genetic diseases. Humans, chimps, gorillas, orangutans, bonobos (the "great apes") share certain genetic diseases. For example, all of the great apes have the same defect in a gene for an enzyme necessary to make vitamin C. Thus, all need to get vitamin C from plants in their diet. Monkeys ("lesser apes") don't have this disease. Thus, this mutation occurred in a common ancestor of the great apes after diverging from monkeys, but, prior to diverging into the different species of great apes.
Another side of sequence similarity is the following: Proteins with similar functionalities often have similar tertiary structures, and therefore, similar amino acid sequences
- Oxygen-binding proteins that contain prosthetic iron groups all have similar overall tertiary structures, and presumably evolved from some ancient iron-binding protein
Yet another surprise was related to the utility with which nature can produce a variety of functional proteins using a relatively small "toolbox" of tertiary structures.
- Although a given organism (e.g. bacteria or human) produces around 30,000 different proteins, there are only ten fundamental "superfolds" or tertiary structure categories that have been identified. All known protein structures are variations of these basic structures, or combinations of these structures.
- Rather than "reinvent the wheel", nature appears to achieve new functionalities by mutations introduced into existing protein structures.
© 2001 Dr. Michael Blaber