Genetic code


The genetic code is the set of rules that maps trinucleotide sequences in DNA or mRNA (called codons) to specific amino acids in the protein product. The code is employed universally by living cells and viruses alike, although there are occasional exceptions, for instance in the mitochondrial genetic code.

The genetic code was discovered by an experiment by Nirenberg & Matthaei where translation was performed in a cell-free environment. A poly-uracil mRNA sequence was translated in vitro and the protein product contained only phenylalanine residues. This confirmed that the codon UUU specified the amino acid phenylalanine. Similarly, a poly-cytosine mRNA produced a poly-proline protein. Repeated experiments such as these enabled the scientists to determine the amino acids coded for by 54 of the 64 existing codons.

Not all codons exclusively encode amino acids. Some act as start codons or stop codons, instructing initiation or termination of polypeptide synthesis, respectively.

  • The start codon is ATG (or AUG) and this always encodes methionine in eukaryotes or formyl-methionine in bacteria
  • There are three possible stop codons (sometimes referred to as nonsense codons):
    • The amber stop codon, TAG (or UAG), which also encodes pyrrolysine
    • The ochre stop codon, TAA (or UAA)
    • The opal stop codon, TGA (or UGA), which also encodes selenocysteine
  • Mutations which generate a stop codon from an otherwise amino acid-coding codon are called nonsense mutations and may be subdivided according to which type of stop codon is incorporated, i.e. there are amber, ochre and opal mutations.

The reading frame refers to the alignment which should be used to read the triplet codons of DNA. The reading frame ensures that the genetic code is non-overlapping, i.e. the last two nucleotides from one codon can't be used with the first nucleotide of its adjacent partner to make another codon.

5'...AAA|AGA|ACT|TGC...3'
can only be read like that, and not like:
5'...AA|AAG|AAC|TTG|C...3'

However, the reading frame of a DNA sequence can be disrupted by insertions or deletions of bases in non-multiples of 3. Such mutations are aptly known as frame-shift mutations.

There are 64 codons in the genetic code because at any one nucleotide position in DNA, each base has a 1 in 4 chance of appearing. When this is applied to the three nucleotide positions of a codon triplet, you end up with 4 x 4 x 4 = 64. Because 64 codons are used to code for only 20 amino acids, many amino acids can be coded for by more than one codon - this phenomenon is called degeneracy of the genetic code, and is the reason why some point mutations bear no effect on the amino acid coded for in the protein product.

Codons are transcribed from DNA to mRNA in transcription, and recognised by a complementary anticodon sequence in an aminoacyl-tRNA. So the specificity of the amino acid incorporated is ensured by a precise codon-anticodon base pairing recognition. However, because there are <45 tRNA species in most organisms and 64 possible codons, there may be some non-Watson-Crick base-pairing between the mRNA and tRNA called wobble base pairing. These base pairs are less thermodynamically stable than the conventional ones, but enable multiple mRNA codons to bind to the same tRNA anticodon.

The complete genetic code is described by the table below.

external image GeneticCode.png