cDNA library

A cDNA library is a collection of cloned protein-coding sequences, taken from a genome, corresponding to all the mRNA that is expressed in a given tissue at a given time. In other words, a cDNA library comprises at least a portion of the transcriptome of the organism. Such libraries offer a 'snapshot view' of gene expression. Analysing the sequences of individual clones taken at random from these libraries allows us to build up “expressed sequence tag” (EST) collections from a wide range of organisms and tissue types. cDNA libraries should be contrasted with genomic libraries, where the entire genome is represented in a collection of clones.


Genes in eukaryotic organisms usually contain regions that do not code for proteins called introns. This means that it is sometimes very difficult to clone the entire coding sequence (i.e. only the exons) of a gene, with any certainty, from a new genome. Most genomic cloning programmes therefore also involve the direct cloning of protein coding sequences, by making complementary DNA (cDNA) copies of messenger RNA, in order to generate a cDNA library. This can be done by using the enzyme reverse transcriptase, which synthesises DNA using RNA as a template.

Eukaryotic mRNA is easy to isolate and copy into a DNA sequence because it has a poly-A tail at its 3' end. This can be annealed to a poly-dT primer in the lab (using a purification column, where unannealed mRNA is eluted), and then extended by reverse transcriptase to make a single-stranded cDNA copy of the messenger sequence. The mRNA template is removed from the nascent cDNA by alkali treatment (this degrades RNA but not DNA) and a poly-G tail is attached. A poly-C primer can then be hybridised and, using DNA polymerase, the single-stranded cDNA can be converted into a double-stranded form. The ds-cDNA is then subsequently methylated (for protective purposes), attached to E. coli linkers, and ligated into a suitable vector for cloning; for cDNA libraries, the usual vector of choice is a plasmid or, for larger coding sequences, a bacteriophage.

The relative representation of cDNA in a given library corresponds to the concentration of mRNA in that tissue type. As such, the isolation of cDNA corresponding to mRNA that is rarely expressed in a given tissue type will require a much larger library than for an abundantly expressed mRNA.

cDNA enrichment:

It is possible to enrich a cDNA library to contain more of an mRNA sequence of interest by selecting mRNA from a particular tissue type where it is likely to be abundantly expressed (e.g. insulin mRNA in pancreatic beta cells).

One procedure involves hybridising all the cDNA from the desired tissue type with all of the mRNA from a tissue type which expresses the desired mRNA in low concentrations. The cDNA that is abundant in one tissue will not hybridise with any of the mRNA in the lowly-expressed tissue; the hybrids can thus be discarded to isolate the desired mRNA. Such hybrid from non-hybrid separation can be done by passing the mixture over a hydroxapatite column, which binds hybridised cDNA:mRNA duplexes. Unhybridised cDNA is eluted and can be made double-stranded and ligated into a vector in the normal way.