Effects of library size on in vitro selection

The library

Every in vitro selection experiment begins with a large DNA library of sequences. This library is synthesized chemically where incorporation of each successive nucleoside is randomized. This is denoted in the sequence as N, where N could be an Adenine (A), Guanine (G), Cytosine (C), or Thymine (T). The end result is a pool of DNA sequences whereby each individual molecule is unique. In fact, a selection experiment can contain as many as 1016 different DNA molecules! This is important as these unique molecules may have the ability to adopt various secondary and tertiary structures that are important for function.

Library Length:

A key decision is to be made at the start of every in vitro selection experiment: length of N or often referred to as the random region. At InnovoGENE Biosciences, we offer three options of library length (N30, N40, and N50). The length of the random region dictates the potential diversity of the starting library. A longer random region will create a greater number of unique sequences. The advantage with this option is the possibility of accessing more intricate secondary and tertiary structures. However, extremely long random regions may not necessarily be the most optimal solution. Long random regions are also more susceptible to forming inhibitory motifs that could negatively affect isolating your sequence of interest. In addition, longer and more complex sequences are often out-competed by shorter motifs simply through having fewer copy numbers in the starting library. Even if the longer motif is superior in activity, it can still be an enrichment bias for the shorter and more abundant sequence. The propensity for this bias is often dubbed "the tyranny of short motifs’. Conversely, a shorter randomized library can reduce this bias but will also limit the possible structural complexity required to perform certain functions.

Sequence Sampling:

Every initial library for in vitro selection has a practical limit of sequence sampling. With only four nucleotide building blocks available, the diversity of sequence space is expressed as 4N. Although common lengths of random regions range between N30 to N80, researchers have used library sizes as low as N20 to more than N200. The practical limit of an in vitro selection library will contain ~1016 unique molecules, but the size of the 4N sequence space can greatly outnumber the initial pool. As the size of N increases past ~N25, the starting library becomes under-sampled. For example, a library with N80 will contain approximately 1048 unique sequences, of which only a very small fraction is sampled for in vitro selection (1016/1048).