The number of biological samples which will require storage and distribution is expected to increase very rapidly as attempts to map and sequence the human genome and other genomes are made, and as protein gene products are produced together with either polyvalent antibodies or monoclonal antibody producing cells for each protein. To completely sequence the human genome, approximately ten million DNA fragments averaging 300 bases in length are required (using present sequencing methods). These will be produced by restriction enzyme digestion of longer inserts in lambda phage, which in turn will be produced from cosmid inserts (circa 40 kb), which will in turn be produced from yeast artificial chromosomes (approximately 400 kb). More than one genome equivalent of DNA must be considered since overlapping fragments will be required for ordering. All of these samples will need to be preserved for long periods of time, and be readily accessible to researchers.
The estimates of storage, validation and distribution are high, and can reach $1,500 per sample. Estimates of the total number of samples to be stored and distributed will depend on cost since the cost of storing and distributing smaller inserts may be more than the cost of preparing them as needed from larger inserts or artificial chromosomes. Estimated storage, characterization, data management, and distribution costs were estimated at approximately one quarter of a billion dollars ($250,000,000) at the Aug. 6, 1987 meeting on "The Cost of Human Genome Projects" sponsored by The Office of Technology Assessment of the U.S. Congress. Estimates of the total number of samples needed to be stored for the human genome sequencing project alone run as high as ten million.
Using conventional storage methods, many useful samples may not be stored because re-isolation may be cheaper than storage with present methods and equipment. If storage and distribution costs can be markedly reduced, the burden of re-isolated fragments previously isolated may be reduced. The larger the number of samples that can be efficiently stored and retrieved, the less the amount of effort required for individual investigators.
Since the purpose of any genome sequencing project is to provide both experimental material and data (the sequence and the gene) to individual investigators, it is important to provide as complete a library of samples as possible at the lowest possible cost. Present technology as developed over a period of approximately thirty years at the American Type Culture Collection includes the storage of cells at liquid nitrogen temperatures (-196.degree. C.) and the storage of lyophilized cells at higher temperatures, often at -70.degree. C. (-94.degree. F.). Storage has traditionally been in heat-sealed glass vials. There are the attendant problems of keeping labels attached, of including as large an amount of data on the labels as possible, of maintaining records of sample location (in a large farm of liquid nitrogen tanks and mechanical refrigerators) and of sample origin, classification, and descriptive text, and keeping records of distribution and use experience.