Identification of multiple expressed genes, of organisms or of micro-organisms can be based on the presence in their genetic material of specific sequences. Identification and quantification of expressed genes is usually performed after reverse transcribing mRNA into its corresponding cDNA and detection of said cDNA via specific capture molecules present on or attached to micro-arrays. Detection of specific organisms can be performed easily by amplifying a particular sequence of their genomic DNA and then detecting and/or identifying these amplified sequences. Quantification of the target sequences bound to their specific capture molecules allows estimation of the amount of target molecules present in the initial sample. Preferably, appropriate control means are included and the necessary corrections made to take into account the efficiency of the different steps, such as the copying or amplification of target sequences and their capture via hybridization upon a micro-array.
Micro-arrays bearing arrays of nucleotide sequences are being produced mainly according to two methods. The first method, described in U.S. Pat. Nos. 5,510,270 and 5,700,637, is based on photolithographic in situ synthesis of capture molecules or sequences on a solid support. Photolithographic DNA synthesis uses rapid solid phase phosphoramidite chemistry. Positional and sequential control are achieved by a combination of 5′-photoprotected phosphoramidites, which can be activated by irradiation with light, and a set of masks containing holes at appropriate positions through which light can pass. Upon excitation, the photoprotecting groups present on partial oligonucleotide sequences, synthesized earlier during the process, are removed and the oligomers are extended by another nucleotide after adding the relevant monomer. Current coupling efficiencies impose an actual size limit of about 25 bases to these chips. Beyond this limit, incomplete products accumulate.
The photolithographic method results in the presence of short oligonucleotides on a support. Using this methodology, a gene or a gene sequence is identified by a series of capture sequences of the same species rather than by a unique sequence. To be able to control specific hybridization of a particular target sequence, it is necessary to perform for each sequence a control, the control sequence being identical to the initial sequence but for one base difference. The main advantage of the method is the possibility to miniaturize the thus obtained arrays or chips and to generate high-density arrays containing several thousands of capture sequences.
The second method is based on the chemical or enzymatic synthesis of capture sequences before mechanical deposition onto known specific locations of the array substrate. With this method, there is no restriction in the size of the sequences to be spotted, deposited or attached. One major advantage of the deposition technology, compared to the in situ synthesis approach, is its great versatility. This method allows production of micro-arrays or chips for virtually any molecule of interest including but not limited to nucleic acid sequences of any length, antibodies, lipids, carbohydrates, small chemical compounds, etc. Furthermore, the synthesis of sequences can be optimized, sequences can be purified, their quality checked before use and/or their concentration adjusted before coupling to the solid surface. However, one disadvantage of the method is that the process is time-consuming since each sequence has to be handled separately before spotting on the micro-array or chip, thereby limiting the size of the arrays.
The chemistry of a hybridization-to-oligonucleotide micro-array is clearly different from that of an array constructed with long DNA capture sequences or molecules. It has been observed that long specific capture sequences give much better binding of a complementary target sequence present in a solution or sample than their corresponding short fragments. In practice long polynucleotide capture sequences are used for direct binding of long polynucleotide target molecules or sequences. In a typical gene expression experiment, the capture sequences for cDNA binding contain 50 bases or more, for example 70 bases, or may even contain 600 bases or nucleotides.
When short oligonucleotides of 15-20 bases only are used as capture sequences (see e.g. U.S. Pat. No. 5,510,270) adequate detection, identification and/or quantification of long cDNAs is possible provided some modifications to the detection protocol. The RNA is first reverse transcribed into its corresponding cDNA by using a primer carrying a transcription start site for T7 RNA polymerase. This cDNA is then retranscribed in vitro into several RNA copies which are then cut into small pieces. These small RNA fragments are then used for hybridization on arrays bearing a series of capture sequences for each of these RNA fragments. Fragmentation is necessary to ensure sufficient access of the target RNA sequences to the very short capture sequences. Specific algorithms are required to adequately correlate the hybridization pattern of these different capture molecules with the original sequence(s) of the target DNA or mRNA.
A similar adaptation is made for the detection of double stranded DNA (dsDNA), which will preferentially re-associate in solution rather than being hybridized on capture sequences present on or attached to a solid substrate. Again the amplicons have to be retranscribed into RNA using a double amplification process performed with primer(s) bearing T3 or T7 sequences and then a retrotranscription with a RNA polymerase. These RNAs are cut into pieces of about 40 bases before being detected on an array (see e.g. example 1 of international patent application WO97/29212). The above technique was herein applied for the identification of the Mycobacterium tuberculosis rpoB gene, using capture nucleotide sequences of less than 30 nucleotides. The described method is complicated in the sense that it does not allow direct detection of amplicons resulting from genetic amplification reactions (such as PCR), but requires another cycle of reactions and copying of target sequences, which each introduce extra bias in the quantification of said target sequences.
Despite some disadvantages, the construction of micro-arrays via chemical synthesis and deposition of oligonucleotides or short polynucleotide sequences, is useful, since it is a fast and low-price process. In addition to that, the design of capture molecules or sequences can be easily adapted according to the requirements of the sequences to be analyzed or discriminated.
Capture molecules do not necessarily have to consist of nucleotide sequences. Target molecules may as well be antibodies, antigen, receptors or ligands to be detected or captured by binding to their respective corresponding counterparts (antibody, antigen, ligand, receptor, . . . ). The above list of examples is non-exhaustive.
A customer is not always served with standard micro-arrays even though these may allow detection of a large amount of different target molecules. He may want to detect target molecules that are not custom, or may want to refine the level of discrimination. There is thus an ever increasing demand for customized or semi-customized micro-arrays to which the customer can add, at his wish, some extra detection molecules. The basic or standard micro-array that is further customized possibly already contains many different capture molecules for well determined gene detection.
Standard micro-arrays can be constructed and delivered to many users which can then use them for detection of some well defined target genes beside other target genes which would change from one application to the other or from one user to the other.