The present invention relates to a method for efficiently screening an enormous number of test samples. More specifically, it relates to a method for efficiently detecting the correlation between test samples in groups A and B when they have material interactions between them. For example, the method of the present invention is especially useful for genomic structural analysis.
Research on genomic structural analysis these days can be classified as xe2x80x9cgenome mappingxe2x80x9d and xe2x80x9csequencing.xe2x80x9d In genome mapping, the chromosomal DNA structure is reconstructed by mapping the genome using various techniques and by aligning many fragmented genomic DNAs. In sequencing, the nucleotide sequence of genomic DNA is clarified by determining nucleotide sequences of aligned DNA fragments. The present invention is especially useful for genome mapping.
Previously, when test samples in groups A and B materially interacted, detecting their correlation required examining whether or not each test sample successively withdrawn one-by-one from group A corresponded to each test sample in group B in the interaction. Therefore, if group A consists of m samples and Group B consists of n test samples, (m)xc3x97(n) screenings were required.
An example of such a conventional screening method is the method for correlating Sequence Tagged Site (STS) markers (group A) with Bacterial Artificial Chromosome (BAC) clones (group B). STS is a concept for systematically marking the human genome (Olson, M. et al., Science 245: 1434-1435, 1989). STSs consist of short nucleotide sequences of about 200 to 300 bp and possess a sequence which cannot be found in other sites of the genome. Accordingly, the same STS contained in multiple clones indicates that these clones share common regions. By performing PCR using primers designed for STS with genomic DNA as the template, the amplified product of a length corresponding to the STS can be confirmed as a single band (S. B. Primrose, xe2x80x9cPrinciples of Genome Analysisxe2x80x9d Blackwell Science Ltd., 1995). In the combination of STS markers (group A) and BAC clones (group B), a BAC clone corresponding to an STS marker is used to be detected by a method based on PCR screening or hybridization screening STS markers one-by-one and BAC clones.
Physical mapping using STS has been used in a limited field by many researchers. The region of the causative gene for cystic fibrosis covered by 30 YAC clones in chromosome 7 has been integrated into a single aligned clone of more than 1.5 Mb (Green and Olson, Science 250: 94-98, 1990). Foote et al. succeeded in aligning 196 clones covering more than 98% of the euchromatin region of the human Y chromosome (Foote et al., Science 258: 60-66, 1992). A YAC-STS integrated map, a combination of physical map with genetic map, on the long arm (q) of chromosome 21 has been prepared (Chumakov et al., Nature 359, 380-387, 1991).
These days, a combination of BAC libraries or PAC (P1-derived artificial chromosome) libraries with STS markers enables covering most human genome. However, a method for efficiently detecting numerous correspondences of STSs to such large-scale libraries has not been established.
In these conventional methods, the increasing number of test samples in groups A and B results in a geometrically progressive increase in the number of screenings, requiring enormous amounts of time and labor.
For example, in genome mapping in general, screening of DNA libraries usually requires numerous repetitions of filter hybridizations or a series of PCR assays of prepared for each library(Asakawa et al., Gene 191: 69-79, 1997). Therefore, to align library clones covering the entire human genome, numerous combinations of DNA libraries and probes must be screened.
The utilization of DNA chips for genomic analysis is highly expected to facilitate more speedy screening. Since oligonucleotides of desired nucleotide sequences can be cumulated in high density on DNA chips, the hybridization assay can be carried out for numerous combinations by a single hybridization. In fact, the mapping of 256 varieties of STS markers against yeast cosmid clone using DNA chips has been reported (Sapolsky, R. J. et al., Genomics 33, 445-456, 1996). However, according to the conventional approach to these problems, the correlation must be examined for all probable combinations of DNAs and probes as before, even with DNA chips. The utilization of DNA chips as such thus does not provide a novel principle enabling the efficient detection of correlation for numerous combinations.
The present inventors considered that the utilization of mixed test samples might reduce the work for detecting the interaction between test samples. Naturally, the random mixing of test samples is not useful for the final clarification of correlation based on their interactions. By systematically mixing test samples in group A based on binary notation and identifying the interactions between this mixture and test samples in group B, the present inventors have found that the correlation between the groups can be more efficiently, identified than prior methods accomplishing this invention.
An objective of this invention is to efficiently detect the correlation by the following method, when test samples (Ai) in group A correlate to test samples (Bj) in group B based on material interactions. Namely, the present invention relates to the following method:
The xe2x80x9ckxe2x80x9d in the phrase xe2x80x9cthe k-th bitxe2x80x9d is the integer at the xe2x80x9ckxe2x80x9d position in the numbering of a bit starting from the right of the figure. For example, the k-th bit of xe2x80x9c110xe2x80x9d where k equals 1 is xe2x80x9c0xe2x80x9d.
[1] a method for determining a combination of test samples out of those constituting groups A and B which correlate physically, chemically or biologically, wherein said method comprises the following steps:
(1) providing m (2nxe2x88x921= less than m= less than 2nxe2x88x921 where m and n are natural numbers;, m greater than =3, n greater than =2) test samples Ai (3= less than i= less than m) in group A and x (x is a natural number) test samples Bj (1= less than j= less than x) in group B,
(2) assigning a g-bit (n= less than g) ID number based on the binary notation to each test sample Ai in group A,
(3) mixing test samples Ai in group A having xe2x80x9c1xe2x80x9d for the first bit of ID numbers based on binary notation to make mixture C1, and similarly mixing test samples Ai in group A having xe2x80x9c1xe2x80x9d for the k-th (1= less than k= less than g) bit of ID numbers to make mixture Ck, thus obtaining g-varieties of mixtures comprising mixtures from C1 through Cg,
(4) detecting the interaction of each of g varieties of mixtures from C1 through Cg with test samples Bj in group B,
(5) determining g-bit binary numbers having xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d for the k-th bit by assigning xe2x80x9c1xe2x80x9d when the interaction is detected between each mixture constituting mixtures from C1 through Cg and Bj in group B, and xe2x80x9c0xe2x80x9d when no interaction is detected, and
(6) determining the correlation between test sample Ai in group A and test sample Bj in group B by referring test sample Ai in group A to the corresponding binary number obtained; (5)
[2] the method of [1], wherein the correlation between test samples involves the interaction between test samples constituting group A and group B;
[3] the method of [2], wherein the correlation based on the interaction between test samples is in the ratio of 1:1 or 1:many;
[4] the method of [1], wherein g is n;
[5] the method of [4], wherein each test samples in group A is assigned an individual ID numbers up to 2n1 to test samples;
[6] the method of any one of [1] through [5], wherein said method comprises steps for detecting the interaction between mixture Ca obtained by mixing all test samples in group A and test sample Bj in group B;
[7] a method for determining a combination of test samples out of those constituting groups A and B and correlating them physically, chemically or biologically, wherein said method comprises assigning ID numbers of more than two series to one test sample in group A followed by the repetition of the method of [1];
[8] the method of any one of [1] through [7], wherein said test sample in group A is an oligonucleotide and said test sample in group B is DNA; and
[9] an STS mapping method comprising performing the method of [8], wherein said method uses STS markers as test samples in group A and genome libraries as test samples in group B.
The principle of the present invention is as follows.
(1) First, xe2x80x9cmxe2x80x9d (2nxe2x88x921= less than m= less than 2nxe2x88x921 where m and n are natural numbers; m greater than =3, n greater than =2) test samples Ai (3= less than i= less than m) in group A and xe2x80x9cxxe2x80x9d (x is a natural number) test samples Bj (1= less than j= less than x) in group B are provided, and
(2) each test sample Ai in group A is assigned a number based on binary notation.
An ID number is systematically assigned by converting the number based on decimal notation to binary notation as shown in Table 1 (wherein p, q, r, P, Q, and R are xe2x80x9c0xe2x80x9d or xe2x80x9c1xe2x80x9d). In order to assign each test sample an individual number based on the binary notation, 2n31 1 numbers with n-bit are required. In the following, the number assigned to each test sample in group A may be referred to as the ID number.
The mixtures are then prepared by mixing test samples in group A according to the following procedure (3).
(3) Test samples Ai in group A having xe2x80x9c1xe2x80x9d for the first bit of the ID number based on binary notation are mixed to make mixture C1. Similarly, test samples Ai in group A having xe2x80x9c1xe2x80x9d for the k-th bit of ID numbers are mixed to make mixture Ck (1= less than k= less than n) so that n different mixtures from C1 through Cn are obtained.
The criteria for preparing this mixture are shown in Table 2, wherein Cn-C1 represent mixtures, and xe2x80x9c1xe2x80x9d indicates the addition of each test sample in group A to each mixture, and xe2x80x9c0xe2x80x9d indicates no addition.
By utilizing mixtures thus obtained and performing procedures (4) to (5) below, the binary numbers by which test samples in group A are specified are determined.
(4) With each of these n-varieties of mixtures from C1 through Cn, the interaction of test samples Bj in group B is detected, and
(5) With each mixture Ck constituting mixtures from C1 through Cn, Ck is assigned xe2x80x9c1xe2x80x9d when the interaction of test sample Bj in group B is detected, and xe2x80x9c0xe2x80x9d when the interaction is not detected, thus determining an n-bit binary number having xe2x80x9c0xe2x80x9d or xe2x80x9c1xe2x80x9d for the k-th bit.
Procedures for determining the binary numbers are summarized in Table 3. In this table, the presence of interaction of mixtures Cn-C1 in each of the test samples constituting group B is indicated as xe2x80x9c1,xe2x80x9d and the absence, as xe2x80x9c0.xe2x80x9d Based on this table, the k-th bit represents the detection result of the interaction of each test sample mixture Ck in group A with a specific test sample in group B.
Finally, procedure (6) determines the test sample in group A corresponding to a specific test sample in group B.
(6) The correlation between test sample Ai in group A and test sample Bj in group B is determined by referring to the test sample Ai in group A which corresponds to the binary number obtained in (5).
In the present invention, the physical, chemical or biological correlation between test samples constituting groups A and B means the relationship by which test samples in both groups interact as mediated by the physical, chemical or biological reaction. Preferably, this reaction is specific so that it can be detected only between specific test samples. Also, the correlation in interactions between test samples in group A and those in group B means that test samples in group A and those in group B are somehow correlated so that their relationship can be clarified by detecting their interaction. For example, interactions between test samples can be represented by the binding reaction based on the specific affinity. More specifically, the binding reaction is exemplified by hybridization of nucleic acid, antigen-antibody reaction, various ligand-receptor reactions, or enzyme-substrate reactions.
In addition, the interaction can be not only the binding reaction between materials but also the functional combination associated with the signal transduction. The functional combination can be exemplified by the combination triggering the transcriptional initiation or signal transduction associated with the binding of the transcriptional regulatory factor to the transcriptional regulatory region, or the binding of the agonist compound to the membrane receptor. In the present invention, the correlation between test samples in both groups is not limited to 1:1, and can be 1:many as shown in FIG. 1. Preferably, in this invention, the correlation of test samples in group A with those in group B is as close to 1:many or 1:1 as possible. However, the correlation can be accurately detected even in a many:1 relation, for example according to the method described below utilizing this invention.
The method of the present invention can be applied to any combination if the interaction between test samples (Ai) in group A and test samples (Bj) in group B can be detected. However, since it is necessary to use mixtures of test samples in group A, the failure to detect the interaction with test samples in group B owing to mixing of test samples belonging to group A should be avoided. The present invention can be utilized when the correlation between two groups must be detected. More preferably, the correlation can be detected based on the interaction between materials. This invention is especially useful for assaying numerous test samples such as in genomic analysis, high throughput screening, combinatorial chemistry, or etc.
For example, group A can be oligonucleotide markers (such as STS markers, VNTR, RFLP, or microsatellite), and group B, DNA library clones (genomic library clones such as BAC, PAC, P1, YAC, cosmid vectors or etc.). This invention may thus be used for the binding assay for a gene using numerous transcriptional regulatory factors and analogues thereof, proteins and binding proteins, antigens and antibodies, enzymes and substrates, etc. Furthermore, this invention can be used for mapping cDNA and EST to genome. In particular, genome mapping, wherein numerous repetitions of screening are required and the desirable correlation between test samples in two groups (that is, 1:1, or 1:many) can be expected, is a useful field of application of the invention. The present inventors have designated the method of using this invention in genome analyses as xe2x80x9cdigital hybridization screening.xe2x80x9d
When there are three or more test samples (m) in group A, the use of this invention can reduce the number of screenings as compared with identifying the interactions with all combinations of test samples. For example, when m is three, the number of screenings (n) becomes two; when m is seven, n becomes three; and when m is 123, n becomes seven (2nxe2x88x921= less than m= less than 2nxe2x88x921; m and n are natural numbers, m greater than =3, n greater than =2). According to the present invention, seven repeated confirmations of interaction will thus give the same results as by confirming interactions with each of the 123 test samples. In this invention, screening efficiency increases with the number of test samples in group A.
In addition, if the possibility that test samples constituting group B will definitely correspond to any of the test samples in group A is previously guaranteed, there is theoretically an exceptional relation between m and n. In such a situation, all negative values (e.g., 00000000) can be corresponded. Therefore, m and n are related as 2nxe2x88x921+1= less than m= less than 2n where m and n are natural numbers and m greater than =3, n greater than =2.
However, in some cases, having too many test samples in group A will decrease the sensitivity for detecting the correlation between test samples. The decreased sensitivity can be avoided by performing the present invention after dividing test samples in group A into appropriate subgroups. In some other cases, an increasing number of test samples in group A will increase the possibility that the correlation becomes xe2x80x9cmany:1. xe2x80x9d Increasing the correlation of xe2x80x9cmany:1xe2x80x9d may reduce the accuracy of detecting correlation. A high possibility for a correlation of xe2x80x9cmany:1xe2x80x9d may occur when DNA markers that are assumed to be very closely localized to the test sample in group A are used and the test sample in group B is a DNA library such as BAC clones. In such a case, DNA markers constituting group A are first collected in one group. The present invention is then applied with this group thus formed as one of the test samples in group A. Collecting test samples into one group means assigning the same ID number to different test samples in group A. BAC clones which are clearly not correlated should be selected out at this stage and further subjected to a secondary screening by an appropriate method. Alternatively, for analogues of chemical compounds, screening should be performed with a group of analogues as one test sample then subjecting the analogues to a secondary screening by a suitable method. The secondary screening can be performed by individually identifying the mutual relations between test samples. For example, in order to detect the mutual relations between BAC clones and STS by secondary screening, hybridization using individual probes or PCR using STS primers with the BAC clone as the template is performed to determine the correlation.
There are, however, no particular limitations in the number of test samples (x) in group B; the efficiency increases with increasing the number (x).
The number assigned to each test sample Ai in group A can be made an ID specific to the sample by selecting a suitable number up to 2nxe2x88x921. In general, the ID number should be unique to each test sample. However, as described above, it is also possible to collectively give the same ID number to different test samples as one group. When there are close to 2nxe2x88x921 test samples, preferably the ID numbers may be assigned randomly instead of assigning numbers sequentially from 1 so that numbers are properly distributed among test samples. Also, preferably the difference in the number of test samples Ai in group A contained in each mixture Ck may be minimized. For example, if Ai comprises 64 test samples, only A64 (1000000) has xe2x80x9c1xe2x80x9d for the seventh bit of the ID number when numbers are assigned sequentially from 1. Therefore, each mixtures from C1 up to C6 comprises 32 test samples, whereas mixture C7 contain only one test sample, A64 (1000000). This situation does not substantially affect the digital hybridization screening at all. However, equalizing the numbers of test samples comprising each mixture Ck can be expected to standardize the labeling and to make the background level uniform. More specifically, it is possible to make the difference in the number of test samples comprising each mixture Ck one or less. Even in the case of 64 test samples above, it is possible to make the difference in the number of test samples in each mixture 1 by assigning successive numbers from the 32nd to 95th based on decimal notation (from 0100000 to 1011111 in binary notation). By other numbering, it is also possible to distribute 64 test samples among mixtures wherein the difference in the number of test samples in each mixture is one or less.
In the present invention, test sample Ai in group A is assigned an ID number based on the binary notation and used for to prepare mixture Ck. By using the binary notation, the results of interaction detection can be directly correlated with the numeral for each bit. In binary notation, a number is generally written with xe2x80x9c1xe2x80x9d and xe2x80x9c0xe2x80x9d. However, in the present invention, a binary number can be expressed with symbols other than xe2x80x9c1xe2x80x9d and xe2x80x9c0xe2x80x9d since xe2x80x9c1xe2x80x9d or xe2x80x9c0xe2x80x9d indicates the presence or absence of a bit in binary notation and does not limit the invention to using only xe2x80x9c1xe2x80x9d or xe2x80x9c0.xe2x80x9d
In preparing mixture Ck (1= less than k= less than n) which comprises test samples xe2x80x9cAixe2x80x9ds in group A, the amount of test sample Ai in group A comprising each mixture is not particularly limited. In some cases, for example, mixing Ai in an equal amount or an equimolar amount produces a homogeneous rate of interaction with test samples in group B, facilitating the easy interpretation of detection results. If there is a difference in the labeling efficiency among respective xe2x80x9cAixe2x80x9ds or in the rate of interaction between groups A and B owing to the combination, it is possible to equalize the rate of interaction by modulating the ratio of xe2x80x9cAixe2x80x9ds. For example, the present invention can use a mixture of probes comprising radioisotope-labeling DNA by two different methods, the kination method and the random oligomer elongation method. In such a case, since the intensity of signals obtained from each probe is expected to be different, an equalized signal intensity cannot be achieved by an equal amount mixing. Therefore, we can attempt to equalize the signal intensity by first measuring the labeling efficiency and signal intensity of each probe, and, based on the result, adjusting the mixing ratio of probes.
In the present invention, drugs and solvents which do not adversely affect screening may be added to mixture Ck, in addition to test samples xe2x80x9cAixe2x80x9ds in group A. If the test sample in group A may adversely affect the screening by interaction, an agent may be added to reduce the harmful effect.
Mixture Ca comprising all test samples in group A may be used to enhance the fidelity of the present invention. If results showing the interaction are not obtained due to the reaction between Ca and test samples in group B, either there is no combination which causes an interaction between them or there is a problem in detecting the interaction.
The interactions of n varieties of mixtures from C1 through Cn with each of test samples Bj in group B can be detected by any methods for identifying their interaction. For example, interactions between oligonucleotides and DNAs can be detected using hybridization between them as the marker.
In order to describe the present invention more understandably, a hypothetical screening of 24 clones as group B performed with six DNA markers as group A is diagrammatically represented here. These six DNA marker probes were specified by assigning a three-bit ID number to each probe, for example, xe2x80x9c001xe2x80x9d to DNA probe 1 and xe2x80x9c010xe2x80x9d to DNA probe 2 (Table 4).
These DNA probes were then mixed in the combinations shown in Table 4 to prepare four probe mixtures, M1, M2, M3, and MA. In the table, xe2x80x9c1xe2x80x9d indicates the presence of the marker and xe2x80x9c0xe2x80x9d indicates its absence. Four probe mixtures (MA, M1, M2, and M3) were provided, and hybridization was carried out separately with each of these probe mixtures against four identical clone filters (FIG. 2). The probe mixture MA, containing all six probes, detected four clones (A, B, C, and D), whereas other probe mixtures detected these same clones in different combinations. The hybridization pattern was then examined for individual clones, for example, clone A was positive for probe mixtures M1 and M2 but negative for M3, resulting in the three-bit pattern xe2x80x9c011xe2x80x9d. This matrix pattern indicated that clone A is correlated with a specific DNA probe 3 (Table 5). Similarly, the remaining three clones were correlated with specific DNA probes.
In DNA library screening using DNA probes, screening fidelity can be enhanced as follows.
(1) Use double-offset filters or two replica filters for a single series of probe mixtures to prevent erroneous results caused by false-positive and false-negative signals. These filters are all to yield the same detection results as the original filter. Accordingly, they must generate identical signal patterns for the same series of probe mixtures. Any difference in the signal detection pattern indicates that either of results is erroneous.
(2) Adding a parity bit to each ID number will standardize the number of xe2x80x9c1xe2x80x9d based on binary notation to an even number (Table 6). That is, test samples of group A having an odd number of xe2x80x9c1xe2x80x9d constituting the ID number are collected to provide mixture Co, and signals from test samples in group B for this mixture are simply recorded. When an interaction between test samples in group B and this mixture Co is detected, a parity bit of xe2x80x9c1xe2x80x9d is added to the ID number. When a test sample in group B which interact with test samples in group A originally have an even number of xe2x80x9c1xe2x80x9d in the ID number, the interaction is not to be detected with said mixture Co, therefore the parity bit is always 0. As a result, the number of 1""s in the ID number+parity bit is always an even number with all test samples in group B. One bit for the parity bit is added to seven-bit figures to determine the ID number obtained through the above-described procedures, and the resulting eight-bit figures are referred to as test samples in group A.
The xe2x80x9c1xe2x80x9d in the finally decided eight-bit figures are counted. If a signal is missing or excessive for any bit, the number of xe2x80x9c1xe2x80x9d will become an odd number, indicating trouble. A hypothetical case in which the number of xe2x80x9c1xe2x80x9d constituting the ID bit and parity bit is standardized to an even number is described. Obviously, the number of bits can also be standardized to an odd number. When standardized to an odd number, an even number of xe2x80x9c1xe2x80x9d indicates trouble. Thus, the fidelity of the screening method in the present invention can be enhanced by using only one additional series of filter.
(3) Performing another screening with a reverse bit assigned to each ID of an STS will also remarkably enhance the fidelity of the screening method of the present invention (Table 7).
If the screening result is precise, ID (screening 1) plus ID (screening 2) is always 1111111. If any signals are missing, ID (screening 1) plus ID (screening 2) is always  less than 1111111. If more than one STS hybridizes with a single clone, ID (screening 1) plus ID (screening 2) is always  greater than 1111111. This strategy requires two screenings.
Therefore, the maximum advantage of performing screenings 1 and 2 is that a combination of probe and clone in an accurate correlation can be detected. In contrast, with the result obtained by either screening 1 or screening 2 alone, it is impossible to discriminate correct cases, incorrect cases, and cases in which more than one probe hybridizes with a single clone, resulting in mixed data in these cases. If most results of either screening are correct, two screenings are not required. However, in general, two screenings provide a significant capability to distinguish the correct result from the incorrect ones.
Performing two screenings enables not only enhancing the fidelity of screening but also separating multiple signals caused by the many:1 correspondence. The separation of signals will be specifically described in the following. For example, a single STS probe is assigned binary ID numbers in two different series, FORWARD ID and REVERSE ID. FORWARD ID and REVERSE ID are assigned binary numbers in an independent series, and mixtures in different combinations are prepared. The correlation with clones constituting libraries will be determined separately based on the present invention. In such an embodiment with FORWARD ID and REVERSE ID, the following natural number relation is independently established:
2nxe2x88x921= less than m= less than 2nxe2x88x921,
where m and n are natural numbers, m greater than =3, and n greater than =2.
FORWARD ID and REVERSE ID for a single STS probe are assigned so that their sum always has xe2x80x9c1xe2x80x9d in all bits. When one probe corresponds to a certain clone, the sum of FORWARD ID and REVERSE ID determined for that clone must thus have xe2x80x9c1xe2x80x9d in all bits. For example, in eight-bit IDs, the sum will become xe2x80x9c11111111.xe2x80x9d However, when two STS probes hybridize with one particular clone, the sum of FORWARD ID and REVERSE ID derived from the hybridization contains xe2x80x9c2xe2x80x9d for more than one bit such as xe2x80x9c11221212xe2x80x9d for an eight-bit number. The addition here does not follow binary notation, but, for convenience""s sake, follows the notation 1+1=2, 0+1=1, 1+0=1, and 0+0=0 for each bit, because this is more easily understood than binary notation. However, this invention is not limited to such an expression pattern. The expression pattern need only dearly indicate whether both FORWARD ID and REVERSE ID, only one of them, or neither of them are obtained for each bit. In such a case, it is possible to deduce which two of the STS probes hybridize with a clone according to the following concept. The presence of xe2x80x9c2""sxe2x80x9d in only one place, that is, the expression of xe2x80x9c2xe2x80x9d for only 1 bit as in xe2x80x9c11121111,xe2x80x9d indicates the interaction with one combination of STS probes (two probes), enabling the straight forward identification of both STS probes.
Theoretically, as the number of bits containing xe2x80x9c2xe2x80x9d increases to 2, 3, 4, 5, 6, 7, or 8, the number of STS probes corresponding to the target increases to 4, 8, 16, 32, 64, 128, or 254 and there are thus 2, 4, 8, 16, 32, 64, or 127 different combinations of probe pairs corresponding to the target. That is, when xe2x80x9c2xe2x80x9d are expressed in 2, 3, 4, and 5 bits, the possible correspondence of STS probes can be narrowed to 4, 8, 16, and 32 varieties (half of the combinations). If one or two probes correspond to one particular clone, their correlation can be narrowed in this way.
When screenings 1 and 2 are performed, the signals generated by hybridization of two probes with one particular clone can often be separated into the component signals even if the result is expressed by a single kind of signal such as an autoradiogram based on radioisotope labeling, instead of the below-described multicolor probe. First, each probe can be identified if there is one xe2x80x9c2xe2x80x9d as described above even when the difference in signal intensities generated by each probe is small. In addition, each probe can be identified when there is a definite difference between the signals, even when there are two or more xe2x80x9c2xe2x80x9d. For example, the sum of FORWARD ID (1111100) and REVERSE ID (11111111) becomes xe2x80x9c22222211,xe2x80x9d indicating the hybridization of more than two probes. Here, we assume that these are strong and weak signals and can be distinguished. If we represent the strong and weak signals for the bit having 2 as the sum by S1 and W1, the above results may be expressed as FORWARD ID (S1, W1, S1, S1, S1, W1, 0, 0) and REVERSE ID (W1, S1, W1, W1, W1, S1, 1, 1). From these results, the FORWARD ID of the probe expressing the strong signal is 10111000, and the FORWARD ID of the probe expressing the weak signal is 01000100. Although three or more STS probes may be hybridized, the unnecessary secondary or tertiary screenings may be omitted when the correlation can be confirmed using the probe having the ID number thus isolated. Furthermore, the method of using two eight-bit ID numbers is equivalent to assigning one 16-bit ID number. Thus, the fidelity of the method of the present invention can be enhanced by using an additional bit for the minimally required bits. xe2x80x9cMinimally required bitsxe2x80x9d means the number of bits (n) needed for assigning individual ID numbers to test samples in group A. For this purpose, additional bits are provided by adding a desired number of bits to the minimally required bits to produce a g-bit ID number.
In many cases in which two probes are assumed to correspond to one particular clone, the number of combinations of two STS probes assumed to interact can be narrowed by using additional bits. For example, if the addition of n bits is sufficient, the number of probe combinations can be reduced to xc2xc to {fraction (1/16)} of that when the screening is carried out with probes having an n-bit ID number when screenings 1 and 2 are performed after assigning an (n+2)-bit ID number to STS probes by using an additional two bits. The more bits added, the greater the reduction becomes. However, the necessary number of bits should be determined considering the frequency of correspondence of two probes to one clone since additional probe mixtures must be prepared in proportion to the additional bits.
(4) Another screening with a second series of oligonucleotide probes (such as reverse primers) will decrease the number of false-positive signals.
(5) Certain probes may hybridize with multiple clones if they contain multicopy sequences such as long and short repetitive sequences. This would interfere with determining the proper ID number. These repetitive sequences, if known, should be eliminated by performing a careful homology search in computer databases. In practice, a preliminary experiment should generally be performed to find and eliminate undesired STS probes prior to actual screening. When group A contains oligonucleotide probes having affinity to multicopy sequences such as long and short repetitive sequences, the identical reaction pattern for the same clone may be observed among different probe mixtures. Therefore, the oligonucleotides to be eliminated will be detected using this unique reaction pattern as a marker. In this invention, probes causing such a non-specific reaction are designated as bad background oligo-probes (BBO). The concepts of steps (1) through (5) can be used not only singly but also in proper combinations, assigning various choices for enhancing the fidelity of screening according to this invention.
The interacting combination of STS markers and genome library can be detected by reacting probe mixtures (group A) comprising labeled STS markers with genome libraries (group B) fixed on filters or DNA chips. Methods for fixing a genome library to filters are known. Although there are several tens of thousands to several hundreds of thousands of clones in each genome library, an efficient assay system can be constructed by using a high-density filter on which several thousand varieties of clones can be fixed. DNA chips are also useful for fixing DNA at a high density. Each clone of the genome library is separately fixed in a grid on the DNA chip and reacted with the mixture of fluorescence-labeled probes. The grid showing a positive hybridization reaction is then determined. The treating capability is remarkably enhanced as compared with the conventional method requiring one by one reaction by reacting the probe mixture with a highly cumulated library.
As described above, more correct results are obtained in the method for determining the correlation according to the present invention, when the number of test samples in group B correlated to those in group A is in a ratio of 1:1, or 1: many. This is because, in principle, only two kinds of data, positive xe2x80x9c1xe2x80x9d or negative xe2x80x9c0,xe2x80x9d can be obtained when radioisotope labels are detected by autoradiography. When two or more probes detect one target, the correct ID number cannot be derived owing to the overlapped signals. However, the correct correlation will be found when a special labeling method is used as described below even with a many:1 correspondence. This method will be described with reference to an example for finding the correlation between STS probes and the genome library.
For example, STS probes are assigned many varieties of labeling to generate distinguishable signals. Such labeling is exemplified by that with fluorescence pigments of different fluorescence wavelengths and pigments having different colors. When all probes are labeled so as to give the same signal, it is difficult to distinguish the correspondence if many probes correspond to a particular clone (that is, a many:1 correspondence). However, when many probes have distinguishable different signals, it is possible to clarify which co-present probe reacts, enabling the separation of signals. For example, if two kinds of probes correspond to a single clone, a five-color labeling theoretically enables the identification of probes in 80% of the cases. In the present invention, such multilabeled probes enabling the determination of more correct correlation is designated multicolor probes.
It is also possible to minimize the number of ID numbers by using a multicolor probe because even though the identical ID number is assigned to different test samples, they can be distinguished by the difference in labeling. Using this characteristic, it is theoretically possible to reduce the number of varieties of mixtures (the number of reactions for establishing the correspondence) in proportion to the number of varieties of labeling. For example, when the identical ID number is assigned to five different probes using five varieties of labels, the number of mixtures will be reduced to ⅕.
By using the method for determining the interacting combination of the present invention, it is possible to efficiently perform not only genome analysis but also the screening of transcriptional regulatory agents and agonist compounds for the membrane receptor. These applications will be specifically described in the following.
The present invention can be applied to screening transcriptional regulators corresponding to target genes. The present invention allows simultaneous screening of the activity of candidate compounds in the transcriptional regulatory region corresponding to each of numerous target genes not to a single target. Transcriptional evaluation plasmids are prepared by inserting the structure formed by replacing the coding region of each gene with a reporter gene and linking it to the. transcriptional regulatory region localized upstream of its 5xe2x80x2-side. The transcriptional regulatory activity is then assayed by successively contacting said transformants with candidate compound mixtures. By transforming each transformant with many varieties of plasmid, conditions equivalent to those corresponding to the mixtures of test samples in group A can be constructed. By just screening each candidate compound using a few transformants, the correlation of candidate compounds having transcriptional regulatory activity with the transcriptional regulatory regions as their targets can be determined.
The present invention also enables the efficient screening of agonist compounds for membrane receptors with unknown functions as a part of the functional analysis of genes. The transcriptional signal at the final stage of the intracellular signal transduction is utilized for this screening. A considerable portion of the final transcriptional signal generated by membrane receptors is thought to be classified into the terminal cAMP responsive element (CRE) of the cAMP signal, or AP1, one of the terminal elements of the Ca signal. Cultured cells are co-transformed by transferring a translational evaluation plasmid in which a reporter gene is linked downstream of CRE or AP1 and another plasmid expressing membrane receptor with unknown functions. The agonist activity of candidate compounds can be determined by contacting candidate agonist compounds with this transformant to detect the expression of the reporter activity. In this case, a process equivalent to that for preparing test sample mixtures in group A of the present invention can be achieved by transferring many membrane receptor genes to the same cell. The correlation between candidate compounds having agonist activity and their targeting membrane receptors can be found by just screening each candidate compound using a few transformants. In this method, the correlation between receptors and agonists can be detected even with membrane receptors having unknown functions, provided they trigger the signal transduction mediated by CRE and AP1.
Epitope analysis of monoclonal antibodies can also be performed based on the present invention. Antigen fragments are first prepared as test samples in group A. For example, in the case of proteinaceous antigens, oligopeptide libraries comprising amino acid sequences shifted by several amino acids each from the terminus are synthesized and assigned ID numbers. Epitope analysis can be performed by identifying the correlation of these libraries with monoclonal antibodies as test samples in group B. For some macromolecular antigens, the correspondence to several hundreds of oligonucleotides might have to be examined. However, the application of the present invention enables clarifying the correlation with only a few assays.