The invention relates to a method and a kit for identifying an unknown allele of a polyallelic gene.
1.1 General Introduction
Many genes exist an multiple alleles which differ from each other by small differences in sequence. It is sometimes desirable to identify an unknown allele of a polyallelic gene. For example, such identification is often necessary to match the alleles of the human leucocyte antigen (HLA) genes in a prospective donor and a prospective recipient in a tissue or organ transplant operation; if the donor and recipient have the same HLA alleles, the probability of the recipient rejecting the donor""s tissue is greatly reduced.
However, it can be a difficult task to identify precisely an unknown allele of a polyallelic gene because two alleles can differ from each other by as little as one nucleotide.
The difficulties are increased in genes which have a very large number of different alleles, such as the major histocompatibility complex (MHC) genes (e.g. the HLA class I genes which have 222 known alleles).
Up to date the most favourable bone marrow transplant (BMT) and kidney transplant results have been obtained using sibling donors who are genotypically HU-identical to the recipient but such donors are available for only about 30% of patients (1-5). BMT using unrelated donor s can be successful, but theme transplants have higher rates of graft failure, increased incidence and severity of Graft versus Host Disease and more frequent complications related to delayed or inadequate immune reconstitution (4).
New molecular biological methods for detection of genetic polymorphism currently provide an opportunity to improve matching of unrelated donors as well as a research tool to investigate the relationship between genetic disparity and transplant complications. These molecular typing methods include sequence-specific amplification, hybridisation with oligonucleotide probes, heteroduplex analysis, single strand conformation polymorphism analysis and direct nucleotide sequencing.
Each of these molecular approaches has been used for routine HLA class II typing (6), but a variety of reasons related to the HLA class I gene structure has complicated and made relatively unsuccessful their application to class I typing. The reasons for these complications are the extensive polymorphism of class I and the degree of sequence homology between the A, B and C loci of class I. In addition, sequence homology between class I classical and non-classical genes and the reported 12 pseudo genes can cause problems for specific locus amplifications (7).
The low occurrence of xe2x80x9callele specificxe2x80x9d sequences at polymorphic sites is a feature of the HLA class I genes. that has limited the resolution of all current DNA typing approaches. An xe2x80x9callele specificxe2x80x9d sequence is a sequence that is only present in one allele and can therefore be used to distinguish the allele from other alleles. The occurrence on more than one exon of the specific sites for determining the allelic specificity causes additional problems in the identification of individual alleles. As a result, there is at present no single method of typing which can identify all HLA class I alleles of high resolution; see Table A below.
1.2 Sequence Specific Primer Amplification (PCR-SSP)
This method utilises both the group-specific and, when present, allele-specific sequence sites in PCR primer design. The SSP design is based on the amplification refractory mutation system (ARMS), in which a mismatch at the 3xe2x80x2 residue of the primer inhibits non-specific amplification (8,9).
Although each SSP reaction may not individually provide sufficient specificity to define an allele, the use of combinations of sequence specific primers allows the amplification of their common sequences to give the desired specificity.
However, despite its high accuracy, PCR-SSP is only in some cases more informative than serology. The reason for this is the low occurrence of allele specific sequence motifs in the exons and this limitation has stimulated a vast amount of research into the identification of allele specific motifs even in the intron sequences (10). However, up to date this approach has not contributed considerably to the identification of more alleles.
Another limitation of this method is that it detects a limited number of polymorphic sequences which are utilised to predict the entire sequence. If an unknown allele is present in a particular sample this extrapolation may be incorrect.
In addition, the successful use of the technique relies on group specific amplification and therefore prior knowledge of broad HLA specificity is needed.
1.3 Single Strand Conformation Polymorphism (SSCP)
This technique is based on the electrophoretic mobility of single stranded nucleic acids in a non-denaturing polyacrylamide gel, which depends mainly on sequence-related conformation (11-13). The technique can be employed for isolating single alleles which could then be used for further manipulation and analysis such as direct sequencing. The pattern of bands obtained after electrophoresis may be diagnostic for an allele (14,15).
The major disadvantage of SSCP is the tendency of DNA single strand to adopt many conformational forms under the same electrophoretic conditions resulting in the presence of several bands from the same product; this makes the identification more difficult. In addition there is a high degree of variation and inconsistency in the sensitivity of this method for detecting mutations or allelic variations and there is a physical limitation in the size of the DNA fragment which is of the order of 200-400 base pairs (16).
1.4 Denaturing Gradient Gel Electrophoresis (DGGE) and Temperature Gradient Gel Electrophoresis (TGGE) (17.18)
The underlying principle of both techniques is the difference in the degree of melting between two alleles (double stranded DNA) which results in a reduction of mobility of the DNA fragments in polyacrylamide gels containing a denaturing reagent (DGGE) or a temperature gradient (TGGE).
Both techniques have been used frequently for screening mutations in genetic systems with one or two variants. They are only rarely used for the separation of alleles in highly polymorphic systems such as HLA.
Both techniques require specific conditions for a particular system under investigation and, in addition, where two alleles share common sequence segments with low melting points they may not always be differentiated. The simultaneous melting of both alleles will produce very similar retardations.
1.5. Cloning of DNA
This is the classical method of preparation of a single sequence, i.e. the sequence derived from a single allele. A variety of constructs has been used to introduce the required DNA fragment into a plasmid and grow sufficient copies for analysis. This method yields pure samples of the analyte, but is time consuming to perform and several clones are normally tested to ascertain the homogeneity of the product.
1.6 Heteroduplex Analysis
Fully matched DNA duplexes are more stable than those with base mismatches. Instability of the duplex increases with the number of nucleotide mismatches; these cause formation of loops and bends in the linear DNA fragment which produce an increasing xe2x80x9cdrag effectxe2x80x9d in polyacrylamide gels which retard the affected migrating bands (18-21).
Mismatched DNA hybrids (heteroduplex) may be formed at the end of each PCR cycle between coamplified alleles from a particular locus or loci due to primer cross reaction at sites with similar sequences. During the annealing stage of each cycle of the PCR, a proportion of sense strands of each allele may anneal to anti-sense strands of different alleles. The banding pattern obtained in PAGE analysis can be useful for identifying the alleles involved in the reaction (22-24).
Heteroduplex analysis is an approach that has been Utilised to compare HLA genes of a particular donor and recipient. HLA genes are amplified, denatured (melted into single strands) and mixed together under conditions that promote renaturation to form double stranded molecules. If the HLA genes of a donor and recipient are similar but not identical, heteroduplexes will form consisting of one strand of an allele of donor origin and a second strand from a different allele of recipient origin (25,26). The sensitivity of this method can be increased by adding DNA from an HLA allele that is not present in the donor or recipient.
The major advantage of heteroduplex analysis is that it is relatively easy and inexpensive. Limitations of this approach include inability to detect certain HLA disparities, potential detection of irrelevant silent mutations and lack of specific information regarding the nature of the alleles involved.
Up to date this approach has been used for HLA class II typing with limited success. Its application to class I typing has not been successful.
1.7 Sequence Specific Oligonucleotide Probes (PCR-SSO)
SSO typing involves amplification of HLA alleles from a particular locus followed by hybridisation with a panel of oligonucleotide probes to detect polymorphic sequences that distinguish one allele or group of alleles from all others. In polymorphic systems a one step operation may not always differentiate all the known alleles; selected primers can be used to achieve amplification of individual alleles which are then identified by specific probes. This second stage of oligotyping is often referred to as high resolution oligotyping (6).
The advantages of the PCR-SSO method are specificity, sensitivity, simplicity, reproducibility, and it is relatively inexpensive to operate and allows simultaneous processing of many samples. This approach has been applied successfully, for example to typing of HLA class II alleles.
The major methodological drawback of this approach is that the complexity of the technique is directly related to the number of alleles under investigation and the presence of two alleles in the heterozygous condition can complicate the identification process.
Published oligotyping methods could result in incorrect interpretation of data if certain combinations of recently discovered alleles are present in a specimen. It is therefore necessary to update the reagents used in the identification step.
Several typing approaches for HLA-A and B based on PCR-SSO have been published; these typically require over 40 and 90 probes respectively (27,28. The operation of these methods is time consuming and the resolution obtained is only moderate.
1.8 Nucleotide Sequencing
DNA templates for sequencing can be produced by a variety of methods, the most popular being the sequencing of cloned genomic or cDNA fragments, or the direct sequencing of DNA fragments produced solely by PCR (as in 1.2 above). These templates represent a single sequence derived from one haplotype. Alleles from both haplotypes of a heterozygous sample may be co-amplified and sequenced together using locus-specific PCR primer.
The recent availability of computer software, which allows the user to align the derived sequence against established sequence libraries, has facilitated the analysis and allele assignments for heterozygous samples in which both templates are sequenced at the same time (27). The effectiveness of this method depends on the amount and frequency of ambiguous heterozygous combinations; for example there are many HLA class II alleles that when present together in one sample cannot be differentiated by this method. The number of such ambiguous combinations of allele sequences is even greater for HLA class I alleles.
Up to date two HLA class I typing approaches based on direct sequencing have been published. Both require serology information followed by allele specific PCR amplification and then direct sequencing (14,30). More recent practice, however, is to amplify DNA fragments without prior knowledge of the allele groups and to use locus specific PCR amplification. Theoretically these approaches should give the highest resolution, but they are beset by ambiguous sequence combinations which cannot be resolved satisfactorily and in practice these methods are expensive and difficult to perform routinely.
Genetic recombination plays a key role in the generation of HLA alleles. This is supported by pairwise comparison of the nucleotide sequences. The most closely related pairs of alleles usually differ by localised clusters of substitutions for which both sequence motifs can be found in other alleles. This pattern implicates interallelic conversion or double recombination as the diversifying mechanism (7). Although the vast majority of such events appear to involve recombination between alleles of the same locus, there are several cases that involve recombination between alleles of different loci (31).
In comparison to the many pairs of alleles that differ by localised clusters of substitutions, few pairs differ by point substitutions and of these only a handful differ by a substitution that has not been found in another allele. Thus, it appears that the rate at which point mutations create new alleles is slower than the rate at which new mutations are subsequently recombined with existing mutations (FIG. 1).
Comparison of allelic HLA class I sequences (32) reveals substitutions throughout the coding region. There is, however, a higher frequency of substitutions within exons 2 and 3 which encode the xcex1l and xcex12 domains of the HLA molecule. In comparing pairs of HLA-A, B and C alleles only 2 pairs out of a total of 6,460 possible combinations can not be distinguished on the basis of nucleotide sequences in exons 2, 3 and 4. However, if the comparison is restricted to exons 2 and 3 this number only increases to 5 pairs of ambiguous sequences. By contrast, when comparison is restricted to either exon 2 or exon 3 alone then the number of ambiguous pairs increases significantly (Table B). This observation is relevant to the design of DNA-based methods for class I typing because it shows that for practical purposes all alleles can be discriminated on the basis of sequence analysis of exon 2 and 3. Although there is some polymorphism in exon 4 encoding the xcex13 domain, mostly in HLA-A alleles, incorporating the information from exon 4 into the above analysis does not significantly increase the number of pairs for which the alleles can be discriminated.
In the development of PCR-based methodologies for the detection of alleles, one of the most important steps is the identification of primer sequences unique for the target gene which includes all polymorphic sites of interest in the amplified fragment, which should also be manageable in length. Typing of the polymorphic sites in exons 2 and 3 would facilitate the identification of all recognised alleles of HLA-A, B and C loci, with 5 exceptions, if suitable locus-specific amplification could be achieved.
Specificity of the primers should ensure the effective amplification of target gene fragments. In practice however, trace amplification of competing, cross-hybridising templates may also take place. In addition, due to the shared polymorphic sequence motifs between class I alleles of all three loci, non-specific coamplification of the DNA fragments would hinder specific identification. In practice, it would therefore be advantageous to use a method that allows the separation of the desired product from the undesirable PCR fragments.
Within exons 2 and 3 of the HLA-A, B and C genes there are only a few locus specific sites which are located primarily in the central region of each exon which would restrict the amplification to incomplete exon fragments. As discussed above, this would reduce the allele specific information necessary for the identification of all allelic variants.
The two polymorphic exons are flanked by introns 1 and 3, and separated by intron 2. Thus, the ideal location for primer sites to amplify exons 2 and 3 together as one fragment would be within introns 1 and 3.
Cereb and collaborators (33) have described primer sequences located in the first and third introns which can be used for locus-specific amplification of the entire exon 2 and 3 region of the HLA-A, B and C genes in one fragment. Their data indicated that the primers used in that study were effective in the amplification of HLA-A, B and C genes. Furthermore, the amplification was truly locus-specific, as assessed by hybridisation with locus-specific, group-specific, and allele-specific oligonucleotide probes.
The invention provides a method for identifying an unknown allele of a polyallelic gene, which method comprises
(i) contacting the unknown allele with a panel of probes, each of which recognises a sequence motif that is present in some alleles of the polyallelic gene but not in others;
(ii) observing which probes recognise the unknown allele so as to obtain a fingerprint of the unknown allele; and
(iii) comparing the fingerprint with fingerprints of known alleles.
The invention also provides a kit for identifying an unknown allele of a polyallelic gene, which kit comprises a panel of probes, each of which probes recognises a sequence motif that is present in some alleles of the polyallelic gene but not in others. (The same motifs may also occur in other loci in linked gene complexes with similar exon/intron structures.) The kit preferably also comprises a database which indicates which probes in the panel recognise each allele of the polyallelic gene.
The use of a panel of probes which each recognises a different motif allows identification of which motifs are present in the unknown allele. The alleles of the polyallelic gene (and alleles of other genes in a linked complex) each have a unique combination of motifs and so identification of this combination (or xe2x80x9cfingerprintxe2x80x9d) leads to identification of the unknown alleles. Thus, the invention allows identification of alleles of polyallelic genes, such as the HLA class I genes, which may not contain xe2x80x9calleles specificxe2x80x9d sequences (i.e., individual sequences which are unique to one particular allele). The technology of the invention is referred to as Universal Recombinant Site Targeting Oligonucleotide, or URSTO.