Oligonucleotides are part of the sequence of a molecule of DNA. DNA is a polynucleotide; a polymer built of nucleotide units, each comprising a phosphate group, a sugar (deoxyribose), and a base (adenine, guanine, cytosine or thymine). DNA molecules form a double helix with the two strands held together by hydrogen bonds between specific base pairs: adenine always pairs with thymine and guanine with cytosine. Thus the sequence of one strand of the helix determines the sequence of the other:
.sup.5 '. . . ATGAAATCTGTACATGGT . . . .sup.3 ' PA1 .sup.5 '. . . TACTTTAGACATGTACCA . . . .sup.5 ' PA1 L represents a linker group containing at least one carbon atom which is capable of bonding to Ar and Y; and PA1 Y represents a specifically acid labile group. PA1 a) a chamber, for example a column, filled with a suitable substrate, such as reverse phase silica gel or polystyrene, and PA1 b) a kit for grafting a protective group as defined above on a molecule. PA1 protecting at least one group in at least one compound in a mixture of compounds to be separated with a group as defined above, and PA1 passing the mixture of compounds through a chamber filled with a reverse phase silica or polystyrene material.
The sequence of a particular piece of DNA might represent part of a gene which `encodes`, and thus directs the production of, a particular protein. Protein production is mediated via an RNA copy of this DNA (with uracil in place of thymine), where the order of these bases defines the order in which the amino acids are joined together to form proteins. Because there are 20 amino acids but only four bases, it follows that a group of bases is needed to code for one amino acid. These groups are comprised of three bases called `codons`. Thus methionine is coded for by ATG, histidine by CAT, lysine by AAA, proline by CCT etc, and the DNA sequence .sup.5 'ATGAAACCTCATAAA .sup.3 ' codes for the amino-acid sequence Met-Lys-Pro-His-Lys. Altering the DNA sequence by substituting one base for another can alter the encoded amino acid sequence. For example, if the CCT codon in the DNA sequence .sup.5 'ATGAAACCTCATAAA .sup.3 ' is changed to CAT (a single nucleotide C to A substitution) the resulting amino acid sequence changes from Met-Lys-Pro-His-Lys to Met-Lys-His-His-Lys. This ability to change, or `mutate`, the nucleotide sequence of a gene or DNA fragment is the basis behind the technique known as site-directed mutagenesis. A known DNA sequence is usually mutated with the aid of a synthetic oligonucleotide and is therefore often called oligonucleotide-directed mutagenesis.
The critical feature of site-directed mutagenesis is that it allows pre-designed mutations to be specifically introduced into a target gene. With the structures of many important proteins having been determined at atomic resolution, particularly by X-ray crystallography but also by NMR spectroscopy, it is possible to analyze the structure and function of enzymes in great detail. The importance of one or more amino acid residue can be assessed by engineering specific modifications into the protein structure. Most commonly the mutations involve single base substitutions but it is also possible to construct deletions and insertions in the DNA sequence, resulting in shorter or longer polypeptides.
It can therefore be seen that there is a need to be able to synthesise oligonucleotide sequences of a high purity. Further information regarding site directed mutagenesis is to be found in an article by Chapman and Reid in Chemistry in Britain, March 1993, p 202-204.
In organic synthesis, in particular multi-step, organic synthesis, the purification of the products obtained can present more problems than the synthesis itself. This is particularly true in the case of oligonucleotide synthesis, a synthesis which systematically uses protective groups which can also play supplementary roles. One solution to this problem would be to modify the oligonucleotide, or any other molecule requiring protection, by binding it to a solid in a manner which is physically reversible, in the course of working up and purification. To date no protective group has been disclosed as being capable of exercising such a property.
GB-A-2251242 (Ramage et al) describes a protecting group for use in peptide synthesis. The protecting group is of formula Ar--L-- where Ar represents a substantially plane, fused ring system containing at least 4 aromatic rings and L represents a group containing at least one carbon atom which is capable of bonding to a group to be protected, which may be an N-terminal amino group of an .alpha.-amino acid in the synthesis of peptides. The protecting groups improve the purification of crude peptides by affinity chromatography on porous graphitised carbon (PGC) HPLC columns.
The protecting groups disclosed in GB-A-2251242 are based on tetrabenzo (a,c,g,i) fluorene (Tbf). However this is a base labile group. In oligonucleotide synthesis several basic treatments are carried out, e.g. phosphate deprotection, and therefore this base labile protecting group is unstable in oligonucleotide synthesis. It is also important that the protecting group for oligonucleotide synthesis can be removed under mild acidic conditions. The protecting group should also only require simple reaction conditions for its introduction onto a nucleotide. The molecule should be regioselective as it is important that the 3'-hydroxyl of the nucleotides remains unprotected. It is also preferable that the molecule should enable monitoring of reactions and purifications.