In forensic science, short tandem repeat analysis, is frequently used to profile an individual and/or sample from a location or an item with a view to marching that profile to another or to determining a non-match so as to eliminate a link. Applied Biosystems AMPFISTR SGM PLUS system DNA profiling technique, for instance, is a single multiplex reaction used to PCR amplify ten STR's and amelogenin (for gender determination) using fluorescent labeled primers. The discriminating power in forensic applications using such multiplexes is such that the chances of two unrelated people having the same profile is approximately 10−13.
The analysis involves collection and preparation of the sample, PCR amplification using the multiplex and separation of the products according to their size using one of a number of techniques. Separation using capillary gel array electrophoresis, CE, is becoming increasingly popular due to its ability to analysis a large number of samples in a short time period, the removal of the need to manually produce the gel and in a manner suited to automation.
The multiplexes used are designed so that the alleles of different loci which are labelled with the same colour dye do not overlap with one another so that each STR can be determined by its position and colour. The size fragments in the above example range from 100 to 360 bases and four different coloured dyes are used.
As well as the dyes associated with the multiplex products, a red labelled standard size marker having bases ranging from 50 to 400 is provided. This is formed of concatamers with constant ACTG proportions, and is run in the same lanes/capillaries as the samples being considered.
Comparison of the position of the sample products, Q samples, with the standard size marker forms the first part of the sizing process. The second part involves comparison with an allelic ladder formed of dye labelled known size and sequence fragments. The limited number of suitable dyes means that the allelic ladder has to be run in a separate lane, or in the case of CE, separate capillary, to a Q sample.
Theory of Size Determination
Both the allelic ladder markers and unknown or questioned (Q) alleles are sized relative to an internal set of DNA markers such as HD-400 ROX standards. The size of an allele (in bases) is always estimated relative to sequenced standards that comprise control allelic ladder markers.
Because all size measurements are made relative to the allelic ladder, determination of the absolute size is not important since comparisons are made directly against a control of the same size and sequence—hence it is only the distance of separation between control allelic ladder marker and the Q allele that is important. This is an important consideration because different internal size standards from different manufacturers will give different absolute results. Consequently, it is only necessary to standardise allelic ladders.
Provided that the size of an allele is no more than 0.5 bases from the measured allelic ladder control standard then a designation is safe to make. In any electrophoretic system distortion of the run may also occur and this can result in band shift which occasionally pushes a band into the next ‘bin’. To capture these events, Gill et al [Int. J. Legal Med. 1996, 109, 14-22] also introduced a series of rules based on measurement of band shift relative to the allelic ladder marker which are explained with reference to FIG. 1. In particular:                a) The sizes in bases of questioned allele Q1 and allelic ladder allele (x) are measured relative to the internal standard, usually by using the Elder and Southern [Anal. Biochen:. 1983, 128, 227-231] local method of measurement. This method calculates the size of the Q allele relative to the two adjacent internal size standards either side of it. Measurements are repeated with Q2 and allelic ladder allele)). Delta (d) values are always conditioned on the allele where both the allelic ladder marker and Q alleles are coincident within the same 1 base bin. The difference in sizes of the questioned and allelic ladder markers are defined as:d1|x=fQ1−fx and d2|y=fQ2−fy where f=size in bases.Hence d1|x and d2|y must always be less than ±0.5 bases in order to be designated.        b) The band shift association rule states that if one band of a heterozygote is shifted, then the other allele will also be shifted in the same direction and to the same extent. If d1 and d2 are the respective band shifts for heterozygous alleles Q1 and Q2 relative to allelic ladder markers x and y respectively then the allele designation is made only ifd1|x−d2|y<0.5        c) If a heterozygote comprising 2 rare alleles is observed, then this observation must be confirmed by re-analysis (band shifts will usually shunt the alleles into adjacent bins corresponding to alleles (x±1 and y±1) that are usually occupied by alleles that are rare. These rules can be programmed into expert systems.Problems        
In flat bed gel electrophoresis a single allelic ladder lane is used for a gel slab which may have a large number of Q sample lanes run on it. In CE the convention is also to run one capillary of the array with an allelic ladder and the other capillaries with Q samples to maximise the number of Q samples which can be processed in a given time. The applicant has determined that this approach is not appropriate for the results to be of the utmost validity.
Because it is not possible to include internal allelic ladder markers within each capillary (because of insufficient dyes and cost) the comparisons of the allelic ladder with Q samples are always made between different gels in CE. In effect each capillary is a different gel. As a result the applicant has realised that the impact of the present system of using only one allelic ladder in a set of 96 capillaries can lead to substantial question marks over the designations applied to individual results from the 95 capillaries containing samples under analysis. A methodology has therefore been developed to establish the number of allelic ladder containing capillaries which should be run to ensure that the designations are correct in an absolute sense and also to give confidence that the designations offer the necessary level of statistical confidence when scrutinised as forensic evidence, in a court of law for instance. Even with relatively small numbers of capillaries in the array, such as 16, the necessary number of allelic ladder containing capillaries needs to be verified.
Deviations in performance between one capillary and another arise for a variety of reasons.
Silica capillaries possess an excess of negative charge on their surfaces. Cations from aqueous solution build up at these surfaces, and when a charge is applied the cations, in the solution, are attracted toward the cathode. This results in a bulk flow toward the cathode, which is against the migration of the DNA, causing disruption to separation and consequent reduction of resolution of DNA fragments. The commonest method to reduce EOF is to coat the capillary inner-surface in order to modify or mask the charge on the silica surface, but the can be variations in the modification which results. Variations in the level of modification also apply to more sophisticated systems such as AB's use of a “dynamic coating polymer” POP-6 (Performance Optimised Polymer) Because the separation of the molecules, whether it be due to the transient entanglement mechanism and/or by non-tangling collisions between the DNA and polymer, is proportional to the size of the molecule, and because the mobility of DNA is also sequence dependent (notably, AT—rich sequences show anomalous migration rates relative to internal size standards) it is recommenced that the evaluation of the number of allelic ladders required is conducted in relation to the loci showing the greatest standard deviation against the size standards in test. The philosophy is preferably to carry out evaluation on worst case scenarios. Using this principal the methods and logic described should be applicable to any capillary array system whatever the number of capillaries or loci being considered apply.
Experimental Demonstration of Variation
The methodology followed in analysing the samples and obtaining standard deviation data is set out in Appendix 1 below.
Allelic ladders were run across an entire 96× array and the standard deviations (SD's) of each allele were compared, FIG. 2. Interestingly, different loci have different characteristics. The standard deviation is both locus dependent and dependent upon the molecular weight of individual alleles; the standard deviation increases with the molecular weight. The data form three distinct clusters a) low SD: D2S1338, D16S359, D21S11, HUMVWFA31/A, HUMTH01, D19S433 b) High SD: HUMFIBRA(FGA), D8S1179, D3S1358; c) intermediate SD: D18S51. The high molecular weight alleles of the HUMFIBRA locus show the greatest SD, followed by D18S51. All other loci have much lower SD's.
The repeating sequences of the high SD loci HUMFIBRA, D8 and D3 are approximately 75% AT-rich. D18S51, which also has an elevated standard deviation, is 75% AT-rich. The only locus that does not fit this pattern is D16, which is part of the low SD cluster.
Worst case scenarios are defined as alleles that are most likely to fall outside their 1-base bin (as discussed in the theory of size analysis above) and these can be evaluated using high molecular weight HUMFIBRA alleles because they have the highest SD's—the range for a high molecular weight HUMFIBRA allele was approximately 1.25 bases, see FIG. 3, with a maximum SD of 0.16, see FIG. 4 which provides Table 1.
Determination of Number of Allelic Ladders Needed
Referring to the sizing theory, a ‘bin’ of ±0.5 bases is constructed around the observed position of an allelic ladder allele. Supposing that the range of measurement error is also one base then provided that Q alleles fall within this bin then they are correctly designated. However, this will be entirely dependent upon the measurements of the estimate of the mean (B) being coincident with the actual mean (A).
However, if the bin has been constructed from an observation (B) that is in the tail of the error distribution, see FIG. 5a, then the one base bin construct will overlap into an adjacent bin, (portion C) and it is possible therefore that Q alleles that actually should be designated in the next bin could appear to fall in the bin constructed around observation (B) and could be mis-designated as a result.
However, if the maximum measurement error range is set at just ±0.25 bases, centred on the estimate of the mean (B), see FIG. 5b, then even if the estimate (B) is taken from the tail of the measurement error distribution of (A) the Q alleles will always fall within the correct bin because the ±0.25 bases bin around (B) will always be within the ±0.5 bases bin around (A). This means that effectively, the range should be no greater than 0.5 bases in total if mis-designation is to be avoided completely.
Against this position the possibility of error is therefore minimised by providing a best estimate of the mean (B)and this is dependent upon the number of allelic ladders that are run across the 96× array in order to make such estimates. The present position in the prior art is that a single run of the allelic ladder is used to determine (B). Of course the more capillaries which are used for allelic ladders the better the estimate of (B), but the more allelic ladders that are used, the lower the number of samples that remain for analysis.
As a consequence of this position it is necessary to establish the minimum number of capillaries which need to be used for allelic ladders to achieve the necessary degree of confidence. To achieve this simulation was used to determine the relationship between the standard deviation and the number of allelic ladders run across the array. For each simulation a constant in allelic ladders were chosen at random, with replacement, from the array of 96 capillaries (where n=1 to 20) and the experiment was repeated 1000 times so that a large sample of estimates of the mean and median were generated from a single array data-set. The whole simulation was repeated by changing the value of it. Standard deviations were calculated from the 1000 estimates of the mean and median for each value on n.
The worst case scenario was evaluated specifically with the high molecular weight HUMFIBRA allele 47.2 (MW=328). Ideally, as identified above, the range should be less than 0.5 bases and this corresponds to a critical SD<0.083 (the critical SD is 0.5/6—under the assumption that with a normal distribution 99% of observations should be covered in a bin of width 6×SD). Standard deviations of means and medians from the 1000 simulations are shown in FIG. 4. As can be seen from FIG. 4, the critical standard deviation corresponded to that calculated from 4 allelic ladders for this high molecular weight allele. Standard deviations of the mean were marginally lower than standard deviations of the median estimate. The simulation experiments were repeated with allele HUMFIBRA 26 (molecular weight 253). The critical SD was achieved with just 2 allelic ladders with this lower molecular weight allele marker, see FIG. 6.
Similar determinations of the minimum number of alleles needed can be made for different loci, for different particular alleles or for different levels of certainty. The determination can also be made for different assumed distributions of the mean, other than normal. The determination can be made in a similar way also for capillary arrays with different numbers of capillaries.
Array and Capillary Specific Variations
The investigations behind this work also established that other variations in the capillary array could make a significant impact on the accuracy and/or robustness of the determinations made from the results.
When the differences in five consecutive runs were compared for HUMFIBRA 47.2 it was noted that there were a number of consistent behaviours in terms of speed of migration. As far as the 12 capillaries where fragments migrated the fastest in these tests were concerned it was notable that capillary no. 1 (the left hand capillary when viewed from the front) always migrated the fastest. Capillaries 2, 3 and 4 where also included in the list, reinforcing the trend that the capillaries that migrated the fastest were those that are low number. However, it is noticeable that fragments in capillaries 55, 62, 67 also migrated faster than expected from the trend and furthermore that this was a reproducible effect between different runs on the same array. This effect was not reproducible when a new capillary array was implemented, however, and so each array requires separate evaluation for the faster or slower mid array capillaries. The higher number capillaries, however, where consistently among the slower.
The implication of this position is that the systematic variation in speed across the array points towards the slower and faster capillaries being avoided for use as the allelic ladder bearing capillaries. This means that care needs to be taken in the choice of capillaries used to run the allelic ladders. If capillaries are all chosen from either extreme—capillaries 1-10 or 86-96 then the calculated means will tend to be continually over or underestimated. Ideally, ladders should be chosen from the mid positions on the array, subject to the further observations made below, to reduce further the chance of a mis-designating alleles. Assessment of each CE machine is advisable to establish variations of this type.
Referring to FIG. 7 it is clear that within a particular array that different capillaries will give different individual performance, over and above any effect of the left to right/fast to slow variation. Capillaries which give fast or slow speeds from the middle part of the array should also be avoided, therefore, for the allele ladder capillaries. As these performance variations effect different capillaries n different arrays it is recommenced that the set up procedure for an array include an evaluation of the speed of the individual capillaries and that the fastest few and slowest few (or those exhibiting performance above or below a threshold) being exclude for use as the allelic ladder capillaries.
To assess each machine separately and each new array separately it is recommended that each machine is characterised by running 96 allelic ladders across the array in order to characterise the separate capillaries and to ensure that those chosen to run allelic ladders give a mean result that is reasonably close to the ‘true’ mean.
Variation with Extent of Use
As well as inherent variation in speed for a CE machine, a CE array and individual capillaries within an array, variation with time occurs. Indeed a point is reached with Ce arrays at which one or more of the capillaries is no longer functioning and the results produced are of no use and the tests need to be repeated. Unfortunately as a significant time period often elapses between a test run being performed and the results being analysed (at which time the breakdown of the array is noticed) very substantial numbers of further tests will have been done in the meantime and these will need repeating (with consequent time and cost implications).
To avoid this the applicant suggests one or both of the following monitoring routines for CE arrays.
Firstly, by recording allelic ladder standard deviations between runs for arrays as a whole or more preferably individual capillaries the variation in the standard deviation with time can be established. SD level above a threshold can be used to warn of array breakdown and promote shifting to a new array. Secondly, a similar aim can be achieved by carrying out a full analysis by running allelic ladders across the entire array at regular intervals. The performance of an array can be monitored by direct reference to the standard deviation—it would be expected that the standard deviation would increase if the array starts to break down, acting as an early warning indicator of a problem.
The methodology set out in the present invention: in determining the effective number of allelic ladders which should be used; in determining which capillaries to avoid for the allelic ladders due to machine or array variations; in determining which capillaries to avoid for allelic ladders due to capillary specific variations; and in providing forewarning of array breakdown offers technology which provides more effective CE analysis through more rigorous results and improved processing efficiency.
Appendix 1
For sample preparation the DNA was extracted from buccal scrapes using QIAAMP spin columns (Qiagen)as described by Greensppon et al. [J. Forensic Sci. 1998, 43, 10-24-1030] and was PCR amplified using a STR multiplex from Applied Biosystems (AB) AMPFISTR AG Plus DNA profiling technique, as described by Cotton et al [Forensic Sci. Int. 2000, 112, 151-161] for use on the AB 377 flat-bed gel automated sequencer. A concatamer internal size standard AB HD400 Rox was included with every sample-this included the following fragment sizes: 50, 60, 90, 100, 120, 150, 160, 180, 190, 200, 220, 240, 260, 280, 290, 300, 320, 340, 360, 380, 400 base pairs. In addition the SGM plus allelic ladder size standard was incorporated in to 8 capillaries per array run. The allelic ladder cocktail comprises alleles from the following loci using filter F.
Dyes are 5-FAM (blue); JOE (green); NED (yellow): D3S1358 (blue); HUMVWFA31/A (blue); D16S359 (blue); D2S1338 (blue); Amelogenin (green); D8S1179 (green); D21S11 (green); D18S51 (green); D19S433 (yellow); HUMTH01 (yellow); HUMFIBRA(FGA) (yellow).
The STR loci utilised are tetrameric repeat sequences. Alleles in the ladders encompass the entire range of common alleles and are spaced at a minimum of 4 bases apart and coincide with the common alleles. In HUMFIBRA(FGA); D19S433 and D21S11 2 base variants are common; hence 2 base variants are included with the ladders for these loci.
The loading buffer (Applied Biosystems) HIDI Formamide was used.
Size Standard was diluted in HIDI Formamide at 1 in 40 ratio. 1.5 ml PCR Product+13.5 ml HIDI Formamide/HD400 Size Standard. 1.5 ml Allelic ladder+10 ml HIDI Formamide/HD400 Size Standard.
Electrophoresis was conducted on the ABI Prism 3700 CE platform using standard run parameters. Labelling and sizing of DNA fragments and their allelic designation was carried out with Genotyper 1.1.1 software.
The separation matrix used was POP-6 (Performance Optimised Polymer-6[7]) using 1×TBE running buffer (supplied by AB). The samples were injected from AB Gene Microtitre plates or ABI Thermofast Microtitre plates. A sample transfer volume of 2.5 ml using electrokinetic injection parameters of 10 kV for 11 secs, a run voltage of 7.5 kV, run temperature of 50° C.; cuvette temperature between 45° C. to 50° C. (note this temperature must be optimised for each separate 3700 machine otherwise sensitivity is compromised); the cuvette polymer flow rate is 12000 counts.