HCS, the application of automated sub-cellular imaging and image analysis to investigating cellular signalling pathways and processes (S. A. Haney, P. LaPan, J. Pan and J. Zhang, “High-content screening moves to the front of the line,” Drug Discovery Today, Vol. 11, No. 19-20, pp. 889-894, October 2006), is becoming widely adopted across both industry and academia as a rapid and cost-effective route to generating highly informative biological data. HCS provides investigators with powerful technologies and applications for detailed investigation of cellular biology in-situ and in-context, and as a consequence generates large multi-parameter data sets corresponding, for example, to various respective images.
In many studies the full potential of this data has not been fully explored or exploited. Standard methods of data analysis and comparison, such as the use of mean and standard deviation, which have been routinely used in high-throughput screening (HTS), obscure underlying patterns and trends in HCS data by averaging cellular population responses.
A simple example of this obscuration occurs in chemical inhibitor or RNAi studies, where, for example, a 50% decrease measured by HTS metrics as a mean response may represent 50% inhibition in all cells or, alternatively, 100% inhibition in 50% of cells, with the remainder being unaffected.
The situation is further worsened by the typical distributions of cellular intensity or spatial data, which is rarely, if ever, normally distributed, thus making mean and standard deviation a poor descriptor of the data distribution.
Consequently comparison of HCS data between samples based on averaged responses is not only underutilising the data but is also likely to be inaccurate in many cases.
Limitations of standard data averaging techniques have led to the adoption of various non-parametric analysis methods, such as use of the Kolmogorov-Smirnov (KS) distance (S. Siegel and N. J. Castellan, Non-Parametric Statistics for the Behavioural Sciences, McGraw-Hill, New York, USA, 2nd Edition, 1988) for comparing cell population data and distributions in HCS data (Z. E. Perlman, M. D. Slack, Y. Feng, T. J. Mitchison, L. F. Wu and S. J. Altschuler, “Multidimensional drug profiling by automated microscopy,” Science, Vol. 306, pp. 1194-1198, 12 Nov. 2004; and B. Zhang, X. Gu, U. Uppalapati, M. A. Ashwell, D. S. Leggett and C. J. Li, “High-content fluorescent-based assay for screening activators of DNA damage checkpoint pathways,” Journal of Biomolecular Screening, Vol. 13, No. 6, pp. 538-543, 19 Jun. 2008).
For example, US 2006/0154236 (Altschuler et al) describe methods and systems for the analysis of cells based on the automated collection of data from image processing software and statistical analysis of this data. The methods described include the use of intra-sample KS distance as a measure of population differences and means for normalising KS distance by dividing by a measure of the variability of the descriptor (e.g. standard deviation) within a population.
However, whilst the use of such non-parametric data analysis methods is an improvement on previous techniques, there still remains the need for both faster and more accurate data analysis techniques, particularly for analysing the extremely large multi-parameter data sets typically generated by HCS/HTS.