There is currently a need in drug discovery and in general biological research for methods and apparatus for accurately performing cell-based assays. Cell-based assays are advantageously employed for assessing the biological activity of chemical compounds.
In addition, there is a need to quickly and inexpensively screen large numbers of chemical compounds. This need has arisen in the pharmaceutical industry where it is common to test chemical compounds for activity against a variety of biochemical objects, for example, receptors, enzymes and nucleic acids. These chemical compounds are collected in large libraries, sometimes exceeding one million distinct compounds.
Performing cell-based assays typically involves recording cellular images and quantifying these images using algorithms of image analysis software. Instruments are known for imaging fluorescently labelled cells and software of these instruments have a number of analysis modules which quantify, for example, biological protein translocations and reactions to stimuli of cellular pathways within the fluorescently labelled cells.
Such instruments typically require a user to initially set up image processing software parameters to recognise cells having particular characteristics. In order for a user to correctly train the software, the user needs to be trained to a required standard, which is often a time consuming, complex and expensive procedure. A lack of user training can result in poor output data leading to an inefficient use of the technology for biocellular analysis.
The article with reference: ‘Location Proteomics; Providing Critical Information for Systems Biology’, G.I.T. Imaging and Microscopy 2/2005, describes use of subcellular location features to cluster images into sets. Images with similar patterns of protein locations may be grouped together automatically without human assistance.
International patent application WO 2002097714 describes an expert system and software method for image recognition optimised for the repeating patterns characteristic of organic material. The method is performed by computing parameters across a two dimensional grid of pixels. The parameters are fed to multiple neural networks, one for each parameter, which have been trained with images. Each neural network then outputs a measure of similarity of the unknown material to the known material on which the network has been trained. However, using a neural network makes it difficult for a user to adjust the system and influence the outcomes of the analyses.
International patent application WO 2004088574 and scientific poster entitled ‘Learning Algorithms Applied to Cell Subpopulation Analysis in High Content Screening’, by Bohdan Soltys, Yuriy Alexandrov, Denis Remezov, Marcin Swiatek, Louis Dagenais, Samantha Murphy and Ahmad Yekta describe a method of classifying biological cells into cellular subpopulations. A user supervises creation of a ‘training data set’ by selecting individual cells from an image and classifying the selected cells into a particular subpopulation. Thus, the user determines which, and how many, cellular descriptors are best for classifying the cells. The training data set includes classification data which identifies characteristics of cells of a certain subpopulation. The training data set is subsequently applied to a cellular image to identify and classify further cells without user supervision. When a large set of descriptors is used, the classification process becomes a time consuming and complicated task. In order to provide successful training, the size of the training data set should grow exponentially with the number of used descriptors. This is known as the “curse of dimensionality”, as described by Trevor Hastie, Robert Tibshirani and Jerome Friedman in the publication “The Elements of Statistical Learning” at page 25. The use of supervised classification methods for cellular assays is therefore relatively limited.
International patent application WO 2006001843 describes a system for characterising a multidimensional distribution of responses from the objects in a population subject to a perturbation. The methods enable the creation of a “degree of response” scale interpolated from non-perturbed and perturbed reference populations. A “fingerprint” of an object, such as a cell, is measured in terms of a feature vector, namely a vector of descriptor values that characterise the object. However, the manner of selection of features that constitute a fingerprint in any particular application is not disclosed.
It is an object of the present invention to provide a more automated and efficient method of setting up a system for performing image and data analysis, in particular for systems that typically contain several populations of biological objects.