The present invention relates to analyzing and interpreting multi-dimensional datasets. Examples of such datasets include optical recordings of neuronal cell slice fluorescence and differences in expression levels of multiple genes within a population of patients or subjects.
It is often desirable to understand the relationship of various events occurring within such a multidimensional dataset. For example, various neurons in a neuronal cell slice may exhibit spontaneous activity in a time series of optical images. It would be desirable to determine which, if any, group of neurons were ever coactive (i.e. active at the same time or at specific different times), were regularly coactive (i.e. coactive at multiple times over the period of observation), and which neuron, if any, consistently activates before or after another neuron activates. It would also be advantageous to know the statistical significance of the relationships between the various events. In other words, whether the correlation among the various events is stronger than would be expected from random activity.
These and other advantages are achieved by the present invention which provides a method and system for analyzing a multidimensional dataset and for detecting relationships between various events reflected in the dataset.
In an exemplary embodiment, a method is presented for analyzing a sequence of data arrays including selecting at least one type of region of interest and at least one region of interest for each type of region of interest chosen from said data arrays, and transforming the sequence of data arrays into a simplified data array with a first dimension equal to the number of selected regions of interest and a second dimension equal to the number of data arrays in the original sequence of data arrays. The simplified data array is then examined to detect events of interest in the regions of interest, and those events of interest are stored in a second simplified data array having the same dimensions as the first simplified data array, but the data in each element of the array is binary. The second simplified array is then analyzed to determine relationships between the events of interest and correspondingly, the regions of interest.
In one exemplary embodiment, analyzing includes plotting a portion or all of the data in the first simplified array to allow visual examination of the relationships between the activities of interest in various regions of interest. In another exemplary embodiment, the analysis step involves detecting events of interest that are coactive and determining whether the number of coactive events is statistically significant. This embodiment may include detecting all such coactive events (i.e. events where at least two regions of interest are active simultaneously), detecting instances where many regions of interest are coactive simultaneously, or detecting instances where two or more regions of interest are each active in a certain temporal relationship with respect to one another (also referred to as coactivity).
In a further exemplary embodiment, the data analysis involves calculating a correlation coefficient between two regions of interest based on how often the regions of interest are coactive relative to how often the first region is active. A map of all such regions is displayed with lines between the regions having a thickness proportional to the correlation coefficient between the two regions.
Another exemplary embodiment includes plotting a cross-correlogram or histogram of events of interest in a particular region of interest with respect to events of interest in another region of interest, so that the histogram will reveal the number of times an event of interest in the first region of interest occurs a certain number of locations away from an event of interest in the second region of interest in the second simplified data array. The cross-correlogram can be plotted with respect to one region of interest, thus showing how many times an event of interest occurs before or after the occurance of another event of interest in the same region of interest.
Other exemplary embodiments include performing Hidden Markov Modeling on the second simplified data array to determine a hidden Markov state sequence and displaying a cross-correlogram between events of interest occurring in one region of interest while that region is in one of the detected Markov states and performing a singular value decomposition on the first simplified data array.