A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure exactly as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention is related to systems, methods, and products for managing biological data generated by scanning arrays of biological materials.
Synthesized nucleic acid probe arrays, such as GeneChip(copyright) probe arrays available from Affymetrix, Inc. of Santa Clara, Calif., and spotted probe arrays such as those made using 417(trademark) or 427(trademark) Arrayers from Affymetrix, have been used to generate unprecedented amounts of information about biological systems. For example, the GeneChip(copyright) Human Genome U133 Set (HG-U133A and HG-U133B) from Affymetrix is available on two microarrays containing over 1,000,000 unique oligonucleotide features covering more than 39,000 transcript variants that represent more than 33,000 human genes. Analysis of expression data from such microarrays may lead to the development of new drugs and new diagnostic tools.
There is a demand among users of probe arrays for methods and systems for accessing, analyzing, and managing the vast amount of information collected using nucleic acid probe arrays or using other types of probe arrays. These methods and systems may include the use of software applications and related hardware that implement so-called data mining tools. The operations of these data mining tools typically are facilitated by organizing the data that they mine in appropriate formats. It often is desirable to employ data management applications to provide these formatting operations, as well as to provide other data management functions.
Systems, methods, and computer program products are described herein to address these and other needs. Reference will now be made in detail to illustrative, non-limiting, embodiments. Various other alternatives, modifications and equivalents are possible. For example, while certain systems, methods, and computer software products are described using exemplary embodiments for analyzing data from experiments that employ Affymetrix(copyright) GeneChip(copyright) probe arrays and/or spotted arrays made using arrayers from Affymetrix, Inc., these systems, methods, and products generally may be applied with respect to many other probe arrays and parallel biological assays.
In one embodiment, a data mining tool is described for mining data that, optionally, may be provided in a publish database. The word publish used as an adjective in this context refers to a database that is in a format, and/or organized in accordance with a schema, to facilitate access by data analysis applications, data mining applications, data reporting applications, other data processing applications, or any combination thereof. The word access in this context refers to storing, retrieving, or otherwise manipulating data. The word publish and its grammatical variants used as a verb in this context refer to formatting and/or organizing a database so that it is accessible as a publish database.
More particularly, in some embodiments a data mining tool is described that includes a data structure populator that stores one or more first sets of data selected for querying into a first data structure. The tool also has a query builder that provides at least a first query based, at least in part, on one or more query parameters. Also included in the tool is a query manager that interrogates the first data structure with the first query. In these embodiments, the one or more first sets of data are based, at least in part, on experiments related to synthesized probe arrays and spotted probe arrays.
In some implementations of these embodiments, the data structure populator includes a pivot table populator, and the first data structure is a pivot table. At least one of the one or more first sets of data may be user-selected, and the one or more query parameters may, at least in part, be user-selected.
The data mining tool may also include a database registration processor that provides the one or more first sets of data to the data structure populator based, at least in part, on a user selection of at least one database. The at least one database may be organized in accordance with an database schema integrated for both synthesized probe array data and spotted probe array data, such as the AADM from Affymetrix, Inc. of Santa Clara, Calif., schema. The data mining tool may further include a query parameter provider that provides the one or more query parameters to the query builder based on a user selection of at least one of the one or more query parameters. Yet another element of the data mining tool, in some implementations, is a results tables and graphs builder that graphically displays data returned as a result of the first query. This display may include any one or more of a table, spreadsheet, scatter plot, histogram, series plot, and/or fold change plot.
In these or other embodiments, the data mining tool may also include a query parameter provider that provides the one or more query parameters to the query builder based on a selection of at least one of the one or more first sets of data, wherein: the one or more first sets of data are selected to include combinations of comparison analyses for two or more sets of replicate data, and the selection of at least one of the one or more query parameters includes a selection of difference call results for all of the comparison analyses. The selection of at least one of the one or more query parameters may further include a ranking based on a count of the difference call results, and/or a ranking based on a percentage of the difference call results.
Yet other embodiments are directed to a data mining method including the following steps: storing one or more first sets of data selected for querying into a first data structure; providing at least a first query based, at least in part, on one or more query parameters; and interrogating the first data structure with the first query. In this method, the one or more first sets of data are based, at least in part, on experiments related to synthesized probe arrays and spotted probe arrays. Other embodiments are directed to a computer program product for data mining comprising a computer usable medium storing control logic that, when executed on a computer system, performs this method.
Also described are embodiments directed to a computer display system. The system includes a processor (e.g., computer CPU) coupled to a display device such that data is displayed on the display device in a graphical user interface (GUI). In this system, a method is performed for displaying the GUI and operating upon data displayed in or received from the GUI, including the following steps: (1) displaying a first frame in the GUI including a first set of graphical elements corresponding to at least one set of synthesized probe array results, and a second set of graphical elements corresponding to at least one set of spotted probe array results; (2) receiving a user selection of one or more of the first or second sets of graphical elements for querying; (3) displaying a second frame in the GUI including a third set of graphical elements capable of representing query criteria; (4) receiving a user selection of one or more query criteria based on the third set of graphical elements; (5) providing a first query based on the query user-selected query criteria; and (6) querying with the first query at least one set of the at least one set of synthesized probe array results or the at least one set of spotted probe array results.
In accordance with yet further embodiments, a computer system is described that includes a processor and a memory unit. Stored in the memory unit is a set of data mining instructions that, when executed by the processor, performs a method comprising the steps of storing first data into a first data structure provided from a first database and interrogating the first data structure with a first query. At least a portion of the first data are based, at least in part, on experiments related to at least one synthesized probe array and at least one spotted probe array. In some implementations, the first data are selected to include combinations of comparison analyses for two or more sets of replicate data.
Other embodiments include a computer system that has a processor and a memory unit. Stored in the memory unit is a set of data mining instructions that, when executed by the processor, performs a method comprising the steps of storing first data into a first data structure provided from a first database and interrogating the first data structure with a first query, wherein the first data include combinations of comparison analyses for two or more sets of replicate biological data.
The above embodiments and implementations are not necessarily inclusive or exclusive of each other and may be combined in any manner that is non-conflicting and otherwise possible, whether they be presented in association with a same, or a different, aspect or implementation. The description of one implementation is not intended to be limiting with respect to other implementations. Also, any one or more function, step, operation, or technique described elsewhere in this specification may, in alternative implementations, be combined with any one or more function, step, operation, or technique described in the summary. Thus, the above implementations are illustrative rather than limiting.