A portion of the disclosure of this patent document contains material that is subject to copyright protection. The copyright owner has no objection to the facsimile reproduction by anyone of the patent document or the patent disclosure exactly as it appears in the Patent and Trademark Office patent file or records, but otherwise reserves all copyright rights whatsoever.
The present invention is related to systems, methods, and products for managing biological data generated by scanning arrays of biological materials.
Synthesized nucleic acid probe arrays, such as GeneChip(copyright) probe arrays available from Affymetrix, Inc. of Santa Clara, Calif., and spotted probe arrays such as those made using 417(trademark) or 427(trademark) Arrayers from Affymetrix, have been used to generate unprecedented amounts of information about biological systems. For example, the GeneChip(copyright) Human Genome U133 Set (HG-U133A and HG-U133B) from Affymetrix is available on two microarrays containing over 1,000,000 unique oligonucleotide features covering more than 39,000 transcript variants that represent more than 33,000 human genes. Analysis of expression data from such microarrays may lead to the development of new drugs and new diagnostic tools.
There is a demand among users of probe arrays for methods and systems for accessing, analyzing, and managing the vast amount of information collected using nucleic acid probe arrays or using other types of probe arrays. These methods and systems may include the use of software applications and related hardware that implement so-called data mining tools. The operations of these data mining tools typically are facilitated by organizing the data that they mine in appropriate formats. It often is desirable to employ data management applications to provide these formatting operations, as well as to provide other data management functions.
Systems, methods, and computer program products are described herein to address these and other needs. Reference will now be made in detail to illustrative, non-limiting, embodiments. Various other alternatives, modifications and equivalents are possible. For example, while certain systems, methods, and computer software products are described using exemplary embodiments for analyzing data from experiments that employ Affymetrix(copyright) GeneChip(copyright) probe arrays and/or spotted arrays made using arrayers from Affymetrix, these systems, methods, and products generally may be applied with respect to many other probe arrays and parallel biological assays.
In one embodiment, a data manager is described for providing a publish database. The word publish used as an adjective in this context refers to a database that is in a format, and/or organized in accordance with a schema, to facilitate access by data analysis applications, data mining applications, data reporting applications, other data processing applications, or any combination thereof. The word access in this context refers to storing, retrieving, or otherwise manipulating data. The word publish and its grammatical variants used as a verb in this context refer to formatting and/or organizing a database so that it is accessible as a publish database.
The data manager in these embodiments includes a results-for-publication identifier that identifies synthesized probe array results and spotted probe array results for publishing. This identification may be based, at least in part, on user selections. As used in this context, identifies and its grammatical variants is intended to be understood broadly. For example, a list of probe array results may simply be consecutively identified for publishing, or various criteria (based on time of experiment, type of experiment, a priority indicator representing the importance of the experiment, and so on) may be used to selectively identify probe array results for publishing or for publishing in a certain order. The data manager also includes a publisher that publishes the synthesized probe array results and the spotted probe array results in a publish database.
The data in the publish database may be organized in accordance with an integrated database schema. The term database schema refers to a scheme for relationships among database entities. In some implementations, the database scheme may include entity relationships among database objects. The database typically is a relational database. As is well known to those of ordinary skill in the relevant art, a number of tools exist for designing and documenting database schema, such as Erwin(copyright) software from Computer(copyright) Associates International, Inc. of Islandia, N.Y. The word integrated in this context means that both data in the publish database related to the synthesized probe array results and data in the publish database related to the spotted probe array results are included in the same database schema. One example of an integrated database schema is the AADM schema from Affymetrix, described below. In some implementations of these embodiments, the synthesized probe array results are in a first format, the spotted probe array results are in a second format, and the publish database is in a third format. That is, the publisher typically converts data from the first and second formats into a third format. The publisher typically stores the publish database in a memory unit of a computer. In these implementations, data in the publish database typically is addressable from a common reference address of the memory unit. Also, the publisher may store the publish database in the memory unit as one or more related files. These files may be related, for example, by using a common name and distinguishing the files based on different file extensions, or in accordance with any of a variety of other methods and techniques known to those of ordinary skill in the relevant art.
The data manager may, in some implementations, also include an experimental results registration processor that registers the synthesized probe array results and the spotted probe array results for publishing. This registration may be based, at least in part, on user selections. For example, a user may select certain probe array results from a graphical user interface displaying a tree of files containing probe array results from multiple experiments with synthesized and/or spotted probe arrays.
In accordance with other embodiments, a method is described for providing a publish database. The method includes the steps of identifying synthesized probe array results and spotted probe array results for publishing, and publishing the synthesized probe array results and the spotted probe array results in a publish database.
In yet other embodiments, a method is described for displaying a graphical user interface (GUI) in a computer display system having a processor coupled to a display device. Data is displayed on the display device in the GUI according to the following steps: (1) displaying a first frame in the GUI including a first set of graphical elements corresponding to at least one set of synthesized probe array results, and a second set of graphical elements corresponding to at least one set of spotted probe array results; (2) receiving a user selection of one or more of the first or second sets of graphical elements for publication in a publish database; and (3) displaying a second frame in the GUI including a third set of graphical elements corresponding to the publish database and, in relation thereto, a fourth set of one or more graphical elements corresponding to those of the first or second sets of graphical elements selected by the user. The first frame may include a data file view wherein the first and second sets of graphical elements are arranged in a first tree structure. The first set of graphical elements may be arranged in one branch of the first tree structure and the second set of graphical elements may be arranged in another branch of the first tree structure. In these implementations, at least one of the first set of graphical elements may correspond to a synthesized probe array result file; and at least one of the second set of graphical elements may correspond to a spotted probe array result file. Also, the second frame may include an active database view wherein the third set of graphical elements is associated with a root of a tree structure, and the one or more graphical elements corresponding to those of the first or second sets of graphical elements selected by the user are associated with branches attached to the root.
In these or other embodiments, the method of displaying the GUI may also include receiving a user activation of a publishing operation to be applied to one or more of the set of synthesized probe array results and/or one or more of the set of spotted probe array results. The step of receiving a user activation may include receiving a user selection of one or more of the fourth set of graphical elements, and causing the probe array results corresponding to the user-selected ones of the fourth set of graphical elements to be published to the publish database.
In another embodiment, a graphical user interface (GUI) is described for use with a computer display system having a processor coupled to a display device such that data is displayed on the display device in the GUI. The GUI includes: (1) a first frame including a first set of graphical elements corresponding to at least one set of synthesized probe array results, and a second set of graphical elements corresponding to at least one set of spotted probe array results; and (2) a second frame including a third set of graphical elements corresponding to the publish database and, in relation thereto, a fourth set of one or more graphical elements corresponding to those of the first or second sets of graphical elements selected by a user.
In a further embodiment, a computer program product is described for providing a publish database that, when executed on a computer system, performs a method comprising the steps of: (1) identifying at least one set of synthesized probe array results and at least one set of spotted probe array results for publishing; and (2) publishing the at least one set of synthesized probe array results and the at least one set of spotted probe array results as a first set of data in a publish database.
In yet a further embodiment, a computer system having a processor and a memory unit is described. A set of data management instructions, when stored in the memory unit and executed by the processor, performs a method for providing a publish database comprising the acts of identifying at least one set of synthesized probe array results and at least one set of spotted probe array results for publishing, and publishing the at least one set of synthesized probe array results and the at least one set of spotted probe array results as a first set of data in a publish database.
The above embodiments and implementations are not necessarily inclusive or exclusive of each other and may be combined in any manner that is non-conflicting and otherwise possible, whether they be presented in association with a same, or a different, aspect or implementation. The description of one implementation is not intended to be limiting with respect to other implementations. Also, any one or more function, step, operation, or technique described elsewhere in this specification may, in alternative implementations, be combined with any one or more function, step, operation, or technique described in the summary. Thus, the above implementations are illustrative rather than limiting.