This invention relates to database systems and methods for storing and manipulating experimental data.
The discovery of new materials with novel chemical and physical properties often leads to the development of new and useful technologies. Traditionally, the discovery and development of materials has predominantly been a trial and error process carried out by scientists who generate data one experiment at a time. This process suffers from low success rates, long time lines, and high costs, particularly as the desired materials increase in complexity. There is currently a tremendous amount of activity directed towards the discovery and optimization of materials, such as superconductors, zeolites, magnetic materials, phosphors, catalysts, thermoelectric materials, high and low dielectric materials and the like. Unfortunately, even though the chemistry of extended solids has been extensively explored, few general principles have emerged that allow one to predict with certainty the composition, structure and/or reaction pathways for the synthesis of such solid state materials.
As a result, the discovery of new materials depends largely on the ability to synthesize and analyze large numbers of new materials. Given approximately 100 elements in the periodic table that can be used to make compositions consisting of two or more elements, an incredibly large number of possible new compounds remain largely unexplored, especially when processing variables are considered. One approach to the preparation and analysis of such large numbers of compounds has been the application of combinatorial chemistry.
In general, combinatorial chemistry refers to the approach of creating vast numbers of compounds by reacting a set of starting chemicals in all possible combinations. Since its introduction into the pharmaceutical industry in the late 1980""s, it has dramatically sped up the drug discovery process and is now becoming a standard practice in that industry (Chem. Eng. News Feb. 12, 1996). More recently, combinatorial techniques have been successfully applied to the synthesis of inorganic materials (G. Briceno et al., SCIENCE 270, 273-275, 1995 and X. D. Xiang et al., SCIENCE 268, 1738-1740, 1995). By use of various surface deposition techniques, masking strategies, and processing conditions, it is now possible to generate hundreds to thousands of materials of distinct compositions per square inch. These materials include high Tc, superconductors, magnetoresistors, and phosphors.
Using these techniques, it is now possible to create large libraries of diverse compounds or materials, including biomaterials, organics, inorganics, intermetallics, metal alloys, and ceramics, using a variety of sputtering, ablation, evaporation, and liquid dispensing systems as disclosed in U.S. Pat. Nos. 5,959,297, 6,004,617 and 6,030,917, which are incorporated by reference herein.
The generation of large numbers of new materials presents a significant challenge for conventional analytical techniques. By applying parallel or rapid serial screening techniques to these libraries of materials, however, combinatorial chemistry accelerates the speed of research, facilitates breakthroughs, and expands the amount of information available to researchers. Furthermore, the ability to observe the relationships between hundreds or thousands of materials in a short period of time enables scientists to make well-informed decisions in the discovery process and to find unexpected trends. High throughput screening techniques have been developed to facilitate this discovery process, as disclosed, for example, in U.S. Pat. Nos. 5,959,297, 6,030,917 and 6,034,775, which are incorporated by reference herein.
The vast quantities of data generated through the application of combinatorial and/or high throughput screening techniques can easily overwhelm conventional data acquisition, processing and management systems. Existing laboratory data management systems are ill-equipped to handle the large numbers of experiments required in combinatorial applications, and are not designed to rapidly acquire, process and store the large amount of data generated by such experiments, imposing significant limitations on throughput, both experimental and data processing, that stand in the way of the promised benefits of combinatorial techniques.
Basing laboratory data management systems on current relational or object-oriented databases leads to significant limitations. Those based on relational systems struggle to provide a facility for effectively defining and processing data that is intrinsically hierarchical in nature. Those based on current object-oriented databases struggle to offer the processing throughput necessary and/or may lack the flexibility of recomposition of the internal data or direct access into the internal structures of the objects as relational systems do with the relational view. Thus, there is a need for laboratory data management systems that combine the ability to process hierarchical data offered by object-oriented approaches and the processing power and/or flexibility of relational database systems.
In general, in one aspect, the invention provides methods, apparatus, including computer program apparatus, and laboratory data management systems implementing techniques for processing data (including, e.g., receiving, manipulating and/or storing data) from chemical experimentation for or on a library of materials or a subset of such a library of materials. The techniques can include receiving data from a chemical experiment on a library of materials having a plurality of members and generating a representation of the chemical experiment. The representation includes data defining an experiment object having a plurality of properties derived from the chemical experiment. The experiment object is associated with the library of materials. The representation also includes data defining one or more element objects, each of which is associated with one or more members of the library of materials.
Particular implementations of the invention can include one or more of the following advantageous features. The chemical experiment can have a type that is one of a pre-defined set of one or more experiment types. The representation can implement a data model describing the set of experiment types, the data model including an experiment base class having a set of experiment base class properties including a classname property for identifying a derived experiment class and a library ID property for identifying a library of materials, and one or more derived experiment classes, each of which is associated with one of the experiment types and has a plurality of derived experiment class properties derived from the associated experiment type. The chemical experiment can be represented by a first experiment object instantiated from the derived experiment class associated with the type of the relevant chemical experiment, and by a second experiment object instantiated from the experiment base class, the classname property of the second experiment object having a value identifying the derived experiment class associated with the experiment type of the chemical experiment, and the library ID property of the second experiment object having a value identifying the library of materials.
The representation can include data defining one or more data set objects or image objects. Data set objects can include sets of one or more values derived from the chemical experiment, each value being associated with one or more of the members of the library of materials. Image objects can include data representing a state of some or all of the members of the library of materials at a time during the chemical experiment. Data set objects and image objects can be associated with properties of an associated experiment object. The representation can include a self-describing representation of the chemical experiment, such as an XML string, and can also include a Java object, a COM IDL interface or a CORBA IDL interface describing the chemical experiment.
The techniques can also include parsing the representation to map the data from the chemical experiment to tables in a relational database based on the properties of an associated experiment object. Parsing the representation can include identifying each of a plurality of XML entities in an XML stream, each entity having associated content; mapping each XML entity into a corresponding object property; and assigning the content associated with an XML entity to a database table based on the corresponding object property. Derived experiment class properties can include properties derived from parameters of an associated experiment type. Data set object values can be derived from or measured for parameters of the chemical experiment, and data set objects can be associated with experiment object properties derived from the corresponding experiment parameters.
The techniques can also include storing the content in the assigned database table in the relational database, and the database can be searched to return a search result including data identifying a set of element objects satisfying search terms specified for one or more searchable fields. Search results can be stored as lists of element objects that satisfy the search terms of a query. Element object values can be displayed for one or more displayable fields. Object representations of chemical experiment data can be reconstructed from the database based on an object identifier specifying content to be retrieved from the database, from which content an object representation is generated based on a class name included in the specified content. The object representation can be mapped to an XML stream describing the content.
In general, in another aspect, the invention provides a data model for describing data from a set of pre-defined types of chemical experiments capable of being performed on a library of materials. The data model includes an experiment base class, one or more derived experiment classes, and an element class. The experiment base class has a set of experiment base class properties including a classname property for identifying a derived experiment class and a library ID property for identifying a library of materials. The derived experiment classes are associated with respective experiment types and have a plurality of derived experiment class properties derived from the associated experiment type. The element class has a plurality of element class properties including a position property for identifying one or more members of a library of materials and a value property for storing a value derived from a chemical experiment for the members identified by the position property. A specific experiment in the set of pre-defined experiments is represented by a first experiment object instantiated from the derived experiment class associated with the type of the chemical experiment, and by a second experiment object instantiated from the experiment base class. The classname property of the second experiment object has a value identifying the derived experiment class associated with the experiment type of the chemical experiment, while the library ID property of the second experiment object having a value identifying the library of materials.
Advantages that can be seen in implementations of the invention include one or more of the following. Decoupling of client processes from the database isolates the experimental process from the underlying data storage, which means that client processes may be extended without requiring immediate extension of the database schema, and/or that the database is protected from unintended and/or unauthorized alteration by client processes. The database can be extended and remain backward compatible with existing client processes. Data is persisted in a human-readable format, aiding in error diagnosis. Using a common schema to store data describing different experiment types means that objects representing experimental data can be recomposed across technical disciplines.
The details of one or more embodiments of the invention are set forth in the accompanying drawings and the description below. Other features, objects, and advantages of the invention will be apparent from the description and drawings, and from the claims.