This invention relates to the crystallization of molecules, in particular, to a method and system for capturing a large number of crystallization trial observations and creating a relational database based on the trial observations.
Macromolecular x-ray crystallography is an essential aspect of modern drug discovery and molecular biology. Using x-ray crystallographic techniques, the three-dimensional structures of biological macromolecules, such as proteins, nucleic acids, and their various complexes, can be determined at practically atomic level resolution. The enormous value of three-dimensional information has led to a growing demand for innovative products in the area of protein crystallization, which is currently the major rate limiting step in x-ray structure determination.
One of the first and most important steps of the x-ray crystal structure determination of a target macromolecule is to grow large, well diffracting crystals with the macromolecule. As techniques for collecting and analyzing x-ray diffraction data have become more rapid and automated, crystal growth has become a rate limiting step in the structure determination process.
Vapor diffusion is the most widely used technique for crystallization in modern macromolecular x-ray crystallography. In this technique, a small volume of the macromolecule sample is mixed with an approximately equal volume of a crystallization solution. The resulting drop of liquid (containing macromolecule and dilute crystallization solution) is sealed in a chamber with a much larger reservoir volume of the crystallization solution. The drop is kept separate from the reservoir, either by hanging from a glass cover slip or by sitting on a tiny pedestal. Over time, the crystallization drop and the reservoir solutions equilibrate via vapor diffusion of the volatile species. Supersaturating concentrations of the macromolecule are achieved, resulting in crystallization in the drop when the appropriate reservoir solution is used.
The process of growing biological macromolecule crystals remains, however, a highly empirical process. Macromolecular crystallization is a hyperdimensional phenomena, dependent on a host of experimental parameters including pH, temperature, and the concentration of salts, macromolecules, and the particular precipitating agent (of which there are hundreds). A sampling of this hyperspace, via thousands of crystallization trials, eventually leads to the precise conditions for crystal growth. Thus, the ability to rapidly and easily generate many crystallization trials is important in determining the right conditions for crystallization. Also, since so many multidimensional data points are generated in these crystallization trials, it is imperative that the experimenter be able to accurately record and analyze the data so that promising conditions are pursued, while no further time, resources, and effort are spent on negative conditions.
Recently, an international protein structure initiative has taken shape with the goal of determining the three dimensional structures of all representative protein folds. This massive undertaking in structural biology which may some day rival the human genome sequencing project in size and scope, is estimated to require a minimum of 100,000 x-ray structure determinations of newly discovered proteins for which no structural information is currently available or predicted. For perspective, the total number of reported novel crystal structures determined to date (spanning nearly 50 years of work) is only approximately 10,000.
Using existing methods for the crystallization of proteins (random screens of conditions) the protein structure initiative will require a minimum of approximately 100 million crystallization trials. In addition the biological information gleaned from genomic research in the protein structure initiative are expected to create even more demand for structural information. Specifically, the biotechnology and pharmaceutical industries are estimated to require upwards of ten fold more protein crystallization experiments (one billion) as a result of research and structure based drug design and the use of crystallized therapeutic proteins. This would require that each of the approximately 500 macromolecular crystallography labs worldwide be responsible for setting up approximately 2000 crystallization trials every working day of the year for five years. Currently, there is no known device available for collecting for analysis macromolecular crystallization data on this scale. Thus, there is a need for a device that permits the efficient capture and storing of large amounts of crystallization trial data information.
The present invention is directed to a providing a system and method for capturing large amounts of crystallization trial data and storing the captured data in a relational database for subsequent analysis. The preferred implementation of the present invention includes software having a plurality of database managers. Preferably, the database managers include a trial manager, a solution manager, a matrix manager, a compound buffer manager, a chemical manager, an apparatus manager, a subunit manager, a macromolecule manager, a macromolecule formulation manager, a complex macromolecule formulation manager, a manufacturer manager, a collaborator manager, a project manager, and a user manager. Preferably, the software also includes query tools for mining the database and a database object manager for managing database objects. To assist a user of the database, preferably, the software also includes a help tool for managing help documentation. In essence, the software provides a crystallographer with a graphical user interface (GUI) to facilitate the entry of relevant data into a relational database. In addition, to the software, the presently preferred implementation of the present invention includes a trial observation system; i.e., a microscope for viewing the results of crystallization trials; and, preferably, a positioning mechanism for positioning the results of multiple crystallization trials within the observation area of the trial observation system.
In accordance with other aspects of this invention, the software resides in a computer having a central processing unit (CPU) and a memory.
In accordance with further aspects of the invention, the trial manager includes a sequence of executable steps that causes the computer to launch xe2x80x9cbuildersxe2x80x9d to guide the user through the initial setup of the trials in order to arrive at the ultimate goal of collecting crystallization trial data for storage in a relational database. The builders include a crystallization trial builder that, in its normal mode, captures all aspects of a crystallization trial in which all crystallization drops receive the same composition of macromolecule formulations and other solutions together with a specified volume of crystallant from each of the crystallants that comprise a specified crystallization matrix. A GUI builder takes the user through all necessary steps required to set up a crystallization trial.
In accordance with further aspects of this invention, the information captured by the crystallization, trial builder includes, but is not limited to information such as: a unique trial ID, a project name, gas purge information (specifies the use of gasses to bathe the crystallization setup), temperature (the temperature at which the crystallization trial will be conducted), temperature units, reservoir volume, reservoir volume units, preparation date, an oil overlay specification (oil can be chosen and the volume specified for the placement of oil on top of the crystallization drop or on top of the reservoir solution), oil overlay units, the order of addition and volume of macromolecule formulations and other solutions that are added to the crystallization drop including the volume of the crystallant that is added to the crystallization drop, barcode IDs for each crystallization apparatus that is used in the trial, the name of the apparatus, the name of the collaborator, and the name of the user (defaults to the user that is logged on).
As noted above, preferably, the crystallization trial builder captures all aspects of a complex or combinatorial crystallization trial where the reservoir chambers of a crystallization apparatus can be filled with a specified volume of any crystallant from any existing matrix. The crystallization drops can individually receive specified compositions of macromolecule formulations and other solutions together with a specified volume of crystallant and in a specified order of addition. The crystallization trial builder includes a GUI that takes the user through all necessary steps required to set up a complex or combinatorial crystallization trial.
In accordance with still other aspects of this invention, the solutions database manager captures the composition of all the solutions that are not part of a crystallization matrix. Preferably, solutions can be categorized as either: buffer, heavy atom, additive, formulations, or stock solutions. Solutions are given a name and can contain a descriptive comment. The user can build any solution from the chemicals in a database. To do this, a solvent is defined and then chemicals are selected and final concentrations are specified. Individual chemical components of solutions include a tag that indicates the user""s intended use of each chemical in the solution. For each solution created, preferably, the user can specify the solution name, the pH of solution, the pH determination method, the viscosity, the vapor pressure osmolality, the osmolality units, the conductivity, and the conductivity units. Stock solutions have the special feature that they can be created with only one compound buffer or chemical. This rule is enforced by a GUI that forms part of the solutions database manager. The utility of stock solutions is that they provide a source of chemicals and solutions from which crystallization solutions can be prepared.
In accordance with still further aspects of this invention, the matrix manager captures information on the crystallization conditions present in any crystallization screen or crystal growth matrix (sets of solutions used to promote protein crystallization) that the user may build. Preferably, a database containing commercially available crystal growth matrices is preloaded into the memory of the computer.
As for other, less commercially available crystallization matrices, new matrices are created using a new matrix builder GUI included in the matrix manager that is comprised of a series of window utilities that enable a user to specify the total number of crystallization solutions in a matrix, and how they are arranged in a grid with a specified number of columns. Preferably, the user can specify the name of the matrix, the name of a manufacturer if the matrix is commercial, or the name of the user responsible for preparing the matrix if the matrix is noncommercial. Preferably, the individual crystallization solutions that comprise a matrix are created using several powerful GUI tools that enable a user to do the following: drag and drop an existing crystallization solution from one matrix into a specific position within a matrix that is being newly created, copy an existing crystallization solution from one matrix into any number of specific positions within a matrix that is being newly created, copy a diluted (with specified percentage dilution factor) crystallization solution from one matrix into any number of specific positions within a matrix that is being newly created, add any chemical at a specified concentration to any crystallization solution that is being created, specify the physical properties of the crystallization solution including the vapor pressure, osmolality, pH (estimated or measured), conductivity, the conductivity and osmolality units, and the viscosity. If a matrix has one or more chemicals or buffers that have been systematically varied in concentration within the matrix grid, preferably, the user is allowed to specify which component has been varied along either the X or Y axis of the matrix grid.
In accordance with yet other aspects of this invention, the compound buffer manager captures and maintains a list of compound buffers and their final pHs. The compound buffer manager includes a GUI that enables a user to select a buffering agent and combine it with the pH conjugate. Preferably, only one buffering agent and pH conjugate is allowed for any one compound buffer. Once the combination of buffering agent and pH conjugate has been made, the user can specify the final pH.
In accordance with still further aspects of this invention, the chemicals manager captures and maintains information used in macromolecular crystallography, preferably including the following properties: chemical name, abbreviated chemical name, chemical formula, molecular mass, physical state at room temperature, density, manufacturer of the chemical, catalog number, and CAS number. The chemicals manager allows a user to categorize chemicals according to their intended use in a crystallization experiment. Preferably, the categories include: buffering agents, pH conjugates, precipitants (or crystallants), salts, CSI""s (co-factors, substrates, and inhibitors), chelators, detergents, reducing agents, cryocoolants, nucleation suppressants, organics, heavy atom compounds, metals, gases, solvents or other. Once a chemical has been put into a database, it can be associated with one or more of the different chemical classes. Preferably, the database comes preloaded with over 500 standard chemicals.
In accordance with still further aspects of this invention, the apparatus manager captures and maintains information relating to crystallization apparatuses. Preferably, the captured information includes the apparatus name, the number of columns, the number of rows, the manufacturer and the drop and reservoir dimensions.
In accordance with yet further aspects of this invention, the subunits manager includes a GUI that captures the name, source (e.g., species), molecular mass, pI (Isoelectric point), macromolecule class (several classes are available and are expandable, for example, protein, DNA, RNA), and a comment that is associated with individual macromolecular subunits (single polymeric chains of biological building blocks).
In accordance with still further aspects of this invention, the macromolecule manager captures and maintains descriptions of complex assemblages of macromolecule subunits including, but not limited to, stoichiometry of subunits (the number of each of the subunits present in the complex), source (e.g., species), and a comment that is associated with a particular macromolecular assemblage. Preferably, the macromolecule manages includes a GUI that enables a user to build macromolecules from subunits.
In accordance with yet further aspects of this invention, the macromolecule formulation manager captures and maintains descriptions of macromolecule formulations (macromolecules in a solution). Preferably, the macromolecule formulation manager includes a GUI that enables a user to place any number of macromolecules at a specified concentration into any single solution. The resulting macromolecule formulation is given a name, and additional information is specified including the preparation date, storage temperature, storage temperature units, the preparator name (derived from the user list), and any needed comment and objects, such as image files, or applications, such as spreadsheets or word processing documents and the like.
In accordance with yet still other aspects of this invention, the complex macromolecule formulation manager captures and maintains descriptions of complex macromolecule formulations (mixtures of macromolecule formulations and other solutions). Preferably, the complex macromolecule formulation manager includes a GUI that enables users to mix any number of macromolecule formulations or other solutions at specified volumes. The result of concentrations of all components are recalculated and the new complex macromolecule formulation is given a name, and additional information is specified including the preparations date, storage temperature, storage temperature units, and preparator name (derived from the user list) and any needed comment.
In accordance with yet other aspects of this invention, the manufacturers manager captures and maintains information relating to manufacturers of chemicals, reagents and apparatuses. Preferably, the information is captured by a GUI that includes space for the manufacturer name, phone, street, city, state, zip, country, e-mail, fax, department and web site address. Preferably, clicking on the web address of the manufacturer launches an internet browser that serves up the relevant web pages.
In accordance with yet other aspects of this invention, the collaborators manager captures and maintains information relating to collaborators. Preferably, the information is captured by a GUI that includes space for the collaborator name, phone, street, city, state, zip, country, e-mail, fax, department, and web site address. Preferably, clicking on the web address of the collaborator launches an internet browser that serves up the relevant web pages.
In accordance with yet other aspects of this invention, the projects manager captures and maintains information relating to projects. Preferably, the information is captured by a GUI that includes space for the project name and a comment used to describe the nature of the project.
In accordance with yet other aspects of this invention, the users manager captures and maintains information related to users. Preferably, the information is captured by a GUI that includes space for the user name, password and enrolled status (restricted, user, power user, or administrator). The use of different roles enables the software to limit access to the database and/or aspects of the software through defined levels of security.
In accordance with other further aspects of this invention, the database object manager has features that enable rapid analysis of crystallization trial data. Preferably, crystallization trial data is rapidly retrieved by scanning of barcode labels on crystallization apparatuses or in notebooks. Preferably, the database object manager includes a GUI that allows a user to open multiple trial observation sessions so that they can be viewed side by side. This enables the rapid identification of important factors in crystallization. Individual walls can display pictographical representations of results.
Preferably, the database object manager GUI also provides a crystal query that creates a table describing all of the parameters associated with trial observation data wherein a crystal was observed. Preferably, the table includes information on well position, observation session, trial ID, crystal morphology, crystallant name, crystallant composition, and crystallization drop composition. Also preferably, the table is designed so that it can be alphabetically or numerically sorted by clicking on the heading of any one of these categories. Also preferably, a user can export the entire crystal query table, or portions thereof, to HTML for printing or sharing of information via e-mail or the internet.
In accordance with yet other further aspects of this invention, the software includes a drop composition calculator that allows the user to quickly calculate the concentration of components in a crystallization drop using variable xe2x80x9cvapor diffusion equilibrationxe2x80x9d settings.
In accordance with other aspects of this invention, preferably, the software includes extensive results reporting features suitable for creating hypertext mark-up language (HTML) reports of crystallization trial results including all relevant information and any digital images that may have been captured. Such reports can be printed by an Internet browser or shared via e-mail. Preferably, the reports have the following features: a print well feature that prints only information that pertains to a single crystallization setup, a print plate feature that prints information that pertains to all crystallization setups in a crystallization plate, and a print trial feature that prints all crystallization setups and plates in a trial. Each of the three print options has two modifiers for a single chosen observation session (session with current focus) and all observation sessions (entire history).
In addition to the features that enable the rapid analysis of crystallization trial data, preferably, the software includes special support features such as a database object manager that allows administrator users to have access to a powerful database object manager. This utility allows users to update, delete (as long as database integrity is not violated), backup, and output (HTML text, or SQL script). Also, preferably, the software allows multiple users to be networked to others via a local access network (LAN) that allows simultaneous access. Preferably, this feature is enabled by the use of Windows NT and Microsoft SQL server 7.0. Such enablement allows data security and backup to be managed at both the level of windows NT login account, and at the level of the user login. Enablement using Microsoft SQL server 7.0 also provides various levels of security. If desired, database backup can be achieved by direct copying of a mirrored database to additional storage devices. In addition, this enablement allows the entire database to be output to a SQL text script that can be used to recreate the entire database from scratch, if necessary. In addition, this enablement allows individual database tables to be output to text, SQL script, or HTML files for printing and viewing in a browser. The text files can also be imported into spreadsheet programs or other databases. The SQL scripts run by the enterprise manager of SQL server 7.0 recreate a specific component object within the database. This feature allows multiple users to share composite objects such as crystal growth matrices in the absence of a LAN environment.
As noted above, in a Windows NT/Microsoft SQL Server 7.0 embodiment of the invention, user security can be provided at the level of the Windows NT login, and at the level of user passwords within the application. Multiple user profiles ensure various levels of access to the database. While the platform for an embodiment is the Microsoft Windows Operation System, it should be readily understood other operating system environments may be used.
Also, preferably, the software provides extensive help documentation to a user through the GUI that answers most common questions. If desired, help may be in HTML format and accessed via a call to an external web site.
Other preferred features of the invention are directed to facilitate the collection of crystallization trial data. Barcode stickers may be used to label the crystallization apparatuses. This enables crystallization trial information to be quickly recalled with a simple scan of a barcode that is located on a crystallization apparatus, or which has been placed in a notebook. This allows the user to unambiguously keep track of crystallization apparatuses and crystallization trials. With a simple trace of the barcode, a user is able to immediately access all information on a given trial or immediately start to collect more trial data. Preferably, the software system employs speech recognition and audio feedback. This allows a user to easily capture trial results without having to look away from the trial observation system, i.e., the microscope. Simple commands and an audio feedback allow a user to keep track of where he or she is in the data collection session without having to continuously crosscheck the crystallization trial well that he or she is observing. Preferably, the voice commands are user definable, resulting in very high accuracy of speech recognition. In addition, the user definable speech commands enable languages other than English to be handled. For example, the Russian word for xe2x80x9cclearxe2x80x9d is xe2x80x9cglasnost.xe2x80x9d Typing xe2x80x9cglasnostxe2x80x9d into the command line for xe2x80x9cclearxe2x80x9d directs the speech recognition utility to interpret xe2x80x9cglasnostxe2x80x9d as being equivalent to the word xe2x80x9cclear.xe2x80x9d Hence, any language that has words which can be phonetically spelled in English can be supported. Preferably, the xe2x80x9caudioxe2x80x9d feedback is fed in stereo through computer speakers or a headset. This allows for directionality of sound, which provides a user with a sense of direction as the user examines a crystallization trial well under a microscope. For example, a xe2x80x9cnext wellxe2x80x9d command can result in a swishing sound that moves from left to right which is a standard direction of movement in crystallization examination apparatuses. Preferably, after a result has been input, it is repeated to the user so that the user can be sure that the result was accurately captured. In addition to speech recognition, the GUI allows users to use the pointer to click on a set of defined crystallization results and thereby input the results into the database.
Preferably, trial results are presented pictographically, which allows a user to xe2x80x9cseexe2x80x9d rather than read the results. Pictographs are highly intuitive and can capture greater than 95 percent of all standard crystallization trial results. Preferably, the results are presented pictographically in a computer monitor (laptop or PC) in a grid array. Preferably, the monitor is positioned comfortably next to the microscope, an arrangement which takes advantage of a user""s peripheral vision and enables the user to maintain a sense of positioning during a data collection session. This provides an additional crosscheck to ensure that the correct data is being collected. Preferably, audio feedback is mirrored by written text that appears on the computer monitor, which provides further cross-checking to ensure accurate results reporting. The pictographical results include: clear, precipitate, spherulite skin, phase separation, microcrystal, big block, medium block, small block, tiny block, big pyramid, medium pyramid, small pyramid, tiny pyramid, big hexagon, medium hexagon, small hexagon, tiny hexagon, big needle, medium needle, small needle, tiny needle, big plate, medium plate, small plate, tiny plate, big urchin, medium urchin, small urchin, tiny urchin, big rod, medium rod, small rod, tiny rod, big leaf, medium leaf, small leaf and tiny leaf. Preferably, five different length red bars are used to indicate: the number of crystals or microcrystals, the density of precipitant, the number of spherulites, the number of phase separation and the thickness of skin. One crystal descriptor called xe2x80x9ctwinxe2x80x9d that can apply to all crystals with clear morphology is also displayed. Preferably, Data-types are used to map the conditions which caused crystallization to occur. Other features allow a user to map by chemical type or subunit or macro molecule class. These data fields provide powerful data mining and query capabilities to the relational database of the invention.
Preferably, the GUI of the trial data collection component of the present invention includes a video window which displays a digital video stream of data generated by a digital color camera that is mounted to the microscope via a standard C mount. Also, preferably, a simple voice command can directs the camera to xe2x80x9ctake a picturexe2x80x9d which provides a high resolution color digital record of the specific observation that is made. Preferably, the digital cameras supported by the software include all cameras that support video for windows drivers and can be supplied with a PCMCIA card that can be used in a laptop and a PCI card that can be used in a PC.
If desired, the present invention can be used as a front end component of a bigger software application that controls liquid handling robots designed to set up new crystallization trials. Specifically, an embodiment of the present invention can be used as an interface to a liquid handling robot that sets up crystallization trials using plasticware and chemical reagents Crystallization trials may be monitored by an embodiment of the present invention so as to allow a user to design a series of optimized crystallization solutions. The optimization solutions may be prepared by a second liquid handling robot also under the control of an embodiment of the present invention, which could direct the first liquid handling robot to physically create the optimized crystallization trials solutions. In this way, all aspects of crystallization trials except those that require the trained eye of a user would be mechanized.
The benefits achieved by the present invention are numerous. Prior to the present invention, capturing crystallization trial data required that users spend countless hours behind the microscope, continuously looking away from the microscope to write their observations on sheets of paper in a notebook. Crystallization trials can last for periods of a year or more. The end result has been reams of notebook pages containing the results of thousands of crystallization trials. Since the information was written in notebooks, there was no efficient means for analyzing the data. The present invention provides a customized relational database that allows crystallization trial data to be captured in real time; i.e., crystallization setups are viewed through a microscope. Allowing command and control of the trial observation session to be driven by the user""s voice, via a speech recognition interface, and facilitated by audio feedback ensures a high level of accuracy. Accuracy is high because researchers do not-have to look up from the microscope to write crystallization trial data into a database.
Displaying the results of a crystallization trial pictographically represents a major breakthrough in how crystallization trial observations are presented for analysis. The relational database created by an embodiment of the present invention allows a powerful query language to be used to create specific query tools for analyzing crystallization trial data.
Preferably, all of the graphical user interface is designed to be highly intuitive by minimizing the activities that go on in a crystallization laboratory. In essence, the present invention provides an xe2x80x9celectronic wet labxe2x80x9d that can be used to capture all aspects of the preparation work that goes into setting up modern crystallization trials.