I. Fluorescent Activated Cell Sorting (FACS)
Flow cytometry is a technique for obtaining information about cells and cellular processes by allowing a thin stream of a single cell suspension to “flow” through one or more laser beams and measuring the resulting light scatter and emitted fluorescence. Since there are many useful ways of rendering cells fluorescent, it is a widely applicable technique and is very important in basic and clinical science, especially immunology. Its importance is increased by the fact that it is also possible to sort fluorescent labeled live cells for functional studies with an instrument called the Fluorescence Activated Cell Sorter (FACS).
Flow cytometry is computerized because without computers the data analysis would be infeasible. As flow cytometry has matured, the importance of combining flow data with data from other sources has become clear, as has the need for multi site collaborations, particularly for clinical research. This lead to our interest in developing methods for naming or identifying flow cytometry samples, reagents and instruments (among other things) and in maintaining a shared repository of information about the samples etc.
Flow cytometry was revolutionized in the late 1970s with the introduction of monoclonal antibodies that could be coupled to a fluorochrome and used as FACS reagents. However, nomenclature for these reagents has been a hodgepodge, in spite of the fact that monoclonals are useful precisely because they can be uniquely and accurately named, i.e., the antibody produced by a clone is always the same whereas naturally produced sera are highly variable. Our work in capturing the experimental semantics of FACS experiments made it clear that we needed at least a local nomenclature and underscored the value of a global nomenclature for FACS data and monoclonal antibodies, which are useful in many fields beside flow cytometry.
II. DNA Arrays/Microarrays
During the past decade, the development of array-based hybridization technology has received great attention. This high throughput method, in which hundreds to thousands of polynucleotide probes immobilized on a solid surface are hybridized to target nucleic acids to gain sequence and function information, has brought economical incentives to many applications. See, e.g., McKenzie, et al., Eur. J of Hum. Genet. 6:417-429 (1998), Green et al., Curr.Opin. in Chem. Biol. 2:404-410 (1998), and Gerhold et al., TIBS, 24:168-173 (1999).
III. Gels
Gel electrophoresis is a standard technique used in biology. It is designed to allow sample to be pulled through a semisolid medium such as agar by an electromagnetic force. This technique allows for separation of small and macromolecules by either their size or charge.
IV. Prior Art
Although there are wide variety of tools that purport to help scientists deal with the complex data collected in today's laboratories, virtually all of these so-called Laboratory Information Systems (LIMS) or Electronic Laboratory Notebook systems (ELNs) approach data collection and management from the perspective of final data output and interpretation. None of these systems addresses the basic needs of the bench scientist, who lacks even minimal tools for automating the collection and storage of data annotated with sufficient information to enable its analysis and interpretation as a study proceeds.
The absence of automated support for this basic laboratory function, particularly when data is collected with today's complex data-intensive instrumentation, constitutes a significant block to creative and cost-effective research. Except in very rare instances, the study and experiment descriptions that scientists use to interpret the digitized data these instruments generate are stored in paper-bound notebooks or unstructured computer files whose connection to the data must be manually established and maintained. The volatility of these connections, aggravated by turnover in laboratory personnel, makes it necessary to complete the interpretation of digitized data as rapidly as possible and seriously shortens the useful lifetime of data that could otherwise be mined repeatedly.
In addition, because paper notebook or unstructured computer information is difficult to make available to other investigators, particularly at different sites or across time, laboratories that would like to make their primary data or their specific findings available to collaborators or other interested parties are unable to do so. Thus, although computer use now facilitates many aspects of research, and although the Internet now makes data sharing and cooperative research possible, researchers are prevented from taking full advantage of these tools by the lack of appropriately tailored computer support for integrating and accessing their work.
Finally, because the minimal computerized support for research that currently exists has developed piecemeal, usually in response to needs encountered during collection of particular kinds of data, no support currently exists for providing lateral support to integrate different types of data collected within an overall study. For example, although automated methods for collecting, maintaining and using DNA microarray data are now becoming quite sophisticated, the integration of these data with information about the source of the material analyzed, or with data or results from FACS or other types analyses done with the same material, is largely a manual task requiring recovery of data and information stored on paper or in diverse files at diverse locations that are often known only to one or a small number of researchers directly concerned with the details of the project. In fact, it is common for individual bench scientists to repeat experiments sometimes several times because key information or data was “misplaced” or its location lost over time.
V. Protégé
Protégé is a knowledge based programming language developed at Stanford University. Information regarding protégé may be retrieved from the web-site http://protege.stanford.edu. Protégé is an ontology editor and a knowledge-based editor. An ontology is an object oriented database which captures knowledge of a domain. Protégé is also a Java tool that provides an extensible architecture for the creation of customized knowledge-based tools. The tools of Protégé also provide for customized knowledge acquisition forms and the entry of domain knowledge. Protégé further provides a platform that can be extended with graphical widgets for tables, diagrams, and animation components to access other knowledge-based systems embedded application. Last Protégé is a library that other applications can use to access and display knowledge bases.