The present invention relates to the field of data entry and retrieval. Specifically, the present invention relates to a method and system having the capability to organize an annotation structure and to query both data and annotations in computer systems. More particularly, the present invention enables the annotation of stored information, and permits the capture, sharing, and querying of data and annotations.
Successful planning and decision making in many technical and other industries depends on the expeditious and correct interpretation of complex information. For example, in the drug industry the data may have origins as diverse as high throughput screening experiments, clinical trials, patent information and research journals. In the petroleum industry the data may span seismic measurements, aerial surveys, laboratory data and economic forecasts. A system capable of providing unified access to disparate data sources and applications reduces the time spent finding, accessing, preparing, transforming and reformatting data, and allows professionals to focus on the interpretation and extraction of knowledge for planning and decision making.
However, one complication with providing this type of unified access is that the data inevitably spans several disciplines, with an attendant probability of misinterpretation. Extensive knowledge of multiple domains is required if misuse is to be avoided.
Therefore, there is still an unsatisfied need for an information management system that clarifies the generation, use, and purpose of the data. The information management system can capture knowledge about the genesis and history of the data, how analyses are done, how decisions are made, and what the outcomes are. This xe2x80x9ccorporate memoryxe2x80x9d forms the basis for the analysis required to make better technical and business decisions.
Several attempts have been made to access information based on annotations. Illustrative attempts are described in the following references:
U.S. Pat. No. 5,404,295 to Katz et al.
U.S. Pat. No. 5,600,775 to King et al.
U.S. Pat. No. 5,832,474 to Lopresti et al.
U.S. Pat. No. 5,548,739 to Yung.
For example, U.S. Pat. No. 5,404,295 describes a method and apparatus for computer retrieval of database material. Annotations are provided for selected database subdivisions and are converted to a structured form and stored in that form along with connections to corresponding subdivisions. Searching for relevant subdivisions involves entering a query in natural language or structured form, converting natural language queries to structured form, matching the structured form query against stored annotations, and retrieving database subdivisions connected to matched annotations.
However, the teaching of this patent is limited to a system with the capability to search the annotations to locate the database material. The system does not have the capability to search the stored information based on both the annotations and database material, or to search on database material to retrieve the annotations. As a result, the system is not suitable for directly locating a subset of data where the filter has predicates on both the annotations and database material. Rather, it will locate all database material that corresponds to the annotation predicates and it would require a second step to filter this subdivision and to apply the data predicates.
The present invention contemplates a method and apparatus for capturing annotations about database material in a way that allows queries with conditions or predicates on both the database material and the annotations. Database material may be text, graphics, spreadsheets, relational tables or any other material which may be stored and indexed. An annotatable data item (i.e. the subsection of database material that can be annotated) is any entity referenced by an index (e.g. by an object identifier) or any attribute or subcomponent of such an entity, or any arbitrary set of such items. Examples include a table such as a relational table or spreadsheet, a view such as a relational view, a row within a table, a cell within a table (i.e. the intersection of a column and a row), a column within a table, an object, an attribute of an object, a set of rows or columns from one table, or a set of rows from different tables. The annotatable data items may be in a single source or multiple sources, or span such sources. Multiple annotations may be entered for a single annotatable data item.
The annotations, together with the pointer information that relates them to the original database material, may be stored in a separate source so that the data model and operation of the sources containing the original database material is not affected. It is the pointer information that allows formulation of the queries to retrieve either annotations related to specific database material or database material related to specific annotations.
Annotations may be used to capture information such as additional facts about the database material, the opinions and judgments of experts about the database material, and/or links to other related material. Annotations may be entered manually or automatically by an application. Henceforth, the person or application that enters an annotation will be referred to as an annotation author, and the person or application that retrieves annotation and/or database material will be referred to as the reader.
Annotations may be captured in structured form to enhance queryability and semantic interpretation as well as to provide some order for users to enter this additional information content. The entry of comments in an unorganized and undisciplined way can often lead to more data with little useful content. The structure is comprised of labeled categories, to aid semantic interpretation. The annotation structure could be as simple as a xe2x80x9cheaderxe2x80x9d category containing attributes (or fields) about whom and when the person or application wrote the annotation, together with a xe2x80x9cbusiness meaningxe2x80x9d category containing a single xe2x80x9cCommentxe2x80x9d field for a textual description of the data item being annotated. In this example, the title of the latter category, xe2x80x9cbusiness meaningxe2x80x9d can aid in the interpretation of the xe2x80x9cCommentxe2x80x9d field. An annotation structure may be more complicated than the one illustrated above and contain many categories, each of which contains a number of attributes. Some or all of these attributes may have constraints placed on their values. For example, the constraints may be on the datatype (e.g. numeric, character) and/or on their values, so that users have to enter values consistent with a particular datatype or consistent with an input list or pick-list. The constraints enforce more structure and consistency in the annotation content and also enhance the queryability with today""s query engines.
It is the capture and query of information from experts represents one important feature of the present invention. To this end, the present method offers the capability to allow standardized structure of annotations based on the xe2x80x9cgroupxe2x80x9d to which the author and reader belong, as well as on the data item being annotated. A group can be as small as one person, in which case there can be a personalized annotation structure, or it can contain a xe2x80x9crelatedxe2x80x9d set of people, such as people of a particular discipline or performing a particular role. Henceforth, group will be referred to as a xe2x80x9ccontextxe2x80x9d. There is a context associated with the annotation author as well as the reader. Thus, it is permitted for the structure for the entry of an annotation about any one data item to be different depending on the context of the author, and for this information to be presented differently on retrieval depending on the context of the reader. These structures that are associated with contexts, can be used to give a level of credibility to the annotations. That is, the annotation structure may be set up such that only experts in a given discipline (context) can enter information or advice pertaining to the expertise understood by that discipline. Filtering and transforming the entered annotation content based on the context of the reader can be used to retrieve only relevant information, or to xe2x80x9chidexe2x80x9d information to which this reader context is unauthorised, or to present the information in a form easily understood by the discipline or role of the reader. Multiple annotations from authors with different contexts or within the same context can be attached to a single annotatable data item.
It should be understood that the foregoing capabilities encompass a single annotation structure containing an attribute such as xe2x80x9cCommentxe2x80x9d or xe2x80x9cURLxe2x80x9d for every annotatable data item, wherein annotations of this type are entered and retrieved in the same way by all author/readers.
The method of the present invention is outlined as follows:
The type of annotatable data item is identified and the allowed structures for this type are registered. A type may include, but is not limited to, xe2x80x9cset of rows of table xxe2x80x9d or xe2x80x9cany cell in column y of spreadsheet zxe2x80x9d. This registration step can be done as a preprocessing step or may be done immediately before annotation entry.
For annotation entry, an annotatable data item is chosen (e.g. a 5th cell in column y of spreadsheet z) and an annotation is entered and stored. The annotation is associated with the annotatable data item at the time of entry by including pointer information to the annotatable data item with the annotation. Optionally, the annotation may be xe2x80x9cpropagatedxe2x80x9d or automatically associated with additional annotatable data items using extra information defined in the registration step. Once annotations have been stored, queries may be issued to retrieve both the annotation content and/or the database material.
There are a number of query modes possible. In the first mode, the reader may browse the annotations in the context of the database material. That is, the reader identifies the specific database material of interest and all accompanying annotations are retrieved. This is achieved by issuing a query using the pointer information stored with the annotations. This mode is useful when the reader is perusing database material and wants to read annotations that contain related information or links to related information.
A second mode refers to querying for particular annotations in the context of the data. That is, the reader first identifies the database material of interest. This may include identifying an annotatable data item or a type of annotatable data item. In the case of an annotatable data item, the reader asks for the accompanying annotations with particular characteristics, (e.g. where the author field contains Smith). In the case of a type, the reader may ask for elements of the type whose annotations have particular characteristics. A query is issued that uses the pointer information and specifies a filter on the annotation content.
The reader may alternatively ask for only the elements of the type and their annotations where the elements of the type and their annotations both have filters on their content. In this case, a query is issued that uses the pointer information and specifies a filter on the annotation content and also a filter on the data content.
The second mode is useful when the reader wishes to review only certain annotations that relate to the data (e.g. all those by expert X) or when the reader wishes to focus on particular database material and annotation content (e.g. find all the data and annotations about drug molecules that have biological activity greater than x (data content) and for which the experts said the experimental measurement was reliable (annotation content)).
The third mode involves querying across the full body of annotations, regardless of the database material being annotated. This may be used, for example, for locating all annotations containing a particular category or for locating annotations containing particular content. For example, an exemplary query can be: How many times has Simulation package x been used to generate production estimates?
The fourth mode involves querying for particular data in the context of the annotations, is an extension of the third mode. In this case, the query retrieves not only the annotations of interest but also the database material that they annotate. For example, in the fourth mode, the answer to the above exemplary query: xe2x80x9cHow many times has Simulation package x been used to generate production estimates?xe2x80x9d might include not only how many times the package x has been used but also the values of the production estimates. This mode also uses the pointer information in order to formulate the query to retrieve the appropriate database material.
According to a preferred embodiment, an information management method is implemented by the information management system, whereby one or more users such as administrators, annotation authors, readers, and/or applications, start the information management method of the present invention by setting up an annotation structure. Using the information management system, a user is capable of performing any one or more of the following tasks or processes:
Enter annotations about the data or fields by various input means.
Browse annotations in the context of data.
Simultaneously query for both annotations and data.
Query for particular annotations in the context of data.
Query across the full body of annotations.
Query for particular data in the context of the annotations.
It is therefore clear that the information management system is not domain specific, in that it can be used in combination with any application regardless of the complexities of the underlying technical or professional fields. The data model for the annotations (i.e., the annotation metadata model) is generic, self-describing, and self-contained.
The information management system is adaptable to the user""s query preferences in that the information management system provides the ability to operate in a datacentric mode or in an annotation-centric mode. The data centric-mode will be explained in connection with FIGS. 5 and 6, and allows the user to select desired data items and to subsequently query and retrieve data and annotations based on these data items. The annotation-centric mode will be explained in connection with FIGS. 7 and 8, and allows the user to select the annotation categories and to subsequently query and retrieve data and annotations based on the content of the selected annotation categories. As a result, the information management system allows both data and annotations to be queried, in that queries can be made over the data content, over the annotation content, or over both simultaneously. This provides the ability to query the annotations, or the annotations and the data, and further provides the ability to retrieve the annotations when their associated data is retrieved.
Yet another feature of the information management system is its ability to allow annotations to be targeted to, or associated with data at different levels of granularities, such as: collection/view/table, attribute/column, instance/row, cell, arbitrary combinations thereof, and so forth.
Still another feature of the information management system is its ability to support storage and retrieval of annotations with a generic structure or a more specific structure, where the structure can depend on the nature of the data being annotated and the context of the author of the annotation.
In addition, the information management system is capable of supporting annotations of data in a variety of sources, formats, and/or data models. The information management system can annotate data in multiple sources when coupled with a data integration engine, in only one source, or in any source regardless of the source""s data model (diverse sources). Further, the information management system can annotate views on the data in these sources, without requiring the data sources being annotated to be modified. The information management system can have multiple annotations for the same data object, and different annotations on the same data item can be entered by different people/applications or by the same person/application at different times. Moreover, when the annotations are retrieved, they can be filtered or modified in a way that depends on the context of the reader. The annotations can also be propagated to specific target data items that can be selected from a drop down list, or by entering a free format text, numeric, document, URL, and so forth.