1. Field of the Invention
The present invention relates to the field of data processing and, more particularly, to methods and systems for referencing selected portions of data.
2. Description of the Related Art
An annotation system is used to create, store, and retrieve descriptive information about objects. Annotations may exist in various forms, including textual annotations (e.g., descriptions, revisions, clarifications, comments, instructions, etc.), graphical annotations (e.g, pictures, symbols, etc.), sound clips, etc. Further, virtually any identifiable collection of data may be annotated, such as a database table (or spreadsheet), as well as any subportion (or sub-object) thereof, such as a column, row, or cell of the table.
These different data objects may be annotated for different reasons. For example, in a biomedical environment, a database table may be annotated to explain why it was created (e.g., for a particular branch of medical research), a column may be annotated to clarify the type of data it holds (e.g., test results), a row may be annotated to comment on a particular set of data (e.g., all related to a common patient), while an individual cell may be annotated to comment on the significance of a particular value stored therein (e.g., an alarmingly high test result). Further, annotations may also be made on a selected group of individual cells, for example, to comment on an important relationship between cells in the group.
Some annotation systems store annotations separately, without modifying the annotated data objects themselves. For example, annotations are often contained in annotation records stored in a separate annotation store, typically a database. The annotation records typically contain information about the annotations contained therein, such as the creation date and author of the annotation, and an identification of the annotated data object, typically in the form of an index that may be used to retrieve a reference to the annotated data. For example, when retrieving a set of annotations for a document, corresponding references may be retrieved as well, to identify the annotated data. The format of the corresponding references may vary depending on the type of data annotated.
For example, a reference to a database table may include a location and name of the table, for example, as a network file path or Uniform Resource Locator (URL). In addition, a reference to a column may include a column name (or number), a reference to a row may include a row number, while a reference to a cell may include both a column name and row number. Thus, references to all these types of data objects may be stored relatively easily and efficiently. A reference to a selected group of cells, on the other hand, presents a challenge, as the reference must include (provide an indication of) each individual cell in the group.
One conventional approach to ensure each individual cell in the group is included is to explicitly reference each individual cell in the selected group, for example, as a column name/row number pair. However, as a selected group may span hundreds of cells that each need to be referenced, this approach may result in inefficient storage, particularly as each selected group of cells may have several associated annotations. Further, when retrieving annotations and references to the corresponding annotated data from a network connected annotation database, transmitting the large number of column/row pairs may consume valuable network bandwidth.
One alternative approach is to store column/row pairs for only the corners of the selected group of cells, resulting in more efficient storage. A disadvantage to this approach, however, is that a reference storing only column/row pairs for the corners may become invalid when column and/or rows are inserted into or deleted from the table. For example, the content of the annotation may have described a relationship between a deleted row/column with others, or the description may not apply to inserted rows/columns. Another disadvantage to this approach is that only contiguous selections of data may be accurately referenced by their corners, thus preventing the use of this approach for discontiguous selections of data (e.g., that exclude certain portions of data within the four corners).
Accordingly, there is a need for an improved method for referencing a selected group of data, preferably that results in efficient storage, as well as flexible representation of different types of selected groups.