An annotation system is one where descriptive information is stored about objects, or parts of objects, without modifying the objects themselves. Annotation systems exist in which annotations are stored in the data stream of the target objects themselves. Such systems have many disadvantages. In a preferred annotation system, annotations are stored separate from the target data source. This provides a great deal of flexibility in managing the data source and its associated annotations. The separate annotation store system is the subject of the present invention and will be referred to as simply the “annotation store” herein. Annotation systems are in high demand in Life Sciences and biotech, but not limited solely to that domain.
An annotation store, typically a database, contains the descriptive information for the annotation. An indexing scheme is used to map each annotation to the target object or the position within the target object. We refer to the objects (collections of bytes of data) that are potential targets for annotations as “data sources”. Annotation systems can have client components ranging from a standalone annotation program to annotation plug-ins that integrate with third party vendor software.
Digital fingerprints are described in “Digital Signatures: How They Work” in Apr. 9, 1996 PC Magazine. A digital fingerprint is a computable identifier for a given set of bytes. Desirable properties of a digital fingerprint include conciseness (for ease of storage and transmission), uniqueness (to avoid different sets of bytes having the same fingerprint), determinism (the same fingerprint should always be computed for the same set of bytes), and ease of computation (to facilitate quick computation of a large number of fingerprints). One popular example of a digital fingerprint is the MD5 hash algorithm, which calculates a 128-byte digital fingerprint for a given collection of bytes.
An annotation is referred to as “lost” when it is not able to be retrieved by a user working with the data source to which the annotation is targeted. A data source is referred to as “lost” when it is not able to be recovered by a user who has retrieved an annotation on that data source via an external process, such as an annotation search or an annotation browser.
In example prior art annotation systems (FIG. 4), the following procedures are used in creating, storing, and retrieving an annotation: First, a user 401 retrieves and opens 402 the target data source, “DS”, from a location 405, “L”. Examples of “L” include a network location (e.g. Internet URL “intranet.server.com/files/my_spreadsheet.xls”), a local path (e.g. “c:\data\article20a.pdf”), or a content-management identifier (e.g. “MyCMS:Store:98a021”). The user then creates the annotation 403, “A”, by entering the information that comprises “A”. The annotation store 407 records the relationship between “A” and “L” 404. If the user creates another annotation, “A2” on the data source from “L”, then a relationship between “A2” and “L” will also be recorded in the annotation store. Thus, there is a many-to-one relationship between annotations and data-source locations within the annotation store.
Referring to prior art FIG. 5, when a user 501 later opens 503 “DS” from location “L” 405, the annotation store 406 is queried for all annotations associated with “L”. In the above scenario, both “A” and “A2” 505 would be returned 504, and the user can work with the annotations and their target data source.
Referencing prior art FIG. 6, a user 601 may access “A” or “A2” using an external mechanism, such as an annotation search 603 or browser interface. In this case, the annotation store 605 is queried for the location at which the target data source can be found. Because “A” (or “A2”) is related to “L1”, “L1” is returned to the user and, once more, the user can work with both the annotation and its target data source.
The traditional annotation system, examples shown in prior art FIGS. 4-6, has many shortcomings. For example, referring to prior art FIG. 7, consider the case in which “DS” is accessed from a location other than “L”. (For example, this might occur if a second user sends “DS” as an email attachment to a user 706). Thus we have the case of a user 706 accessing “DS” from a new location, “L2”. When the annotation store 705 is queried 704 for all annotations related to “L2”, nothing is returned, and the annotations “A” and “A2” are lost.
A second shortcoming example (prior art FIG. 8) involves scenarios in which the user 804 accesses the annotation “A” through an external search 805 or browser mechanism and attempts to locate the target data source “DS”. As before, the annotation store returns location “L”, but if “DS” no longer exists at “L” (for example, if a local copy of an article was annotated prior to the article being moved 802 to a content-management system 803), then “DS” will be lost.