An annotation system is one where descriptive information is stored about objects, or parts of objects, without modifying the objects themselves. Annotation systems exist in which annotations are stored in the data stream of the target objects themselves. Such systems have many disadvantages. In a preferred annotation system, annotations are stored separate from the target data source. This provides a great deal of flexibility in managing the data source and its associated annotations. The separate annotation store system is the subject of the present invention and will be referred to as simply the “annotation store” herein. Annotation systems are in high demand in Life Sciences and biotech, but not limited solely to that domain.
An annotation store, typically a database, contains the descriptive information for the annotation. An indexing scheme is used to map each annotation to the target object or the position within the target object. We refer to the objects (collections of bytes of data) that are potential targets for annotations as “data sources”. Annotation systems can have client components ranging from a standalone annotation program to annotation plug-ins that integrate with third party vendor software.
Digital fingerprints are described in “Digital Signatures: How They Work” in Apr. 9, 1996 PC Magazine. A digital fingerprint is a computable identifier for a given set of bytes. Desirable properties of a digital fingerprint include conciseness (for ease of storage and transmission), uniqueness (to avoid different sets of bytes having the same fingerprint), determinism (the same fingerprint should always be computed for the same set of bytes), and ease of computation (to facilitate quick computation of a large number of fingerprints). One popular example of a digital fingerprint is the MD5 hash algorithm, which calculates a 128-byte digital fingerprint for a given collection of bytes.
An annotation is referred to as “lost” when it is not able to be retrieved by a user working with the data source to which the annotation is targeted. A data source is referred to as “lost” when it is not able to be recovered by a user who has retrieved an annotation on that data source via an external process, such as an annotation search or an annotation browser.
U.S. patent application Ser. No. 10/600,316 “MANAGEMENT AND RECOVERY OF DATA OBJECT ANNOTATIONS USING DIGITAL FINGERPRINTING” assigned to IBM incorporated herein by reference teaches an annotation system using digital fingerprinting.
As shown in that application, the prior art annotation system (referring to FIG. 4), the following procedures are used in creating, storing, and retrieving an annotation: First, a user 401 retrieves and opens 402 the target data source, “DS”, from a location 405, “L”. Examples of “L” include a network location (e.g. Internet URL “intranet.server.com/files/my_spreadsheet.xls”), a local path (e.g. “c:\data\article20a.pdf”), or a content-management identifier (e.g. “MyCMS:Store:98a021”). The user then creates the annotation 403, “A”, by entering the information that comprises “A”. The annotation store 407 records the relationship between “A” and “L” 404. If the user creates another annotation, “A2” on the data source from “L”, then a relationship between “A2” and “L” will also be recorded in the annotation store. Thus, there is a many-to-one relationship between annotations and data-source locations within the annotation store.
Referring to prior art FIG. 5, when a user 501 later opens 503 “DS” from location “L” 405, the annotation store 406 is queried for all annotations associated with “L”. In the above scenario, both “A” and “A2” 505 would be returned 504, and the user can work with the annotations and their target data source.
Referencing prior art FIG. 6, a user 601 may access “A” or “A2” using an external mechanism, such as an annotation search 603 or browser interface. In this case, the annotation store 605 is queried for the location at which the target data source can be found. Because “A” (or “A2”) is related to “L1”, “L1” is returned to the user and, once more, the user can work with both the annotation and its target data source.
The traditional annotation system, examples shown in prior art FIGS. 4-6, has many shortcomings. For example, referring to prior art FIG. 7, consider the case in which “DS” is accessed from a location other than “L”. (For example, this might occur if a second user sends “DS” as an email attachment to a user 706). Thus we have the case of a user 706 accessing “DS” from a new location, “L2”. When the annotation store 705 is queried 704 for all annotations related “L2”, nothing is returned, and the annotations “A” and “A2” are lost.
A second shortcoming example (prior art FIG. 8) involves scenarios in which the user 804 accesses the annotation “A” through an external search 805 or browser mechanism and attempts to locate the target data source “DS”. As before, the annotation store returns location “L”, but if “DS” no longer exists at “L” (for example, if a local copy of an article was annotated prior to the article being moved 802 to a content-management system 803), then “DS” will be lost.
An annotation system is one where descriptive information is stored about objects, or parts of objects, without modifying the objects themselves. An annotation store, typically a database, contains the descriptive information for the annotation, and an indexing scheme is used to map each annotation to the location of the object or the position within the object. Annotation systems are in high demand in Life Sciences and biotech, but not limited solely to that domain.
Today annotation is generally performed on objects denoting (but not limited to) images, drawings, spreadsheets, web pages and word processing documents by storing a reference to the annotation against a known position in the structure of any of these objects. This is accomplished by using a named position in one of these objects data models (a well known pre-defined structure) that will give absolute positional information or as some manner of a measured byte offset within the file that makes up the object (also a well known structure) or as spatial co-ordinates mapped onto a rendered view of the object when it is an image for example. In all these cases, the general structure of the object being annotated is pre-defined and well known and is generally stored as a whole object or as parts that make up the whole object.
In a typical usage scenario, users (for the purpose of this document the word “users” denotes software programs in addition to individual end users) can open an object for viewing at which time the annotation store is queried for the existence of annotations associated with that object and if any are found, the user can also view them. Alternatively, the user is otherwise made aware of the annotations as they are matched to the appropriate place in the existing well known structure of the object concerned. In another mode the user might search the annotations database store and find annotations, which list the objects to which they apply.
Using the method of annotation described above, it is not possible to annotate data or views in application programs for which where there are no well known, (single or multiple) relatively static object structures stored against which the annotation system can use to map a point of reference to the annotation. Furthermore, in many application programs the objects, which are used to render a view to the user, are transitory and are established dynamically during the users interaction with that application (for instance a user navigating web pages). Consequently there is no persisting object against which the annotation store can use as the reference to which an annotation may be usefully related. This is particularly true of systems where the application program does not include a pre-designed annotation capability as part of its standard functionality.
There is a need to be able to annotate unstructured or transient objects.