The Internet is the largest network of computers. Large corporations and educational institutions may have their own networks of computers, which may themselves be part of, or apart from, the Internet. Digital data, stored on one or more computers (called “Source Data”), may be accessed by one or more other computers and altered by such other computer(s) to generate “Derivative Data”. Often times, the source data is typically modified by a computer other than the computer that is requesting the derivative data. The Derivative Data may be stored on one or more other computers, which may include all or some of the computers on which the Source Data were stored and all or some of the computers that altered the Source Data. When the Source Data is representative of an image, it is called Source Image Data and the altered data is called Derivative Image Data.
There are many well-known methods of creating Derivative Image Data (“DID”) from Source Image Data (“SID”). Many of these methods consist of applying one or more transformations, T(1), T(2), . . . T(n) to the SID. These transformations may act on one or more SID sets and produce one or more DID sets. For example, if the SID is a digital image with an even number of pixels in each row and an even number of rows, T(1) may be a transformation that “crops the source image to create a new image consisting of the upper right hand quarter of the source image”. If the SID is a digital image where each pixel consists of three 8 bit numbers, R, B, G, that indicate the red, blue and green intensity values, respectively, for each pixel, T(2) may be a transformation which “interchanges the R and B intensity values”. A derivative image may be created from a source image by performing T(1) and then T(2) and then T(1) on the SID. Other examples of image transformations are the rotation, scaling, filtering and image processing operations contained in Adobe's Photoshop software. Such methods are known as deterministically computable methods. Such methods generate a DID set from a specific set of SID sets by applying a specific set of completely defined transformations in a specific order. For example, if the SID set consisted of numbers and a transformation S was “multiply every other number by a random number generated by the local computer”, then this method would not be deterministically computable unless the method of computing the random number was also specified and reproducible.
There are many standard and proprietary formats for image data. Some data formats do not contain information that describes how the data are to be interpreted. For example, consider a data set D consisting of 512×512×8 bits of data. This data set D may represent a gray scale image with 256 gray levels at each of the 512×512 pixel sites or the same data set D may represent balances in bank accounts. Other formats of data include meta-data (that is data about the data) that enables proper interpretation of the data. For example, there may be a header (another data set) which is appended to the header of D, which is text and reads “the data following this text consists of 512×512 bytes of data, each byte of which represents an 8 bit gray level pixel value and the pixels are arranged in an array of 512 rows and 512 columns of pixels with the first pixel value being located at the upper left-hand corner of the image and the subsequent pixels filling the array across rows and down columns.” An alternative is to append a file name extension, such as .jpg, or .gif, which indicates that the data in the named file has a standard, well documented format either known to the public, or in the case of proprietary formats, to authorized users of the format. Many image formats use a combination of the file name extension and header data to provide interpretative information. For example, the jpg format includes a header structure and the header structure has a field in which users may insert data, such as a comment, which provides even more meta-data. Some fields of header data may be necessary for the format to conform to its specification and other fields may be optional.
When an application program is written, such as a program to display a .jpg image on a computer screen, the program may be written to ignore optional data in a header. An application program may still properly display the .jpg image, even if it does not use the optional data to display the image. Image data formats, which include header field(s) for data not required for use by an application program so it generates an image that conforms to the format specifications are termed herein as “commentable formats”. The element of commentable formats that is important for the present invention is that it provides a mechanism for a program to insert and make use of reasonably large data strings without interfering with the proper interpretation of the formatted data by another, independent program which cannot parse or use the data strings. Although only image data is discussed herein, those skilled in the art will immediately understand that the appended header may be replaced by any mechanism which provides a documented place for meta-data and that such formats include formats for video and audio data, 3-dimensional data such as for CAT-scans, computer graphic data, virtual reality data and such other forms of data that have commentable formats.
There are many methods that relate to the use of source and derivative images. For example, the Open Prepress Interface (“OPI”) specifies a mechanism for a user of a reduced size version (derivative) of a high quality original digital image (source) within compliant document creation programs to move the derivative around in the document (for example, for placement purposes) and then send the document, which includes a file pointer to the source image, to a printer. The printer then replaces the derivative image with the source image in the printed output. However, such methods do not include information as to how the derivative image was generated from the source image and the file pointer is not universal but specific to a particular file system.
There are many well known aspects to the management of digital data. One task may be to erase all digital data that has not been read or altered for a year and such tasks may be done efficiently. However, there are many valuable image management tasks which relate to the relationship of source and derivative images and that cannot now be done efficiently. For example, one of the most popular methods of generating images for the World Wide Web involves the use of Adobe's Photoshop program. Inside Photoshop, images are created in layers with, for example, one layer being a background photo (layer 1), another layer being an inset photo of a sports star (layer 2), another layer being a marketing brand icon (layer 3), another layer being a photo of a product (layer 4) and another layer being text (layer 5). A photo appearing on the Internet may consist of all layers superimposed on the previous one. One source-derivative data management task may be, for example, to replace all old brand icons appearing on such web images with new brand icons. Currently, except for looking (whether it is done by a person or by a computer image processing program) at every image on every web site (this approach is called the method of exhaustive search), there is no method for completing such a data management task. The method of exhaustive search, carried out by humans, is feasible only on small networks. However, there are not enough people to carry out an exhaustive search on the Internet within a time period that renders such a search useful to people and corporations. The method of exhaustive search, as carried out by computers, is only feasible when one imposes very restrictive conditions on the derivative data sets. For example, when brand images are arbitrarily rotated, scaled and filtered, even if such transformations are limited to those enabled by the Photoshop program only, no known computer program can identify such transformed brand images as being derived from source brand images.
What is needed is a system and method for identifying the source sets used to generate derivative images and the transformations used for generating such images.