The present invention relates to data processing systems and methods, and in particular to a method and a system for handling processor-intensive operations in a data processing system such as a computer-based data retrieval system. The invention is useful for allocating resources in a data processing system and/or for reducing data access delays.
Data libraries which are accessible from any one of a large number of computer terminals within a distributed network are well known in the art. As the Internet, intranets and the World Wide Web have gained in popularity, and imaging systems have become more widely available, libraries of images and other data objects have been stored on server computers and connected for access by many people via the Internet or intranet. Individuals can, for example, request copies of required objects such as images using a Web Browser installed on a client computer connected to the Internet.
Making data objects such as images accessible to many users can create opportunities for malicious parties to replace the images on the server with forgeries, or to intercept and replace images that have been transmitted to others. In addition, with the development of sophisticated image editing software, tools which allow easy alteration of the content of any digital image are widely available. Thus, the integrity of a digitally stored image may be in question unless safeguards are provided.
Watermarking and digital signature techniques have been developed which allow verification of the source and integrity of an image (i.e. detection of any changes to it), as well as providing a means for checking that the user is authorised to access the image (by only allowing access if the requester knows the digital signature), and enabling subsequent identification of unauthorised copies. Thus, watermarking and digital signature techniques are beneficial to image owners and licensors and can also be beneficial to those requesting image access.
An end user may specify a digital signature of an image when requesting the image file either a) if the data retrieval system requires this before it will deliver the file or b) if the user wishes to check the authenticity and integrity of the image before it is delivered to them. Additionally, a data retrieval server may be adapted to perform dynamic digital signature checking of images even if the user is not required to know the signature.
In addition to digital signature checking, a server computer may be adapted to dynamically execute a watermarking process, either for all stored objects or only for certain categories of object or certain categories of requester, when it receives a request for a copy of an object. This watermarking enables subsequent identification and verification. Since images as stored in a digital image library may not have watermarks, it may be desired to watermark the images prior to distribution of copies in order to embed information such as the identity of either the distributor or the requester.
A problem which arises with systems which perform watermarking dynamically (i.e. when an object is requested from a repository, rather than when the object is stored in the repository or earlier, which is the conventional approach) is the time delay that an end user can experience while waiting for the object to be delivered. Dynamic watermarking is a computer processor-intensive operation leading to the possibility of delays in object retrieval. These delays increase with the number of requests being processed concurrently.
Such delays may be considered particularly undesirable for users who have requested objects for which no signature checking or watermarking is required, since in this case their retrieval and delivery operations involve relatively little processing but can be significantly delayed by digital watermarking of objects for other users.
It is a first aspect of the present invention to provide a method and a system for data retrieval, the method including:
responsive to receipt of data retrieval requests by the data retrieval system, identifying requested data retrieval operations requiring a first predetermined task to be performed;
inputting into a first processing queue only the data retrieval operations requiring said first predetermined task to be performed;
handling data retrieval operations which do not require said first predetermined task to be performed without performing said first predetermined task; and
processing the data retrieval operations in the first processing queue including performing said first processing task.
A data retrieval method according to the invention preferably also includes inputting into a second processing queue separate from said first queue data retrieval operations which require a second predetermined task but do not require said first predetermined task, and processing said first and second queues independently.
The first predetermined task is preferably the performance of a watermarking process. By separating simple rretrievals from retrieve-and-watermark operations, which are processor-intensive, it is possible to avoid retrieval operations (and hence users) not requiring watermarking from being overly impacted by the resource demands of users whose requested data objects do require watermarking.
Methods and systems according to one embodiment of the invention determine for each received request both whether watermarking is required and whether a second post-retrieval processing task is required (such as digital signature checking, or conversion from TIFF format to JPEG format). Simple data retrieval operations and retrieval-with-signature-checking operations can then be separated from retrieval-with-format-conversion operations, and all of these are separated from retrieval-with-watermarking operations.
Alternative embodiments of the invention may separate processing operations according to both the size of the requested object and the particular tasks to be performed on it.
Thus, requested data retrieval operations are separated for processing according to the predicted resource requirements of carrying out that processing. This facilitates improved resource allocation and enables a reduction of the delay experienced for operations which are not processor intensive.
The data retrieval operations requiring performance of one or more predetermined tasks are preferably identified by comparing a data-object-identifier obtained from a received request with a table of object types and required processing tasks for each type.
The first and second processing queues are preferably circularly-linked lists to which requested data retrieval operations are added, the operations being indexed according to the ID""s of their end user requesters. Each node in the first ring (circularly-linked list) contains a list of all of the operations to be performed for a respective user which require the processor-intensive first post-retrieval task, and each node in the second ring contains a list of all of the operations to be performed for a respective user which require a second post-retrieval task.
The independent processing of each circularly-linked list preferably involves serving each of the plurality of users on a xe2x80x98round-robinxe2x80x99 basis, under the control of a scheduler, with a predetermined amount of processing being performed for each user each time that user""s tasks in the respective linked-list are given attention. This xe2x80x9cpredetermined amount of processingxe2x80x9d is preferably either an equal amount of processing time (e.g. 2 seconds of CPU time) for each user, or the completion of one or more particular processing tasks for one or more requested objects for each user. Alternatively, predetermined but different amounts of processing may be performed for different categories of user.
It is a second aspect of the present invention to provide a method and a system for allocating data processing resources between users of a data retrieval system, the method including:
inputting data retrieval requests to a scheduler within the data retrieval system;
identifying requested data retrieval operations requiring performance of a predetermined processor-intensive task (e.g. watermarking);
retrieving requested data objects from a repository;
inputting into a first circularly-linked-list the data retrieval operations requiring performance of said predetermined task, the operations being indexed according to their end user requesters;
handling data retrieval operations which do not require performance of said predetermined task without performing said predetermined task; and
processing data retrieval operations in said first circularly-linked-list, including performing said predetermined task, in a circular sequence such that a predefined unit of processing is performed for each end user requester in turn.
The handling of data retrieval operations which do not require said predetermined processor-intensive task may simply involve delivery to the requester without performance of post-retrieval processing. Alternatively or additionally, data retrieval operations which do not require the first processor-intensive task but do require a different post-retrieval task (e.g. TIFF to JPEG format conversion, or digital signature checking) may be input into a second circularly-linked-list separate from said first circularly-linked-list, the operations in the second list being indexed according to their end user requesters. The second list is then processed in a circular sequence independently of processing of the first list such that for each list a predefined unit of processing is performed for each end user requester in turn.
The time that users spend waiting for objects from a digital object library is preferably also reduced by limiting to only one CPU-intensive watermarking process per processor. If multiple processors are available (on one or more computers), each may run a watermarking process such that there may exist as many watermarking threads as there are processors to run them without adversely affecting processing throughput.
Limiting to one watermarking thread per processor can avoid the reduction in overall throughput which may result from multiple processor-intensive threads running simultaneously on a single processor. This is achieved in a preferred embodiment of the invention by means of a control process which only starts one such thread on each processor and which obtains new tasks from the watermarking process input queue only when a previous watermarking operation is complete. This is distinguished from the alternative approach of invoking new instances of the watermarking process whenever a request is received and there is no currently available instance of the watermarking process to handle it.
Because dynamic watermarking is so processor-intensive, only one such operation can be effectively handled at one time. Simultaneously running two watermarking processes on the same processor will cause the tasks to take more than twice as long as each would take if run serially, because there is a small delay each time usage of the processor is switched. Thus, the user will experience a delay whichever one of two objects he wants to access. For example, if an object takes 10 seconds to watermark, running two watermarking threads in parallel on a single processor may take 24 seconds, rather than the 21 seconds taken to watermark them one at a time.
The limitation to one watermarking thread per processor is particularly advantageous when implemented in accordance with a further aspect of the invention. In this further aspect, images are stored in a hierarchical arrangement of pages within page sets within folders, each page including a full image, a thumbnail image and a digital signature. The system""s response to a user""s request for access to a page set containing images is to perform the following steps:
(i) to retrieve from the repository all of the thumbnails within pages of the page set and to send the thumbnails to the requester;
(ii) to retrieve from the repository all of the full images within pages of the page set and to initiate digital signature checking and dynamic watermarking of the full images in anticipation of a subsequent user request for a selected one or more of said full images.
Because the watermarking is performed by a single thread per processor, the potential for overloading the processor is reduced and there is an increased likelihood of some of the full watermarked images being available in the server""s cache memory within an acceptable time.
Furthermore, the invention according to a preferred embodiment uses a combination of (i) fetching of full images for watermarking in advance of end-user requests for specific images, (ii) initiation of watermarking of these full images by means of a single process on each processor prior to user selection of a specific image, and (iii) promotion of an image within the queue of images to be watermarked in response to that image being selected by the end user from the set of thumbnails. Thus, a selected image is promoted so as to be the next in line for processing following completion of a current in-progress watermarking task.
By allowing for promotion in response to user selection, in a system which pre-fetches images for processing and limits to one processor-intensive process running on each processor, user access delays are significantly reduced.
A method, a data retrieval system, or a resource manager according to the invention may be implemented within a computer program product comprising computer readable program code stored on a computer readable storage medium.
The invention is particularly suitable for use with computer-based image libraries, but may be used for retrieval by a data retrieval system of any data objects where some of the data retrieval requests require a relatively slow processing operation to be performed and others do not; undue delays of the xe2x80x98fastxe2x80x99 retrieval requests can be avoided by separating their processing from the processing of the xe2x80x98slowxe2x80x99 retrieval requests.