In an x-ray system for angioplasty (angiography), a sequence of x-ray pulses is generated, with which the human body is radiated. In an image recording system (image amplifier, flat screen detector or suchlike) an ‘image’ is generated for each of these x-ray pulses using technical means, said image corresponding to the degree of x-ray absorption. Further sources for such image sequences can also originate from memory units or external signal sources (also other modalities such as MR, CT, ultrasound . . . ). In the further sequence, these digital x-ray images are processed in an image processing system (IPS), with different algorithms being used for image enhancement purposes. The images thus processed are in turn output in a temporal sequence on a monitor for diagnosis purposes.
It is characteristic of an IPS that the frequency of the x-ray pulses is fixed or variable and typically moves in a range of between 1 to 60 images per second. A prerequisite for an IPS is that only a specific predetermined time span may elapse, the so-called latency of the IPS, from the image information arriving from the image recording system until this image is displayed on the monitor.
An IPS further requires image data of the current examination to also be able to be simultaneously stored on a non-volatile storage (e.g. hard disk). At a later point in time, this image data can then be reprocessed by different algorithms of the IPS, if necessary also by algorithms other than those during the original recording, or in a different parameterization of the algorithms, with the original image repetition rates having in turn to be maintained. A further application also considers the reproduction at a higher or lower speed.
A typical application is shown in FIG. 1, which illustrates an IPS as claimed in the prior art. An image source BQ allocates different algorithms to different computing units PE1, PE2.1, PE2.2, PE3.1, PE3.2, PE4 as well as to a memory and finally to a data sink DS.
The following difficulties arise with such an implementation of such an IPS shown in FIG. 1.
The duration for the calculation of the individual algorithms may be very different. Differences of one or more decimal powers can easily exist in the required computing time between the individual algorithms.
The overall computing power required to execute the algorithms is very high. This can generally no longer be achieved by one single processor.
The division on a multiprocessor system is not a trivial matter. Due to the strict timing of the input image data, a rigidly clocked system is generally used, with the timing corresponding to the image sequence frequency and/or a multiple or fraction thereof.
In different applications, the configuration of the processing chain can indicate clear differences, for instance when the data does not come from the image recording system (live) but instead from the non-volatile storage (replay).
As a result of the high level of computing power required, a so-called “customer-specific circuit” (ASIC) is particularly suited to implementing the algorithm. The effort involved in designing such a circuit is enormously high and the production costs only depreciate with very high quantities. Such a circuit cannot be changed afterwards, for instance when an improved algorithm has been found.
With the realization in a programmable logic module (FPGA), the depreciation of the costs is already achieved with a significantly lower quantity. The development costs are also considerably lower than with an ASIC. However, the expandability of the algorithms is significantly restricted by the predetermined number of available logic elements in an FPGA. The design of the logic is not very simple and is only controlled by specialists.
The use of programmable processors (universal processor CPU or signal processor DSP) achieves a significant simplification. A number of processors, which process the image data, is typically used with an IPS. The necessary computing power is herewith achieved in that the overall task is distributed across several processors. A pipeline of processes is frequently established here, as is shown in FIG. 1. The known techniques for using several processors are the sequencing of processing steps and the division of data into several similarly-operating processing stages (striping), with the individual partial results being combined again by means of interleaving.
The computing units PE1, PE2.1, . . . (Processing Elements) shown in FIG. 1 can be different computing units, such as for instance ASIC, FPGA, DSPs, universal processors, microcontrollers, routers, periphery controllers and suchlike. In the previous methods of resolution, of which the arrangement shown in FIG. 1 is an example, a specific topology of the data flow is established by physically connecting the individual PEs. The allocation of the algorithms to the computing units is carried out in a design phase, in which the topology, the computing power, the data transfer and the demand for latency of the individual stages as well as the overall processing have to be taken into account. The previous approaches have generally resulted in a direct mapping of the algorithms on assigned PEs, as is the case in the design shown in FIG. 1. This direct allocation is normally regarded as a ‘natural’ realization of the sequence of algorithms. Accordingly, the data paths are established and direct connections between the PEs are established.