The present invention is directed to a system and method for cone-beam reconstruction in medical imaging or the like and more particularly to such a system and method implemented on one or more microprocessors. The present invention is also useful for nondestructive testing, single photon emission tomography and CT-based explosive detection, micro CT or micro cone beam volume CT, etc.
Cone-beam reconstruction has attracted much attention in the medical imaging community. Examples of cone-beam reconstruction are found in the commonly assigned U.S. Pat. Nos. 5,999,587 and 6,075,836 and U.S. patent application Ser. Nos. 09/589,115 and 09/640,713, whose disclosures are hereby incorporated by reference in their entireties into the present disclosure.
CT (computed tomography) image reconstruction algorithm can be classified into two major classes: filtered backprojection (FBP) and iterative reconstruction (IR). The filtered backprojection is more often discussed because it is accurate and amenable to fast implementation. The filtered backprojection can be implemented as an exact reconstruction method or as an approximate reconstruction method, both based on the Radon transform and/or the Fourier transform.
The cone beam reconstruction process is time-consuming and needs a lot of computing operation. Currently, the cone beam reconstruction process is prohibitively long for clinical and other practical applications. Considering a set of data with projection size N=512, since the time and computation for FBP is O(N4), the reconstruction need GFLOPS (gigaflops) of computation. Usually, the use of an improved algorithm and a faster computing engine can achieve fast cone beam reconstruction.
Existing fast algorithms for reconstruction are based on either the Fourier Slice Theorem or a multi-resolution re-sampling of the backprojection. Algorithms based on the Fourier Slice Theorem use interpolations to transform the Fourier projection data from the polar to the Cartesian grid, from which the reconstruction can be obtained by an inverse FFT. Many works have been done to bring down the FBP time, and most of them are focused on fan-beam data. These include the linogram method and the xe2x80x9clinksxe2x80x9d method as well as related fast methods for re-projection. An approximate method has been proposed based on the sinogram and xe2x80x9clinkxe2x80x9d; such a method works for 2D FBP and can achieve O(N2logN) complexity. The xe2x80x9clinkxe2x80x9d method has been extended to 3D cone-beam FBP; after rebinning the projection data in each row, the same method as in 2D can be applied to rebinning data, and data processing time can be brought down to O(N3logN) complexity for cone beam reconstruction. Another fast algorithm has been presented, using Fast Hierarchical Backprojection (FHBP) algorithms for 2D FBP, which address some of the shortcomings of existing fast algorithms. FHBP algorithms are based on a hierarchical decomposition of the Radon transform and need O(N2logN) computing complexity for reconstruction. Unfortunately, experimental evidence indicates that for reasonable image sizes, N≈103, the realized performance gain over the more straightforward FBP is much less than the potential N/logN speedup. A loss in reconstruction quality comes as well when compared with the Feldkamp algorithm. In real implementation, the total reconstruction time depends not only on the computing complexity, but also on the loop unit time. The 3D cone beam FBP mentioned above which uses the link method needs additional memory space to store the xe2x80x9clinkxe2x80x9d area. The link reconstruction table area containing interpolation coefficients and address information to access xe2x80x9clinkxe2x80x9d data takes O(N3) additional memory and lowers the performance because the memory access time. The speed-up is smaller than N/logN.
A customized backprojection hardware engine having parallelism and pipelining of various kinds can push the execution speed to the very limit. The hardware can be an FPGA based module or an ASIC module, a customized mask-programmable gate array, a cell-based IC and field programmable logic device or an add-in board with high speed RISC or DSP processors. Those boards usually use high-speed multi-port buffer memory or a DMA controller to increase data exchanging speed between boards. Some techniques, like vector computing and pre-interpolating projection data, are used with the customized engine to decrease reconstruction operation. Most of the customized hardware is built for 2D FBP reconstruction applications. No reconstruction engine-based a single or multiple microprocessors that is specially designed for fast cone beam reconstruction is commercially available.
A multi-processor computer or a multi-computer system can be used to accelerate the cone beam reconstruction algorithm. Many large-scale parallel computers have tightly coupled processors interconnected by high-speed data paths. The multi-processor computer can be a shared memory computer or a distributed memory computer. Much work has been done on the large-scale and extremely expensive parallel computer. Most of that work uses an algorithm based on the 3D Radon transform. As an example, the Feldkamp algorithm and two iterative algorithms, 3D ART and SIRT, have been implemented on large-scale computers such as Cray-3D, Paragon and SP1. In such implementations, the local data partition is used for the Feldkamp algorithm and the SIRT algorithm, while the global data partition is used for the ART algorithm. The implementation is voxel driven. The communication speed between processors is important to the reconstruction time, and the Feldkamp implementation can gain best performance in Multiple Instruction Multiple Data (MIMD) computers. Parallel 2D FBP has been implemented on Intel Paragon and CM5 computers. Using customized accelerating hardware or a large-scale parallel computer is not a cost-effective fast reconstruction solution, and it is not convenient to modify or add a new algorithm for research work.
In a distributed computing environment, many computers can be connected together to work as a multi-computer system. The computing tasks are distributed to each computer. Usually the parallel program running on a multi-computer system uses some standard library such as MPI (message passing interface) or PVM (parallel virtual machine). Parallel reconstruction has been tested on a group of Sun Sparc2 computers connected with an Ethernet network, and the implementation is based on the PVM library. The Feldkamp algorithm has been implemented on heterogeneous workstation clusters based on the MPI library. The implementation runs on six computer clusters, and the result shows that the implementation in load balancing resulted in processor utilization of 81.8%, and use of asynchrous communication has improved processor utilization to 91.9%. The biggest disadvantage of multi-computer clusters is that communication speed decreases reconstruction speed. Since cone beam reconstruction involves a large data memory, the data is usually distributed into each computer. The computers need to exchange data in the backprojection phase. The memory communication is a big trade-off for reconstruction speed. Another disadvantage is the inability to get a small size reconstruction engine with multi-computer clusters. There are also some attempts to implement cone beam reconstruction on distributed computing technology such as COBRA (common object request broker architecture and specification). Usually the distributed computing library costs more communication time trade-off than directly using the MPI library, thus resulting in lower reconstruction speed.
Besides parallelism between processors, a single processor can gain data and operation parallelism with some micro-architecture techniques. Instruction-level Parallelism (ILP) is a family of processor and compiler design techniques that speed up execution by causing individual machine operations to execute in parallel. Modern processors can divide instruction executing into several stages; some techniques such as pipeline and branch prediction permit the execution of multiple instructions simultaneously. To enable data processing parallelism, some processors add single instruction multiple data (SIMD) instructions, which can process several data in one instruction. Such processors include Intel""s IA-32 architecture with MMX(trademark) and SSE/SSE2, Motorola""s PowerPC(trademark) with AltVeC(trademark) and AMD Athlon with 3Dnow(trademark). However, to date, such parallelism has not been exploited in cone-beam reconstruction.
In light of the above, it will be readily apparent that a need exists in the art to perform cone-beam reconstruction at a practically acceptable speed without the need for customized hardware or a large-scale computer. It is therefore an object of the invention to provide a system and method for cone-beam reconstruction which can be performed quickly on inexpensive, widely available equipment.
To achieve the above and other objects, the present invention is directed to a practical implementation for high-speed CBR on a commercially available PC based on hybrid computing (HC). Feldkamp CBR is implemented with multi-level acceleration, performing HC utilizing single instruction multiple data (SIMD) and making execution units (EU) in the processor work effectively. The multi-thread and fiber support in the operating system can be exploited, which automatically enable the reconstruction parallelism in a multi-processor environment and also make data I/O to the hard disk more effective. Memory and cache access are optimized by proper data partitioning. Tested on an Intel Pentium III 500 Mhz computer and compared to the traditional implementation, the present invention can decrease filtering time by more than 75% for 288 projections each having 5122 data points and can save more than 60% of the reconstruction time for 5123 cube, while maintaining good precision with less than 0.08% average error. The resulting system is cost-effective and high-speed. An effective reconstruction engine can be built with a commercially available Symmetric Multi-processor (SMP) computer, which is easy and inexpensive to upgrade along with newer PC processors and memory with higher access speed.
In the present invention, the Feldkamp algorithm cone beam reconstruction (FACBR) can achieve high speed with good precision. The test environment is an Intel Pentium III 500 Mhz with 640 MB 100 Mhz memory. The result shows that the reconstruction for a 5123 cube with 288 projections can be finished in less than 20 minutes and maintains good precision, while the old implementation required more than 100 minutes. Several simulated phantoms have been used to test the precision of the HC FACBR. Comparing the reconstructed image with a simulated phantom image and images reconstructed by the traditional method shows less than a 0.04% average error compared to traditional method images and good precision to computer-simulated phantoms. A linear attenuation coefficient distribution of a three-dimensional object can be reconstructed quickly and accurately.
A higher speed SSE-2 enabled Pentium IV and a 2- or 4-processor PC are expected to permit 5123 cube FACBR in a few minutes in the future. FACBR is implemented with multi-level acceleration and hybrid computing utilizing the SIMD and ILP technology. The memory and cache access are optimized by proper data partition. Compared to implementation on a large-scale computer and computer clusters, the present invention is cost-effective and high-speed. A market available SMP computer provides an effective reconstruction engine which is easy and inexpensive to be upgraded along with newer PC processors. By contrast, custom built hardware is expensive and very difficult to upgrade.
A high-speed implementation will be disclosed for FACBR on a PC. Techniques for hybrid execution (HE) and hybrid data (HD) will also be disclosed. With these hybrid computing features, good memory organization and instruction optimization, a high speed Feldkamp implementation can be implemented on a general purpose PC with a high performance to price ratio. The HD and HE can also be applied to implementation on other hardware platforms to improve the FACBR performance. With higher clock frequency processors and an inexpensive market available SMP PC, it is possible to gain good performance as done by expensive, inconvenient customized hardware. As a commercial market available PC is used to achieve high performance, it is convenient to design new algorithms and a new system for cone beam reconstruction, and it is useful to integrate an image grab system and 3D rendering system, in a single system which is easy to configure and upgrade.
As Intel x86 CPU frequency has increased to the GHz level, it is practical and economically feasible to build a Multi-Processor x86-based high-speed cone beam reconstruction computing engine. Although the Feldkamp algorithm is an approximate cone beam reconstruction algorithm, it is a practical and efficient 3D reconstruction algorithm and is a basic component in a few exact cone-beam reconstruction algorithms including the present invention.
The present invention implements parallel processing on a single microprocessor or multiple processors. The use of hybrid computing (both fixed and floating point calculation) accelerates the cone-beam reconstruction without reducing the accuracy of the reconstruction and without increasing image noise. Those characteristics are particularly important for the reconstruction of soft tissue, e.g., cancer detection.