As is known in the art, Magnetic Resonance Imaging (MRI) is based on the physical phenomenon of nuclear magnetic resonance and has been successfully employed in medicine and biophysics for more than 15 years. In this examination modality, the subject is subjected to a strong, constant magnetic field. As a result thereof, the nuclear spins in the subject align, these having been previously irregularly oriented. Radiofrequency energy can then excite these “ordered” spins to a specific oscillation. This oscillation generates the actual measured signal in MRI that is picked up with suitable reception coils. By utilizing non-uniform magnetic fields, which are generated by gradient coils, the test subject can be spatially encoded in all three spatial directions, which is generally referred to as “location encoding”.
The acquisition of the data in MRI ensues in k-space (frequency domain). The MRI image or spatial domain is obtained from the MRI data in k-space by means of Fourier transformation. The location encoding of the subject that k-space defines ensues by means of gradients in all three spatial directions. A distinction is made between the slice selection (defines an exposure slice in the subject, usually the Z-axis), the frequency encoding (defines a direction in the slice, usually the x-axis) and the phase encoding (defines the second dimension within the slice, usually the y-axis).
As is also known, fast MR image reconstruction has become more and more important for real-time applications such as interventional imaging. Measurement data from MRI scanners can be nicely interpreted as complex-valued samples in the frequency space, more commonly referred to as k-space in the MR literature. The switching patterns of the magnetic field gradients applied during the measurement determine the sampling trajectories in k-space. Conventional MRI adopts Cartesian trajectories as illustrated in FIG. 2A. When the entire k-space is sampled in this fashion, the image is reconstructed by a straightforward application of the FFT to the measurement data. Other trajectories such as radial, spiral, and Lissajou are possible and generally called non-Cartesian or non-uniform sampling. In this work, we concentrate on the radial trajectories, which is shown in FIG. 2B.
Fast MR image reconstruction has become more and more important for real-time applications such as interventional imaging. An imaging speed of at least 5 frames per second is necessary in order to provide immediate feedback to the physicians. This motivates faster image acquisition and reconstruction.
A popular technique to reconstruct images from non-Cartesian trajectories in k-space is the so-called gridding method, see J. O'Sullivan, “Fast sinc function gridding algorithm for Fourier inversion in computer tomography,” IEEE Transaction on Medical Imaging M 1-4(4), pp. 200-207, 1985 and J. I. Jackson, C. H. Meyer, D. G. Nishimura, and A. Macovski, “Selection of a convolution function for Fourier inversion using gridding,” IEEE Transaction on Medical Imaging 10(3), pp. 473-478, 1991. The basic idea of gridding is to resample the raw measurement data on the Cartesian grid. Then, the Fast Fourier Transform (FFT) is used to reconstruct the target image.
B. Dale, M. Wendt, and J. L. Duerk proposed a fast implementation which exploits table look-up operations, discuss in a paper entitled “A rapid look-up table method for reconstructing MR images from arbitrary k-space trajectories,” IEEE Transaction on Medical Imaging 20(3), pp. 207-217. Given a reasonable window size for interpolation (usually 3×3 or 5×5), this algorithm provides acceptable image quality with high computational efficiency. However, on currently available MR image reconstruction hardware, this algorithm is still a performance bottleneck for real-time imaging applications.
As is also known, graphics processing units (GPUs) provide a powerful subset for parallel single instruction multiple data (SIMD)-type operations which outperform current CPUs even with streaming SIMD extension (SSE) 2 or 3 optimization. Furthermore, there is a second type of parallelism available on modern graphics cards. Today there are 16 to 24 pixel units on board executing SIMD operations in parallel.
S. Thilaka and L. Donald, GPU Gems 2, ch. Medical Image Reconstruction with the FFT, pp. 765-784. Addison-Wesley, describe one variation of GPU FFT implementations and applied it to MR image reconstruction for the case of Cartesian sampling. Interestingly, the GPU implementation of the filtered backprojection algorithm has been more widely investigated in the computed tomography (CT) literature, see F. Xu and K. Mueller, “Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware”, IEEE Transaction of Nuclear Science, 2005.
Real-time MRI always strives for higher frame rates. MR image acquisition can be sped up further by using multiple receivers and exploiting the differences in their spatial sensitivity profiles to create images from only partially sampled k-space. However, the algorithms for reconstructing these data sets are also quite computational intensive and pose another bottleneck for real-time MR image reconstruction. With existing commercial hardware, either parallel image reconstruction, see S. Müller, M. Bock1, C. Fink, S. Zhlsdorff, P. Speier, and W. Semmler, “truefisp mri with active catheter tracking and real-time parallel image reconstruction,” in Proc. Intl. Soc. Mag. Reson. Med, 13th Scientific Meeting & Exhibition, p. 2158, 2005 (GRAPPA (M. A. Griswold and et al., “Generalized autocalibrating partially parallel acquisitions (grappa),” Magnetic Resonance in Medicine 47(6), pp. 1202-1210, 2002) in particular), or gridding can barely be achieved in real-time, but not both together.
Concurrently, using the graphic processing unit (GPU) to improve algorithm performance has become increasingly popular.
There are several publications on calculating the FFT on the GPU. For the CPU-based FFT, the FFTW library (Fastest Fourier Transform in the West, www.fftw.org) is claimed to be the fastest implementation using all possible optimizations like SSE2/3 and hyper threading. Moreland et al. presented in a paper entitled “The FFT on a GPU,” in HWWS '03, Proceedings of the ACM SIGGRAPH/EUROGRAPHICS conference on Graphics hardware, pp. 112-119, Eurographics Association, (Aire-la-Ville, Switzerland, Switzerland), 2003, the first GPU implementation with the performance being slower than FFTW. The FFTW library claims to be the fastest implementation of the FFT on the CPU using all kinds of optimizations like SSE2/3 and hyper threading. It is used as the golden standard to compare to GPU implementations.
Schiwietz Thomas and Westermann Ruediger, “GPU-PIV”, in proceedings of VMV 2004, pp. 151-158 and Jansen Thomas, von Rymon-Lipinski Bartosz, Hanssen Nils and Keeve Erwin, “Fourier Volume Rendering on the GPU Using a Split-Stream-FFT”, in proceedings of VMV 2004, pp. 395-403 presented GPU implementations with the performance being comparable to the FFTW on the CPU.
Sumanaweera Thilaka and Lui Donald, “Medical Image Reconstruction with the FFT”, in “GPU Gems 2”, Addison-Wesley (2005), pp. 765-784 describe different GPU FFT implementations and its applications in medical image reconstruction. The authors explain MR reconstruction from raw data sampled on a Cartesian grid. While there are not many papers on MR reconstruction using the GPU with radial trajectories, there are many papers about CT reconstruction using the GPU, see Fang Xu and Klaus Mueller, “Accelerating popular tomographic reconstruction algorithms on commodity PC graphics hardware,” IEEE Transaction of Nuclear Science, 2005.
As is known in the art, graphics cards provide readable and writable data structures in GPU memory. Basically, there are three types of data structures available: 1D, 2D, and 3D arrays, all referred to as textures. Among them, 2D textures provide the fastest update performance. The array elements of a texture are called texels. Each texel can have up to four float-typed components. The four components are often called the RGBA channels, as they are originally used to represent the red (R), green (G), blue (B), and alpha (A) intensities of a color for rendering.
To set values in a texture, the GPU processing pipeline consisting of several stages is utilized. This procedure is also referred to as “rendering” which is carried over from the original convention when screen pixels are drawn by the pipeline. FIG. 3 illustrates three important stages in the graphics card pipeline, which we elaborate below:                Vertex shader: A stream of vertices enters the vertex shader stage. A vertex is a data structure on the GPU with attributes such as position vectors for certain coordinate systems. In this stage, vertex attributes are manipulated according to the objectives of the applications. Traditionally, user-defined vertex shader programs are used to transform position vectors from one space to another. In one instance of our work, vertices are used to address the k-space coordinates.        Geometry shader (triangle setup): In this stage, sets of three vertices are selected to setup triangles according to a predefined GPU state. These triangles will be used by the next pipeline stage.        Rasterization/Pixel shader: Using the three vertices of each triangle defined above as knot points, this rasterization stage bilinearly interpolates all vertex attributes for the texels circumscribed by this triangle. This is done by the hardware and is highly computational efficient.        
Additional information of GPU programming is provided in: J. D. Foley, A. van Dam, S. K. Feiner, and J. F. Hughes, Computer graphics (2nd ed. in C): principles and practice, Addison-Wesley Longman Publishing Co., Inc., Boston, Mass., USA, 1996; M. Woo, Davis, and M. B. Sheridan, OpenGL Programming Guide: The Official Guide to Learning OpenGL, Version 1.2, Addison-Wesley Longman Publishing Co., Inc., Boston, Mass., USA, 1999; K. Gray, Microsoft DirectX 9 Programmable Graphics Pipeline, Microsoft Press, Redmond, Wash., USA, 2003; Information about GPGPU programming. http://www.gpgpu.org; and S. Thilaka and L. Donald, GPU Gems 2, ch. Medical Image Reconstruction with the FFT, pp. 765-784. Addison-Wesley, 2005 all incorporated herein by reference.